Canonical thermalization

For quantum systems that are weakly coupled to a much 'bigger' environment, thermalization of possibly far from equilibrium initial ensembles is demonstrated: for sufficiently large times, the ensemble is for all practical purposes indistinguishable from a canonical density operator under conditions that are satisfied under many, if not all, experimentally realistic conditions.


I. INTRODUCTION
It is commonly accepted that Quantum Mechanics describes the entire "physical world". In particular, equilibrium Statistical Mechanics, and possibly also its limitations, should follow from Quantum Mechanics, though a really satisfying "derivation" still does not seem available. The main objective of the present paper is such a derivation of the "summit" [1] of equilibrium Statistical Mechanics, namely the canonical ensemble, from Quantum Mechanics in combination with certain, very weak assumptions regarding the preparation, the observables, and the Hamiltonian of the system.
A key issue in Statistical Physics is the lack of knowledge about many "details" of any given "real" system. In particular, the initial condition of the typically 10 23 particles is unknown apart from a few "gross" features, for instance the (approximate) total energy and a few additional macroscopic properties in case of an nonequilibrium initial condition. The standard way to deal with this incomplete knowledge is to consider a statistical ensemble (many repetitions of the "same" experiment, formally described by a density operator) instead of one particular "realization" of the experiment (formally described by either a pure or a mixed quantum mechanical state). Much less appreciated is the fact that the concrete statistical ensemble for any given "real" system is also largely unknown and would be at least as hard to actually determine as the state of any single realization. It is therefore unavoidable to introduce certain postulates regarding this largely unknown statistical ensemble. Their only justification is their plausibility, an admittedly subjective concept. For instance, the present author finds it not plausible to assume for an isolated macroscopic system at equilibrium a microcanonical ensemble, be it as a postulate per se or as a consequence of some other hypothesis or principle (e.g. Jaynes principle): Its key claim is that if one would be able to actually determine the ensemble averaged occupation probabilities of all energy levels, one would always find that all levels within some small energy-interval are occupied with exactly equal probabilities and all other levels are not occupied at all. Since level populations are constants of motion, the same properties must apply to the possibly far from equilibrium initial ensemble. This seems indeed very unlikely to be true for any given "real" sys-tem, independently of any further "details" of the setup and the preparation. For instructive numerical examples see e.g. Ref. [2].
Put differently, we find it plausible that there exists a well-defined statistical ensemble describing the initial state of any given "real" system. But its details are unknown, probably very complicated, and already quite different even when the "same" experiment is repeated in two different labs. Hence the chance that the unknown "true" ensemble happens to agree with any particular ensemble we are postulating is virtually zero. Our present solution of this problem is not to assume any specific form of the "true" initial ensemble, but only some very general "gross" features. Namely, we will assume that the ensemble averaged occupation probabilities of the energy levels can be locally (on the energy axis) averaged in a well-defined manner. We will provide good reasons to expect that real systems satisfy the assumption by closer inspection of the preparation procedure at the origin of the initial condition.
A second important deficit of knowledge regards the appropriate Hamiltonian (and Hilbert space) of the system, generating the time evolution of the initial ensemble. To begin with, we will take it for granted that the system can be treated as strictly closed (isolated, autonomous), though in real systems small remnant interactions with the rest of the world are unavoidable. The main reason is that standard Quantum Mechanics is only able to describe the evolution in time of closed systems. Open systems, interacting and entangled with an environment, can be handled only "indirectly", by first including the entire relevant environment of the open system into a closed supersystem, then evolving the latter by standard Quantum Mechanics and finally eliminating the environment again. Likewise, we assume that after including all relevant perturbations "from outside" into the considered system, it must be possible to theoretically model it as strictly isolated from the rest of the world. If this were not possible, the problem could only be treated by means of a generalization of Quantum Mechanics. Many (often hidden) hypotheses of how small external perturbations may modify standard Quantum Mechanics have been proposed. We do not share the viewpoint that such extensions are a pivotal point in the foundation of Statistical Mechanics. Otherwise, an unavoidable consequence would be that Statistical Mechanics does not "work" for strictly closed systems. Also numerical evidence seems to support our present viewpoint.
Taking for granted a closed system, the "right" Hamiltonian (and Hilbert space) is still far from obvious, and even the existence of one particular "true" Hamiltonian for any given "real" system is questionable. Our present way to deal with this problem is to focus on "generic" Hamiltonians, while leaving any further details unspecified.
The Quantum Mechanical time evolution generated by the Hamiltonian will not be touched in any way, neither by heuristically modify it to account for small remnant external perturbations (see above), nor by introducing any kind of approximation. In other words, the wellknown time-inversion invariance of Quantum Mechanics is fully and rigorously maintained.
Within the above general framework, the main subject of our present work is the long-time evolution of a largely arbitrary and possibly far from equilibrium initial ensemble. Specifically, we address the two key claims of Statistical Mechanics in this context: (i) Equilibration: The ensemble approaches a stationary long-time behavior. (ii) Thermalization: Provided the ensemble exhibits a sharply peaked energy distribution, the longtime steady state is captured by (i.e. is experimentally indistinguishable from) the microcanonical ensemble corresponding to the given energy peak. In particular, if the total closed system consist of a subsystem of actual interest which is weakly coupled to a much "bigger" environment (canonical setup), then the steady state of the subsystem alone (after eliminating/tracing out the environment) is captured by the canonical ensemble.
In this work we address the questions in how far and in which sense equilibration and thermalization can be derived within our above specified general framework with particular emphasis on the canonical setup. The main new results of our present paper are established in Sects. IX and XI, demonstrating that in the longtime limit, the reduced state of the "small" system is for all practical purposes indistinguishable from a canonical ensemble. More precisely, the true ensemble itself remains time-dependent forever and thus quite different from the canonical density operator, but the experimentally observable differences between the two ensembles are unresolvably small for the overwhelming majority of times. The other sections of the present paper provide the pre-requisites needed in those central Sects. IX and XI. They mostly collect and unify, but partially also extend previously known material. The final section XII contains the summary and outlook with particular emphasis on closely related recent works and on the subject of (almost) integrable many-body quantum systems, be-ing at the focus of the present Special Issue.

II. GENERAL FRAMEWORK
According to Sect. 1, we consider an isolated system, incorporating all relevant parts of the environment (thermal baths, reservoirs etc.), and modeled according to standard Quantum Mechanics by an autonomous Hamiltonian H on a (separable) Hilbert space H. Specifically, we focus on spatially finite (compact) systems with a large (macroscopic) but finite particle number, corresponding of f degrees of freedom with 1 ≪ f < ∞ . (1) As a consequence, all eigenvectors of H represent bound states and hence the spectrum of H is discrete (quantized). As usual, |n (n = 0, 1, ...) denote the (typically infinitely many) eigenvectors of H and the corresponding eigenvalues E n are assumed to be ordered, with a finite ground state energy E 0 > −∞ [17]. In other words, the Hamiltonian can be written as where n indicates a summation over all n = 0, 1, .... For any function g : R → R we adopt the common definition In the special case of a power series, g(x) = k g k x k , one readily sees that the (4) reproduces k g k H k , as it must be. But (4) also covers more general functions g(x). Degenerate energies correspond to equality signs in (2). Yet, sums like in (3) and (4) are meant to run over all n-values.
For any energy eigenvalue E n , the projector onto the associated eigenspace of H is given by where Em=En indicates a summation over all m-values satisfying E m = E n . It follows that P Em = P En for degenerate energies E m = E n . Consequently, the identity operator can be rewritten as where En indicates a summation over all mutually different E n -values, i.e. degenerate energies only appear once in the sum. Likewise, the Hamiltonian from (3) can be rewritten as A. Level Counting, Entropy, Temperature In the following, we collect some "well-know" results of "elementary level counting". Though some details may be strictly speaking more subtle than we will say below, and possibly even not yet proven rigorously in sufficient generality, our present point of view is that these unsolved issues of Statistical Physics are substantially less critical than those at the actual focus of our present paper (equilibration and thermalization, see Sect. I). Differently speaking, we will henceforth restrict ourselves to Hamiltonians H which satisfy the properties given below and adopt the common opinion that they cover most cases of practical relevance.
We emphasize that the entire present section exclusively deals with properties of the energy eigenvalues of the Hamiltonian (3). In other words, we do not speak about system states at all, and hence the considerations of the present section do not depend on whether the system is in or out of equilibrium, since these are specifications of the system state, not of the system per se.
To begin with, the number of energy levels below any given upper limit E is defined as where Θ(x) := x −∞ dy δ(y) is the Heaviside stepfunction. The entropy then follows as ] with a small but finite ∆E is well-known to be equivalent to (10) but would be less convenient later on.) Focusing on the most common case, the entropy is an extensive quantity.
For a system with f degrees of freedom, S(E)/k B is thus very roughly speaking comparable in order of magnitude to f , It follows from (9) and (11) that in macroscopic systems with f = O(10 23 ) degrees of freedom the energy levels are unimaginably dense on any decent energy scale: for instance, within an energy interval of 10 (−10 20 ) Joule there will still be of the order of 10 (10 23 ) levels. On these exceedingly tiny scales, the step function Θ(x) appearing in (9) is considered to be actually "washed out", so that Ω(E) becomes a reasonably smooth function of E with a well defined derivative where also the delta-function δ(x) = Θ ′ (x) is considered as "washed out" over many energy levels. In other words, ω(E) represents the density of states. The correspondingly washed out entropy (10) gives rise to the usual definition of temperature, Combining (9), (10), and (12), (13) yields the useful relation We reiterate that in our present work, entropy and temperature are by definition given by (10) and (13), and as such are for the moment completely independent of the question of whether the considered system is at equilibrium or not. In particular, we did not establish any relation so far between the energy E and the state of the system.
The following statements can be readily verified for simple examples like the ideal gas. More detailed estimates, which we omit here, indicate that they in fact remain true quite generally: The actual dependence of the right hand side of Eq. (11) on E is such that the entropy approaches zero for E ↓ E 0 , but for the rest (i.e. for macroscopic values of E − E 0 ), the dependence on E is comparably weak, essentially of logarithmic form. Likewise, (13) approaches zero for E ↓ E 0 . Combining all these properties of S(E) with (11) we can conclude that and hence Note that if we did not avoid speaking about equilibrium states, then (15) could also be justified via the equipartition of energy for an extensive systems with energy E at equilibrium. Finally, we can infer from (15) and the derivative of this relation in combination with (13) Similarly as in the first paragraph of this section, we may take the alternative point of view that we only consider model Hamiltonians H which capture the following common property of real systems reasonably well: If the macroscopic system energy E is changed by an amount dE then the microscopic kinetic energy per degree of freedom k B T (E)/2 changes by an amount which is very roughly of the order of magnitude of dE/f , i.e.
Since T (E 0 ) = 0, integration of (18) implies (15) and (17). Similarly, the third law S(E 0 ) = 0 and (13) yield upon another integration the relation (11) and the logarithmic dependence of S(E) upon E − E 0 mentioned above (15). It is well-known that all these relations may become problematic for extremely low temperatures. Such cases are tacitly excluded from now on.

B. System States and Dynamics
According to standard Quantum Mechanics, the state of the system is at any time instant t given by a density operator ρ(t). While we are mainly interested in statistical ensembles (mixed states) in this paper, it is nevertheless worth to point out that formally our considerations will also cover pure states as special case. The time evolution can be written as with unitary propagator (20) where we have exploited (4) in the last relation. Denoting the matrix elements of ρ(t) by ρ mn (t) := m|ρ(t)|n , Eqs. (19) and (20) yield for an arbitrary initial condition ρ(0) at time t 0 = 0 the result where mn indicates a summation over all m, n = 0, 1, 2, ....

C. Level populations
Level populations are denoted by p n and refer to the ensemble averaged occupation probabilities of the energy eigenstates |n . In other words, p n is the expectation value of the observable |n n|. According to (21) and (22) it can be rewritten as p n := Tr{|n n| ρ(t)} = ρ nn (t) = ρ nn (0) independently of t.
Since every density operator ρ(t) is, for any value of t, non-negative and Hermitian, it follows that (ψ, φ) := ψ|ρ(t) + ǫ|φ satisfies the scalar product axioms for any ǫ > 0. Hence, Cauchy-Schwarz's inequality applies, i.e. |(ψ, φ)| 2 ≤ (φ, φ) (ψ, ψ). In the limit ǫ → 0 we thus obtain Similarly as in (23), the ensemble averaged occupation probability p En of an energy eigenvalue E n is given by the expectation value of the projector (5) onto the corresponding eigenspace and can be rewritten as independently of t. For convenience, also the p En will sometimes be called level populations.
The obvious normalization conditions are Since P En ρ(0)P En commutes with H from (8), we can without loss of generality, choose a basis in which both operators are simultaneously diagonal. In the following, we always work with this specific energy basis due to its convenient property that all the non-diagonal elements of P En ρ(0)P En vanish, Given any ensemble ρ(t), let H + ⊂ H be the sub-Hilbert space spanned by those basis vectors |n for which p En = 0, Exploiting (24) it follows that ρ nm (t) = 0 whenever ρ nn (t) = 0 or ρ mm (t) = 0. Denoting by P + the projector onto H + we thus can conclude that ρ(t) := P + ρ(t) P + .

As usual, observables are represented by Hermitian operators
A mn := m|A|n with expectation value A (t) := Tr{ρ(t)A} (32) and, without loss of generality, are assumed not to depend explicitly on time.
According to (29) it follows that where A + is the projection/restriction of A onto H + from (28), In other words, only the sub-Hilbert space H + and the projected/restricted observables (34) actually matter. Hence we can from now on replace H and A by H + and A + whenever it will be convenient.

III. THE PROBLEM OF EQUILIBRATION
Generically, the statistical ensemble ρ(t) is not stationary right from the beginning, in particular for an initial condition ρ(0) out of equilibrium. But if the right hand side of (22) depends on t initially, it cannot approach for large t any time-independent "equilibrium ensemble" whatsoever. In fact, any mixed state ρ(t) returns arbitrarily "near" to its initial state ρ(0) for certain, sufficiently large time-points t, and similarly for the expectation values (32), as demonstrated for instance in Appendix D of Ref. [18]. We emphasise, that these arbitrarily close recurrences do not refer to pure states only (as in the classical Poincaré recurrences) but rather to arbitrary statistical ensembles ρ(t).
More specifically, consider any ρ(t) which is not completely independent of t. Then, according to (22) there must exists at least one ρ mn (0) = 0 with ω := [E n − E m ]/ = 0. In fact, one expects that one usually finds pairs with ω-values ranging from extremely small to extremely large values on the scale of 1 Hz, thus including any experimentally "reasonable" frequency. Focusing on the specific observable A =Â +Â † ,Â := |m n|/ρ mn (0) (35) it readily follows from (22) that Tr{ρ(t)A} = 2 cos(ωt) .
In other words, the ensemble ρ(t) exhibits permanent oscillations rather than equilibration, at least as far as the observable A is concerned. The main implication of the two previous paragraphs is that equilibration, as specified in Sect. I, cannot be true and hence cannot be proven in full generality and rigor. Put differently, Quantum Mechanics and equilibration are strictly speaking incompatible. Equilibration can at most approximately hold true for a restricted class of observables A and initial conditions ρ(0). The main objective of our present work is to show that, and in which sense this is indeed the case under rather weak restrictions regarding observables and initial conditions. Those are the subject of the next two Sections.

IV. REALISTIC OBSERVABLES
The basic idea is that it is not necessary to theoretically admit any arbitrary Hermitian operator A as a pos-sible observable [19][20][21][22]. Rather it is sufficient to focus on experimentally realistic observables in the following sense [23]: Any observable A must represent an experimental device with a finite range of possible outcomes of a measurement: (37) where the maximization and minimization is over all normalized vectors |ψ ∈ H. Accordingly, a max and a min are the largest and smallest eigenvalues of A.
Moreover we require that this working range ∆ A of the device A is limited to experimentally reasonable values compared to its resolution limit δA. All measurements known to the present author yield less than 20 relevant digits, i.e. ∆ A /δA ≤ 10 20 . Maybe some day 100 or 1000 relevant digits will become feasible, but it seems reasonable that a theory which does not go very much beyond that will do. We also remark that range and resolution are specific to the given measurement device, but are (practically) independent of the properties (e.g. the size) of the observed system.
The above specified class of admissible observables clearly includes any realistic measurement apparatus. Yet it turns out that the class of observables which will be admissible in our main results below can still be substantially extended in two steps.
First, as said at the end of Sect. II D, we can replace the full Hilbert space H by the sub-Hilbert space H + defined in (28). Accordingly, the full range ∆ A from (37) can be replaced by the reduced range According to (28), H + is at most as large as H. However, in many cases the level populations (25) may be safely negligible e.g. beyond some finite upper energy threshold, yielding a finite-dimensional H + , while H is typically infinite dimensional. Hence, the reduced range ∆ ′ A from (38) will be finite even for operators A with an unbound spectrum on H, i.e. for which the full range ∆ A from (37) is infinite.
Second, we consider observables of the form with arbitrary real coefficients b := (b 0 , b 1 , ...). In particular, we can conclude from (4) that arbitrary functions g : R → R of H are of this form, As it will turn out, it is sufficient to consider instead of A in (38) any observable of the form A − B(b) with arbitrary coefficients b. As a consequence, ∆ ′ A from (38) can be replaced by where min b indicates a minimization over all real coefficients b := (b 0 , b 1 , ...). In particular, it follows that for some set of coefficients b or some function g(x). From (37), (38), and (41) we see that Rather than requiring that the full range-to-resolution ratio does not exceed 10 20 , as discussed below (37), it will be sufficient in our central result below to similarly limit the reduced ratio:

A. Population density
Our key requirement with respect to the initial condition ρ(0) is that the concomitant ensemble averaged level populations (23) can be written in the form with a smooth function h(E) and "unbiased fluctuations" δp n . Physically, h(E) thus represents a locally averaged level occupation probability, henceforth abbreviated as population density.
To be more precise with respect to (45), we recall that for a system with f degrees of freedom, there are roughly 10 O(f ) energy eigenvalues E n per Joule, see Sect. II A. Assumption (45) means that within any energy interval around some reference energy E > E 0 , which contains very many levels E n , but which is still exceedingly small on any experimentally resolvable scale, the ensemble averaged level populations p n can be split into an approximately constant "local" average value h(E) and "unbiased fluctuations" δp n , i.e. the average over all δp n belonging to this interval around E is negligibly small compared to h(E) itself. The key point is that h(E) must be independent of the exact choice of the considered energy interval around E.
Comparable assumptions of well-defined "local averages" are tacitly taken for granted in many different physical contexts. Likewise, we find it quite plausible that the ensemble averaged level populations p n , though largely unknown (see Sect. I), still satisfy our present assumption under experimentally realistic conditions. Further arguments are provided in Sect. V B below.
Finally, we emphasise once more that all these considerations concern ensemble averaged level populations, i.e. mean values over many repetitions of an experiment, which the experimentalist would denote as "identical" but which in fact are very different on the microscopic level, see Sect. I.

B. System preparation
The key idea is that the initial condition is the result of a preparation process, during which the system was not yet isolated, admitting conclusions about the initial condition itself.
The simplest case consists in a time-dependent parametric change of the Hamiltonian during the preparation phase. More complex, but ultimately applying to every real experiment, is some type of contact with the "rest of the world" prior to the actual isolation of the system.
The first consequence is an entanglement with the rest of the world during the preparation phase (t < 0), implying that the reduced initial state (at t = 0) of the system (after tracing out the rest of the world) will be a mixed state even for a single realization of the experiment. Already this reduction step brings along a certain "randomization" of the system level populations. More importantly, there will unavoidably arise some kind of time dependencies of the system Hamiltonian during the preparatory period t < 0, the last of them being caused by the actual shutting down of all connections with the rest of the world. Such a time dependence of the system Hamiltonian is known to generically entail an approximately diffusive "spreading" of occupation probabilities over neighboring energy levels [25]. Since the levels are so exceedingly dense, the diffusion will -already during a very short time span and even for a very weak timedependence of the Hamiltonian -effectively lead to a diffusive randomization of the p n 's in accordance with (45).
In theoretical studies it is quite common to generate the out of equilibrium initial condition by means of a "sudden" (discontinuous) parametric change of the Hamiltonian [2,15,16], called "quantum quench". Such a procedure thus misses the above mentioned diffusive "spreading" of the occupation probabilities.

C. Energy density
The energy probability density, or energy density for short, is defined as Accordingly, ρ(E) dE quantifies the ensemble averaged probability to find a value between E and E + dE when measuring the energy of the system. With (4), (22), (23), and (32) it follows that independently of t. In the same spirit as around (12) and in the discussion of h(E) below (45), the delta-functions in (46) and (47) are understood to be "washed out" over many energy levels in order to give rise to a well-defined, smooth energy density. As a consequence one finds that [24] ρ While a detailed derivation of this relation is provided in Appendix C , it is intuitively quite obvious: The probability ρ(E) dE to encounter an energy between E and E + dE is equal to the locally averaged population h(E) of the energy levels multiplied by the local level density ω(E) times the interval length dE.

D. Maximal level population
To get a feeling for the exotic orders of magnitudes arising in the context of (45), let us assume that there are exactly 10 (10 23 ) equally spaced energy levels E n per Joule and that our energy interval around E has a length between 10 −(10 22 ) J and a few J. Then, our interval contains at least roughly 10 O(10 23 ) energy levels. Assuming that h(E) is approximately constant within the interval, and zero outside, the normalization (26) implies that h(E) = 10 −O(10 23 ) within the interval. Recalling that this is the local average value of p n , it seems quite reasonable to assume that all the individual p n values do not exceed the range between zero and 10 (10 22 ) times the average value h (E). Otherwise, the average over the δp n 's would not be negligible compared to h(E) = 10 −O(10 23 ) for every possible choice of the interval.
Returning to the general case, we can conclude that even if h(E) varies very fast on any experimentally realistic scales and even if the energy levels are populated extremely unequally, we still expect that max n p n will be extremely small, typically On this rather heuristic level, (45) thus implies (49). Intuitively it even seems plausible that the two conditions are more or less equivalent. Next we remark that the mere existence of the level density (12) implicitly takes for granted that the multiplicities of degenerate energies are not exceedingly large, i.e. very much smaller than 10 O(f ) . There can be little doubt that this assumption will be fulfilled under all experimentally realistic conditions. Under the very same assumption we can infer from (25) and (49) the very rough estimate Note that p En is the occupation probability of E n from (25), and as such does not refer to any specific energy basis. In contrast, (49) is implicitly understood with respect to the specific basis introduced above (27). This is the main advantage of (50) compared to (49).

E. Physical arguments
Besides those already discussed in Sect. V B, there are the following additional physical reasons to expect that (49) and (50) are fulfilled under experimentally realistic circumstances.
First, the time-energy uncertainty relation seems to prohibit for all practical purposes the determination of the system energy with a precision that would be necessary to populate only a relatively small number of levels with appreciable probability so that (49) and (50) would be violated Second, while an ideal energy measurement would in principle allow us to prepare the system at one specific energy eigenvalue, every real (finite resolution) measurement will result in appreciable probabilities of very many levels.
F. Example (35) It is instructive to reconsider our example from (35), (36) and see what happens to the concomitant incompatibility with equilibration in case we restrict ourselves to realistic observables and initial conditions, satisfying (44) and (49), respectively. To begin with, one readily sees that the spectrum of A from (35) (within H + ) consists of the two eigenvalues a ± = ±|ρ nm (0)| −1 , and, in case dimH + > 2, of one further eigenvalue a 0 = 0. With (24), (37) and (38) we can conclude that A somewhat more tedious calculation shows that also ∆ ′′ A from (41) coincides with ∆ A in our present example. For experimentally realistic initial conditions we can infer with (49) that ∆ A ≥ O(10 f ). For macroscopic systems (f ≫ 1) it follows that the oscillations from (36) are beyond any realistic experimental resolution limit δA according to (44). The same thing may alternatively also be viewed as follows: Any single (ideal) measurement process always results in one of the three outcomes a + , a − , or a 0 . Hence an infeasible number of repetitions is needed to resolve the order-one variations of the ensemble average (36).
In short, while mathematically speaking the observable (35) indeed leads to perpetual oscillations (36), such oscillations cannot be resolved in practice for experimentally realistic observables and initial conditions.
The above example also suggests that our assumptions of experimentally realistic observables and initial conditions (or some similar restrictions) are almost unavoidable for taming the oscillations in (22) and thus overcoming the concomitant incompatibility with the basic Statistical Mechanical claim of equilibration, see Sect. III.

VI. GENERIC HAMILTONIANS
As detailed in Sect. I, the "true" Hamiltonian H of a given system is usually not known in detail. Therefore, we assume that these details are of "generic" character in so far as the level counting properties from Sect. II A are satisfied and energy differences E j − E k and E n − E m are never exactly equal apart from trivial cases. More precisely, we require that A condition similar to (52) is well known under the names "non-resonance condition" or "non-degenerate energy gap condition" and is considered to be satisfied by generic Hamiltonians, see e.g. [6,9,10,13], and, in particular, Sect. 3.2.1 in [26] and references therein. The essential intuitive argument is as follows: Consider an arbitrary "path" H(λ) in the "space of all Hamiltonians", parameterized by λ. In the absence of any special reasons like symmetries, it is quite plausible that every gap E n − E m evolves as a function of λ somewhat differently than all the other gaps. While we cannot exclude that two gaps may happen to coincide for specific λ-values, these special points are of measure zero. In other words, Hamiltonians with degenerate energy gaps are of measure zero compared to "all" Hamiltonians.
We remark that our present condition is weaker than the usual non-resonance condition [6,9,10,13,26] in so far as (52) still admits the possibility of degenerate energy eigenvalues.

VII. EQUILIBRATION FOR ISOLATED SYSTEMS
Being confident that the above discussed conditions (44), (50), and (52) are fulfilled under many, if not all, experimentally realistic conditions, we henceforth take them for granted and turn to the question, in how far they are sufficient to yield equilibration, i.e. a stationary long time behavior of the statistical ensemble (cf. Sect. I).

A. Equilibrium ensemble
Given an arbitrary but fixed ρ(0) evolving according to (22), we will see below that the pertinent equilibrium ensemble is given by the density operator (sometimes called the generalized Gibbs ensemble) where the time average of an arbitrary function or operator h(t) is defined as In other words, the equilibrium ensemble ρ eq is the time averaged "true" ensemble ρ(t). As such, it is timeindependent and moreover inherits all the defining properties of a genuine density operator from ρ(t). Namely, one readily sees that ρ eq is a non-negative, Hermitian operator of unit trace and satisfies the dynamics (22). Making use of the specific basis introduced above (27), one can conclude -as detailed in Appendix D -from (23) and (53) that i.e., ρ eq amounts to the (time-independent) diagonal part of ρ(t) from (22). Focusing on observables of the specific form (39) it follows with (22) and (55) that In particular, we can conclude with (40) that for arbitrary functions g : R → R.

B. Main result
It readily follows from (32), (53), and (54) that In other words, on the average over all times t ≥ 0, the "true" statistical ensemble ρ(t) is indistinguishable from the equilibrium ensemble ρ eq . The natural next step is to consider the mean square deviation The following relation is derived in Appendix D: where ∆ ′′ A is defined in (41). The last factor Tr{ρ 2 eq } in (60) is the so-called purity of ρ eq , i.e. the purity of the time-independent part of ρ(t), but not the purity of ρ(t) itself. In principle, ρ(t) may even be a pure state (see above (19)) with a purity of one, while the purity of the concomitant ρ eq may still be as small as 10 −O(f ) according to (50) and the relation (132) in Appendix B.
Observing that Tr{ρ 2 eq } = n ρ 2 nn (0) according to (55) and introducing relation (132) from Appendix B into (60) we finally obtain where p En is the occupation probability of E n , see (25). Considering Tr{ρ(t)A} as a random variable, generated by randomly sampling time points t according to a uniform distribution on [0, ∞), the corresponding mean value and variance are given by (58) and (59). The next step is to invoke Chebyshev's inequality [18,27], stating that for any random variable x with average µ and variance σ 2 and any given κ > 0, the probability Prob(|x − µ| > κ) that x deviates from µ by more than κ satisfies Prob(|x − µ| > κ) < (σ/κ) 2 . In our present case we thus can conclude that where δA is the resolution limit of A, see Sect. IV. With (61) we arrive at the first main result of our present paper: where p En is the ensemble averaged occupation probability of the (possibly degenerate) energy eigenvalue E n , see (25). We recall that the only ingredients in deriving this result were the (generalized) non-resonance condition (52) and the assumption that the measurement range ∆ ′′ A from (41) is finite, cf. Sect IV. For the rest, (63) is a completely general and rigorous relation, formally valid for any choice of δA > 0. It generalizes the previously known result from [12], which did not admit degenerate energy eigenvalues, nor the minimization over arbitrary B(b) in (41).

C. Discussion
For realistic initial conditions and generic Hamiltonians we can take for granted the rough estimate (50), yielding with (63) the result Focusing on large systems (1) we can conclude that for the overwhelming majority of times t ≥ 0 the difference between Tr{ρ(t)A} and Tr{ρ eq A} is way below the instrumental resolution limit δA for any experimentally realistic observable according to (44). In other words, the system looks exactly as if it were in the steady state ensemble ρ eq for the overwhelming majority of times t ≥ 0, though the "true" density operator ρ(t) is actually quite different, see Sect. III. This is the main result of our present paper regarding the question of equilibration, see Sect. I. Note that these conclusions do not really require a macroscopic number f of degrees of freedom. Put differently, our result also explains the common numerical observation that already quite small particle numbers often equilibrate and thermalize surprisingly well.
As promised in the introduction, the derivation of (61)-(64) is based on the exact Quantum Mechanical time evolution (19)- (22) without any modification or approximation. In other words, the full Quantum Mechanical time-inversion invariance is still contained in (64). In particular, (64) is compatible with the recurrence property of Tr{ρ(t)A} mentioned below (34), but implies that such excursions from the "apparent equilibrium state" ρ eq must be exceedingly rare events.
Exactly the same "apparent equilibration" towards ρ eq emerges if one propagates ρ(0) backward in time (keeping the system isolated also for t < 0). Along the entire real t-axis, an initial condition ρ(0) far from equilibrium thus closely resembles one of the above mentioned rare excursions, except that the location of this excursion is on purpose chosen as the time-origin.
In other words, Quantum Mechanical time inversion invariance is maintained, but when starting out of equilibrium, an "apparent time arrow" emerges with extremely high fidelity.
Note that any single excursion of Tr{ρ(t)A} from the "apparent equilibrium value" Tr{ρ eq A} is a priori not expected to exhibit any special symmetry with respect to time inversion. Only the probabilistic properties of an ensemble of such excursions are, in the absence of magnetic fields, expected to satisfy a microreversibility or detailed balance type of symmetry with respect to time inversion. Note that these considerations apply both to "small" and "large" excursions.
While (64) provides a bound for the relative amount of time the system exhibits notable deviations from equilibrium, the typical duration of one given excursion, or equivalently, the characteristic relaxation time of an out of equilibrium initial condition ρ(0) remains unspecified. Note that Statistical Mechanics itself also makes not statements in this respect. Hence, it is justified to omit them within a foundation of Statistical Mechanics. However, we remark that since our assumptions on initial condition and Hamiltonian were very weak and we kept the exact Quantum Mechanical time evolution (19)-(22), we expect that the actual relaxation time will be close to that of the real system we are modeling, provided this modeling is not too bad. Since one can easily imagine real systems with arbitrarily large or small relaxation times, any further quantification of the relaxation process inevitably would require a considerably more detailed specification of the Hamiltonian H, the initial state ρ(0), and the observable A. Thus, our main result (64) may well be already quite close to "the maximum one can say in full generality".
As mentioned above (19) and below (60), in principle ρ(t) may even be a pure state of the form |ψ(t) ψ(t)|. In this case, the occupation probabilities p En of the energy eigenvalues E n appearing in (63) can be rewritten according to (25) as As long as all these occupation probabilities are small, e.g. satisfying the rough estimate (50), the above relation (64) and the subsequent discussion still remain valid for pure states ρ(t) = |ψ(t) ψ(t)|.
If the system is prepared in a pure energy eigenstate, i.e. ρ(t) = ρ(0) = |n n| then ∆ ′′ A = 0 for arbitrary A according to (28) and (41). In other words, (64) still represents a tight upper estimate in this case, which in fact represents "the opposite extreme" compared to the property (49) or (50) of experimentally realistic initial conditions.
Likewise, for observables of the form (39) or of the form g(H) with arbitrary g we have ∆ ′′ A = 0 in (64) according to (42).
We close with the following conceptual remarks regarding the notion of "experimentally realistic observables and initial conditions". In Sects. IV and V we have specified certain properties which we are proposing to be necessary for observables or initial conditions to be considered as "experimentally realistic". But these properties are not meant to be sufficient. While establishing such exact (necessary and sufficient) conditions is not the subject of our present work, we provide some simple example to illustrate our point: For any A and any fixed τ , the observable B := U τ AU † τ satisfies Tr{ρ(t)B} = Tr{ρ(t − τ )A} according to (19). Whenever A was realistic according to the criterion from Sect. IV, the same applies to B (∆ B = ∆ A ). According to (32), B imitates (for any ρ(t)) the behavior of A with a time delay of τ . If ρ(0) is a far from equilibrium initial condition and τ exceeds the relaxation time for the time inverted dynamics, the observable B thus initially behaves as if the system were already equilibrated but then all of a sudden undergoes an excursion as if the system would transiently move very far from equilibrium. Turning to negative τ values, B would represent a device which can "look" -in principle arbitrarily far -into the future. There can be little doubt that such observables are not "realistic". Likewise, for any given ρ(t) satisfying the Quantum Mechanical time evolution (19), a hypothetical initial condition of the formρ τ (0) := ρ(−τ ) produces analogous "unrealistic" phenomena while being "experimentally realistic" according to Sect. V whenever ρ(0) was so (the level populations of ρ(0) and ofρ τ (0) are equal). To identify suitable criteria for sorting out such pathologies is a very subtle task, especially in view of the fact that so-called spin-echo experiments seem indeed to be able to realize such initial conditionsρ τ (0) to some extent [28].

VIII. THE PROBLEM OF THERMALIZATION
According to (55) and the discussion below (64), expectation values (32) become practically indistinguishable from after initial transients have died out. In this respect the problem of equilibration raised in Sect. I can be considered as settled and we henceforth can focus on (66).
Turning to the issue of thermalization, the key question is thus, in how far the equilibrium expectation value of A from (66) is in agreement with that predicted by the microcanonical ensemble, namely where the level populations p mic n are equal to a normalization constant if E n is contained within a small energy interval and zero otherwise [29].
In case (66) and (67) yield measurable differences for experimentally realistic ρ(0) and A, the "purely Quantum Mechanical" prediction (66) is commonly considered as "more fundamental" [2,15,16]. From this point of view, our derivation of equilibrium Statistical Mechanics is complete, provided the latter is valid itself.
What are these validity conditions, beyond which the microcanonical formalism of equilibrium Statistical Mechanics may break down?
A first well known validity condition for the microcanonical formalism is, as said below (67), that only E n within a small energy interval (68) have a non-vanishing occupation probability. More generally, as already mentioned in Sect. I, in equilibrium Statistical Mechanics it is taken for granted that the system energy is fixed up to unavoidable experimental uncertainties. On the other hand, realistic initial conditions according to Sect. V in particular require that this energy uncertainty is much larger than 10 −O(f ) Joule, which is obviously always fulfilled in practice, but we never introduced or exploited any type of upper limit for this uncertainty so far, i.e. the energy uncertainty may still be arbitrarily large in (64) and (66). In other words, for large energy uncertainties, our key relation (66) remains valid, while equilibrium Statistical Mechanics is likely to become invalid. This is clearly a not at all surprising case of disagreement between (66) and (67) .
To avoid such "almost trivial" cases, we henceforth take for granted that the system energy is known up to an uncertainty ∆E which is as small as possible, but still experimentally realistic, cf. Sect.V.
A second (often tacit) validity condition of the microcanonical formalism is that the expectation values (67) are required/assumed to be (practically) independent of the exact choice of of the interval I in (68), i.e. of its upper limit E and its width ∆E. But essentially this means nothing else than: In (66) the details of p n are largely irrelevant. (69) The same conclusion (69) follows from the equivalence of the microcanonical and canonical ensembles (for all energies E), considered as a self-consistency condition for equilibrium Statistical Mechanics [20,21].
Our first remark regarding property (69) itself, is that no experimentalist can control the populations p n of the unimaginably dense energy levels E n , apart from the very gross fact that they are "mainly concentrated within the interval I from (68)". If the details would matter, not only equilibrium Statistical Mechanics would break down, but also reproducing measurements, in particular in different labs, would be largely impossible, see also the discussion in Sects. I and V. Second, one can readily construct observables and initial conditions, being experimentally realistic according to our definitions in Sects. IV and V but still violating (69). The fact that equilibrium Statistical Mechanics is known to have an extremely wide experimental applicability implies that our so far notion of "experimentally realistic" is still too general (see also at the end of Sect. VII C).
The simplest way to guarantee property (69) seems to require/assume that the expectation values A nn = n|A|n hardly vary within any small energy interval of the form (68). This is similar in spirit to classical coarse graining, and, in fact, is part of a common conjecture about the semiclassical behavior of fully chaotic classical systems [30]. In particular, negligible variations of A nn for close by n-values imply Serdicki's "eigenstate thermalization hypothesis" [31] (anticipated in [7] and revisited in Ref. [2]), implying that each individual energy eigenstate |n behaves like the equilibrium ensemble.
An alternative way to guarantee property (69) follows from the argument by Peres [6] that even if the A nn may notably vary with n, the immense number of relevant summands in (66) may -for "typical" A and ρ(0) -lead to a kind of statistical averaging effect and thus a largely ρ(0)-independent overall value of the sum. Numerically, the validity and possible failure of such conjectures and of property (69) itself have been exemplified e.g. in [2,7,15,32]. While the details -in particular the role of "more basic" system properties like "ergodicity" and "(non-)integrability" -are still not very well understood [2,5,6,30,33,34], "equilibration" in agreement with (66) was seen numerically in all known cases.

IX. CANONICAL SETUP
The objective in the remainder of the paper is to establish thermalization without making use of the unproven property (69). While the general case seems extremely difficult to tackle, as discussed above, we focus on the most important special case, namely the so-called canonical setup: an isolated compound system, consisting of a (sub-)system which is weakly coupled to a "heat bath".
In analogy to Sect. II, the starting point is a system (subsystem, central system, system of actual interest, index "S") with f S degrees of freedom, Hamiltonian H S , and Hilbert space H S together with an environment (e.g. a heat bath, index "B") with f B degrees of freedom, Hamiltonian H B , and Hilbert space H B . As usual, the environment is assumed to be macroscopic and much "bigger" than the system, i.e.
The system S may or may not be macroscopic, i.e. f S may but needs not be a large number (in the range of 10 23 ). On the other hand f B is of the order of 10 23 or even larger (e.g. if f S is already of this order). The systemplus-environment compound (total system, supersystem, no index) thus has degrees of freedom and "lives" in the product space The contact (coupling) between system and environment is described by an interaction Hamiltonian H int : H → H and a "coupling strength" λ, resulting in a total Hamiltonian of the form where 1 HS indicates the identity on H S , and similarly for 1 HB . We will mainly be interested in observables which only concern system properties, i.e. which are of the form Within this general framework, we take for granted that all conditions for equilibration of the isolated system-plus-bath compound in the sense of Sect. VII are fulfilled, i.e. the observables (74) satisfy (44), the ensemble averaged energy level populations satisfy (50), and the Hamiltonian (73) satisfies the generalized nonresonance condition (52). As expected and demonstrated in detail later, these requirements in particular rule out λ = 0. A further requirement is the subject of the next subsection.

A. Weak coupling condition
Some kind of weak coupling assumption is an indispensable (though often tacit) prerequisite of the canonical formalism. The simplest possibility would be to require that λ in (73) is so small that the eigenvectors |n and eigenvalues E n of H(λ) deviate only very little from those of H(0). However, according to ordinary perturbation theory, these deviations will be governed by terms of the form λ m|H int |n /[E m − E n ]. Since some E m − E n are of the order of 10 −O(f ) Joule (see below (11)), the admissible λ-values would be so small that no realistic model would satisfy the condition.
For this reason, we henceforth focus on system-plusbath compounds (73) which satisfy the following "operational" weak coupling condition: After equilibration of the system-plus-bath compound, a reversible (adiabatically slow) decoupling of the system from the bath does not lead to any experimentally resolvable changes.
As far as expectation values of system observables A are concerned, we recall that the "true", time-dependent expectation values A (t) may, even after equilibration, still exhibit quite notable "excursions" at exceedingly rare time points t (cf. Sect. VII). In this case, the above weak coupling condition tacitly refers to time-averaged expectation values.
The quantities, for which the weak coupling condition will actually be taken for granted later on, are observables of the form (74) and the energy density (46) of the total system-plus-bath compound.
Physically, there can be little doubt that most real systems in contact with a heat bath satisfy the above weak coupling condition. Hence, the same is expected for "realistic models" of such systems. Yet, in view of the perturbation theoretical considerations above, a mathematical proof for any given model seems extremely difficult. In fact, the condition concerns not only the Hamiltonian (73) but simultaneously the observables (74) and initial conditions ρ(0) (cf. Sect. V) of the system-plus-bath compound. The nature and difficulty of the problem may become more evident by considering a particularly "simple" special case, namely observables of the form (74) and canonical density operators. In this special case, we may consider the relation  (72), respectively, and Z(λ) and Z S representing standart partition sums, normalizing the respective density operators. This relation (75) is an elementary identity for λ = 0, is usually considered as "obvious" for "weak coupling", but to the best of the present authors knowledge is unproven (and probably wrong and thus unprovable without strong extra assumptions on A S ) for small but still experimentally realistic coupling strengths λ.
Since the solution of these long standing and very subtle problems is not the actual main theme of our present work, we adopt the standpoint that the above weak coupling condition is an implicit additional requirement regarding experimentally realistic Hamiltonians, observables, and initial conditions on top of the previous requirements (44), (50), (52). In all what follows, we focus on systems which satisfy all those requirements, being confident that they include the majority of experimentally realistic models.

B. Main implications of weak coupling
The "true" time-averaged expectation values are, according to (58), given by where the argument λ of ρ eq has been added to remind us of the fact that we are dealing with the given, "true" Hamiltonian (73) with a "small" but non-vanishing coupling strength λ. Likewise, the "true" density operator from (22) is now rewritten in the form where ω mn (λ) := [E m (λ) − E n (λ)]/ , D mn (λ) := |m(λ) n(λ)|, and where |n(λ) and E n (λ) refer to the eigenvectors and eigenvalues of the Hamiltonian (73) for an arbitrary but fixed λ-value.
As soon as λ in (73) starts to change in the course of time, the density operator is no longer given by (22) but rather follows from the Liouville-von Neumann equation While an explicit solution for general protocols λ(t) is hopeless, in the special case of adiabatically slow (quasistatic) parameter changes, the Adiabatic Theorem can be invoked to yield where we have tacitly restricted ourselves to the simplest and most relevant case of non-degenerate energy levels, see also Appendix E.
The adiabatically slow decoupling process appearing in the weak coupling condition (Sect. IX A) means that λ(t) in (79) is given during a large but finite initial time span by the "true", finite coupling strength, then adiabatically slowly changes to the value zero, and afterwards remains zero for all later times t until infinity. Consequently, the time average of the density operator in (79) is governed by the infinitely long time period with λ(t) = 0, i.e.
Invoking the weak coupling condition from Sect. IX A it follows that the "true" time-averaged expectation values (76) are practically indistinguishable from those obtained with the help of (80), i.e.
A well-known further consequence of the Adiabatic Theorem (79) is the time-independence of the level populations n(λ(t))|ρ(t)|n(λ(t)) which can thus be identified with p n = ρ nn (0) for all times t. Considering λ rather than t as independent variable, we may equivalently say that the p n are λ-independent. It follows that in the relation p n = h(E n (λ))+δp n from (45), the right hand side must be λ-independent in the same sense. By locally averaging over many n-values (see below (45) and [35]) we can conclude that also h(E n (λ)) must be λ-independent. In view of the λ-dependence of E n (λ) it follows that also the function h(E) generally must acquire a dependence on λ. Indicating this fact by adding an index λ to h(E) we can conclude that for all n, and that the relation (45) now takes the form with λ-independent p n and δp n . Likewise, the density of states (12) and the energy density (47) acquire a λdependence and thus are now denoted by ω λ (E) and ρ λ (E), respectively. Similarly as in (48) one can conclude [35] that Finally, the weak coupling condition as discussed in Sect. IX A includes the practical indistinguishability of the "true" energy density from the energy density after decoupling the system from the bath, i.e. ρ λ (E) = ρ 0 (E) (85)

C. Zero coupling limit
We consider the Hamiltonian (73) in the limit of vanishing coupling strength λ. Denoting by |n S and E S n the eigenvectors and eigenvalues of H S and by |m B and E B m those of H B , those of (73) with λ = 0 follow as

Violation of the non-resonance condition
The fact that systems consisting of non-interacting sub-systems require special attention with respect to the generalized non-resonance condition (52) has first been noticed in Ref. [13].
Taking into account that the original indices in (52) now become double indices according to (87), condition (52) takes the modified form One readily sees that this condition is violated by considering the following specific choice [13]: j 1 = j 2 = k 1 = n 2 =: n and k 2 = m 1 = m 2 = n 1 =: m. In the generic case, it will be possible to find indices n and m so that all four energies E nn , E nm , E mn , E mm appearing in (88) in are different. With (87) it follows that E nn − E nm = E mn − E mm . In other words, condition (88) is violated.

Exploiting the product basis
The purpose of this subsection is to rewrite the pertinent relations from Sect. IX B in terms of the product energy basis (86) and the corresponding energy eigenvalues (87) in the zero coupling limit.
Essentially, we are just relabeling all eigenvectors and eigenvalues in a way which will turn out particularly convenient later on. Namely, all the single labels n(λ) are now replaced by double lables nm(λ) with the additional convention that the argument λ will be omitted in the case λ = 0, in agreement with the notation in Sect. IX C 1. Note that only in the zero coupling limit does the first of the two indices in nm refer to the system and the second to the bath, but not any more for λ = 0.
Specifically, |n(0) and E n (0) from Sect. IX B are now denoted as |nm and E nm and satisfy (86) and (87). Likewise, |n(λ) and E n (λ) are now denoted as |nm(λ) and E nm (λ). Finally, the λ-independent level populations p n now become p nm .
For observables of the form (74) the time-averaged expectation value (81) thus can be rewritten as Next, we rewrite the p nm according to (83) as Regarding the "unbiased fluctuation" δp nm (cf. Sect. V A) can conclude that for any fixed index n, the sub-set δp nm with variable indices m = 0, 1, 2, ... is unbiased as well, i.e.
More precisely, the energies E nm from (87), with n arbitrary but fixed and m variable, are still unimaginably dense, and the same property is inherited by E nm (λ). Further, there is no reason to expect the emergence of any special "correlations" between the ensemble averaged fluctuations δp nm by selecting any sub-set with a fixed n in (91). Note that during the preparation phase (cf. Sect. V B), there may exists an indirect, non-weak coupling between system and bath in the canonical setup, namely when both parts are interacting simultaneously with the rest of the world. Then, it is even more obvious to expect that (92) will be canonically fulfilled.

Additivity of Entropy
Here, we revisit the issues of level counting, entropy and temperatures from Sect. II A for the special Hamiltonian (73) with λ = 0.
The definitions and relations (9)-(17) can be taken over without any change to our present special case, except that all single indices n now become double indices nm (see (86), (87)). We reiterate that all those definitions and relations remain for the moment on a purely formal level without any reference to the actual system state. Their only purpose at this stage is to count levels in a convenient way.
Since λ = 0 in (73), we are dealing with two individual isolated systems and hence analogous definitions and equations as in (9)-(17), but now with indices "S" and "B", apply separately to the system and to the bath. As detailed in Appendix A, the following relations between those separate system and bath quantities and the original quantities for the system-plus-bath compound (73) with λ = 0 can be established: In the generic case this maximum is unique and is contained in the interval (E 0 , E). Adopting the definition the following results are established in Appendix A: More precisely, these relations are asymptotically exact approximations for f B → ∞. But since f B is at least of the order of 10 23 (see below (70)) they are satisfied with extremely high accuracy. In other words, in the zero coupling limit the entropies of the systems-plus-bath compound exhibit an additive behavior provided the total energy E is distributed among system and bath such that S B (E ′ ) + S S (E − E ′ ) is maximized, which in turn has the consequence that all temperatures are identical, i.e. the so-called equilibrium condition (or zeroth law of thermodynamics) is fulfilled.

X. OUTLOOK ON GENERAL NON-INTERACTING SYSTEMS
Systems consisting of non-interacting particles or other types of non-interacting sub-systems are popular models in many different contexts. Also the canonical systemplus-bath setup from the previous section is of this structure. Strictly speaking, every single sub-system would thus be isolated and its energy would be a conserved quantity, thus prohibiting any kind of equilibration or thermalization process between different sub-systems. To each of them, the discussion of thermalization and and the concomitant open questions from Sect. VIII apply. In particular, individual sub-systems which are "small" (e.g. single particles) may not even exhibit equilibration (cf. Sect. I). Accordingly, the term "non-interacting" actually means an interaction which is strictly speaking finite, thus giving rise to "normal" equilibration and thermalization, but in certain other respects still "negligibly small". A more precise formulation of such a weak coupling condition and its implications analogously to Sect. IX is straightforward.
Similarly as in Sect. IX C 1 one finds that the generalized non-resonance condition (52) is violated at zero coupling [13] but is generically restored as soon as the slightest interaction between the sub-systems is included. Note that in Sect. IX C 1 it is tacitly assumed that the two non-interacting sub-systems are distinguishable. In the opposite case of indistinguishable sub-systems (e.g. indistinguishable particles), all admitted states of the compound system must be symmetric (Bosonic sub-systems) or anti-symmetric (Fermionic sub-systems) against exchanging the indices of the two sub-systems, i.e. the pair (n, m) has to be identified with (is indistinguishable from) (m, n) for all n, m in (86), (87). Accordingly, in the identity E nn − E nm = E mn − E mm discussed below (88), the two energies E nm and E mn must be identified. Yet, condition (88) is still violated.
These considerations demonstrate that the generalized non-resonance condition (52) is a quite sensitive criterion in the context of equilibration and thermalization. It thus seem likely that a significantly weaker but still completely general condition of this type may not exist. Note that a system of strictly non-interacting particles (or other kinds of sub-systems) which is coupled to a bath (canonical setup) generically gives rise to a total systemplus-bath compound which cannot be decomposed into isolated sub-systems any more, and therefore generically fulfills condition (52).
Beyond the realm of "weak coupling", a partitioning of the total compound into physically meaningful subsystems becomes questionable. In particular, it does not make much sense to speak about properties of one "subsystem alone". Rather, the total compound amounts to an isolates system without any special properties, to which the general discussion form Sect VIII applies.

XI. THERMALIZATION FOR THE CANONICAL SETUP
The objective of the present section is to establish thermalization without making use of the unproven property (69) in the most important special case, namely the canonical setup from Sect. IX.
As detailed in Sect. VIII, it is taken for granted that the system energy is known with high accuracy. In other words, the energy density ρ(E) from (46) exhibits a very narrow peak within a close vicinity of E * . By combining (46) and (98) one recovers the expected relation The "star" in E * emphasises the fact that the ensemble averaged energy of the "real" system is fixed. Identifying E in Sect. IX C 3 with E * establishes the connection between the so far unrelated considerations in Sect. IX C 3 and our present issue of thermalization. Formally, the task is to show that in (66) the unknown details of p n are (practically) irrelevant. In particular, this then implies that (66) indeed agrees with the prediction of equilibrium Statistical Mechanics.
As detailed in Sect. IX, the canonical setup consists of a "small" system (S) which is weakly coupled to a much "bigger" bath (B). Equilibration of the isolated system-plus-bath compound is taken for granted, i.e. expectation values (32) become practically indistinguishable from (66). Focusing on system observables of the form (74) and observing (76) and (89), those equilibrium expectation values thus take the form where Tr S indicates the trace in H S . In other words, as far as system properties are concerned, the knowledge of the reduced equilibrium density operator ρ S eq : H S → H S is sufficient.
A. Boltzmann-form of p S n and canonical density Exploiting (93) we can conclude that where the delta-function is, as usual, considered as washed out. The first term under the integral can be rewritten by means of (14) Turning to the second term under the integral, we exploit (87), the definition of ω B (E) analogous to (12), and the corresponding relation (14), yielding Making use of (10), (103), and (104) we can rewrite (102) as where the dependence of Q on n has been dropped. Exploiting (95) and (96) we see that According to the mean value theorem there exists for any given x-and y-value a ϑ ∈ [0, 1] with the property that Choosing x = E B (E) and y = E S (E)−E S n and exploiting (13), we can rewrite (107) as where the dependence of ϑ and ∆ on E and n has been dropped. We first consider the simplest case of a deltadistributed energy density Thus, (105) takes the form Observing (95), the denominator T B (E * − E S n ) can first be rewritten as T B (E B (E * ) + E S (E * ) − E S n ) and then with (16) as (113) Finally, with (97) and relations like in (15) but with indices S and B we can conclude that (114) In view of (70), the last summand is negligible and (112) takes the form Similarly as in (114), one sees that T B (E B (E) + ∆) appearing in (109) can be approximated by With the usual definitions of the free energy F S (E) and the partition sum Z S (E) of the system S, namely in combination with (116), we can rewrite (115) as Taking into account the normalization condition n p S n = 1, we recover We finally turn to general energy densities ρ(E). Since ρ(E) enters linearly in (105), we simply can superimpose the results (119) for sufficiently many delta-functions approximating the true ρ (E). The fact that each deltafunction brings along a somewhat different value of E * in (119) has a negligible effect according to (16) as long as ρ(E) is still sharply peaked about its mean value. Denoting this mean value, in accordance with (98) and (99), again by the symbol E * , one thus recovers exactly the same relations as in (117)-(120).
In summary, the canonical formalism (117)-(120) is valid in full generality. Apart from the average energy E * , all the remaining details of the (unknown) energy density ρ(E) do not matter. The Boltzmann-distribution (119) together with (101) yields the canonical density operator where T (E * ) is the temperature corresponding to the given total energy E * of the isolated system-plus-bath compound. In practice, this energy E * is usually not known, and one thus rather considers the temperature T as "given". Accordingly, in (119)-(121) the state function T (E * ) is replaced by the "new" independent state variable T and similarly Z S (E * ) by Z(T ) := Z S (E * (T )).

XII. SUMMARY AND CONCLUSIONS
In the first part of this paper we considered general, isolated quantum systems with many degrees of freedom f , and being extensive in the sense of Eqs. (11) and (15). As a further "generic" property of the Hamiltonian H, the (generalized) non-resonance condition (52) was taken for granted. Our key assumptions concerning the "realistic modeling" of actual experimental systems were: (i) observables have a "reasonably bound" range-to-resolution ratio and (ii) initial conditions may be arbitrarily out of equilibrium but, on the average over the entire statistical ensemble (many repetitions of the "same" experiment), they give rise to a well-defined population density (average occupation probability of many neighboring energy levels). The latter assumption seems quite plausible per se, but can also be justified via the experimental preparation procedure at the origin of the initial condition.
All further "details" of the initial condition and the Hamiltonian were left unspecified, reflecting the unavoidable actual lack of knowledge in this respect.
Given the initial condition, the exact standard Quantum Mechanical time evolution was adopted without any approximation or modification.
Our first main result (64) implies that after initial transients have died out, the system looks for all practical purposes as if it were in a steady state described by the so-called generalized Gibbs ensemble ρ eq , in spite of the fact that the "true" density operator ρ(t) never becomes stationary, but rather exhibits the well-known Quantum Mechanical recurrence and time inversion invariance properties. Our key conclusion was that the mathematically undeniable differences between the "apparent equilibrium" ρ eq and the "true" density operator ρ(t) are either unobservably small or unobservably rare in time.
While the issue of equilibration can thus be considered as settled, that of thermalization still remains an open problem as far as completely general isolated systems are concerned, as detailed in Sect. VIII.
In the second part of the paper we focused on a special case of foremost practical relevance, namely the canonical setup, consisting of a system of actual interest (that may be macroscopic or not) which is weakly coupled to a much "bigger" environment. Provided the total system-plusbath compound satisfies the above conditions for equilibration, the corresponding "apparent equilibrium" ρ eq reduces, after eliminating (tracing out) the bath, to the canonical density operator (121), independently of all the unknown "microscopic details" of the possibly far from equilibrium initial condition. In other words, the "small" system is proven to exhibit "thermalization". The result even goes beyond the claim of Statistical Mechanics in so far as not only the system but also the bath may be initially out of equilibrium.
It seems not unlikely that our main prerequisites in deriving these results cannot be substantially weakened any more: Systems which can be decomposed into strictly non-interacting sub-units (e.g. non-interacting particles) are known not to thermalize, and indeed violate the nonresonance condition (52). Likewise, when either an unlimited range-to-resolution ratio or an initial condition without a well-defined population density is admitted, one readily finds examples which do not exhibit equilibration, see Sects. III and V F. While our main focus has been on statistical ensembles, we have noticed in Sect. VII C that all the above results also remain valid for pure states |ψ(t) , provided the initial energy level occupation probabilities | ψ(0)|n | 2 satisfy the condition that a well-defined population density exists. Specifically, within the canonical setup the reduced system density operator will again be practically indistinguishable from the canonical ensemble for the overwhelming majority of times t. This result is closely related to the issue of "canonical typicality" from Refs. [22-24, 26, 36, 37].
Next, we briefly address the issue of low temperatures. First, for extremely low temperatures, the rough estimates from (11) and (15) may break down. Since these estimates are at the heart of our present approach, also our main results may not be valid any more. Essentially this happens when the entropy becomes experimentally indistinguishable from zero. This may, but need not be the case for Bose-Einstein condensates [14].
Further, the common notion that a Bose-Einstein condensate exhibits a macroscopically populated ground state may be easily misunderstood in our present context. Namely, this notion refers to the fact that the total many particle product state contains a large number of single particle ground states. The word ground state thus refers to the individual (non-interacting) particles, not to the total many particle system. Indeed, besides the numerous single particles in their individual ground state, there may still remain many further particles which are in excited single particle states. Hence, we are in fact not dealing with the actual ground state of the many particle product Hilbert space. Rather, it may easily happen that the maximal population of all the many particle product states is still small and hence the conditions regarding the level populations from Sect. V may still be satisfied.
We close with a few remarks regarding the issues of (non-)integrability, ergodicity, chaos, decoherence, and entanglement. We first remark that (non-)integrability, ergodicity, and chaos are relatively well defined notions for classical systems, but that their role with respect to equilibration and thermalization is not really clear in the classical case. The corresponding notions in the realm of quantum systems are much less well and uniquely defined [2,33]. But even if this problem would be solved, in view of the classical situation, the usefulness of those concepts for equilibration and thermalization are likely to be limited in the quantum case as well.
The closest connection of our present approach to the above notions may be via the non-resonance condition (52). However, such a connection or analogy does not seem to offer any additional physical insight. Concerning our requirement that the initial condition must exhibit a well defined population density, we remark that during the preparation phase (which represents the physical origin of the initial condition), a distinction between integrable and non-integrable systems does not make much sense anyhow.
The concepts of integrability, ergodicity and the like may well play a crucial role for the following two issues: (i) The transient relaxation-process of the initial state towards equilibrium, both qualitatively (exponential decay or not) and quantitatively (estimating the relaxation time). This is suggested by the well-established role of level statistic in quantum chaos and the importance of energy differences throughout the present Appendix D.
(ii) The general problem of thermalization addressed in Sect. VIII, in particular the unproven key postulate (69) in this context.
As far as the issue of equilibration is concerned, our present approach and results demonstrate that entanglement and decoherence play no role since we are dealing with isolated systems without any external influence by the rest of the world. With respect to thermalization, the question remains open.

A. Comparison with related works
We first address several works with a finite but not too small overlap with the present one and then turn to the two most closely related Refs. [12,13]. The pertinent literature regarding the "missing link" (69) in the context of thermalization has already been addressed at the end of Sect. VIII.
Considering and estimating quantities like (59) is very natural and has a long tradition: Merits and shortcomings of the early works are reviewed e.g. in [5], most notably Ludwig's approach [3]. In particular, many of them [4,5] involve an extra average over initial conditions with the effect that any specific non-equilibrium initial condition (representing a given experiment) must be excluded as "potentially untypical" from the general conclusions. Turning to the more recent precursors, Peres' approach [6] is roughly comparable to ours up to Eq. (149) but then proceeds with the conjecture that theÃ mn are pseudorandom matrix elements, statistically independent of the ρ nm , for which there are general arguments [6] and numerical evidence [32] (and counter-evidence [2]) but no proof. For pure states, Srednicki obtained similar results [9] by exploiting a common conjecture about the semiclassical behavior of classically smooth observables A in systems with a fully chaotic classical limit. Again, this conjecture is based on good arguments [30] but no proof. Moreover, typical classical many-body systems are not expected to behave fully chaotic [28,38]. Somewhat similar conclusion have been reached even earlier by Deutsch [8] via additional hand waving arguments. Finally, rigorous results comparable to (63) are due to [10,16], but only for rather special Hamiltonians H and initial conditions. Within the same restrictions, the ground breaking work by Tasaki [10] also addresses the issue of thermalization by arguments which are similar in spirit to those in our present Sect. IX.
The first part of the present work (until the end of Sect. VIII) represents a generalization and more detailed explanation of the Letter [12]. The main extensions consist in the enlarged class of observables admitted in Sect. IV and the fact that degenerate energy eigenvalues are not any more excluded in our present work.
We finally turn to the closely related work [13]. In contrast to our present work, Ref. [13] is focused on Hilbert spaces which are finite dimensional and which exhibit a "system-plus-bath" product structure of the form (72). Apart from a non-resonance condition (excluding degeneracies, see below (52)), the Hamiltonian may still be completely arbitrary. Further, the system is assumed to be in a pure state on the total system-plus-bath Hilbert space, while the obtained results mainly concern the reduced (usually mixed) state of the "small" system after tracing out the "large" bath. Apart from these quite significant overall differences, the main findings with respect to equilibration are rather similar in character to ours. In particular, observables with finite range are implicitly taken for granted according to the discussion below Eq. (3) in Ref. [13], and the "effective dimension" d ef f from [13] is basically equivalent to Tr{ρ 2 eq } in our present approach, as discussed below (60). With respect to thermalization, the results from [13] are of a quite different character than ours, mostly concerning "typical" [22,26,37] properties of pure states which are randomly sampled according to a uniform probability density from certain sub-Hilbert spaces. In the opinion of the present author (see also Sects. I and V B), the main open question of this approach is in how far one particular pure state or an ensemble of uniformly distributed pure states are suitable to describe a real experimental setup.

Acknowledgment
Special thanks is due to Michael Kastner and Roderich Tumulka for pointing out Refs. [20,21] and to an anonymous referee for insisting in a more detailed discussion of the notion of weak coupling (Sect. IX).

XIII. APPENDIX A
In this Appendix we establish the relations (96) and (97). As detailed above (94), analogous definitions and equations as in (9)-(17), but with indices "S" and "B", hold true and are exploited in the following.
With the help of (12) and (87) we can conclude that where nm indicates a summation over all n, m = 0, 1, 2, .... It follows from (12) that ω S (E ′ ) = 0 for is a sufficiently smooth function of E ′ , it thus will have an absolute maximum in the interior of the interval [ . For any given E-value exceeding the ground state energy E 00 = E S 0 + E B 0 of the compound system, the absolute maximum will furthermore be generically unique.
Next we rewrite (122) by means of (11) and (14) as Similarly to the integrand in (122), the exponent in (123) generically exhibits a unique absolute maximum at some , henceforth denoted as E B (E). The next step is to evaluate (123) by means of a saddle point approximation, i.e. by expanding the exponent around its maximum up to the second order. At the maximum, the derivative of the exponent vanishes, yielding with (13) the relation where we have introduced We emphasise that, like in Sects. II A and IX C 3, we avoid speaking about system states. If we give up this viewpoint for a second, then (124) is nothing else than the so-called equilibrium condition for two systems with negligible (but non-zero) interaction in thermal equilibrium, sharing the total system energy E according to (125).
Expanding the exponent on the right hand side of (123) up to the second order about its maximum at E ′ = E B (E) in combination with (17) yields Since f B is at least of the order of 10 23 (see below (70)), it follows that only an extremely small neighborhood of the maximum notably contributes to the integral in (123), and within this neighborhood the variations of the nonexponential factors on the right hand side of (123) are negligibly small according to (16). Performing the remaining Gaussian integral in (123) yields (15) and (71) one can infer that R is of the order of f . Considering that the quantity in (127) scales like f according to (15) and that f is at least of the order of 10 23 , the contribution of ln(O(R)) in (127) is negligible. We thus obtain in extremely good approximation the relation Differentiating this relation with respect to E and taking into account (13), (124), and (125) yields The latter two relations are identical to (96) and (97) To do so, we first exploit (25) to conclude and hence With (26) we obtain (132).

XV. APPENDIX C
In this Appendix, the derivation of (48) is provided. To start with, we recall that the delta-functions in (46) and (47) are understood to be "washed out" over many energy levels. Hence, smoothening ρ(E) with the help of yet another washed out delta-function actually does not change ρ(E) any more: Introducing (45) and (47) on the right hand side yields Since the delta-functions are washed out, the last summand essentially amounts to a local average over many δp n and is thus negligible (see below (45)). In turn, the function h(E) hardly changes within the peak region of the delta-functions. Hence h(E n ) can be replaced by h(E ′ ) and then by h(E). Altogether we thus obtain The sum can be identified with ω(E ′ ) from (12). Since the latter is once again already a locally averaged quantity, the integral over the last remaining delta-function is trivial, yielding (48).

XVI. APPENDIX D
According to (54) it follows that e iat = 1 for a = 0 e iat = 0 for a = 0 .
Since, according to (138), the time averaged exponentials in (148) vanish if E j − E k + E m − E n = 0 we can conclude from the non-resonance condition (52) that where the first sum runs over all m, n with E m = E n and the second over all m, n. With (24) and (55) we thus obtain σ 2 A ≤ mnÃ mn ρ nnÃnm ρ mm = mn m|Ãρ eq |n n|Ãρ eq |m .
The sum over n amounts to an identity operator and that over m yields Next, we evaluate this trace with the help of the eigenvectors |χ n ofÃ, yielding σ 2 A ≤ mn χ m |ρ eqÃ |χ n χ n |ρ eqÃ |χ m .
The sum over n yields the identity operator and that over m amounts to Tr{ρ 2 eq }, yielding Finally, we note that according to (56) and (141) we can subtract from A in (140) an arbitrary function B(b) of the form (39) with the only consequence in the final result (154) that ∆ ′ A goes over into ∆ ′ A−B(b) . Since this conclusion holds for arbitrary B(b), the inequality even remains true after minimization over all B(b), i.e. we can replace ∆ ′ A in (154) by ∆ ′′ A from (41). In other words, we recover (60).

XVII. APPENDIX E
The purpose of this Appendix is a more detailed justification of the non-degeneracy assumption adopted in Eq. (79). More precisely, we will argue that the energy levels E n (λ), considered as functions of λ, do not cross each other for generic Hamiltonians H(λ).
Intuitively, one expects that the E n (λ) will be highly non-trivial functions of λ and that for different n these functions will behave notably "different" from each other. Since the energy levels are unimaginably dense (cf. Sect. II A), crossings of neighboring levels E n (λ) upon variation of λ thus might seem to be almost unavoidable.
But closer inspection shows that, on the contrary, such level crossings are actually avoided for generic Hamiltonians H(λ) due to the so-called level-repulsion mechanism: The levels E n (λ), considered as functions of λ, are governed by the following exact evolution equation, originally due to Pechukas and Yukawa [39]: where E n (λ) and |n(λ) are the "accompanying" eigenvalues and eigenvectors of H(λ). The closing evolution equations for V nm (λ) are also known, but not explicitly needed for our purpose. The main point is that looking upon λ as "time" and E n (λ) as "particle positions", Eq. (155) is nothing else than the Newtonian equation of motion for a one-dimensional "gas" of point particles, the so-called Pechukas-Yukawa gas [39]. The particles are repelling each other with "coupling strengths" |V nm (λ)| 2 which depend on "time" λ. In the generic case (no special symmetries or "selection rules"), the coupling of two neighboring particles, say |V nn+1 (λ)| 2 , will be positive with the exception of at most a discrete set of timepoints λ, and as a consequence, any "attempt" of the two neighboring levels E n (λ) and E n+1 (λ) to cross each other is inhibited by a repulsive "force" term in (155) of the form |V nn+1 (λ)| 2 [E n (λ) − E n+1 (λ)] −1 which diverges as E n (λ) → E n+1 (λ).