Partitioning a macroscopic system into independent subsystems

We discuss the problem of partitioning a macroscopic system into a collection of independent subsystems. The partitioning of a system into replica-like subsystems is nowadays a subject of major interest in several field of theoretical and applied physics, and the thermodynamic approach currently favoured by practitioners is based on a phenomenological definition of an interface energy associated with the partition, due to a lack of easily computable expressions for a microscopic (i.e.~particle-based) interface energy. In this article, we outline a general approach to derive sharp and computable bounds for the interface free energy in terms of microscopic statistical quantities. We discuss potential applications in nanothermodynamics and outline possible future directions.


Introduction
We consider a system of N interacting particles contained in a volume V at temperature T where the interactions in a general phase (or state) space X ⊂ R d are characterised by a Hamiltonian H. Its canonical partition function (1) Z(N, V, T ) = V ⊂X e −βH(x) dx encodes the statistical and thermodynamic properties of the system. In this work we discuss the possibility of partitioning a macroscopic system into weakly interacting subsystems. Generally, the partitioning of a macroscopic system into several independent or weakly interacting subsystems, in particular, subsystems of the same size (replicas) is relevant for the simulation of various kinds of thermodynamic systems [1]. Molecular modelling and simulation, for example, is often concerned with the question of how to quantify the trade-off between available computational resources and the desired spatial resolution of a model; running molecular simulations that allow for reliable and robust predictions often requires suitable techniques for the extrapolation of simulation data and estimated statistical quantities on macroscopic scales (see e.g. [2]). The extrapolation to larger scales requires the knowledge of the error of thermodynamic quantities invoked by the passage from a small system to a larger one, which often boils down to quantifying the error invoked by approximating a large system by independent subsystems, each of which has the size of the small system of interest. To solve the latter problem, it is necessary to determine the energy associated with creating interfaces between the subsystems. Roughly speaking, if such an interface energy is sizable, compared to the total energy of each subsystem, then the system is not separable and, as a consequence, its thermodynamic properties are not scalable, otherwise the system is separable and thus the extrapolation to a larger scale is straightforward. Beyond molecular simulation, the idea of partitioning a system into independent subsystems is known to be a powerful theoretical tool for proving the existence Key words and phrases. Molecular dynamics, small systems thermodynamics, canonical ensemble, Bogoliubov inequality, Donsker-Varadhan variational principle. of thermodynamic limits in particle systems [3,4]. A related conceptual problem, which is relevant in various disciplines and applications, is the definition of the size of a nanosystem, specifically, the question of how small a nanosystem must be so that one can manipulate it at the level of single molecules or atoms, while being large enough so that thermodynamic and statistical properties are still well defined (see e.g. [5,6,7]).
The idea of building an ensemble of independent replicas whose joint statistics mimics a large system has been used by T.L. Hill to lay the foundations for what is today known as "nanothermodynamics" [8,9,10]. A closely related framework has been proposed in connection with the thermodynamics of non-additive systems [11,12,13,14,15].
Nowadays, the partitioning of thermodynamic systems is mostly done following Hill's phenomenological approach that is based on empirical laws and ad hoc assumptions about the underlying thermodynamics. Despite its lack of theoretical foundation, Hill's approach represents a powerful tool for understanding small systems thermodynamics, and the aim of this work is to make a first step towards linking the key quantities in Hill's phenomenological approach, such as the free energy associated with the separation of a system into smaller units, with microscopic quantities that can be estimated from running computer simulations. In doing so, we start from the microscopic definition of the canonical partition function of the system and derive an upper and lower bound for the free energy difference between the fully coupled system and the collection of replicas with the coupling removed. We derive tight bounds for the interface free energy in terms of particle-based expressions, where it turns out that the precise form of the removed coupling terms is irrelevant for the analysis, so the considerations equally apply for collections of uncoupled or weakly coupled replicas. The fact that the bounds are particle-based has important practical consequences, in that it allows for a straightforward numerical implementation by molecular simulation algorithms.

Hill's approach to replica-like systems
Before we explain our approach, we briefly review Hill's replica model [8,9]. The approach proposed by T.L. Hill has been often taken as a paradigm for the construction of a macroscopic ensemble from statistically independent microscopic replicas (see e.g. [14,13]). Hill's empirical approach is targeted at systems that are so small so that their interaction with the environment is no more negligible, as a consequence of which the known laws of thermodynamics do not hold. The solution proposed by Hill is to consider the thermodynamics of a single nanosystem in an extended ensemble composed of non-interacting replicas, which represents a macroscopic system that is governed by the standard laws of thermodynamics. Here "non-interacting" refers to the absence of an explicit interaction potential between the particles in different replicas, but, as we will see below, the construction gives rise to an effective interaction on the level of the thermodynamic potentials. If a macroscopic system is characterized by energy E r , entropy, S r , volume, V r , and number of particles, N r , then such a macroscopic system can be represented as a collection of r independent or weakly interacting replicas. Letting E, S, V, N denote the corresponding quantities for each of the replicas, then E r = rE, S r = rS, V r = rV and N r = rN , and the fundamental equation reads (2) dE r = T dS r − P dV r + ρdN r + Xdr , where T , P and ρ are the temperature, pressure and chemical potential that are well defined for the macroscopic system. The key point of Hill' approach now is to introduce the unknown X, which is the energy per replica associated with splitting the system into subsystems, which plays the role of an effective interaction between replicas. This energy is only empirically defined, and most models following Hill's prescription tacitly assume that each replica can be characterized by a microscopic distribution (e.g. canonical) so that intensive thermodynamic quantities in (2), such as temperature or chemical potential are also microscopically well defined.
Despite the success of this approach, a problem remains: in a system of interacting particles one wishes to have a particle-based definition of X, according to the fundamental laws of statistical mechanics. In most applications, however, X is defined only empirically rather than being based on first principles. A rigorous way to compute X would require to have access to the difference ∆F = F − F 0 between the free energy F of the total system and the total free energy F 0 = F 1 + F 2 + . . . of the subsystems, so as to link the properties of the macroscopic system and its non-interacting constituents.
In the next section we will derive microscopic expressions that bound ∆F from above and from below, and discuss conditions under which these bounds are sharp. We refrain from presenting our results with maximum possible generality, so as to keep the arguments clear and concise. We stress, however, that most of the following reasoning can be easily generalized at the expense of introducing a more elaborate mathematical notation.

Bounds for the interface free energy
has the interpretation of the total energy of a finite collection of uncoupled (or weakly coupled) subsystems, and U is the coupling energy. We set β = 1 in what follows, and assume, without loss of generality, that H 0 and U are dimensionless, which can be easily obtained by scaling (H 0 , U ) → (βH 0 , βU ). The specific form of H 0 or U is irrelevant for the following considerations. Specifically, we seek upper and lower bounds for the free energy We assume that both Z and Z 0 are finite, so that ∆F is well-defined. It will be convenient to consider a dual representation of the thermodynamic free energy that is based on relative entropy or Kullback-Leibler divergence.
Definition 1. Let f, g : X → [0, ∞) be two probability density functions on X . The Kullback-Leibler divergence (or: relative entropy) between f and g is defined as Remark 1. The above definition is based on the convention 0 log 0 = 0, which can be justified by a limit argument. The requirement that division by zero is only allowed on null sets, i.e. {g(x) > 0} ⊂ X is a null set under the probability measure induced by the density function f , expresses the absolute continuity of the probability measure P with respect to the measure Q induced by g, so that Q(B) > 0 whenever P (B) > 0 for a measurable set B ⊂ X . Absolute continuity guarantees that the integral (7) is well defined, possibly having the value +∞.
Note that the Kullback-Leibler (KL) divergence is not symmetric in its arguments, i.e. KL(f, g) = KL(g, f ). A useful property, however, is that it is nonnegative, for all x ∈ X up to a set N ⊂ X of measure zero under the probability density f . To see this, notice that, by Jensen's inequality, where the last identity is implied by g being a probability density. As the logarithm is strictly concave, equality in Jensens inequality is attained if and only if the integrand log(g/f ) in the first equation is constant up to a set of measure zero under the probability density f . Since f ≥ 0 cannot be identically zero, the integral must be equal to zero to attain equality, therefore g = f up to null sets. In the following we assume that the first argument in the KL divergence (here: f ) is always strictly positive, which implies that null sets under f are Lebesgue null sets, i.e., the above statements hold almost everywhere (a.e.).
3.1. Two-sided Bogoliubov inequality. Finding an upper and a lower bound for the free energy difference ∆F = − log(Z/Z 0 ) is now straightforward. To this end we define the two probability densities on X , which are readily seen to be normalized.
Theorem 1 (Two-sided Bogoliubov inequality). Under the previous assumptions, it holds that denote the expectations with respect to π and π 0 .
Proof. The conditions on H 0 and U imply that U is integrable with respect π 0 and, as a consequence, it is integrable with respect to π as well. Moreover both π 0 and π are strictly positive. The non-negativity of the KL divergence (see Definition 1) then implies that in other words, Interchanging the arguments π and π 0 in the KL divergence, we see that which gives us the upper bound Hence the assertion is proved.
The upper bound ∆F ≤ E π0 [U ] of Theorem 1, that follows directly from (5) via Jensen's inequality, is known as Bogoliubov inequality (also: Peierls-Bogoliubov or Gibbs-Bogoliubov inequality) in the literature which is why we named the result of Theorem 1 after Bogoliubov; see [17,18,3,4]. We mention a related result by Isihara [16] who has proved a two-sided Bogoliubov inequality using a cumulant expansion of the free energy.
3.2. Variational bounds. By Jensen's inequality, the bounds in Theorem 1 are sharp if and only if U is constant, in which case ∆F = U . Excluding this somewhat pathological case, and assuming further that U is bounded, we can still improve the bounds for ∆F by using the duality relation [19,20,21] where, by definition, − log E π0 [e −U ] = ∆F and the infimum is taken over the set P + of a.e. positive probability densities ρ > 0 on X . Equation (19) is known as the Donsker-Varadhan principle [22], which, in our case, is again a simple consequence of Jensen's inequality: Noting that, for any probability density ρ > 0 (a.e.) and any bounded random variable W on X , we have where the weight L(x) = π 0 (x)/ρ(x), ρ(x) = 0 is the likelihood ratio between the two probability densities π 0 and ρ, and denotes the expectation of W with respect to ρ. (Without loss of generality, we can exclude the set {ρ(x) = 0} ⊂ X from our considerations, which is a null set under both ρ and π 0 .) Hence, by Jensen's inequality, it follows that The rightmost expression equals E ρ [U ] + KL(ρ, π 0 ). By the strict convexity of the exponential function in (22), equality can only be attained if U (x)+log(π 0 (x)/ρ(x)) is a.e. constant, since otherwise the inequality becomes strict. Moreover, since the right hand side in (22) is strictly convex in ρ, there is a unique probability density function ρ = ρ * for which the equality is attained. It is given by and it is therefore the unique, positive minimizer of (19). Here γ > 0 is the normalization constant, and it can be readily verified that γ is equal to ∆F which implies that ρ * = π and thus yields the known equality from the first step in the proof of Theorem 1.
In practice, the rightmost term in (24) will be difficult to evaluate, however we can get a good estimate by exploiting the fact that the Donsker-Varadhan principle (19) has a dual form that characterises the (relative) entropy in terms of the free energy and the internal energy (see [19,Prop. 2.3]): (25) KL(π, π 0 ) = sup where integrability ψ ∈ L 1 is understood with respect to π. By combining the equations (19)- (25), we obtain the following variational characterisation of ∆F .
Theorem 2 (Variational bounds). We have That is, for all ψ ∈ L 1 and ρ ∈ P + it holds that Even though the variational form (26) may not be particularly useful in practice, (27) can be the starting point for systematic improvements of the Bogoliubov bound (13). In particular (13) is obtained from (27) by setting ψ = 0 and ρ = π 0 , and so a sensible strategy to improve (13) could be to consider the homotopies in (27) that have the property that one recovers the original bound (13) for α 1 = α 2 = 0 and the sharp bound in (26) for α 1 = α 2 = 1. Here γ(α 2 ) is again the normalization constant that guarantees that ρ α2 is a probability density. By differentiating the variational bounds with respect to the homotopy parameters α 1 and α 2 , it can be readily seen that the lower bound in (27) is strictly increasing for small values of α 1 , whereas the upper bound is strictly decreasing for small values of α 2 , in other words, small perturbations to ψ = 0 and ρ = π 0 are sufficient to improve the bounds. (19) furnishes the Gibbs relation F = E − T S for the thermodynamic free energy F , with E being the internal energy, T temperature and S denoting the Gibbs entropy. We specify the previous assumptions by assuming that X ⊂ R d is compact and setting H 0 ≡ 0, so that π 0 ≡ 1/|X |. Then, setting U = βV with β = (k B T ) −1 , k B > 0 being Boltzmann's constant and V denoting any continuous energy function on X , equation (19) turns into

Remark 2. Equation
The unique minimizer is the Gibbs-Boltzmann density ρ * = exp(−βV )/Z with normalization constant Z = exp(−βF ). Hence the Donsker-Varadhan principle can be considered a measure-theoretic generalization of the Gibbs relation for the free energy in terms of internal energy and (relative) entropy.

3.3.
Examples and computational aspects. As a simple example, we consider a collection of two subsystems with separable Hamiltonian with H i : X i → R and X i ⊂ R 6Ni , i = 1, 2. The coupling is by virtue of the interaction Hamiltonian Here N = N 1 + N 2 , |V | = |V 1 | + |V 2 | are the total number of particles and the total volume, with constant density Theorem 1 now implies that denote the expected values of the interaction potential between the particles of subsystem 1 and subsystem 2 with (lower bound) or without coupling (upper bound).
Since the two subsystems are independent,Ū 0 has the interpretation of an average interdomain energy: the energy required to partition the system into two subsystems plus a "corridor", assuming a phase space volume δV that is negligible in comparison with V 1 and V 2 , so that Other examples that are relevant in applications are the calculation of thermodynamic bulk properties or the simulation of open systems [31,32]. Imagine, for example, a situation in which one wants to study the conformational change of a large biomolecule immersed in aqueous solution. Typically one would need a large system to properly represent the bulk of the solvent, however such calculations are very expensive, and one may think of separating the system into a solvation region and an environement, and simulating only the relevant subsystem consisting of the biomolecule and its solvation shell. If one were able to simulate only the subsystem while ignoring the interaction with the (possibly infinitely large) solvent environment, it would be possible to compute the relevant properties by simulating a system of a drastically reduced dimensionality. In order to quantify finite-size effects that are invoked by neglecting the interactions between the subsystem and its environment (mediated by the interaction potential U ), it is crucial to have precise estimates of the interface free energy associated with the solvation shell.
Roughly speaking, finite-size effects are negligible, and computing bulk properties by simulating a small system is justified, when ∆F is small compared to the characteristic energy scale of the system, otherwise the effects coming from the environment have to be taken into account, for example by increasing the system size or by changing the interaction potential [24]. As these free energy calculations are extremely costly and the simulations have to be repeated many times with different simulation parameters (e.g. see [25]) in order to determine the optimal system size or the parameters of the interaction potential, it is important to have free energy estimators that are computationally feasible.
Let us stress that the aforementioned examples are just representatives for a much larger class of problems, for most molecular dynamics models that involve either heat baths, reservoirs, or the alike are subject to finite size effects when implemented on a computer. Assessing whether these finite size effects are important or not is crucial, and in this paper we suggest a flexible tool to derive quantitative estimates to see how far a simulation is away from the (ideal) thermodynamic limit, without running an enormous number of trial simulations.
Being able to split a large system with d = 3N degrees of freedom into many, say, r independent subsystems moreover allows for easy parallelizability of MD codes [26]. This typically reduces the computational complexity of the corresponding algorithms from O(d α ), with α typically between 2 and 3, to O(d α 1 + . . . + d α r ) with N k = d k /3 being the number of particles in the k-th subsystem, even though one is simulating in total the same number of particles N = N 1 + . . . + N r . One resort is to replace the Monte Carlo estimator by its variational counterpart from Theorem 2 and use stochastic optimization tools to approximate the free energy as has been described in [21,27], or to use suitable moment estimates ("concentration inequalities") for the underlying probability distributions as in e.g. [29,30].

Discussion and Conclusions
Mimicking infinite physical systems by simulating finite domains is challenging because the statistics of the small system is by definition "marginal", in that one considers finitely many degrees of freedom of a system with, theoretically, infinitely many degrees of freedom. Therefore one needs suitable numerical techniques to account for the finite size effects in the corresponding probability distributions [23]. Particularly important are error indicators for interface free energies that allow for estimating the statistical error due to the separation of an infinite system into weakly interacting small subsystems. The difference between the upper and lower bounds as well as their variational counterparts provide easily computable error bounds to quantify this effect.
Getting precise estimates of the interface free energy ∆F is of interest in many practical problems, in which the separation of a system into non-interacting or weakly interacting subsystems plays a role [31,32]; examples include the formation of islands in colloidal systems [33] or the formation of nanosize domains through epitaxial growth [34], to mention just two important cases; specifically, in colloidal systems, the knowledge of the degree of independence of the various islands in terms of the ratio between the interface energy and the total energy of the islands gives a handy criterion as to whether it is justified to treat single islands as systems per se in a simulation and thus acquire a large gain in computational efficiency in analyzing the local microscopic details. Given the current development of nanothermodynamics in the direction of (bio-)technology, precise and computationally feasible free energy bounds may be further relevant for the statistical modeling of such nanosystems [35,36,37,38]. The variational free energy bounds can moreover serve as quantitative criteria to optimally design separable systems with specific degrees of separation by suitably adjusting the characteristic physical parameters (e.g. interaction range, strength of interaction, atom spacing). One such example are molecular wires in nanoelectronic devices, for which a high degree of separability implies that the independent subsystems are representative for locally acting devices, where the degree of locality can be engineered by a proper choice of the physical parameters controlled by the minimization of the interface energy [39]. We stress one more time that the above considerations can be readily generalized to non-stationary and path-space problems (cf. [40]). Future work should address bounds for specific observables.