Entropy minimization within the maximum entropy approach

A procedure of the minimization of the maximum entropy with respect to an external driving force μ, is proposed. In application to coupled dynamic-stochastic systems such an approach allows one to reduce the uncertainty in estimating the dynamic counterpart (a probe or a model) parameters. In its turn this permits to estimate an optimal driving path giving a maximal information on the probability distribution f(x|μ) of the stochastic counterpart with a given probe/model θ(μ|x). It is found that the functional form of the model should be similar to the observed/measured one θ(μ), while the minimum uncertainty is reached when the distribution becomes independent of the driving μ.


Introduction
The term Complex Systems has emerged as a recognized field of research in many domains (from physics to biology). Nevertheless a strict definition of the complexity is still lacking. However there are several common generic features such as multicomponent nature, emergence, non-linear or even feedback relations among the components. In this situation one has to work under conditions of incomplete or partial information on the system. Thus a deterministic description becomes almost impossible and one has to resort to statistical/probabilistic methods. One of them is the maximum entropy approach [1] which is extensively invoked in the solution of inference problems appearing in various fields . The approach allows one to calculate a probability distribution of relevant variables from an incomplete knowledge of the microscopic mechanisms governing the system evolution. The central idea is to find a least biased distribution consistent with the information at hand. Therefore, the scheme operates with two main ingredients: the entropy measure (that estimates the uncertainty associated with the probability distribution) and the constraints encoding the available information. In this context various entropy functionals are introduced (see [2] for a recent review) in order to derive various nonexponential probability distributions observed in nature. These are generalizations of the famous Shannon form (4) that appears from the generalized forms as some limiting case. However, many of these generalized entropies are non-additive for statistically independent subsystems. This fact with its consequences is still a subject of extensive debates [2,3].
In a situation when many generalized entropy forms are being proposed [4] it might seem that the maximum entropy approach suffers from an ambiguity [5]: "in a sense that any probability distribution seems to be derivable from the maximization of any entropic measure if an appropriate constraint is used". In this respect it has been argued [5] that such an ambiguity does not appear when one searches for a parametric family of distributions f (x|µ). Here x is a relevant fluctuating variable and µ is an external parameter independent of x. For instance, such a parameter could enter through an observed mean value θ(µ) [6] or the distribution width varying in response to some external perturbation µ, like in the process of insertion into a host matrix [7]. On the other hand, we have to note that we are not free to modify the constraints without logical necessity because they contain the information (or our theoretical model) on the processes running in the system of interest.
Since we deal with a parametrized distribution f (x|µ) its entropy H(µ) is also a function of the parameter µ. This opens up a possibility to search for a minimum of H(µ) as a function of the driving µ. In other words, with a given observable θ(µ) and its model estimation θ(µ|x) we may try to find an "optimal" set of parameters reducing the uncertainty on the behavior of f (x|µ). This idea was briefly sketched in [8].
In this paper we explore such a possibility in application to coupled dynamic-stochastic systems [9,10,11], where the dynamic counterpart may serve as a probe for determining the behavior f (x|µ) of the stochastic subsystem. In this way the maximum entropy approach allows one to find a functional form of the distribution f (x|µ) while the subsequent minimization of the maximized entropy permits to reduce the uncertainty on the relevant parameters.

Maximum entropy approach
Let us first summarize some basic principles of the maximum entropy approach. In the light of what is discussed above, we consider a parametrized entropy functional H(µ) of a quite general form [8] H Here H is simultaneously a function of the parameter µ and a functional of the parametrized distribution f (x|µ), D(t) is some function with appropriate behavior. The functional Φ [f (x|µ)] can be represented as where φ(p) is another function ensuring suitable overall properties (continuity, concavity,...). Combining eqs. (1) and (2) we arrive at It is clear that practically all generalized entropy functionals appearing in the current literature can be deduced from the definition above (except the forms involving the distribution derivatives like the Fisher form). For instance, for D(t) = t, φ(p) = −p ln p we recover the Shannon entropy Let us consider a problem of maximizing H(µ) under two natural constraints -the normalization ∫ dxf (x|µ) = 1 (5) and some known average θ(µ) of a conditional response function θ(µ|x) Here the overbar denotes the corresponding average taken with the distribution f (x|µ). Usually the average θ(µ) corresponds to the observed behavior (e.g. experimental data) and the conditional function θ(µ|x) represents our theoretical estimation (a model) of this behavior for a given state x of the random environment. The latter is characterized by a probability distribution f (x|µ) depending on the "driving" parameter µ (for instance, this could be an external field, pressure or chemical potential). Thus we have to find a variation of the following Lagrangian where κ and λ are the Lagrange multipliers which should be found from the constraints (5) and (6). The variation procedure δL/δf (x|µ) = 0 leads to In the case of the Shannon entropy H(µ) = H S (µ) (4) this procedure leads to an apparently exponential distribution Nevertheless, if the relevant variable is x, then the actual form of the distribution (9) depends on the form of the constrained function θ(µ|x). For instance, as we have demonstrated [10,11], if θ(µ|x) is of logarithmic form, then (9) transforms into a Γ-distribution. Physically this corresponds to a non-equilibrium stationary state whose thermodynamic entropy is held at a given distance from the maximum that corresponds to the equilibrium. For θ(µ|x) = g(µ)x α we obtain from (9) a stretched exponential distribution. This means that we can associate exponential distributions to the Shannon entropy only in the case of linear constraint, i.e. θ(µ|x) = g(µ)x.

Minimal maximized entropy
As is discussed above we are dealing with the parametrized entropy H(µ). Let us consider now the variation of H(µ) with respect to the parameter µ. From eqs. (1) and (2) we obtain Here could be a function of µ. Anyway it does not depend on x since Φ [f (x|µ)] is a functional of the probability distribution and simultaneously a function of µ. Therefore, combining these two equations we obtain Using now the condition (8) for the maximum entropy we arrive at It is clear that the last term disappears because of the normalization condition (5) and we finally get the maximum entropy rate as Now differentiating the constraint (6) we obtain Combining these two equations we arrive at Therefore, we have a "universal" relation among the entropy rate, the observed behavior θ(µ) and the average of the model estimation θ(µ|x). This relation does not depend on the form of the entropic functional (the form (1) is quite general) and thus could be helpful for drawing conclusions of general validity. A brief discussion of (15) for the case of the Shannon form can be found in [10,11]. Anyway it is clear that the behavior of H(µ) depends on what is known/measured θ(µ) and on our conditional estimation θ(µ|x). Thus it seems logically acceptable to search for a way of reducing the ambiguity. Namely, one can analyze the conditions for a minimum of H(µ). In such a way we can find some specific values of the parameters at which the uncertainty on the values of the relevant variable x is minimal. Such a procedure might help to determine which "driving path" is more informative. Let us illustrate this on a simple example. Suppose that there are two relevant choices: , where the value of α is to be estimated. Maximization of the Shannon entropy (4) under these conditions leads to Estimating the terms in the relation (15) we get Analyzing the first condition in (16) we arrive at α → q. From the second condition of (16) we conclude that α → q from above or below, depending on the value of q. Based on these arguments and returning back to (19) one can easily see that the minimum uncertainty conditions point up towards choosing the model close to the observed behavior θ(µ). In this way the probability distribution becomes independent of the driving µ. Thus, the minima of H(µ) occurring at µ 1 , ..., µ n correspond to the minima of the observed function θ(µ) and this is equivalent to a minimization of the distribution width with fixing the positions of the distribution maxima at x = µ 1 , ..., µ n . Therefore, equation (15) together with the conditions (16) allow one to translate the specificities of the observed behavior θ(µ) into the amount of information on the relevant random variable x. Obviously an interpretation of this correspondence depends on the nature of processes which are encoded in the function appearing in the constraint (6).

Conclusion
A minimization procedure of the maximum entropy of a quite general entropic functional with respect to an external driving force µ is proposed. It is demonstrated that for a conditional or parametrized probability distribution f (x|µ) there is a "universal" relation among the entropy rate and the functions appearing in the constraint. This relation allows one to translate the specificities of the observed behavior θ(µ) into the amount of information on the relevant random variable x at different values of the parameter µ. It is clear that we are not free to change the constraint in ad hoc manner without some physical (or logical) reasons because it encodes our partial information θ(µ) into what we are supposed to know (θ(µ|x)) and what we are searching for (f (x|µ)) In application to coupled dynamic-stochastic systems such an approach allows one to reduce the uncertainty in estimating the dynamic counterpart (a probe or a model) parameters. In its turn this permits to estimate an optimal driving path giving a maximal information on the probability distribution f (x|µ) of the stochastic counterpart with a given probe/model θ(µ|x). It is found that the functional form of the model should be similar to the observed/measured one θ(µ), while the minimum uncertainty is reached when the distribution becomes independent of the driving µ.