| J. Phys. A: Math. Theor. 40 No 47 (23 November 2007) F1021-F1030 |
| doi:10.1088/1751-8113/40/47/F02 |
| PII: S1751-8113(07)60372-2 |
Statistics of partial minima
E Ben-Naim1, M B Hastings1 and D Izraelevitz2
1 Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
2 Decision Applications Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
E-mail: ebn@lanl.gov, hastings@lanl.gov and izraelevitz@lanl.gov
Received 19 September 2007, in final form 10 October 2007
Published 6 November 2007
Abstract. We study pseudo-optimal solutions to multi-objective optimization problems by introducing partial minima defined as follows. Point x k-dominates x ' when at least k of the coordinates of x are smaller than the corresponding coordinates of x '. A point not k-dominated by any other point in the set is a k-minimum or a partial minimum, generalizing the global minimum. We study statistical properties of partial minima for a set of N points independently distributed inside the d-dimensional unit hypercube using exact probabilistic methods and heuristic scaling techniques. The average number of partial minima, A, decays algebraically with the total number of points, A ~ N–(d–k)/k, when 1 ≤ k < d. Interestingly, there are k – 1 distinct scaling laws characterizing the largest coordinates: the distribution P(yj) of the jth largest coordinate, yj, decays algebraically,
, with
for 1 ≤ j ≤ k – 1. The average number of partial minima grows logarithmically,
, when k = d. The full distribution of the number of minima is obtained in closed form in two dimensions.PACS numbers: 02.50.Cw, 05.40.–a, 89.20.Ff, 89.75.Da |
1. Introduction
A host of decisions in computer science, economics, politics and everyday life involve multiple criteria or multiple objectives [1–4]. A pedestrian choosing a walking path considers the distance, the number of turns and the number of traffic lights. In business, takeover bids are decided on a multitude of complex conditions in addition to the total monetary offer. In elections, voters examine how candidates stand on multiple issues.
In multi-objective optimization, a solution that is optimal with respect to all criteria is rarely possible and instead, one faces a set of choices that are suboptimal on at least one criterion. Decisions require algorithms to weed out clearly inferior choices, sort through all the remaining imperfect choices and evaluate the relative trade-offs between costs.
Since in multi-objective optimization, a global optimum is unlikely, we are interested in identifying points that are close to optimum. In this paper, we propose a pseudo-optimality criterion, and derive the likelihood of finding pseudo-optimal solutions as a function of the number of choices.
By definition, a global optimum is superior in all cost dimensions. Intuitively, one may define a pseudo-optimum as superior to all alternatives along a large number of cost dimensions. For example, in a three-cost scenario, there may not be any choice that is optimal with respect to all three costs, but we may be able to find choices that are better than any alternative along two costs, see figure 1. In a voting scenario, no candidate may have the most attractive position on all issues to a given voter. In this case, a voter might naturally restrict her attention to a candidate, or candidates, who have the most attractive position on as many issues as possible.

| Figure 1. Illustration of partial domination and partial minima. There are three costs associated with A, B and C. A and C dominate B on one cost and thus B is not a 1-minimum. Neither A nor C dominate B on two costs and thus B is a 2-minimum. |
Let us represent our choices as N points in d dimensions with coordinates x ≡ (x1, x2, ..., xd). Each coordinate xi > 0 represents a distinct cost. By convention, small-x values are superior and are considered dominant. Partial minima, a formalization of the pseudo-optima concept discussed above, are defined as follows. A point x is said to be k-dominated by x ' when at least k of the coordinates of x are larger than the corresponding coordinates of x '. A point is said to be a partial minimum, or formally a k-minimum, when it is not k-dominated by any other point in the set as illustrated in figure 1. We stress that a partial minimum is not required to dominate all other points on the same d – k coordinates and may dominate different points along different coordinates. The parameter 1 ≤ k ≤ d quantifies the quality of the partial minimum: a smaller k value represents a more stringent condition. The two extremes are the global minimum, k = 1, where every coordinate is a minimum of the point set, and the efficient set, k = d, that includes all points that are not obviously dominated by other points as shown later in figure 3. Partial minima are conditional multivariate extrema and their properties are amenable to analysis using a statistical physics perspective [5–9].
In this study, we obtain exact statistical properties of partial minima including the multivariate density and its asymptotic behavior as well as scaling properties such as the typical size and average number of minima. We present two major results. First, as a function of the set size N, the average number of minima decays algebraically when 1 ≤ k < d, and grows logarithmically when k = d. Second, there are k – 1 different scaling laws for the largest coordinates, each following a power-law distribution with k – 1 distinct exponents. The rest of the d + 1 – k coordinates are characterized by distributions with sharp tails. We also discuss the relevance of these results to the multi-objective shortest path on graphs, a central problem in multi-objective optimization.
We consider the situation where there are no correlations between the coordinates. That is, each coordinate is independently drawn from some distribution. As discussed below, this situation is equivalent to a uniform distribution in the unit hypercube. Thus, we conveniently assume that xi is uniformly distributed in [0:1] for all 1 ≤ i ≤ d.
2. Heuristic arguments
Elementary scaling laws for the typical size of a partial minimum and the average number of minima are derived heuristically. We assume that (i) the partial minimum is dominant on a fixed set of k coordinates and (ii) all its coordinates are equal, xi = x, for all i. By the partial minimum definition, the corresponding k-dimensional hypercube contains only the partial minimum itself. The volume of this hypercube is xk and the expected number of points inside this hypercube must be of order 1, Nxk ~ 1. Consequently, the typical size x decays algebraically with N,
This characteristic scale decreases as the minimum condition becomes more stringent, that is, as k decreases.
The expected number of partial minima
follows from the expected number of points inside the d-dimensional hypercube with linear dimension x, Nxd. Partial minima are asymptotically rare and the scale 1 decays indefinitely. Furthermore, with a small probability, there is only one minimum when N is large. The scaling estimate 2 coincides with the exact value A = N–(d–1) for k = 1, since any point is a global minimum with probability N–d. For k = d, the minimum in any one coordinate is a partial minimum and thus there is at least one partial minimum. Indeed, the decay exponent
in 2 vanishes. This special case is discussed separately.
3. The density of minima
The density Pd,k(x) of k minima located at x is obtained analytically through a formal generalization of the heuristic argument above. For example, in two dimensions, the density is
The factor N is the number of ways to choose the minimum, and the second factor guarantees that the rest of the points do not dominate the minimum at (x1, x2). These points must not fall inside an L-shaped region of area x1 + x2 – x1 x2 or equivalently 1 – (1 – x1)(1 – x2) when k = 1, or a rectangle of area x1 x2 when k = 2, as illustrated in figure 2.

| Figure 2. Illustration of the excluded area for a global minimum (k = 1, left) and points on the efficient set (k = 2, right) in two dimensions. Points in the gray region k-dominate the distinguished point. |
In general, the density of minima
reflects that the N – 1 points are excluded from a d-dimensional region of volume Gd,k(x). The excluded volume obeys the recursion
In our notation, the dimensional index of a function dictates the dimension of its vectorial argument so the vectors on the right-hand side of 4 have d – 1 components. We obtain the recursion relation 4 by separating the excluded region into two regions: one in which the dth coordinate is dominant and one in which it is not. Using the boundary conditions Gd,0 = 1 and Gd,k = 0 when k > d, we recover G1,1 = x1 and G2,1 = x1 + x2 – x1 x2. Furthermore,
In general, Gd,d = ∏di = 1 xi and Gd,1 = 1 – ∏di = 1(1 – xi).
4. Scaling
In the limit N → ∞, the product term x1 x2 in P2,1 = N[1 – (x1 + x2 – x1 x2)]N–1 is negligible compared with the linear term x1 + x2 and thus,
Generally, only the kth degree terms are asymptotically relevant and the leading behavior is
The auxiliary function Fd,k(x) contains
terms, each a distinct product of degree k. For example,
The auxiliary function equals the sum, Fd,1 = ∑di = 1 xi, and the product, Fd,d = ∏di = 1 xi, in the two extremes. The function Fd,k(x) is defined recursively
for 1 ≤ k ≤ d with the boundary condition F0,k = δk,0. This recursion follows from 4 by dropping the higher-degree term xdGd–1,k(x).
The asymptotic behavior 5 can be recast in the scaling form
as N → ∞. The scaling variable is z = x N1/k, in accord with 1, and the scaling function is
The average number of k minima equals the integral of the density, Ad,k = ∫dx Pd,k(x), where ∫dx ≡ ∏di = 1∫10dxi.Note1 When k < d, the asymptotic behavior of the average follows from the scaling form 7,
, and is in agreement with 2. The proportionality constant ad,k equals the integral of the scaling function, ad,k = ∫dzΦd,k(z), although now, the integration range is unrestricted, ∫dz ≡ ∏di = 1∫∞0 dzi. The prefactor is trivial for perfect minima, ad,1 = 1, and otherwise, it can be obtained analytically only in a few exceptional cases.
5. Extreme statistics
Since global minima are constrained along all cost coordinates, extremely large costs are exponentially rare, whenever such a global minimum exists. Because we have relaxed the minimality condition, this may not necessarily be the case for partial minima. In our voting example, a candidate who is attractive to a voter on a multitude of issues may be extremely unattractive on a particular one. How likely is such a scenario?
We begin our study of extremal statistics [10–12] by first considering the distribution of the largest coordinate in a partial minimum. Without loss of generality, we order the coordinates x1 < x2 < · · · < xd–1 < xd. Our focus is on the tail of the distribution of the variable xd, corresponding to the regime xd
xd–1. We also restrict our attention to the limit N → ∞. The distribution Q1(xd) of the largest coordinate xd equals the integral of the multivariate distribution with respect to the rest of the coordinates,
The second line is obtained by substituting the leading asymptotic behavior 5 and the third line reflects that only the first term in 6 is relevant when xd
xi for all i < d. Our last step is to multiply and divide the third line by xd and then invoke the scaling law 2 for the average number of k – 1 minima in d – 1 dimensions. In essence, we utilize the fact that when one of the coordinates is very large, the partial minima criterion involves one less constraint in one less dimensionNote2. The power-law decay of the distribution 9 shows that there is a substantial likelihood that xd is relatively large.
The distribution Q2(xd–1) of the second largest coordinate xd–1 is obtained using the bivariate distribution
,
The distribution Q2(xd–1) equals the integral of the bivariate distribution with respect to the largest coordinate,
. This integral is dominated by the divergence at the lower limit of integration, and consequently,
The power-law tail is now steeper.
A similar calculation applies to the distributions of the k – 1 largest elements. In general, the distribution Qj(yj) of the jth largest element, yj, with the definition yj ≡ xd + 1–j, decays as a power law,
for 1 ≤ j ≤ k – 1. The decay exponent increases monotonically with the index j,
We can verify the decay law 2 using
, where the lower limit of integration is set by the typical size scale 1. Interestingly, there are k – 1 distinct scaling behaviors for the k – 1 largest elements. Each of these extremal coordinates is distributed according to a power-law distribution that is characterized by a distinct exponent.
This multiscaling behavior affects the behavior of the moments
ymj
defined as follows,
ymj
= Im/I0, where
. The integral Im is dominated by the divergence at the lower cutoff when the order is small, m ≤ αj, but otherwise, the integral Im is finite. Consequently, the moments have the following scaling dependences on N:
Low-order moments exhibit ordinary scaling behavior as they are characterized by the typical size scale 1 that underlies the multivariate distribution function 8. As usual, there is a logarithmic correction at the crossover. High-order moments plateau at a fixed value that is independent of the index m, an indication that there is a significant probability that the extreme elements are of order 1. Interestingly, the average size of the different coordinates may follow different scaling laws. For example, there are two scaling laws,
y1
~ N–1/6 and
y2
~ N–1/3 when d = 4 and k = 3. Of course, the sum ∑di = 1 xi has the same extremal statistics as does xd.
The crossover moment or equivalently the exponent αj diverges as k → j. Therefore, the smallest d + 1 – k coordinates exhibit the ordinary scaling behavior
for k ≤ j ≤ d, and all moments of the respective distribution functions must be finite. In these cases, the distribution functions Qj have tails that are as sharp as or sharper than an exponential. In the aforementioned case d = 4 and k = 3, the third and the fourth largest coordinates exhibit the ordinary scaling,
y3
~
y4
~ N–1/3.
6. Efficient sets
The set of points that are not dominated on all coordinates by any other point are partial minima when k = d (figure 3). We refer to this set as the `efficient set'. The efficient set, also termed the efficient frontier or Pareto equilibria, plays a central role in multi-objective optimization and has been studied extensively in economics, computer science, operations research and game theory [13, 14]. Since there is no objective trade-off between costs, every point in the efficient set is potentially a solution to the multi-objective optimization problem. The study of the properties of efficient sets was the original motivation for our research.

| Figure 3. Illustration of the efficient set in two dimensions. Filled squares are on the efficient set and unfilled squares are not. Only four of the filled squares are on the convex hull. |
In the special case k = d, the expected size of the efficient set, Ed(N) ≡ Ad,d(N), obeys the recursion
The point with the largest xd coordinate certainly does not dominate any other point. Furthermore, this point is on the efficient set if and only if the rest of its d – 1 coordinates are not dominated by any other point. This event occurs with probability
and hence the second term in the recursion. We note that the recursion 16 can also be obtained by performing the integration over xd in Ed(N) = N∫dx[1 – x1 x2 · · · xd]N–1. This integration is analytically feasible only if k = 1 or k = d.
The recursion relation 16 is subject to the boundary condition E1(N) = 1. In two dimensions,
or alternatively, E2(N) = H(N), where
is the harmonic number. The average size of the efficient set grows logarithmically, E2(N) = ln N + γ + · · ·, where γ = 0.57721 is Euler's constant. In three dimensions, we have
, and asymptotically,
. The large-N behavior is obtained in general by converting the difference equation 16 into a differential equation dEd/dN = Ed–1/N. The expected size of the efficient set grows logarithmically,
This logarithmic growth reflects that the integral of the scaling function, ∫dzΦd,d(z), is divergent at the upper limit. A straightforward generalization of the calculation above shows that the distribution of the extremal coordinates has a logarithmic correction,
for 1 ≤ j ≤ d – 1. We can verify that the average number of points is consistent with the exact behavior
as in 18. The crossover moment vanishes and the moments decay logarithmically,
where m > 0 and 1 ≤ j ≤ d – 1.
7. Two dimensions
For the case d = 2, we obtain closed form expressions for the distribution function of partial minima. This permits us to establish central limit-type behaviors for the distribution of the size of the efficient set.
In two dimensions, the distribution function pn(N) for the event that the efficient set includes n points, where 1 ≤ n ≤ N, satisfies the recursionNote3
and is subject to the boundary condition Pn(0) = δn,0. On the square, there are two coordinates: x1 and x2. We can derive 21 by alluding to the same reasoning behind 16, i.e., the point with the largest x2 coordinate will be on the efficient set if and only if its x1 coordinate is minimal, an event that occurs with probability N–1.
Recursion equations for the average E(N) =
n
and the variance V(N) =
n2
–
n
2 with
f(n)
≡ ∑Nn = 1 f(n)Pn are obtained by summing 21. The average satisfies E(N) = E(N – 1) + N–1 in accord with 16 and the variance satisfies V(N) = V(N – 1) + N–1 – N–2. Thus, the variance equals the difference between the first and the second harmonic numbers
where H(2)(N) = ∑Nn = 1 n–2. The variance and the average have identical leading asymptotic behaviors,
.
With the transformation
, the auxiliary function
satisfies the recursion
with
. This recursion defines the Stirling numbers
[15], so
. Therefore, the full probability distribution is expressed in closed form,
for 0 ≤ n ≤ N.
The general asymptotic behavior, derived in [16],
applies in the limit n → ∞, N → ∞ with the ratio n/ln N finite. For small n
ln N, the distribution is Poissonian, Pn(N) = N–1(ln N)n–1/(n – 1)! and for large n, the distribution approaches a Gaussian centered at the average E(N) ≊ ln N with the variance V(N) ≊ ln N,
We note that the convex hull, a subset of the efficient set (see figure 3), is characterized by similar statistical properties including a limiting Gaussian distribution and logarithmic growths, albeit with different prefactors, of the average and the variance [17–19].
8. Multi-objective shortest path
The multi-objective shortest path on a graph is defined as follows. Consider a graph, possibly with multiple edges connecting pairs of nodes, with d different costs on each edge. Fix the source and the destination nodes, and then consider all paths from source to destination, assigning d total costs to each path computed as the sum of the d individual costs of the path's constituent edges. The multi-objective shortest path problem consists of finding the efficient set of paths. Generally, finding the efficient set is an NP-hard problem, although less demanding approximation schemes exist [20, 21]. Nevertheless, the computation time of the approximation scheme depends crucially on the size of the efficient set.
Suppose the edge costs are independent, random draws from a common distribution. We can consider two limiting topologies. First, for a graph of two nodes connected by N edges, the number of elements in the efficient set grows poly-logarithmically in the number of edges as shown in 18. Second, for a one-dimensional chain of nodes where each pair of neighboring nodes is connected by a pair of edges, the total path costs become correlated [20], even though the individual edge costs are not. We have conducted numerical studies that find that the size of the efficient set is highly sensitive to the distribution of edge costs. Assuming each edge has two costs (w1, w2), both chosen from some continuous distribution, the convex hull grows linearly in the length of the chain. Interestingly, we observed various behaviors for the size of the efficient set, ranging from linear in the length of the chain, to power-law behavior, characterized by exponents greater than unity, up to stretched exponential behavior.
Finally, we consider Erdös–Renyi random graphs [22–24]. Using the fact that the shortest path between two randomly chosen nodes grows logarithmically with the total number of nodes in the graph and the fact that paths that are close in length to the shortest path weakly overlap and hence their costs are weakly correlated, the results in this paper can be used to heuristically show [25] that the size of the efficient set of paths grows poly-logarithmically with the number of nodes as in 18. This number is much smaller than the number found for chains where the paths are correlated.
9. Conclusions
We proposed partial minima as a protocol for identifying pseudo-optimal solutions to multi-objective optimization problems. Partial minima are defined by a parameter k: a point in d dimensions that dominates all other points on at least d – k coordinates is a partial minimum. As this optimality criterion becomes more stringent, partial minima improve in quality but are less probable. In the extreme case k = d, the number of partial minima grows logarithmically with the total number of points.
Remarkably, there is a series of distinct power-law distributions that characterize the largest coordinates with a consequent multiscaling distribution of the moments, while the rest of the coordinates obey ordinary scaling. Viewed as quasi-optimal solutions to multi-objective optimization problems, partial minima involve a trade-off. When the optimality criterion is relaxed, these quasi-optima become more likely, but are more likely to incur at least one extremely large cost.
Our results hold as long as the set of points are not correlated, that is, as long as they are drawn from independent distributions. These distributions need not be identical. If the ith coordinate is drawn from the distribution fi(xi), the transformation
and dxi → fi(xi) dxi maps to a uniform distribution in the unit hypercube. Correlations present an interesting challenge and we anticipate serious modifications to the scaling laws above. For instance, it is simple to show that the size of the efficient set grows as a power of the number of points, ~N1/2, rather than a logarithm, when the points are uniformly distributed inside the unit circle. Incidentally, this growth is much faster than the N1/3 for the corresponding number of points in the convex hull [17].
Another interesting issue is the crossover from the algebraic decay 2 to the logarithmic growth 18. The average number of partial minima decreases monotonically with N when k is small, but is a non-monotonic function of N when k is large. For example, when d = 4 and k = 3, the average Ad,k peaks at N = 16. It will be interesting to elucidate how the height and the location of this peak scales with N.
Acknowledgments
We thank Gunes Ercal-Ozkaya for suggesting random graphs and Paul Krapivsky for useful discussions on convex hulls. We acknowledge financial support from DOE grant DE-AC52-06NA25396.
ReferencesNotes
E Ben-Naim et al 2007 J. Phys. A: Math. Theor. 40 F1021
M Janoschek et al 2005 J. Phys.: Condens. Matter 17 L425
Byeong-Gon Park and Hwankyung Sung 2002 The Astronomical Journal 123 892
Sébastien Guenneau et al 2007 New J. Phys. 9 399
C B Yang 2004 J. Phys. A: Math. Gen. 37 L523
Charles E. Worley and Brian D. Mason 1998 The Astronomical Journal 116 917
R. R. Gal et al. 2008 ApJ 684 933
M J Everitt et al 2005 New J. Phys. 7 64
Michael Stock and Thomas J Witt 2006 Metrologia 43 583
Shardha Jogee et al. 2002 ApJ 575 156