Paper The following article is Open access

Statistical mechanics analysis of generalized multi-dimensional knapsack problems

, and

Published 18 August 2022 © 2022 The Author(s). Published by IOP Publishing Ltd
, , Citation Yuta Nakamura et al 2022 J. Phys. A: Math. Theor. 55 375003 DOI 10.1088/1751-8121/ac8381

1751-8121/55/37/375003

Abstract

Knapsack problem (KP) is a representative combinatorial optimization problem that aims to maximize the total profit by selecting a subset of items under given constraints on the total weights. In this study, we analyze a generalized version of KP, which is termed the generalized multidimensional knapsack problem (GMDKP). As opposed to the basic KP, GMDKP allows multiple choices per item type under multiple weight constraints. Although several efficient algorithms are known and the properties of their solutions have been examined to a significant extent for basic KPs, there is a paucity of known algorithms and studies on the solution properties of GMDKP. To gain insight into the problem, we assess the typical achievable limit of the total profit for a random ensemble of GMDKP using the replica method. Our findings are summarized as follows: (1) when the profits of item types are normally distributed, the total profit grows in the leading order with respect to the number of item types as the maximum number of choices per item type xmax increases while it depends on xmax only in a sub-leading order if the profits are constant among the item types. (2) A greedy-type heuristic can find a nearly optimal solution whose total profit is lower than the optimal value only by a sub-leading order with a low computational cost. (3) The sub-leading difference from the optimal total profit can be improved by a heuristic algorithm based on the cavity method. Extensive numerical experiments support these findings.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Combinatorial optimization is a popular topic related to numerous research fields. It is deeply connected to computer science and the theory of algorithms, and it is frequently applied to real-world problems in the field of operations research. The knapsack problem (KP) is a major combinatorial optimization problem such as the traveling salesman problem and the minimum spanning tree problem [1]. There are many variants of KP, but all of them aim to maximize the total profit by selecting a subset of items as long as they do not violate the given constraints for total weights. KP has been extensively examined for a long time because of its wide applicability. Its application includes resource allocation problems [2], cutting or packing stock problems [3], and capital budgeting problems [4, 5].

The basic and most well-known version of KP, which we refer to as 0–1 one-dimensional KP (0–1 1DKP), is defined as follows: suppose that there are N item types. Each item type i, 1 ⩽ iN, has two characteristic quantities vi ∈ (0, ) and wi ∈ (0, ), where vi stands for a profit that an item of type i possesses while wi means a weight of the item. The 0–1 1DKP aims to find a way to select items to put in a knapsack such that the total profit is maximized under the constraint that each item type can be selected at most only once, and the total weight does not exceed the capacity of the knapsack. The problem is mathematically formulated as follows:

where U and ${\sum }_{i=1}^{N}{w}_{i}{x}_{i}$ denote the total profit and total weight, respectively. Variables xi denote the number of item types i to be placed in the knapsack, and C denotes the capacity of the knapsack.

We herein address the generalized multidimensional knapsack problem (GMDKP) by extending 0–1 1DKP in the following two directions:

  • (a)  
    Introduction of multiple constraints on total weights as ${\sum }_{i=1}^{N}{w}_{\mu i}{x}_{i}\leqslant {C}_{\mu }$ for μ ∈ {1, ..., K}, which is termed as multi-dimensionalization [6].
  • (b)  
    Relaxation of the maximum number up to which each item type can be chosen [6]. This implies that we allow each item type i to be selected up to ${x}_{i}^{\mathrm{max}}\enspace $ times. The 0–1 1DKP corresponds to the case of ${x}_{i}^{\mathrm{max}}=1$.

Specifically, GMDKP is expressed as follows:

Two issues are worth discussing here. The first issue is related to the theoretically achievable limit of the total profit of the GMDKP, which offers a baseline for examining the performance of search algorithms. Korutcheva et al [7] considered a multidimensional version and analyzed the typical properties of solutions under a random set up while maintaining ${x}_{i}^{\mathrm{max}}=1\ (\forall i\in \left\{1,\dots ,N\right\})$. We term this version the multidimensional knapsack problem (MDKP). Later, Inoue [8] examined MDKP by relaxing non-negative integers xi to spherically constrained real numbers. However, no such typical case performance analyses have been obtained for GMDKP.

The second issue concerns the algorithm for finding the solution of GMDKP. Even in the basic case, KPs are known to belong to the class of NP-hard [9]. A significant amount of effort has been made to overcome the computational hardness, which can be classified into two directions. The first direction involves searching for exact solutions. It has been empirically shown that methods based on dynamic programming [10] and the branch-and-bound method [11] can find exact solutions very efficiently for many instances of 0–1 1DKP. However, such methods are not applicable to GMDKP. Therefore, for the extended KPs, we should resort to the second direction, which aims to efficiently search good approximate solutions. In this direction, it is experimentally shown that a greedy-type heuristic algorithm can find fairly good approximate solutions for randomly generated GMDKPs [12]. However, its possibilities and limitations have not been theoretically clarified yet.

In view of the current situation, we analyze a random ensemble of GMDKP using the replica method to clarify the typically achievable limit of the total profit. The result of the analysis indicates that when the profits of item types are normally distributed, the total profit grows in the leading order with respect to the number of item types as the maximum number of choices per item type xmax increases while it depends on xmax only in a sub-leading order if the profits are constant among the item types. Besides, it also implies that a nearly optimal solution whose total profit is lower than the optimal value only by a sub-leading order can be found by the aforementioned greedy-type algorithm with a low computational cost. However, further improving the total cost in the sub-leading order is still non-trivial. For accomplishing this, we develop a heuristic algorithm based on the cavity method. Extensive numerical experiments support the analytical results and the usefulness of the developed algorithm.

The remainder of this paper is organized as follows. In the next section, we introduce a random ensemble of GMDKP, which is analyzed. In section 3, we theoretically examine the typically achievable limit of the total profit for the introduced problem ensemble. In section 4, we numerically validate the results obtained in section 3. We also develop a heuristic algorithm for the sub-leading order improvement. The final section is devoted to discussion and future prospects.

2. Problem setup

To examine the typical properties of GMDKP, we consider an ensemble that is characterized by the following simplified conditions:

  • Fix ${x}_{i}^{\mathrm{max}}$ to a constant xmax for ∀i ∈ {1, ..., N}.
  • Cμ CN for ∀μ ∈ {1, ..., K}, where C > 0 is a proportional constant.
  • vi  (∀i ∈ {1, ..., N}) are independently distributed from an identical Gaussian distribution $\mathcal{N}(V,{\sigma }_{V}^{2})$, where 0 $\leqslant $ σV V.
  • wμi  (∀i ∈ {1, ..., N}, ∀μ ∈ {1, ..., K}) are independently distributed from another identical Gaussian distribution $\mathcal{N}(W,{\sigma }_{W}^{2})$, where 0 < σW W.

A distinctive feature of the KP is the positivity of the profit (vi ) and weight (wμi ) parameters. The current simplifying setup incorporates this feature with a small number of parameters although possible correlations among the parameters that may exist in realistic problems are ignored.

3. Statistical mechanics analysis on the optimal solution

We compute the typical value of the achievable U in the limit of N, K by maintaining their ratio K/N = α ∈ [0, ). In the following, we use N as a shorthand notation of this scaling limit to avoid cumbersome expression. Hence, we first transform the total weights to appropriate expressions under the assumption that the solution is placed in the vicinity of the boundaries of the weight constraints [7], which means that the total number of chosen items satisfies

Equation (1)

Inserting wμi = W + ξμi , where ${\xi }_{\mu i}\sim \mathcal{N}(0,{\sigma }_{W}^{2})$, into the definitions of total weights, this yields the following decomposition with respect to the total weights.

Equation (2)

where

M controls the difference of the total number of chosen items from its leading term NC/W in the scale of $O(\sqrt{N})$. These expressions indicate that the total weights are constant in the leading order of O(N) and vary in the next order of $O(\sqrt{N})$ based on the choice of $\boldsymbol{x}\in {\left\{0,\dots ,{x}^{\mathrm{max}}\right\}}^{N}$. Handling $-U=-{\sum }_{i=1}^{N}{v}_{i}{x}_{i}=-NVC/W-{\sum }_{i=1}^{N}{\eta }_{i}{x}_{i}$ as a Hamiltonian, where ${\eta }_{i}\sim \mathcal{N}(0,{\sigma }_{V}^{2})$, we compute the partition function with the inverse temperature β > 0 as

Equation (3)

where Θ(x) = 1 (x ⩾ 0) and 0 (x ⩽ 0), and Tr x denote the summation with respect to all possible choices of $\boldsymbol{x}=({x}_{i})\in {\left\{0,1,\dots ,{x}^{\mathrm{max}}\right\}}^{N}$.

Zβ ( ξ , η , M) varies randomly depending on the realization of ξ = (ξμi ) and η = (ηi ), and it is supposed to scale exponentially with respect to N. This implies that its typical behavior can be examined by assessing the average of its logarithm (free entropy). This naturally leads to the use of the replica method. More specifically, for $n=1,2,\dots \in \mathbb{N}$, we compute the moment ${\mathbb{E}}_{\boldsymbol{\xi },\boldsymbol{\eta }}[{Z}_{\beta }^{n}(\boldsymbol{\xi },\boldsymbol{\eta },M)]$, where ${\mathbb{E}}_{\boldsymbol{\xi },\boldsymbol{\eta }}[\dots ]$ denotes the average operation with respect to ξ and η , as a function of n and continue the obtained functional expression to $n\in \mathbb{R}$. Subsequently, we evaluate the average free entropy per item type using the identity

After some calculations (details are provided in appendix A), this procedure provides the concrete expression of the average free entropy under the replica symmetric (RS) assumption as follows:

Equation (4)

where extrX {...} denotes the operation of the extremization of ... with respect to X, and

The typical value of the maximum total profit (per item type) is assessed as

Equation (5)

In the following, we describe the results obtained by the above computation for ${\sigma }_{V}^{2} > 0$ and ${\sigma }_{V}^{2}=0$, separately, as they are considerably different between the two cases.

Case of ${\sigma }_{V}^{2} > 0$.

In the limit of β, the variables in (4) scale so as to satisfy χ = β(Qq) ∼ O(1), $E=(\hat{Q}+\hat{q})/\beta \sim O(1)$, $F=\hat{q}/{\beta }^{2}\sim O(1)$, $G=\hat{M}/\beta \sim O(1)$. Using the new variables, the extremum condition is expressed as

where

Equation (6)

The solution determined by these offers the maximum per item type total profit $\mathcal{U}$ as

and entropy (per item type) as

However, we must keep in mind that the solution is obtained under the RS ansatz, which may not be valid for β. The stability analysis against the perturbation that breaks the replica symmetry [13] indicates that the RS solution is locally unstable if

Equation (7)

is satisfied. Equation (6) means that x* varies discontinuously by unity at certain values of z, which leads to $\int \mathrm{D}z{\left(\frac{\partial {x}^{\ast }}{\partial G}\right)}^{2}=+\infty $ [14]. These conclude that (7) is satisfied, which indicates that the RS solution is invalid, as long as M is finite. As the replica symmetry breaking (RSB) implies that finding the lowest energy (the optimal profit) solution is challenging due to ragged energy landscapes, this also suggests that designing the way of optimally packing $NC/W\pm O(\sqrt{N})$ items in the knapsack is computationally difficult.

Meanwhile, (7) also implies that the validity of the RS solution is recovered for M → − as $H\left(-\frac{WM}{{\sigma }_{W}\sqrt{Q}}\right)\to 0$ holds. In addition, as E, F → 0 holds, we can obtain an analytical expression of the RS solution in this limit as

Equation (8)

which yields

Equation (9)

where A is the solution of H(A) = C/(xmax W).

Equation (8) corresponds to the solution obtained by choosing NC/W + o(N) items from the item types of higher vi values, which we term the 'greedy packing'. This bounds the leading order term of U from above by $N(VC/W+{x}^{\mathrm{max}}{\sigma }_{V}\frac{{\mathrm{e}}^{-{A}^{2}/2}}{\sqrt{2\pi }})$ since choosing O(N) items further on top of the NC/W + o(N) items typically breaks some of the weight constraints. On the other hand, the upper bound is easily achieved by choosing $NC/W-O\left({N}^{1/2+{\epsilon}}\right)\ (0< {\epsilon}< 1/2)$ by the greedy packing. This is because the largest value of the fluctuation terms uμ  (μ = 1, ..., αN) in (2) scales typically as $O\left({(N\,\mathrm{log}(N))}^{1/2}\right)$, and therefore, all the weight constraints are typically satisfied if we set a sufficient size 'margin' by reducing the total number of chosen items from NC/W by $O\left({N}^{1/2+{\epsilon}}\right)$ without changing the leading O(N) term of U.

In summary, we obtain the following results:

  • The optimal total profit grows with xmax in the leading term of O(N).
  • Finding the truly optimal solution would be computationally difficult.
  • However, achieving the total profit that is lower than the optimal value only by $O\left({N}^{1/2+{\epsilon}}\right)$ would be easy by employing the greedy packing.

Case of ${\sigma }_{V}^{2}=0$.

Unlike the case of ${\sigma }_{V}^{2} > 0$, the variables in (4) remain O(1) even in the limit of β. This is because the constraint for the total number of chosen items (1) makes β irrelevant to the extremum condition of (4). More precisely, expressing $\tilde{M}=\hat{M}+\beta V$, the extremum of (4) is characterized by

independently of β, where

The resultant solution yields the per item type total profit for N as

Equation (10)

which, unlike the case of ${\sigma }_{V}^{2} > 0$, does not vary with xmax. The solution also offers the per item type entropy as

Equation (11)

which does not depend on β.

Figure 1 shows S versus M for several values of α for xmax = 1, which indicates that for all the value of α, S becomes positive for M below a certain critical values of Mopt(α) at which S vanishes. Such behavior is also the case for xmax ⩾ 2. Figure 2 plots Mopt(α) versus xmax. This indicates that Mopt(α) increases with xmax, but almost saturates for xmax ≳ 10. The local stability of the RS solution would be broken if

Equation (12)

were satisfied. However, (12) does not hold for M < Mopt(α) indicating that the RS solution is valid.

Figure 1.

Figure 1.  S versus M for several values of α. Parameters are set as W = 1.0, C = 0.5, and ${\sigma }_{W}^{2}=0.01$.

Standard image High-resolution image
Figure 2.

Figure 2. Behavior of Mopt for xmax = 1, 2, 3, 10, and 100. Mopt almost saturates for xmax ≳ 10. The parameters are set as in figure 1.

Standard image High-resolution image

To summarise, we reach the following conclusions:

  • The optimal total profit does not depend on xmax in the leading term of O(N).
  • Solutions are highly degenerated up to the sub-leading term of $O(\sqrt{N})$ of the total profit. This has also been pointed out for the case of xmax = 1 in [7]. For M < Mopt(α), exponentially many solutions achieve the value of total profit $U=NVC/W+\sqrt{N}M$. The solutions vanish at M = Mopt(α), which indicates that the optimal total profit is provided as
    Equation (13)
  • The total profit is improved monotonically in the term of $O(\sqrt{N})$ by increasing xmax, but almost saturates for xmax greater than a moderate value.

4. Search algorithms

4.1. Achieving leading order optimality by greedy packing

The results in the previous section imply that the greedy packing can find a nearly optimal solution with a low computational cost. For confirming this, we carried out numerical experiments using OR-Tools by Google [15] and PECHγ by Akçay et al [12].

OR-Tools provides an exact algorithm that can efficiently find the exact solutions for problems of moderate sizes. However, the necessary computational cost still grows exponentially with respect to N in the worst case. Therefore, its use is practically limited to N of several tens. Additionally, it is applicable only for 0–1 MDKP. On the other hand, PECHγ is a greedy-type heuristic in which the greediness is controlled by a parameter γ ∈ (0, 1]. By setting γ = 1, in each iteration, it chooses at most xmax items from the remainders so as to maximize the increase of the total profit until a certain weight constraint is violated. This realizes the greedy packing.

Figure 3 plots the achieved per item type total profit versus the number of item types N for a case of ${\sigma }_{V}^{2} > 0$. As mentioned in section 3, the largest value uμ  (μ = 1, ..., αN) in the weight constraints (2) scales as O((N log N)1/2). This implies that for finite N, the difference of the per item type total profit achieved by PECH1.0 from $\mathcal{U}$ of (9) is proportional to N−1/2(log N)1/2 in the leading order. On the other hand, such dependence for the exact solution obtained by OR-tools is nontrivial, but we speculate that the leading order is O(N−1/2). This is because we should be able to construct physically valid replica solutions at least in a certain range of finite M by taking RSB into account, which means that the total number of chosen items can grow from NC/WO((N log N)1/2) to NC/WO(N1/2) by optimizing the choice of items. Values extrapolated from experimental data to N assuming the abovementioned scaling forms exhibit considerably good agreement with theoretical prediction (9) in view of the corrections by terms of o(N−1/2(log N)1/2).

Figure 3.

Figure 3. Per item type total profit obtained experimentally by PECH1.0 (greedy packing) and OR-Tools (exact algorithm) for C = 0.5, W = 1.0, ${\sigma }_{W}^{2}=0.01$, V = 1.0, ${\sigma }_{V}^{2}=0.01$, and α = 0.1. Red and black symbols stand for data by PECH1.0 and OR-Tools, respectively. Error bars denote one standard error. Data of OR-Tools are plotted only for xmax = 1 as the algorithm is not applicable for xmax ⩾ 2. Blue symbols on the vertical lines denote the theoretical prediction (9) for N. Left panel: extrapolated values for N, $\mathcal{U}(\infty )$, were determined under the assumptions $\mathcal{U}(N)=\mathcal{U}(\infty )-a{N}^{-1/2}{(\mathrm{log}\,N)}^{1/2}$ (PECH1.0) and $\mathcal{U}(N)=\mathcal{U}(\infty )-a{N}^{-1/2}$ (OR-Tools). These show considerably good agreement with the theoretical predictions. For reference, extrapolated values based on the assumption $\mathcal{U}(N)=\mathcal{U}(\infty )-a{N}^{-1/2}$ are also plotted as black symbols for PECH1.0. The larger deviations from the theoretical predictions support the relevance of the log correction in the scaling. Right panel: the same data plotted versus N−1/2 both for PECH1.0 and OR-Tools.

Standard image High-resolution image

The results for ${\sigma }_{V}^{2}=0$ are plotted in figure 4. We employed the scaling forms of O(N−1/2(log N)1/2) and O(N−1/2) for PECH1.0 and OR-tools, respectively, as in the case of ${\sigma }_{V}^{2} > 0$. The employment of O(N−1/2) for OR-tools is more reasonable than that for ${\sigma }_{V}^{2} > 0$ as the finiteness of M for the optimal solution is supported by the stability of the RS solution. Meanwhile, as shown in figure 2, Mopt, which corresponds to the prefactor of the term of O(N−1/2) in $\mathcal{U}(N)$ of the exact solution, almost vanishes for the examined parameter setting. We, therefore, determined the value of $\mathcal{U}(\infty )$ for OR-Tools by the extrapolation under the assumption of $\mathcal{U}(N)=\mathcal{U}(\infty )-a{N}^{-1/2}-b{N}^{-1}$ taking into account the contribution of the higher order term of O(N−1). This as well as the extrapolation for PECH1.0 again results in significantly good accordance with the replica prediction. However, unlike the case of ${\sigma }_{V}^{2} > 0$, the achieved per item type total profit depends little on xmax.

Figure 4.

Figure 4. Per item type total profit obtained experimentally by PECH1.0 (greedy packing) and OR-Tools (exact algorithm) for ${\sigma }_{V}^{2}=0$ and α = 0.5. The other parameters and implications of symbols and panels are the same as in figure 3. In this setup, Mopt, which corresponds to the prefactor of the term of O(N−1/2) in $\mathcal{U}(N)$, almost vanishes as shown in figure 2. This indicates that we should carry out the extrapolation for OR-Tools under the assumption of $\mathcal{U}(N)=\mathcal{U}(\infty )-a{N}^{-1/2}-b{N}^{-1}$ taking into account the higher order contribution of O(N−1). The extrapolated value as well as those for PECH1.0 exhibits significantly good agreement with the replica prediction.

Standard image High-resolution image

4.2. Performance improvement in sub-leading order by cavity method

The greedy packing implemented by PECH1.0 achieves the leading order optimality with an O(N) computational cost. However, for finite N, the achieved total profit is still lower than the truly optimal profit by O((N log N)1/2). We here develop a method for reducing the gap using the cavity method [16].

For this purpose, we consider the 'canonical distribution' of x = (xi ), ${x}_{i}\in \left\{0,1,\dots ,{x}_{i}^{\mathrm{max}}\enspace \right\}$, i = 1, ..., N that satisfy all the weight constraints given by $D=\left\{{w}_{\mu i},{C}_{\mu }\right\}\ \left(\mu =1,\dots ,K,\right.\left.i=1,\dots ,N\right)$ as

Equation (14)

and evaluate the marginal distributions ${p}_{i}\left({x}_{i}\vert D\right)={\sum }_{\boldsymbol{x}{\backslash}{x}_{i}}\;p\left(\boldsymbol{x}\vert D\right)$, where β > 0 is a parameter for controlling the emphasis on the total profit. Ξ is a normalization constant. Then, we find item type i* that maximizes the probability of being non-zero ${p}_{i}({x}_{i}\ne 0\vert D)={\sum }_{{x}_{i}\ne 0}{\;p}_{i}\left({x}_{i}\vert D\right)$, put one item of i* in the knapsack, and reduce ${x}_{{i}^{\ast }}^{\mathrm{max}}$ by one as ${x}_{{i}^{\ast }}^{\mathrm{max}}{\leftarrow}{x}_{{i}^{\ast }}^{\mathrm{max}}-1$. We also subtracted the upper bounds of weight Cμ as ${C}_{\mu }{\leftarrow}{C}_{\mu }-{w}_{\mu {i}^{\ast }}\ \left(\mu =1,\dots ,K\right)$. We repeat these procedures as long as the weight constraints are satisfied. After the final repetition, the items in the knapsack constitute an approximate solution. We refer to this procedure as 'marginal-probability-based greedy strategy' (MPGS). The pseudo-code for the procedure is summarized in algorithm 1.

Algorithm 1. MPGS.

Input: $D=\left\{{w}_{\mu i},{C}_{\mu }\right\},{v}_{i},{x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N,\ \mu =1,\dots ,K)$
Output: xi  (i = 1, ..., N)
 xi := 0 (i = 1, ..., N)
 i* := 0
 while Cμ ⩾ 0, ∀μ and ${x}_{i}^{\mathrm{max}} > 0,\exists i$ do
  Evaluate ${p}_{i}({x}_{i}\vert D)\ (i=1,\dots ,N,{x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$ with algorithm 2 or 3
  i* ←argmaxi pi (xi ≠ 0|D)
  if ${x}_{{i}^{\ast }}^{\mathrm{max}}=0$ or ${C}_{\mu }-{w}_{\mu {i}^{\ast }}< 0,\enspace \exists \mu $ then
   break
  else
   ${x}_{{i}^{\ast }}{\leftarrow}{x}_{{i}^{\ast }}+1$
   ${x}_{{i}^{\ast }}^{\mathrm{max}}\enspace {\leftarrow}{x}_{{i}^{\ast }}^{\mathrm{max}}-1$
  for μ = 1 to K do
   ${C}_{\mu }{\leftarrow}{C}_{\mu }-{w}_{\mu {i}^{\ast }}$

Unfortunately, it is computationally difficult to conduct this greedy search because the computational cost for assessing the marginal distributions ${p}_{i}\left({x}_{i}\vert D\right)$ from the joint distribution $p\left(\boldsymbol{x}\vert D\right)$ grows exponentially with respect to N. We employed the cavity method to resolve this problem. To perform this, we depict the joint distribution by a factor graph (figure 5) and recursively update messages defined on the edges between the factor and variable nodes as

Equation (15)

Equation (16)

by following the recipe of belief propagation (BP), which provides efficient algorithms for finding the solution of the cavity method [16]. After determining the messages, the marginal distribution is approximately assessed as follows:

where cμi , ciμ , and ci are normalization constants.

Figure 5.

Figure 5. Factor graph representation of (14). The messages are passed through the edges in both directions.

Standard image High-resolution image

The exact performance of the BP algorithm is still computationally infeasible as the computational cost for evaluating (15) grows exponentially with respect to N. This problem is solved by the Gaussian approximation employed in the approximate message passing technique [17, 18]. More precisely, utilizing the central limit theorem, we consider ∑ji wμj xj in (15) as a Gaussian random variable, which is characterized by mean Δμi = ∑ji wμj mjμ and variance ${V}_{\mu \to i}={\sum }_{j\ne i}{w}_{\mu j}^{2}{\chi }_{j\to \mu }$, where mjμ and χjμ denote mean and variance of ${\mathcal{M}}_{j\to \mu }\left({x}_{j}\right)$, respectively. Hence, it is possible to analytically evaluate (15) as follows:

This reduces BP of (15) and (16) for updating equations with respect to 4NK variables, miμ , χiμ , Δμi , and Vμi , which are defined as edges in the factor graph. The BP algorithm is reduced as described above, and it is summarized with the pseudo-code in algorithm 2. The computational cost can be further reduced by expressing the BP algorithm to that for variables defined for nodes, which is sometimes referred to as generalized approximate message passing (GAMP) [17, 19, 20]. Its derivation and the pseudo code (algorithm 3) are provided in appendix B.

Algorithm 2. BP.

Input: $D=\left\{{w}_{\mu i},{C}_{\mu }\right\},{v}_{i},{x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N,\ \mu =1,\dots ,K),\beta $
Output: ${p}_{i}({x}_{i}\vert D)\ (i=1,\dots ,N,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$
 ${\mathcal{M}}_{i\to \mu }({x}_{i}){:=}\frac{{\text{e}}^{\beta {v}_{i}{x}_{i}}}{{\sum }_{{x}_{i}\;\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{\text{e}}^{\beta {v}_{i}{x}_{i}}}\ (i=1,\dots ,N,\ \mu =1,\dots ,K,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$
 ${\mathcal{M}}_{\mu \to i}({x}_{i}){:=}1/({x}_{i}^{\mathrm{max}}+1)\ (i=1\,\dots \,N,\ \mu =1\,\dots \,K,\ {x}_{i}=0\,\dots ,{x}_{i}^{\mathrm{max}})$
 while messages not converged do
  Δμ := 0, Vμ := 0 (μ = 1, ..., K)
  ${\mathcal{M}}_{i}({x}_{i}){:=}\frac{{\text{e}}^{\beta {v}_{i}{x}_{i}}}{{\sum }_{{x}_{i}\;\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{\text{e}}^{\beta {v}_{i}{x}_{i}}}\ (i=1,\dots ,N,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$
  for i = 1 to N do ▹Pre-calculation
   for μ = 1 to K do
   ${m}_{i\to \mu }{:=}{\sum }_{{x}_{i}\;\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{x}_{i}{\mathcal{M}}_{i\to \mu }({x}_{i})$
   ${\chi }_{i\to \mu }{:=}{\sum }_{{x}_{i}\;\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{x}_{i}^{2}{\mathcal{M}}_{i\to \mu }({x}_{i})-{m}_{i\to \mu }^{2}$
   Δμ ← Δμ + wμi miμ
   ${V}_{\mu }{\leftarrow}{V}_{\mu }+{w}_{\mu i}^{2}{\chi }_{i\to \mu }$
   for xi = 0 to ${x}_{i}^{\mathrm{max}}\enspace $ do
    ${\mathcal{M}}_{i}({x}_{i}){\leftarrow}{\mathcal{M}}_{i}({x}_{i})\times {\mathcal{M}}_{\mu \to i}({x}_{i})$
  for i = 1 to N do
   for μ = 1 to K do
   Δμi := Δμ wμi miμ
   ${V}_{\mu \to i}{:=}{V}_{\mu }-{w}_{\mu i}^{2}{\chi }_{i\to \mu }$
   for xi = 0 to ${x}_{i}^{\mathrm{max}}\enspace $ do
    ${\mathcal{M}}_{\mu \to i}({x}_{i}){\leftarrow}H\left(\frac{{w}_{\mu i}{x}_{i}+{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}\right)$
    ${\mathcal{M}}_{i\to \mu }({x}_{i}){\leftarrow}{\mathcal{M}}_{i}({x}_{i})/{\mathcal{M}}_{\mu \to i}({x}_{i})$
   Normalize ${\mathcal{M}}_{\mu \to i}({x}_{i}),{\mathcal{M}}_{i\to \mu }({x}_{i})$
 for i = 1 to N do
  for xi = 0 to ${x}_{i}^{\mathrm{max}}\enspace $ do
   ${p}_{i}\left({x}_{i}\vert D\right){:=}{\text{e}}^{\beta {v}_{i}{x}_{i}}{\prod }_{\mu =1}^{K}\;{\mathcal{M}}_{\mu \to i}\left({x}_{i}\right)$
  Normalize ${p}_{i}\left({x}_{i}\vert D\right)$

Three issues are noteworthy. The first issue is with respect to the necessary cost of computation. Given that it is necessary to assess summations over μ = 1, ..., K and i = 1, ..., N for each of i = 1, ..., N and μ = 1, ..., K, respectively, the computational cost of this algorithm is O(NK) per update, where ${x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N)$ is assumed as O(1). Furthermore, we should repeat the computation until convergence with respect to each choice of one item, which implies that the cost of finding an approximate solution increases with respect to O(NKT), where T denotes the number of selected items, as long as the number of iterations necessary for the convergence is O(1) per choice. Although this may not be low-cost, it is still feasible in many practical situations. At the selection of the next item, starting with the convergent solution for the last choice, which is termed as 'warm start', is effective for suppressing the number of iterations. In addition, as the leading order optimality is achieved by the greedy packing, we can limit the employment of MPGS to the 'final stage' of the solution search. For instance, after obtaining a solution by PECH1.0, it would be reasonable to improve the solution by redoing the last 10% search by MPGS. In such cases, the practical system size is reduced considerably, for which the computational cost would not be a big problem.

The second is about the setting of β. The larger values emphasize the greediness of the solution search; the larger β prefers item types of the larger vi . However, too large β prevents BP from converging due to the occurrence of RSB unless vi is constant among the item types. Hence, we have to tune the value of β. In the experiments shown below, we set β = 2–10, for which BP converged.

The final issue is related to the validity of the current approximate treatment. The developed algorithm yields the exact results under appropriate conditions, as N, if wμi 's are provided as independent random variables sampled from a distribution with zero mean and finite variance [20, 21]. Unfortunately, they are biased to positive numbers in KPs, including GMDKP, which does not guarantee the accuracy of the obtained solutions. Nevertheless, the cavity/BP framework provides another advantage in terms of technical ease for computing marginal distributions. The naive mean field method (nMFM) [22] and Markov chain Monte Carlo (MCMC) are representative alternatives for assessing marginals. However, nMFM cannot be directly employed for (14) because $\mathrm{log}\,p\left(\boldsymbol{x}\vert D\right)$ diverges to − for x that do not satisfy the weight constraints. Additionally, MCMC in systems such as (14), hardly converge as they are in frozen states at any temperature [23]. Hence, they offer a rational reason for selecting the cavity/BP framework as the basis of the approximate search algorithm. Replacing BP with expectation propagation (EP) [24, 25], which can somewhat incorporate the correlations among wμi 's, can be another option. However, EP requires a higher computational cost of $O\left({N}^{3}\right)$ per update than BP, which limits its employment to relatively small systems.

Figure 6 compares the rescaled difference ${N}^{-1/2}\left(U-N\mathcal{U}(\infty )\right)$ of the achieved total profit among OR-tools, PECH1.0, and MPGS (based on BP) for N = 80 and xmax = 1 under the setting of figure 3 that corresponds to ${\sigma }_{V}^{2} > 0$. Data of OR-Tools is plotted only for α = 0.1 as obtaining data for lager α is difficult due to the limitation of computational resources. Although the total profit of MPGS is considered lower than that of OR tools, it is larger than that of PECH1.0 at all of the examined values of α. A similar tendency is also observed for ${\sigma }_{V}^{2}=0$ (figure 7). Meanwhile, error bars of MPGS for ${\sigma }_{V}^{2} > 0$ are considerably larger than those for ${\sigma }_{V}^{2} > 0$, which may be due to the influence of RSB. This implies that the total profit for ${\sigma }_{V}^{2} > 0$ cases could be further improved by generalizing BP so as to take RSB into account [26, 27].

Figure 6.

Figure 6. Rescaled difference from the leading order term $N\mathcal{U}$ of the achieved total profit for ${\sigma }_{V}^{2}=0.01$, N = 80, and xmax = 1. Other parameters are the same as in figure 3. Error bars denote one standard error. Data of OR-Tools are plotted only for α = 0.1 as obtaining data for lager α is difficult due to the limitation of computational resources.

Standard image High-resolution image
Figure 7.

Figure 7. Rescaled difference from the leading order term $N\mathcal{U}$ of the achieved total profit for ${\sigma }_{V}^{2}=0$, N = 80, and xmax = 1. Other parameters are the same as in figure 4. Error bars denote one standard error. Red curve represents the replica prediction (13) for the exact solution. Data of OR-Tools are plotted only for α = 0.1 and 0.5 as obtaining data for lager α is difficult due to the limitation of computational resources.

Standard image High-resolution image

5. Summary

In summary, we analyzed a random ensemble of GMDKP, which is a generalized version of the KP. The KP is a representative NP-hard optimization problem. Using the replica method, we assessed the achievable limit of the total profit under multiple weight constraints for typical samples of the ensemble. Our analysis showed that despite the NP-hardness, one can achieve a nearly optimal total profit that accords with the truly optimal value in the leading order with respect to the number of item types N with an O(N) computational cost. Several earlier studies report that KPs may be among the 'easiest' NP-hard problems [2831]. Although the studies argue not the approximation accuracy but the computational cost for finding the exact solution, our analysis may offer a useful clue for understanding the 'easiness' of solving KPs.

We also developed a heuristic algorithm to improve the total profit in the sub-leading order by the cavity method. Extensive numerical experiments showed that the developed algorithm outperforms other existing algorithms.

In this study, we assumed that the weight and profit parameters of GMDKP were independently provided from certain distributions. However, these parameters can show some correlations in realistic problems. Hence, examining the property of solutions for such cases is an important future task.

Acknowledgments

Useful comments from anonymous referees are appreciated. This study was partially supported by JSPS KAKENHI Grant Nos. 21K21310 (TT), 17H00764 (YK), and JST CREST Grant No. JPMJCR1912 (YK).

Data availability statement

No new data were created or analysed in this study.

Appendix A.: Details of replica calculation

Equation (A.1)

where ${\mathrm{Tr}}_{\left\{{x}_{i}^{a}\right\}}$ represents the summation with respect to all possible choices of $({x}_{i}^{a})\in {\left\{0,1,\dots ,{x}^{\mathrm{max}}\enspace \right\}}^{nN}$. By introducing the RS assumption

we have

The last term can be computed as follows:

Finally, we have

Appendix B.: GAMP

GAMP provides the following update equations for node variables as follows:

for i = 1, ..., N, and

for μ = 1, ..., K, where ${\phi }_{i}\left(a,b\right)=\mathrm{ln}\left({\sum }_{{x}_{i}=0}^{{x}_{i}^{\mathrm{max}}}\mathrm{exp}\left(-\frac{a}{2}{x}_{i}^{2}+b{x}_{i}\right)\right)$. After obtaining these variables, the marginal distributions are assessed as ${p}_{i}\left({x}_{i}\vert D\right)\!\propto \!\mathrm{exp}\left\{-\frac{{a}_{i}}{2}{x}_{i}^{2}\!+\!\left({\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}\right.\right.\left.\left.{m}_{i}+\beta {v}_{i}\right){x}_{i}\right\}$. This update rule is summarized in algorithm 3.

Algorithm 3. GAMP.

Input: $D=\left\{{w}_{\mu i},{C}_{\mu }\right\},{v}_{i},{x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N,\ \mu =1,\dots ,K),\beta $
Output: ${p}_{i}({x}_{i}\vert D)\ (i=1,\dots ,N,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$
 while not converged do
  for i = 1 to N do
   ${a}_{i}{\leftarrow}{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}^{2}}{{V}_{\mu }}{A}_{\mu }$
   ${\left.{m}_{i}{\leftarrow}\frac{\partial }{\partial h}{\phi }_{i}\left({a}_{i},{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}$
   ${\left.{\chi }_{i}{\leftarrow}\frac{{\partial }^{2}}{\partial {h}^{2}}{\phi }_{i}\left({a}_{i},{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}$
  for μ = 1 to K do
   ${V}_{\mu }{\leftarrow}{\sum }_{i=1}^{N}{w}_{\mu i}^{2}{\chi }_{i}$
   ${\left.{B}_{\mu }{\leftarrow}\frac{\partial }{\partial \theta }\,\mathrm{ln}\,H\left(\frac{{\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i}-{C}_{\mu }}{\sqrt{{V}_{\mu }}}-{B}_{\mu }+\theta \right)\right\vert }_{\theta =0}$
  ${\left.{A}_{\mu }{\leftarrow}-\frac{{\partial }^{2}}{\partial {\theta }^{2}}\,\mathrm{ln}\,H\left(\frac{{\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i}-{C}_{\mu }}{\sqrt{{V}_{\mu }}}-{B}_{\mu }+\theta \right)\right\vert }_{\theta =0}$
 for i = 1 to N do
  for xi = 0 to ${x}_{i}^{\mathrm{max}}\enspace $ do
   ${p}_{i}\left({x}_{i}\vert D\right){:=}\mathrm{exp}\left\{-\frac{{a}_{i}}{2}{x}_{i}^{2}+\left({\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}\right){x}_{i}\right\}$
  Normalize ${p}_{i}\left({x}_{i}\vert D\right)$

Its derivation is as follows. We employ Taylor's expansion of $\mathrm{ln}\,H\left(\frac{{w}_{\mu i}{x}_{i}+{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}\right)$ up to the second order of $\frac{{w}_{\mu i}{x}_{i}}{\sqrt{{V}_{\mu \to i}}}$ handling $\frac{{w}_{\mu i}{x}_{i}}{\sqrt{{V}_{\mu \to i}}}$ as a small number, which leads to the following expression.

where ${A}_{\mu \to i}=-\frac{{\partial }^{2}}{\partial {\theta }^{2}}\,\mathrm{ln}\,{\left.H\left(\frac{{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}+\theta \right)\right\vert }_{\theta =0}$ and ${B}_{\mu \to i}=\frac{\partial }{\partial \theta }\,\mathrm{ln}\,{\left.H\left(\frac{{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}+\theta \right)\right\vert }_{\theta =0}$.

This provides the mean and variance of ${\mathcal{M}}_{i\to \mu }\;\left({x}_{i}\right)$ as

and those of ${p}_{i}\left({x}_{i}\vert D\right)$, mi , and χi , as

where ${a}_{i\to \mu }={\sum }_{\nu \ne \mu }\frac{{w}_{\nu i}^{2}}{{V}_{\nu \to i}}{A}_{\nu \to i}$, ${b}_{i\to \mu }={\sum }_{\nu \ne \mu }\frac{{w}_{\nu i}}{\sqrt{{V}_{\nu \to i}}}{B}_{\nu \to i}$, ${a}_{i}={\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}^{2}}{{V}_{\mu \to i}}{A}_{\mu \to i}$, and ${b}_{i}={\sum }_{\mu =1}^{K}\frac{{w}_{\mu \to i}}{\sqrt{{V}_{\mu \to i}}}{B}_{\mu \to i}.$ The small size of $\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu \to i}}}$ validates handling aiμ ai and χiμ χi for i = 1, ..., N and μ = 1, ..., K. This yields ${V}_{\mu \to i}\simeq {V}_{\mu }={\sum }_{i=1}^{N}{w}_{\mu i}^{2}{\chi }_{i}$ for μ = 1, ..., K, and i = 1, ..., N. Similarly, as the difference between Δμi and ${{\Delta}}_{\mu }={\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i\to \mu }$ is relatively small, ${A}_{\mu \to i}\simeq {A}_{\mu }=-\frac{{\partial }^{2}}{\partial {\theta }^{2}}\,\mathrm{ln}\,{\left.H\left(\frac{{{\Delta}}_{\mu }-{C}_{\mu }}{\sqrt{{V}_{\mu }}}+\theta \right)\right\vert }_{\theta =0}$. Furthermore, we expand miμ as ${m}_{i\to \mu }\simeq \frac{\partial }{\partial h}{\phi }_{i}{\left.\left({a}_{i},{b}_{i}-\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu \to i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}\simeq \frac{\partial }{\partial h}{\phi }_{i}{\left.\left({a}_{i},{b}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}-\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu \to i}\frac{{\partial }^{2}}{\partial {h}^{2}}{\phi }_{i}\left({a}_{i},{b}_{i}+\beta {v}_{i}+h\right){\left.\right\vert }_{h=0}={m}_{i}-\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu \to i}{\chi }_{i}\simeq {m}_{i}-\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }{\chi }_{i}$ and provide

where ${\left.{B}_{\mu }=\frac{\partial }{\partial \theta }\,\mathrm{ln}\,H\left(\frac{{{\Delta}}_{\mu }-{C}_{\mu }}{\sqrt{{V}_{\mu }}}+\theta \right)\right\vert }_{\theta =0}$. Additionally, Taylor's expansion, ${B}_{\mu \to i}\simeq {B}_{\mu }-\left(\frac{{\partial }^{2}}{\partial {\theta }^{2}}{\left.\,\mathrm{ln}\,H\left(\frac{{{\Delta}}_{\mu }-{C}_{\mu }}{\sqrt{{V}_{\mu }}}+\theta \right)\right\vert }_{\theta =0}\right)\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{m}_{i}=\ {B}_{\mu }+\frac{{w}_{\mu i}{A}_{\mu }}{\sqrt{{V}_{\mu }}}{m}_{i}$ yields

The aforementioned expressions provide the update equations for the node variables.

Please wait… references are loading.