Statistical mechanics analysis of generalized multi-dimensional knapsack problems

Yuta Nakamura; Takashi Takahashi; Yoshiyuki Kabashima

doi:10.1088/1751-8121/ac8381

1. Introduction

Combinatorial optimization is a popular topic related to numerous research fields. It is deeply connected to computer science and the theory of algorithms, and it is frequently applied to real-world problems in the field of operations research. The knapsack problem (KP) is a major combinatorial optimization problem such as the traveling salesman problem and the minimum spanning tree problem [1]. There are many variants of KP, but all of them aim to maximize the total profit by selecting a subset of items as long as they do not violate the given constraints for total weights. KP has been extensively examined for a long time because of its wide applicability. Its application includes resource allocation problems [2], cutting or packing stock problems [3], and capital budgeting problems [4, 5].

The basic and most well-known version of KP, which we refer to as 0–1 one-dimensional KP (0–1 1DKP), is defined as follows: suppose that there are N item types. Each item type i, 1 ⩽ i ⩽ N, has two characteristic quantities v_i ∈ (0, ∞) and w_i ∈ (0, ∞), where v_i stands for a profit that an item of type i possesses while w_i means a weight of the item. The 0–1 1DKP aims to find a way to select items to put in a knapsack such that the total profit is maximized under the constraint that each item type can be selected at most only once, and the total weight does not exceed the capacity of the knapsack. The problem is mathematically formulated as follows:

$\begin{equation*}\begin{gathered}\underset{\boldsymbol{x}}{\text{maximize}}\ U=\sum\limits _{i=1}^{N}{v}_{i}{x}_{i},\\ \text{subject}\;\text{to}\ \sum\limits _{i=1}^{N}{w}_{i}{x}_{i}\leqslant C,\quad {x}_{i}\in \left\{0,1\right\}\ (\forall i\in \left\{1,\dots ,N\right\}),\end{gathered}\end{equation*}$

where U and ${\sum }_{i=1}^{N}{w}_{i}{x}_{i}$ denote the total profit and total weight, respectively. Variables x_i denote the number of item types i to be placed in the knapsack, and C denotes the capacity of the knapsack.

We herein address the generalized multidimensional knapsack problem (GMDKP) by extending 0–1 1DKP in the following two directions:

(a)
Introduction of multiple constraints on total weights as ${\sum }_{i=1}^{N}{w}_{\mu i}{x}_{i}\leqslant {C}_{\mu }$ for μ ∈ {1, ..., K}, which is termed as multi-dimensionalization [6].
(b)
Relaxation of the maximum number up to which each item type can be chosen [6]. This implies that we allow each item type i to be selected up to ${x}_{i}^{\mathrm{max}}\enspace$ times. The 0–1 1DKP corresponds to the case of ${x}_{i}^{\mathrm{max}}=1$ .

Specifically, GMDKP is expressed as follows:

$\begin{equation*}\begin{gathered}\underset{\boldsymbol{x}}{\text{maximize}}\ U=\sum\limits _{i=1}^{N}{v}_{i}{x}_{i},\\ \text{subject}\;\;\text{to}\ \sum\limits _{i=1}^{N}{w}_{\mu i}{x}_{i}\leqslant {C}_{\mu },\quad \mu \in \left\{1,\dots ,K\right\},\\ \,\enspace \enspace \enspace {x}_{i}\in \left\{0,1,\dots ,{x}_{i}^{\mathrm{max}}\right\}\quad (\forall i\in \left\{1,\dots ,N\right\}).\end{gathered}\end{equation*}$

Two issues are worth discussing here. The first issue is related to the theoretically achievable limit of the total profit of the GMDKP, which offers a baseline for examining the performance of search algorithms. Korutcheva et al [7] considered a multidimensional version and analyzed the typical properties of solutions under a random set up while maintaining ${x}_{i}^{\mathrm{max}}=1\ (\forall i\in \left\{1,\dots ,N\right\})$ . We term this version the multidimensional knapsack problem (MDKP). Later, Inoue [8] examined MDKP by relaxing non-negative integers x_i to spherically constrained real numbers. However, no such typical case performance analyses have been obtained for GMDKP.

The second issue concerns the algorithm for finding the solution of GMDKP. Even in the basic case, KPs are known to belong to the class of NP-hard [9]. A significant amount of effort has been made to overcome the computational hardness, which can be classified into two directions. The first direction involves searching for exact solutions. It has been empirically shown that methods based on dynamic programming [10] and the branch-and-bound method [11] can find exact solutions very efficiently for many instances of 0–1 1DKP. However, such methods are not applicable to GMDKP. Therefore, for the extended KPs, we should resort to the second direction, which aims to efficiently search good approximate solutions. In this direction, it is experimentally shown that a greedy-type heuristic algorithm can find fairly good approximate solutions for randomly generated GMDKPs [12]. However, its possibilities and limitations have not been theoretically clarified yet.

In view of the current situation, we analyze a random ensemble of GMDKP using the replica method to clarify the typically achievable limit of the total profit. The result of the analysis indicates that when the profits of item types are normally distributed, the total profit grows in the leading order with respect to the number of item types as the maximum number of choices per item type x^max increases while it depends on x^max only in a sub-leading order if the profits are constant among the item types. Besides, it also implies that a nearly optimal solution whose total profit is lower than the optimal value only by a sub-leading order can be found by the aforementioned greedy-type algorithm with a low computational cost. However, further improving the total cost in the sub-leading order is still non-trivial. For accomplishing this, we develop a heuristic algorithm based on the cavity method. Extensive numerical experiments support the analytical results and the usefulness of the developed algorithm.

The remainder of this paper is organized as follows. In the next section, we introduce a random ensemble of GMDKP, which is analyzed. In section 3, we theoretically examine the typically achievable limit of the total profit for the introduced problem ensemble. In section 4, we numerically validate the results obtained in section 3. We also develop a heuristic algorithm for the sub-leading order improvement. The final section is devoted to discussion and future prospects.

2. Problem setup

To examine the typical properties of GMDKP, we consider an ensemble that is characterized by the following simplified conditions:

Fix ${x}_{i}^{\mathrm{max}}$ to a constant x^max for ∀i ∈ {1, ..., N}.
C_μ ≡ CN for ∀μ ∈ {1, ..., K}, where C > 0 is a proportional constant.
v_i (∀i ∈ {1, ..., N}) are independently distributed from an identical Gaussian distribution $\mathcal{N}(V,{\sigma }_{V}^{2})$ , where 0 $\leqslant$ σ_V ≪ V.
w_μi (∀i ∈ {1, ..., N}, ∀μ ∈ {1, ..., K}) are independently distributed from another identical Gaussian distribution $\mathcal{N}(W,{\sigma }_{W}^{2})$ , where 0 < σ_W ≪ W.

A distinctive feature of the KP is the positivity of the profit (v_i) and weight (w_μi) parameters. The current simplifying setup incorporates this feature with a small number of parameters although possible correlations among the parameters that may exist in realistic problems are ignored.

3. Statistical mechanics analysis on the optimal solution

We compute the typical value of the achievable U in the limit of N, K → ∞ by maintaining their ratio K/N = α ∈ [0, ∞). In the following, we use N → ∞ as a shorthand notation of this scaling limit to avoid cumbersome expression. Hence, we first transform the total weights to appropriate expressions under the assumption that the solution is placed in the vicinity of the boundaries of the weight constraints [7], which means that the total number of chosen items satisfies

$\begin{equation}\underset{N\to \infty }{\mathrm{lim}}\,\frac{1}{N}\sum\limits _{i=1}^{N}{x}_{i}=\frac{C}{W}.\end{equation} \tag{ 1 }$

Inserting w_μi = W + ξ_μi, where ${\xi }_{\mu i}\sim \mathcal{N}(0,{\sigma }_{W}^{2})$ , into the definitions of total weights, this yields the following decomposition with respect to the total weights.

$\begin{equation}\sum\limits _{i=1}^{N}{w}_{\mu i}{x}_{i}=\sum\limits _{i=1}^{N}(W+{\xi }_{\mu i})\left(\frac{C}{W}+{x}_{i}-\frac{C}{W}\right)=CN+WM\sqrt{N}+{u}_{\mu }\sqrt{N},\end{equation} \tag{ 2 }$

where

$\begin{equation*}M\equiv \frac{1}{\sqrt{N}}\sum\limits _{i=1}^{N}\left({x}_{i}-\frac{C}{W}\right),\hspace{25.0pt}{u}_{\mu }\equiv \frac{1}{\sqrt{N}}\sum\limits _{i=1}^{N}{\xi }_{\mu i}{x}_{i}.\end{equation*}$

M controls the difference of the total number of chosen items from its leading term NC/W in the scale of $O(\sqrt{N})$ . These expressions indicate that the total weights are constant in the leading order of O(N) and vary in the next order of $O(\sqrt{N})$ based on the choice of $\boldsymbol{x}\in {\left\{0,\dots ,{x}^{\mathrm{max}}\right\}}^{N}$ . Handling $-U=-{\sum }_{i=1}^{N}{v}_{i}{x}_{i}=-NVC/W-{\sum }_{i=1}^{N}{\eta }_{i}{x}_{i}$ as a Hamiltonian, where ${\eta }_{i}\sim \mathcal{N}(0,{\sigma }_{V}^{2})$ , we compute the partition function with the inverse temperature β > 0 as

$\begin{align}\hfill {Z}_{\beta }(\boldsymbol{\xi },\boldsymbol{\eta },M)& ={\mathrm{Tr}}_{\boldsymbol{x}}\prod\limits _{\mu =1}^{K}{\Theta}\left(-WM-{u}_{\mu }\right)\delta \left(\sum\limits _{i=1}^{N}\left({x}_{i}-\frac{C}{W}\right)-\sqrt{N}M\right)\mathrm{exp}\left(\sum\limits _{i=1}^{N}\beta (V+{\eta }_{i}){x}_{i}\right),\hfill \end{align} \tag{ 3 }$

where Θ(x) = 1 (x ⩾ 0) and 0 (x ⩽ 0), and Tr_x denote the summation with respect to all possible choices of $\boldsymbol{x}=({x}_{i})\in {\left\{0,1,\dots ,{x}^{\mathrm{max}}\right\}}^{N}$ .

Z_β( ξ , η , M) varies randomly depending on the realization of ξ = (ξ_μi) and η = (η_i), and it is supposed to scale exponentially with respect to N. This implies that its typical behavior can be examined by assessing the average of its logarithm (free entropy). This naturally leads to the use of the replica method. More specifically, for $n=1,2,\dots \in \mathbb{N}$ , we compute the moment ${\mathbb{E}}_{\boldsymbol{\xi },\boldsymbol{\eta }}[{Z}_{\beta }^{n}(\boldsymbol{\xi },\boldsymbol{\eta },M)]$ , where ${\mathbb{E}}_{\boldsymbol{\xi },\boldsymbol{\eta }}[\dots ]$ denotes the average operation with respect to ξ and η , as a function of n and continue the obtained functional expression to $n\in \mathbb{R}$ . Subsequently, we evaluate the average free entropy per item type using the identity

$\begin{align*}\hfill {{\Phi}}_{\beta }& =\underset{N\to \infty }{\mathrm{lim}}\,\frac{1}{N}{\mathbb{E}}_{\boldsymbol{\xi },\boldsymbol{\eta }}[\mathrm{log}\,{Z}_{\beta }(\boldsymbol{\xi },\boldsymbol{\eta },M)]\hfill \\ \hfill & =\underset{n\to 0}{\mathrm{lim}}\,\frac{\partial }{\partial n}\underset{N\to \infty }{\mathrm{lim}}\,\frac{1}{N}\,\mathrm{log}\,{\mathbb{E}}_{\boldsymbol{\xi },\boldsymbol{\eta }}\left[{Z}_{\beta }^{n}(\boldsymbol{\xi },\boldsymbol{\eta },M)\right].\hfill \end{align*}$

After some calculations (details are provided in appendix A), this procedure provides the concrete expression of the average free entropy under the replica symmetric (RS) assumption as follows:

$\begin{align}\hfill {{\Phi}}_{\beta }& =\underset{Q,q,\hat{Q},\hat{q},\hat{M}}{\mathrm{e}\mathrm{x}\mathrm{t}\mathrm{r}}\left\{\alpha \int \mathrm{D}z\,\mathrm{log}\,H(f(z))\right.+\int \mathrm{D}z\,\mathrm{log}\sum\limits _{x\in \left\{0,1,\dots ,{x}^{\mathrm{max}}\right\}}g(x,z)\hfill \\ \hfill & \quad +\left.\frac{1}{2}\hat{Q}Q+\frac{1}{2}\hat{q}q-\frac{C}{W}\left(\hat{M}+\beta V\right)\right\},\hfill \end{align} \tag{ 4 }$

where extr_X{...} denotes the operation of the extremization of ... with respect to X, and

$\begin{align*}\hfill \mathrm{D}z& =\frac{\mathrm{d}z\,{\mathrm{e}}^{-\frac{{z}^{2}}{2}}}{\sqrt{2\pi }},\hfill \\ \hfill H(x)& ={\int }_{x}^{+\infty }\mathrm{D}z,\hfill \\ \hfill f(z)& =\frac{WM/{\sigma }_{W}+\sqrt{q}z}{\sqrt{Q-q}},\hfill \\ \hfill g(x,z)& =\mathrm{exp}\left(-\frac{\hat{Q}+\hat{q}}{2}{x}^{2}+\left(\sqrt{\hat{q}+{\beta }^{2}{\sigma }_{V}^{2}}z+\hat{M}+\beta V\right)x\right).\hfill \end{align*}$

The typical value of the maximum total profit (per item type) is assessed as

$\begin{equation}\mathcal{U}=\underset{N\to \infty }{\mathrm{lim}}\,\frac{1}{N}{\mathbb{E}}_{\boldsymbol{\xi },\boldsymbol{\eta }}\left[\underset{\boldsymbol{x}}{\mathrm{max}}\,U\right]=\underset{\beta \to \infty }{\mathrm{lim}}\,\frac{\partial {{\Phi}}_{\beta }}{\partial \beta }.\end{equation} \tag{ 5 }$

In the following, we describe the results obtained by the above computation for ${\sigma }_{V}^{2} > 0$ and ${\sigma }_{V}^{2}=0$ , separately, as they are considerably different between the two cases.

Case of ${\sigma }_{V}^{2} > 0$ .

In the limit of β → ∞, the variables in (4) scale so as to satisfy χ = β(Q − q) ∼ O(1), $E=(\hat{Q}+\hat{q})/\beta \sim O(1)$ , $F=\hat{q}/{\beta }^{2}\sim O(1)$ , $G=\hat{M}/\beta \sim O(1)$ . Using the new variables, the extremum condition is expressed as

$\begin{align*}\hfill E& =\frac{\alpha }{\chi }H\left(-\frac{WM}{{\sigma }_{W}\sqrt{Q}}\right),\hfill \\ \hfill F& =\frac{\alpha }{{\chi }^{2}}\int \mathrm{D}z{\Theta}\left(\frac{WM}{{\sigma }_{W}}+\sqrt{Q}z\right){\left(\frac{WM}{{\sigma }_{W}}+\sqrt{Q}z\right)}^{2},\hfill \\ \hfill Q& =\int \mathrm{D}z{\left\{{x}^{\ast }(z,E,F,G)\right\}}^{2},\hfill \\ \hfill \chi & =\int \mathrm{D}z\frac{\partial }{\partial G}{x}^{\ast }(z,E,F,G),\hfill \\ \hfill \frac{C}{W}& =\int \mathrm{D}z{x}^{\ast }(z,E,F,G),\hfill \end{align*}$

where

$\begin{equation}{x}^{\ast }(z,E,F,G)=\underset{x\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{\text{argmax}}\left\{-\frac{E}{2}{x}^{2}+\left(\sqrt{F+{\sigma }_{V}^{2}}z+G+V\right)x\right\}.\end{equation} \tag{ 6 }$

The solution determined by these offers the maximum per item type total profit $\mathcal{U}$ as

$\begin{equation*}\mathcal{U}=\frac{VC}{W}+{\sigma }_{V}^{2}\chi ,\end{equation*}$

and entropy (per item type) as

$\begin{equation*}S=\underset{\beta \to \infty }{\mathrm{lim}}\,{{\Phi}}_{\beta }-\beta \frac{\partial {{\Phi}}_{\beta }}{\partial \beta }=0.\end{equation*}$

However, we must keep in mind that the solution is obtained under the RS ansatz, which may not be valid for β → ∞. The stability analysis against the perturbation that breaks the replica symmetry [13] indicates that the RS solution is locally unstable if

$\begin{equation}\frac{\alpha }{{\chi }^{2}}H\left(-\frac{WM}{{\sigma }_{W}\sqrt{Q}}\right)\times \int \mathrm{D}z{\left(\frac{\partial {x}^{\ast }}{\partial G}\right)}^{2} > 1\end{equation} \tag{ 7 }$

is satisfied. Equation (6) means that x* varies discontinuously by unity at certain values of z, which leads to $\int \mathrm{D}z{\left(\frac{\partial {x}^{\ast }}{\partial G}\right)}^{2}=+\infty$ [14]. These conclude that (7) is satisfied, which indicates that the RS solution is invalid, as long as M is finite. As the replica symmetry breaking (RSB) implies that finding the lowest energy (the optimal profit) solution is challenging due to ragged energy landscapes, this also suggests that designing the way of optimally packing $NC/W\pm O(\sqrt{N})$ items in the knapsack is computationally difficult.

Meanwhile, (7) also implies that the validity of the RS solution is recovered for M → −∞ as $H\left(-\frac{WM}{{\sigma }_{W}\sqrt{Q}}\right)\to 0$ holds. In addition, as E, F → 0 holds, we can obtain an analytical expression of the RS solution in this limit as

$\begin{equation}{x}^{\ast }(z,E,F,G)=\left\{\begin{array}{cc}\hfill {x}^{\mathrm{max}},\hfill & \hfill \quad (z > A),\hfill \\ \hfill 0,\hfill & \hfill \quad (z< A),\hfill \end{array}\right.\end{equation} \tag{ 8 }$

which yields

$\begin{equation}\mathcal{U}=\frac{VC}{W}+{x}^{\mathrm{max}}{\sigma }_{V}\frac{{\mathrm{e}}^{-{A}^{2}/2}}{\sqrt{2\pi }},\end{equation} \tag{ 9 }$

where A is the solution of H(A) = C/(x^max W).

Equation (8) corresponds to the solution obtained by choosing NC/W + o(N) items from the item types of higher v_i values, which we term the 'greedy packing'. This bounds the leading order term of U from above by $N(VC/W+{x}^{\mathrm{max}}{\sigma }_{V}\frac{{\mathrm{e}}^{-{A}^{2}/2}}{\sqrt{2\pi }})$ since choosing O(N) items further on top of the NC/W + o(N) items typically breaks some of the weight constraints. On the other hand, the upper bound is easily achieved by choosing $NC/W-O\left({N}^{1/2+{\epsilon}}\right)\ (0< {\epsilon}< 1/2)$ by the greedy packing. This is because the largest value of the fluctuation terms u_μ (μ = 1, ..., αN) in (2) scales typically as $O\left({(N\,\mathrm{log}(N))}^{1/2}\right)$ , and therefore, all the weight constraints are typically satisfied if we set a sufficient size 'margin' by reducing the total number of chosen items from NC/W by $O\left({N}^{1/2+{\epsilon}}\right)$ without changing the leading O(N) term of U.

In summary, we obtain the following results:

The optimal total profit grows with x^max in the leading term of O(N).
Finding the truly optimal solution would be computationally difficult.
However, achieving the total profit that is lower than the optimal value only by $O\left({N}^{1/2+{\epsilon}}\right)$ would be easy by employing the greedy packing.

Case of ${\sigma }_{V}^{2}=0$ .

Unlike the case of ${\sigma }_{V}^{2} > 0$ , the variables in (4) remain O(1) even in the limit of β → ∞. This is because the constraint for the total number of chosen items (1) makes β irrelevant to the extremum condition of (4). More precisely, expressing $\tilde{M}=\hat{M}+\beta V$ , the extremum of (4) is characterized by

$\begin{align*}\hfill \hat{Q}+\hat{q}& =-\alpha \int \mathrm{D}z\frac{{\partial }^{2}}{\partial {\left(\sqrt{q}z\right)}^{2}}\,\mathrm{log}\,H(f(z)),\hfill \\ \hfill \hat{q}& =\alpha \int \mathrm{D}z{\left(\frac{\partial }{\partial \left(\sqrt{q}z\right)}\,\mathrm{log}\,H(f(z))\right)}^{2},\hfill \\ \hfill Q& =\int \mathrm{D}z\left\langle {x}^{2}\right\rangle ,\hfill \\ \hfill q& =\int \mathrm{D}z{\left\langle x\right\rangle }^{2},\hfill \\ \hfill \frac{C}{W}& =\int \mathrm{D}z\left\langle x\right\rangle ,\hfill \end{align*}$

independently of β, where

$\begin{equation*}\left\langle (\dots )\right\rangle =\frac{\sum\limits _{x\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}(\dots )\mathrm{exp}\left(-\frac{\hat{Q}+\hat{q}}{2}{x}^{2}+\left(\sqrt{\hat{q}}z+\tilde{M}\right)x\right)}{\sum\limits _{x\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}\mathrm{exp}\left(-\frac{\hat{Q}+\hat{q}}{2}{x}^{2}+\left(\sqrt{\hat{q}}z+\tilde{M}\right)x\right)}.\end{equation*}$

The resultant solution yields the per item type total profit for N → ∞ as

$\begin{equation}\mathcal{U}=\frac{VC}{W},\end{equation} \tag{ 10 }$

which, unlike the case of ${\sigma }_{V}^{2} > 0$ , does not vary with x^max. The solution also offers the per item type entropy as

$\begin{align}\hfill S& ={{\Phi}}_{\beta }-\beta \frac{\partial {{\Phi}}_{\beta }}{\partial \beta }\hfill \\ \hfill & =\alpha \int \mathrm{D}z\,\mathrm{log}\,H(f(z))+\int \mathrm{D}z\,\mathrm{log}\sum\limits _{x\in \left\{0,1,\dots ,{x}^{\mathrm{max}}\right\}}g(x,z)\hfill \\ \hfill & \quad +\frac{1}{2}\hat{Q}Q+\frac{1}{2}\hat{q}q-\frac{C\tilde{M}}{W},\hfill \end{align} \tag{ 11 }$

which does not depend on β.

Figure 1 shows S versus M for several values of α for x^max = 1, which indicates that for all the value of α, S becomes positive for M below a certain critical values of M_opt(α) at which S vanishes. Such behavior is also the case for x^max ⩾ 2. Figure 2 plots M_opt(α) versus x^max. This indicates that M_opt(α) increases with x^max, but almost saturates for x^max ≳ 10. The local stability of the RS solution would be broken if

$\begin{equation}\alpha \int \mathrm{D}z{\left(\frac{{\partial }^{2}}{\partial {(\sqrt{q}z)}^{2}}\,\mathrm{log}(H(f(z)))\right)}^{2}\times \int \mathrm{D}z{\left(\frac{\partial \left\langle x\right\rangle }{\partial (\sqrt{\hat{q}}z)}\right)}^{2} > 1\end{equation} \tag{ 12 }$

were satisfied. However, (12) does not hold for M < M_opt(α) indicating that the RS solution is valid.

**Figure 1.** S versus M for several values of α. Parameters are set as W = 1.0, C = 0.5, and ${\sigma }_{W}^{2}=0.01$ .
Download figure:
Standard image High-resolution image

**Figure 1.** S versus M for several values of α. Parameters are set as W = 1.0, C = 0.5, and ${\sigma }_{W}^{2}=0.01$ .
Download figure:
Standard image High-resolution image

**Figure 2.** Behavior of M_opt for x^max = 1, 2, 3, 10, and 100. M_opt almost saturates for x^max ≳ 10. The parameters are set as in figure 1.
Download figure:
Standard image High-resolution image

To summarise, we reach the following conclusions:

The optimal total profit does not depend on x^max in the leading term of O(N).
Solutions are highly degenerated up to the sub-leading term of $O(\sqrt{N})$ of the total profit. This has also been pointed out for the case of x^max = 1 in [7]. For M < M_opt(α), exponentially many solutions achieve the value of total profit $U=NVC/W+\sqrt{N}M$ . The solutions vanish at M = M_opt(α), which indicates that the optimal total profit is provided as
$\begin{equation}U=\frac{NVC}{W}+\sqrt{N}{M}_{\text{opt}}(\alpha ).\end{equation} \tag{ 13 }$
The total profit is improved monotonically in the term of $O(\sqrt{N})$ by increasing x^max, but almost saturates for x^max greater than a moderate value.

4. Search algorithms

4.1. Achieving leading order optimality by greedy packing

The results in the previous section imply that the greedy packing can find a nearly optimal solution with a low computational cost. For confirming this, we carried out numerical experiments using OR-Tools by Google [15] and PECH_γ by Akçay et al [12].

OR-Tools provides an exact algorithm that can efficiently find the exact solutions for problems of moderate sizes. However, the necessary computational cost still grows exponentially with respect to N in the worst case. Therefore, its use is practically limited to N of several tens. Additionally, it is applicable only for 0–1 MDKP. On the other hand, PECH_γ is a greedy-type heuristic in which the greediness is controlled by a parameter γ ∈ (0, 1]. By setting γ = 1, in each iteration, it chooses at most x^max items from the remainders so as to maximize the increase of the total profit until a certain weight constraint is violated. This realizes the greedy packing.

Figure 3 plots the achieved per item type total profit versus the number of item types N for a case of ${\sigma }_{V}^{2} > 0$ . As mentioned in section 3, the largest value u_μ (μ = 1, ..., αN) in the weight constraints (2) scales as O((N log N)^1/2). This implies that for finite N, the difference of the per item type total profit achieved by PECH_1.0 from $\mathcal{U}$ of (9) is proportional to N^−1/2(log N)^1/2 in the leading order. On the other hand, such dependence for the exact solution obtained by OR-tools is nontrivial, but we speculate that the leading order is O(N^−1/2). This is because we should be able to construct physically valid replica solutions at least in a certain range of finite M by taking RSB into account, which means that the total number of chosen items can grow from NC/W − O((N log N)^1/2) to NC/W − O(N^1/2) by optimizing the choice of items. Values extrapolated from experimental data to N → ∞ assuming the abovementioned scaling forms exhibit considerably good agreement with theoretical prediction (9) in view of the corrections by terms of o(N^−1/2(log N)^1/2).

The results for ${\sigma }_{V}^{2}=0$ are plotted in figure 4. We employed the scaling forms of O(N^−1/2(log N)^1/2) and O(N^−1/2) for PECH_1.0 and OR-tools, respectively, as in the case of ${\sigma }_{V}^{2} > 0$ . The employment of O(N^−1/2) for OR-tools is more reasonable than that for ${\sigma }_{V}^{2} > 0$ as the finiteness of M for the optimal solution is supported by the stability of the RS solution. Meanwhile, as shown in figure 2, M_opt, which corresponds to the prefactor of the term of O(N^−1/2) in $\mathcal{U}(N)$ of the exact solution, almost vanishes for the examined parameter setting. We, therefore, determined the value of $\mathcal{U}(\infty )$ for OR-Tools by the extrapolation under the assumption of $\mathcal{U}(N)=\mathcal{U}(\infty )-a{N}^{-1/2}-b{N}^{-1}$ taking into account the contribution of the higher order term of O(N⁻¹). This as well as the extrapolation for PECH_1.0 again results in significantly good accordance with the replica prediction. However, unlike the case of ${\sigma }_{V}^{2} > 0$ , the achieved per item type total profit depends little on x^max.

4.2. Performance improvement in sub-leading order by cavity method

The greedy packing implemented by PECH_1.0 achieves the leading order optimality with an O(N) computational cost. However, for finite N, the achieved total profit is still lower than the truly optimal profit by O((N log N)^1/2). We here develop a method for reducing the gap using the cavity method [16].

For this purpose, we consider the 'canonical distribution' of x = (x_i), ${x}_{i}\in \left\{0,1,\dots ,{x}_{i}^{\mathrm{max}}\enspace \right\}$ , i = 1, ..., N that satisfy all the weight constraints given by $D=\left\{{w}_{\mu i},{C}_{\mu }\right\}\ \left(\mu =1,\dots ,K,\right.\left.i=1,\dots ,N\right)$ as

$\begin{equation}p\left(\boldsymbol{x}\vert D\right)=\frac{1}{{\Xi}}\prod\limits _{\mu =1}^{K}{\Theta}\left({C}_{\mu }-\sum\limits _{i=1}^{N}{w}_{\mu i}{x}_{i}\right)\mathrm{exp}\left(\beta \sum\limits _{i=1}^{N}{v}_{i}{x}_{i}\right)\end{equation} \tag{ 14 }$

and evaluate the marginal distributions ${p}_{i}\left({x}_{i}\vert D\right)={\sum }_{\boldsymbol{x}{\backslash}{x}_{i}}\;p\left(\boldsymbol{x}\vert D\right)$ , where β > 0 is a parameter for controlling the emphasis on the total profit. Ξ is a normalization constant. Then, we find item type i* that maximizes the probability of being non-zero ${p}_{i}({x}_{i}\ne 0\vert D)={\sum }_{{x}_{i}\ne 0}{\;p}_{i}\left({x}_{i}\vert D\right)$ , put one item of i* in the knapsack, and reduce ${x}_{{i}^{\ast }}^{\mathrm{max}}$ by one as ${x}_{{i}^{\ast }}^{\mathrm{max}}{\leftarrow}{x}_{{i}^{\ast }}^{\mathrm{max}}-1$ . We also subtracted the upper bounds of weight C_μ as ${C}_{\mu }{\leftarrow}{C}_{\mu }-{w}_{\mu {i}^{\ast }}\ \left(\mu =1,\dots ,K\right)$ . We repeat these procedures as long as the weight constraints are satisfied. After the final repetition, the items in the knapsack constitute an approximate solution. We refer to this procedure as 'marginal-probability-based greedy strategy' (MPGS). The pseudo-code for the procedure is summarized in algorithm 1.

Algorithm 1. MPGS.

Input: $D=\left\{{w}_{\mu i},{C}_{\mu }\right\},{v}_{i},{x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N,\ \mu =1,\dots ,K)$ $D=\left\{{w}_{\mu i},{C}_{\mu }\right\},{v}_{i},{x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N,\ \mu =1,\dots ,K)$

Output: x_i (i = 1, ..., N)

x_i := 0 (i = 1, ..., N)

i* := 0

while C_μ ⩾ 0, ∀μ and ${x}_{i}^{\mathrm{max}} > 0,\exists i$ ${x}_{i}^{\mathrm{max}} > 0,\exists i$ do

Evaluate ${p}_{i}({x}_{i}\vert D)\ (i=1,\dots ,N,{x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$ ${p}_{i}({x}_{i}\vert D)\ (i=1,\dots ,N,{x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$ with algorithm 2 or 3

i* ←argmax_i p_i (x_i ≠ 0|D)

if ${x}_{{i}^{\ast }}^{\mathrm{max}}=0$ ${x}_{{i}^{\ast }}^{\mathrm{max}}=0$ or ${C}_{\mu }-{w}_{\mu {i}^{\ast }}< 0,\enspace \exists \mu$ ${C}_{\mu }-{w}_{\mu {i}^{\ast }}< 0,\enspace \exists \mu$ then

break

else

${x}_{{i}^{\ast }}{\leftarrow}{x}_{{i}^{\ast }}+1$ ${x}_{{i}^{\ast }}{\leftarrow}{x}_{{i}^{\ast }}+1$

${x}_{{i}^{\ast }}^{\mathrm{max}}\enspace {\leftarrow}{x}_{{i}^{\ast }}^{\mathrm{max}}-1$ ${x}_{{i}^{\ast }}^{\mathrm{max}}\enspace {\leftarrow}{x}_{{i}^{\ast }}^{\mathrm{max}}-1$

for μ = 1 to K do

${C}_{\mu }{\leftarrow}{C}_{\mu }-{w}_{\mu {i}^{\ast }}$ ${C}_{\mu }{\leftarrow}{C}_{\mu }-{w}_{\mu {i}^{\ast }}$

Unfortunately, it is computationally difficult to conduct this greedy search because the computational cost for assessing the marginal distributions ${p}_{i}\left({x}_{i}\vert D\right)$ from the joint distribution $p\left(\boldsymbol{x}\vert D\right)$ grows exponentially with respect to N. We employed the cavity method to resolve this problem. To perform this, we depict the joint distribution by a factor graph (figure 5) and recursively update messages defined on the edges between the factor and variable nodes as

$\begin{equation}{\mathcal{M}}_{\mu \to i}({x}_{i})={c}_{\mu \to i}\sum\limits _{\boldsymbol{x}{\backslash}{x}_{i}}{\Theta}\left({C}_{\mu }-{w}_{\mu i}{x}_{i}-\sum\limits _{j\ne i}{w}_{\mu j}{x}_{j}\right)\prod\limits _{j\ne i}{\mathcal{M}}_{j\to \mu }({x}_{j}),\end{equation} \tag{ 15 }$

$\begin{equation}{\mathcal{M}}_{i\to \mu }\left({x}_{i}\right)={c}_{i\to \mu }\;{\text{e}}^{\beta {v}_{i}{x}_{i}}\prod\limits _{\nu \ne \mu }{\mathcal{M}}_{\nu \to i}\left({x}_{i}\right),\end{equation} \tag{ 16 }$

by following the recipe of belief propagation (BP), which provides efficient algorithms for finding the solution of the cavity method [16]. After determining the messages, the marginal distribution is approximately assessed as follows:

$\begin{equation*}{p}_{i}\left({x}_{i}\vert D\right)\simeq {c}_{i}\,{\text{e}}^{\beta {v}_{i}{x}_{i}}\prod\limits _{\mu =1}^{K}{\mathcal{M}}_{\mu \to i}\left({x}_{i}\right).\end{equation*}$

where c_μ→i, c_i→μ, and c_i are normalization constants.

**Figure 5.** Factor graph representation of (14). The messages are passed through the edges in both directions.
Download figure:
Standard image High-resolution image

The exact performance of the BP algorithm is still computationally infeasible as the computational cost for evaluating (15) grows exponentially with respect to N. This problem is solved by the Gaussian approximation employed in the approximate message passing technique [17, 18]. More precisely, utilizing the central limit theorem, we consider ∑_j≠i w_μj x_j in (15) as a Gaussian random variable, which is characterized by mean Δ_μ→i = ∑_j≠i w_μj m_j→μ and variance ${V}_{\mu \to i}={\sum }_{j\ne i}{w}_{\mu j}^{2}{\chi }_{j\to \mu }$ , where m_j→μ and χ_j→μ denote mean and variance of ${\mathcal{M}}_{j\to \mu }\left({x}_{j}\right)$ , respectively. Hence, it is possible to analytically evaluate (15) as follows:

$\begin{align*}\hfill {\mathcal{M}}_{\mu \to i}\left({x}_{i}\right)& \propto \int \mathrm{D}z{\Theta}\left({C}_{\mu }-{w}_{\mu i}{x}_{i}-{{\Delta}}_{\mu \to i}-\sqrt{{V}_{\mu \to i}}z\right)\hfill \\ \hfill & =H\left(\frac{{w}_{\mu i}{x}_{i}+{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}\right).\hfill \end{align*}$

This reduces BP of (15) and (16) for updating equations with respect to 4NK variables, m_i→μ, χ_i→μ, Δ_μ→i, and V_μ→i, which are defined as edges in the factor graph. The BP algorithm is reduced as described above, and it is summarized with the pseudo-code in algorithm 2. The computational cost can be further reduced by expressing the BP algorithm to that for variables defined for nodes, which is sometimes referred to as generalized approximate message passing (GAMP) [17, 19, 20]. Its derivation and the pseudo code (algorithm 3) are provided in appendix B.

Algorithm 2. BP.

Input: $D=\left\{{w}_{\mu i},{C}_{\mu }\right\},{v}_{i},{x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N,\ \mu =1,\dots ,K),\beta$
Output: ${p}_{i}({x}_{i}\vert D)\ (i=1,\dots ,N,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$
${\mathcal{M}}_{i\to \mu }({x}_{i}){:=}\frac{{\text{e}}^{\beta {v}_{i}{x}_{i}}}{{\sum }_{{x}_{i}\;\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{\text{e}}^{\beta {v}_{i}{x}_{i}}}\ (i=1,\dots ,N,\ \mu =1,\dots ,K,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$
${\mathcal{M}}_{\mu \to i}({x}_{i}){:=}1/({x}_{i}^{\mathrm{max}}+1)\ (i=1\,\dots \,N,\ \mu =1\,\dots \,K,\ {x}_{i}=0\,\dots ,{x}_{i}^{\mathrm{max}})$
while messages not converged do
Δ_μ := 0, V_μ := 0 (μ = 1, ..., K)
${\mathcal{M}}_{i}({x}_{i}){:=}\frac{{\text{e}}^{\beta {v}_{i}{x}_{i}}}{{\sum }_{{x}_{i}\;\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{\text{e}}^{\beta {v}_{i}{x}_{i}}}\ (i=1,\dots ,N,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$
for i = 1 to N do	▹Pre-calculation
for μ = 1 to K do
${m}_{i\to \mu }{:=}{\sum }_{{x}_{i}\;\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{x}_{i}{\mathcal{M}}_{i\to \mu }({x}_{i})$
${\chi }_{i\to \mu }{:=}{\sum }_{{x}_{i}\;\in \left\{0,\dots ,{x}^{\mathrm{max}}\enspace \right\}}{x}_{i}^{2}{\mathcal{M}}_{i\to \mu }({x}_{i})-{m}_{i\to \mu }^{2}$
Δ_μ ← Δ_μ + w_μi m_i→μ
${V}_{\mu }{\leftarrow}{V}_{\mu }+{w}_{\mu i}^{2}{\chi }_{i\to \mu }$
for x_i = 0 to ${x}_{i}^{\mathrm{max}}\enspace$ do
${\mathcal{M}}_{i}({x}_{i}){\leftarrow}{\mathcal{M}}_{i}({x}_{i})\times {\mathcal{M}}_{\mu \to i}({x}_{i})$
for i = 1 to N do
for μ = 1 to K do
Δ_μ→i := Δ_μ − w_μi m_i→μ
${V}_{\mu \to i}{:=}{V}_{\mu }-{w}_{\mu i}^{2}{\chi }_{i\to \mu }$
for x_i = 0 to ${x}_{i}^{\mathrm{max}}\enspace$ do
${\mathcal{M}}_{\mu \to i}({x}_{i}){\leftarrow}H\left(\frac{{w}_{\mu i}{x}_{i}+{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}\right)$
${\mathcal{M}}_{i\to \mu }({x}_{i}){\leftarrow}{\mathcal{M}}_{i}({x}_{i})/{\mathcal{M}}_{\mu \to i}({x}_{i})$
Normalize ${\mathcal{M}}_{\mu \to i}({x}_{i}),{\mathcal{M}}_{i\to \mu }({x}_{i})$
for i = 1 to N do
for x_i = 0 to ${x}_{i}^{\mathrm{max}}\enspace$ do
${p}_{i}\left({x}_{i}\vert D\right){:=}{\text{e}}^{\beta {v}_{i}{x}_{i}}{\prod }_{\mu =1}^{K}\;{\mathcal{M}}_{\mu \to i}\left({x}_{i}\right)$
Normalize ${p}_{i}\left({x}_{i}\vert D\right)$

Three issues are noteworthy. The first issue is with respect to the necessary cost of computation. Given that it is necessary to assess summations over μ = 1, ..., K and i = 1, ..., N for each of i = 1, ..., N and μ = 1, ..., K, respectively, the computational cost of this algorithm is O(NK) per update, where ${x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N)$ is assumed as O(1). Furthermore, we should repeat the computation until convergence with respect to each choice of one item, which implies that the cost of finding an approximate solution increases with respect to O(NKT), where T denotes the number of selected items, as long as the number of iterations necessary for the convergence is O(1) per choice. Although this may not be low-cost, it is still feasible in many practical situations. At the selection of the next item, starting with the convergent solution for the last choice, which is termed as 'warm start', is effective for suppressing the number of iterations. In addition, as the leading order optimality is achieved by the greedy packing, we can limit the employment of MPGS to the 'final stage' of the solution search. For instance, after obtaining a solution by PECH_1.0, it would be reasonable to improve the solution by redoing the last 10% search by MPGS. In such cases, the practical system size is reduced considerably, for which the computational cost would not be a big problem.

The second is about the setting of β. The larger values emphasize the greediness of the solution search; the larger β prefers item types of the larger v_i. However, too large β prevents BP from converging due to the occurrence of RSB unless v_i is constant among the item types. Hence, we have to tune the value of β. In the experiments shown below, we set β = 2–10, for which BP converged.

The final issue is related to the validity of the current approximate treatment. The developed algorithm yields the exact results under appropriate conditions, as N → ∞, if w_μi's are provided as independent random variables sampled from a distribution with zero mean and finite variance [20, 21]. Unfortunately, they are biased to positive numbers in KPs, including GMDKP, which does not guarantee the accuracy of the obtained solutions. Nevertheless, the cavity/BP framework provides another advantage in terms of technical ease for computing marginal distributions. The naive mean field method (nMFM) [22] and Markov chain Monte Carlo (MCMC) are representative alternatives for assessing marginals. However, nMFM cannot be directly employed for (14) because $\mathrm{log}\,p\left(\boldsymbol{x}\vert D\right)$ diverges to −∞ for x that do not satisfy the weight constraints. Additionally, MCMC in systems such as (14), hardly converge as they are in frozen states at any temperature [23]. Hence, they offer a rational reason for selecting the cavity/BP framework as the basis of the approximate search algorithm. Replacing BP with expectation propagation (EP) [24, 25], which can somewhat incorporate the correlations among w_μi's, can be another option. However, EP requires a higher computational cost of $O\left({N}^{3}\right)$ per update than BP, which limits its employment to relatively small systems.

Figure 6 compares the rescaled difference ${N}^{-1/2}\left(U-N\mathcal{U}(\infty )\right)$ of the achieved total profit among OR-tools, PECH_1.0, and MPGS (based on BP) for N = 80 and x^max = 1 under the setting of figure 3 that corresponds to ${\sigma }_{V}^{2} > 0$ . Data of OR-Tools is plotted only for α = 0.1 as obtaining data for lager α is difficult due to the limitation of computational resources. Although the total profit of MPGS is considered lower than that of OR tools, it is larger than that of PECH_1.0 at all of the examined values of α. A similar tendency is also observed for ${\sigma }_{V}^{2}=0$ (figure 7). Meanwhile, error bars of MPGS for ${\sigma }_{V}^{2} > 0$ are considerably larger than those for ${\sigma }_{V}^{2} > 0$ , which may be due to the influence of RSB. This implies that the total profit for ${\sigma }_{V}^{2} > 0$ cases could be further improved by generalizing BP so as to take RSB into account [26, 27].

**Figure 7.** Rescaled difference from the leading order term $N\mathcal{U}$ of the achieved total profit for ${\sigma }_{V}^{2}=0$ , N = 80, and x^max = 1. Other parameters are the same as in figure 4. Error bars denote one standard error. Red curve represents the replica prediction (13) for the exact solution. Data of OR-Tools are plotted only for α = 0.1 and 0.5 as obtaining data for lager α is difficult due to the limitation of computational resources.
Download figure:
Standard image High-resolution image

5. Summary

In summary, we analyzed a random ensemble of GMDKP, which is a generalized version of the KP. The KP is a representative NP-hard optimization problem. Using the replica method, we assessed the achievable limit of the total profit under multiple weight constraints for typical samples of the ensemble. Our analysis showed that despite the NP-hardness, one can achieve a nearly optimal total profit that accords with the truly optimal value in the leading order with respect to the number of item types N with an O(N) computational cost. Several earlier studies report that KPs may be among the 'easiest' NP-hard problems [28–31]. Although the studies argue not the approximation accuracy but the computational cost for finding the exact solution, our analysis may offer a useful clue for understanding the 'easiness' of solving KPs.

We also developed a heuristic algorithm to improve the total profit in the sub-leading order by the cavity method. Extensive numerical experiments showed that the developed algorithm outperforms other existing algorithms.

In this study, we assumed that the weight and profit parameters of GMDKP were independently provided from certain distributions. However, these parameters can show some correlations in realistic problems. Hence, examining the property of solutions for such cases is an important future task.

Acknowledgments

Useful comments from anonymous referees are appreciated. This study was partially supported by JSPS KAKENHI Grant Nos. 21K21310 (TT), 17H00764 (YK), and JST CREST Grant No. JPMJCR1912 (YK).

Data availability statement

No new data were created or analysed in this study.

Appendix A.: Details of replica calculation

$\begin{align}\hfill \mathbb{E}[{Z}_{\beta }^{n}(\boldsymbol{\xi },\boldsymbol{\eta },M)]& =\int \prod\limits _{a\leqslant b}^{n}d{q}^{ab}\ {\left(\prod\limits _{a=1}^{n}{\mathbb{E}}_{\boldsymbol{u}}[{\Theta}(-WM-{u}^{a})]\right)}^{K}\ {\mathrm{Tr}}_{\left\{{x}_{i}^{a}\right\}}\prod\limits _{a< b}\delta \left(\sum\limits _{i=1}^{N}{x}_{i}^{a}{x}_{i}^{b}-N{q}^{ab}\right)\hfill \\ \hfill & \,\quad \times \prod\limits _{a=1}^{n}\delta \left(\sum\limits _{i=1}^{N}{({x}_{i}^{a})}^{2}-N{q}^{aa}\right)\prod\limits _{a=1}^{n}\delta \left(\sum\limits _{i=1}^{N}\left({x}_{i}^{a}-\frac{C}{W}\right)-\sqrt{N}M\right)\hfill \\ \hfill & \,\quad \times \prod\limits _{i=1}^{N}{\mathbb{E}}_{\boldsymbol{\eta }}\left[\mathrm{exp}\left(\sum\limits _{a=1}^{n}\beta (V+{\eta }_{i}){x}_{i}^{a}\right)\right]\hfill \\ \hfill & =\int \prod\limits _{a\leqslant b}^{n}\mathrm{d}{\hat{q}}^{ab}d{q}^{ab}\prod\limits _{a=1}^{n}\mathrm{d}{\hat{M}}^{a}\ {\left(\prod\limits _{a=1}^{n}{\mathbb{E}}_{\boldsymbol{u}}[{\Theta}(-WM-{u}^{a})]\right)}^{K}\hfill \\ \hfill & \,\quad \times {\mathrm{Tr}}_{\left\{{x}_{i}^{a}\right\}}\int \prod\limits _{i=1}^{N}D{y}_{i}\mathrm{exp}\left\{\sum\limits _{a< b}{\hat{q}}^{ab}\left(\sum\limits _{i=1}^{N}{x}_{i}^{a}{x}_{i}^{b}-N{q}^{ab}\right)\right.\hfill \\ \hfill & \hfill \\ \hfill & \,\enspace -\frac{1}{2}\sum\limits _{a=1}^{n}{\hat{q}}^{aa}\left(\sum\limits _{i=1}^{N}{({x}_{i}^{a})}^{2}-N{q}^{aa}\right)+\sum\limits _{a=1}^{n}{\hat{M}}^{a}\left(\sum\limits _{i=1}^{N}\left({x}_{i}^{a}-\frac{C}{W}\right)-\sqrt{N}M\right)\hfill \\ \hfill & \,\enspace \left.+\sum\limits _{a=1}^{n}\sum\limits _{i=1}^{N}\beta (V+{\sigma }_{V}{y}_{i}){x}_{i}^{a}\right\},\hfill \end{align} \tag{ A.1 }$

where ${\mathrm{Tr}}_{\left\{{x}_{i}^{a}\right\}}$ represents the summation with respect to all possible choices of $({x}_{i}^{a})\in {\left\{0,1,\dots ,{x}^{\mathrm{max}}\enspace \right\}}^{nN}$ . By introducing the RS assumption

$\begin{equation*}{q}^{ab}=\begin{cases}Q\quad \hfill & (a=b)\hfill \\ q\quad \hfill & (a\ne b)\hfill \end{cases},\hspace{25.0pt}{\hat{q}}^{ab}=\begin{cases}\hat{Q}\quad \hfill & (a=b)\hfill \\ \hat{q}\quad \hfill & (a\ne b)\hfill \end{cases},\hspace{25.0pt}{\hat{M}}^{a}=\hat{M}\ (a=1,\dots ,n),\end{equation*}$

we have

$\begin{align*}\hfill (\mathrm{A}.\mathrm{1})& =\int \mathrm{d}\hat{q}\,\mathrm{d}q\,\mathrm{d}\hat{Q}\,\mathrm{d}Q\,\mathrm{d}\hat{M}{\left(\int \mathrm{D}z{H}^{n}\left(\frac{WM/{\sigma }_{W}+\sqrt{q}z}{\sqrt{Q-q}}\right)\right)}^{K}\hfill \\ \hfill & \quad \times \mathrm{exp}\left\{\frac{n}{2}N\hat{Q}Q-\frac{n(n-1)}{2}N\hat{q}q-\frac{C}{w}nN\hat{M}-n\sqrt{N}\hat{M}M\right\}\hfill \\ \hfill & \quad \times {\mathrm{Tr}}_{\left\{{x}_{i}^{a}\right\}}\int \prod\limits _{i=1}^{N}D{y}_{i}\,\mathrm{exp}\left\{\sum\limits _{i=1}^{N}\left(-\frac{1}{2}\hat{Q}\sum\limits _{a=1}^{n}{({x}_{i}^{a})}^{2}+\hat{q}\sum\limits _{a< b}{x}_{i}^{a}{x}_{i}^{b}\right.\right.+\left.\left.(\hat{M}+\beta V+\beta {\sigma }_{V}{y}_{i})\sum\limits _{a=1}^{n}{x}_{i}^{a}\right)\right\}.\hfill \end{align*}$

The last term can be computed as follows:

$\begin{align*}\hfill & {\left({\mathrm{Tr}}_{\left\{{x}^{a}\right\}}\int \mathrm{D}y\,\mathrm{exp}\left\{-\frac{1}{2}\hat{Q}\sum\limits _{a=1}^{n}{({x}^{a})}^{2}+\hat{q}\sum\limits _{a< b}{x}^{a}{x}^{b}+(\hat{M}+\beta V+\beta {\sigma }_{V}y)\sum\limits _{a=1}^{n}{x}^{a}\right\}\right)}^{N}\hfill \\ \hfill & \,\enspace ={\left({\mathrm{Tr}}_{\left\{{x}^{a}\right\}}\int \mathrm{D}z\,\mathrm{exp}\left\{-\frac{\hat{Q}+\hat{q}}{2}\sum\limits _{a=1}^{n}{({x}^{a})}^{2}+\sqrt{\hat{q}+{\beta }^{2}{\sigma }_{V}^{2}}z\sum\limits _{a=1}^{n}{x}^{a}+(\hat{M}+\beta V)\sum\limits _{a=1}^{n}{x}^{a}\right\}\right)}^{N}\hfill \\ \hfill & \,\enspace ={\left(\int \mathrm{D}z{\left[\sum\limits _{x\in \left\{0,1,\dots ,{x}^{\mathrm{max}}\right\}}\mathrm{exp}\left\{-\frac{\hat{Q}+\hat{q}}{2}{x}^{2}+\left(\sqrt{\hat{q}+{\beta }^{2}{\sigma }_{V}^{2}}z+\hat{M}+\beta V\right)x\right\}\right]}^{n}\right)}^{N}.\hfill \end{align*}$

Finally, we have

$\begin{align*}\hfill & \underset{n\to 0}{\mathrm{lim}}\,\frac{\partial }{\partial n}\,\underset{N\to \infty }{\mathrm{lim}}\,\frac{1}{N}\,\mathrm{log}(\mathbb{E}[{Z}_{\beta }^{n}(\boldsymbol{\xi },\boldsymbol{\eta },M)])\hfill \\ \hfill & \quad =\underset{Q,q,\hat{Q},\hat{q},\hat{M}}{\text{extr}}\left\{\alpha \int \mathrm{D}z\,\mathrm{log}\,H\left(\frac{WM/{\sigma }_{W}+\sqrt{q}z}{\sqrt{Q-q}}\right)+\frac{1}{2}\hat{Q}Q+\frac{1}{2}\hat{q}q-\frac{C}{W}\hat{M}\right.\hfill \\ \hfill & \qquad \left.+\int \mathrm{D}z\,\mathrm{log}\left(\sum\limits _{x\in \left\{0,1,\dots ,{x}^{\mathrm{max}}\right\}}\mathrm{exp}\left\{-\frac{\hat{Q}+\hat{q}}{2}{x}^{2}+\left(\sqrt{\hat{q}+{\beta }^{2}{\sigma }_{V}^{2}}z+\hat{M}+\beta V\right)x\right\}\right)\right\}.\hfill \end{align*}$

Appendix B.: GAMP

GAMP provides the following update equations for node variables as follows:

$\begin{equation*}{a}_{i}{\leftarrow}\sum\limits _{\mu =1}^{K}\frac{{w}_{\mu i}^{2}}{{V}_{\mu }}{A}_{\mu },\end{equation*}$

$\begin{equation*}{\left.{m}_{i}{\leftarrow}\frac{\partial }{\partial h}{\phi }_{i}\left({a}_{i},\sum\limits _{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0},\end{equation*}$

$\begin{equation*}{\left.{\chi }_{i}{\leftarrow}\frac{{\partial }^{2}}{\partial {h}^{2}}{\phi }_{i}\left({a}_{i},\sum\limits _{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0},\end{equation*}$

for i = 1, ..., N, and

$\begin{equation*}{V}_{\mu }{\leftarrow}\sum\limits _{i=1}^{N}{w}_{\mu i}^{2}{\chi }_{i},\end{equation*}$

$\begin{equation*}{\left.{B}_{\mu }{\leftarrow}\frac{\partial }{\partial \theta }\,\mathrm{ln}\,H\left(\frac{{\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i}-{C}_{\mu }}{\sqrt{{V}_{\mu }}}-{B}_{\mu }+\theta \right)\right\vert }_{\theta =0},\end{equation*}$

$\begin{equation*}{\left.{A}_{\mu }{\leftarrow}-\frac{{\partial }^{2}}{\partial {\theta }^{2}}\,\mathrm{ln}\,H\left(\frac{{\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i}-{C}_{\mu }}{\sqrt{{V}_{\mu }}}-{B}_{\mu }+\theta \right)\right\vert }_{\theta =0},\end{equation*}$

for μ = 1, ..., K, where ${\phi }_{i}\left(a,b\right)=\mathrm{ln}\left({\sum }_{{x}_{i}=0}^{{x}_{i}^{\mathrm{max}}}\mathrm{exp}\left(-\frac{a}{2}{x}_{i}^{2}+b{x}_{i}\right)\right)$ . After obtaining these variables, the marginal distributions are assessed as ${p}_{i}\left({x}_{i}\vert D\right)\!\propto \!\mathrm{exp}\left\{-\frac{{a}_{i}}{2}{x}_{i}^{2}\!+\!\left({\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}\right.\right.\left.\left.{m}_{i}+\beta {v}_{i}\right){x}_{i}\right\}$ . This update rule is summarized in algorithm 3.

Algorithm 3. GAMP.

Input: $D=\left\{{w}_{\mu i},{C}_{\mu }\right\},{v}_{i},{x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N,\ \mu =1,\dots ,K),\beta$ $D=\left\{{w}_{\mu i},{C}_{\mu }\right\},{v}_{i},{x}_{i}^{\mathrm{max}}\enspace (i=1,\dots ,N,\ \mu =1,\dots ,K),\beta$

Output: ${p}_{i}({x}_{i}\vert D)\ (i=1,\dots ,N,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$ ${p}_{i}({x}_{i}\vert D)\ (i=1,\dots ,N,\ {x}_{i}=0,\dots ,{x}_{i}^{\mathrm{max}})$

while not converged do

for i = 1 to N do

${a}_{i}{\leftarrow}{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}^{2}}{{V}_{\mu }}{A}_{\mu }$ ${a}_{i}{\leftarrow}{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}^{2}}{{V}_{\mu }}{A}_{\mu }$

${\left.{m}_{i}{\leftarrow}\frac{\partial }{\partial h}{\phi }_{i}\left({a}_{i},{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}$ ${\left.{m}_{i}{\leftarrow}\frac{\partial }{\partial h}{\phi }_{i}\left({a}_{i},{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}$

${\left.{\chi }_{i}{\leftarrow}\frac{{\partial }^{2}}{\partial {h}^{2}}{\phi }_{i}\left({a}_{i},{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}$ ${\left.{\chi }_{i}{\leftarrow}\frac{{\partial }^{2}}{\partial {h}^{2}}{\phi }_{i}\left({a}_{i},{\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}$

for μ = 1 to K do

${V}_{\mu }{\leftarrow}{\sum }_{i=1}^{N}{w}_{\mu i}^{2}{\chi }_{i}$ ${V}_{\mu }{\leftarrow}{\sum }_{i=1}^{N}{w}_{\mu i}^{2}{\chi }_{i}$

${\left.{B}_{\mu }{\leftarrow}\frac{\partial }{\partial \theta }\,\mathrm{ln}\,H\left(\frac{{\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i}-{C}_{\mu }}{\sqrt{{V}_{\mu }}}-{B}_{\mu }+\theta \right)\right\vert }_{\theta =0}$ ${\left.{B}_{\mu }{\leftarrow}\frac{\partial }{\partial \theta }\,\mathrm{ln}\,H\left(\frac{{\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i}-{C}_{\mu }}{\sqrt{{V}_{\mu }}}-{B}_{\mu }+\theta \right)\right\vert }_{\theta =0}$

${\left.{A}_{\mu }{\leftarrow}-\frac{{\partial }^{2}}{\partial {\theta }^{2}}\,\mathrm{ln}\,H\left(\frac{{\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i}-{C}_{\mu }}{\sqrt{{V}_{\mu }}}-{B}_{\mu }+\theta \right)\right\vert }_{\theta =0}$ ${\left.{A}_{\mu }{\leftarrow}-\frac{{\partial }^{2}}{\partial {\theta }^{2}}\,\mathrm{ln}\,H\left(\frac{{\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i}-{C}_{\mu }}{\sqrt{{V}_{\mu }}}-{B}_{\mu }+\theta \right)\right\vert }_{\theta =0}$

for i = 1 to N do

for x_i = 0 to ${x}_{i}^{\mathrm{max}}\enspace$ ${x}_{i}^{\mathrm{max}}\enspace$ do

${p}_{i}\left({x}_{i}\vert D\right){:=}\mathrm{exp}\left\{-\frac{{a}_{i}}{2}{x}_{i}^{2}+\left({\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}\right){x}_{i}\right\}$ ${p}_{i}\left({x}_{i}\vert D\right){:=}\mathrm{exp}\left\{-\frac{{a}_{i}}{2}{x}_{i}^{2}+\left({\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}+\beta {v}_{i}\right){x}_{i}\right\}$

Normalize ${p}_{i}\left({x}_{i}\vert D\right)$ ${p}_{i}\left({x}_{i}\vert D\right)$

Its derivation is as follows. We employ Taylor's expansion of $\mathrm{ln}\,H\left(\frac{{w}_{\mu i}{x}_{i}+{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}\right)$ up to the second order of $\frac{{w}_{\mu i}{x}_{i}}{\sqrt{{V}_{\mu \to i}}}$ handling $\frac{{w}_{\mu i}{x}_{i}}{\sqrt{{V}_{\mu \to i}}}$ as a small number, which leads to the following expression.

$\begin{equation*}H\left(\frac{{w}_{\mu i}{x}_{i}+{{\Delta}}_{\mu \to i\ }-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}\right)\propto \mathrm{exp}\left(-\frac{{w}_{\mu i}^{2}{A}_{\mu \to i}}{2{V}_{\mu \to i\ }}{x}_{i}^{2}+\frac{{{w}_{\mu i}B}_{\mu \to i}}{\sqrt{{V}_{\mu \to i\ }}}{x}_{i}\right),\end{equation*}$

where ${A}_{\mu \to i}=-\frac{{\partial }^{2}}{\partial {\theta }^{2}}\,\mathrm{ln}\,{\left.H\left(\frac{{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}+\theta \right)\right\vert }_{\theta =0}$ and ${B}_{\mu \to i}=\frac{\partial }{\partial \theta }\,\mathrm{ln}\,{\left.H\left(\frac{{{\Delta}}_{\mu \to i}-{C}_{\mu }}{\sqrt{{V}_{\mu \to i}}}+\theta \right)\right\vert }_{\theta =0}$ .

This provides the mean and variance of ${\mathcal{M}}_{i\to \mu }\;\left({x}_{i}\right)$ as

$\begin{equation*}{m}_{i\to \mu }=\frac{\partial }{\partial h}{\phi }_{i}\left({a}_{i\to \mu },{b}_{i\to \mu }+\beta {v}_{i}+h\right){\left.\right\vert }_{h=0},\end{equation*}$

$\begin{equation*}{\chi }_{i\to \mu }=\frac{{\partial }^{2}}{\partial {h}^{2}}{\phi }_{i}\left({a}_{i\to \mu },{b}_{i\to \mu }+\beta {v}_{i}+h\right){\left.\right\vert }_{h=0},\end{equation*}$

and those of ${p}_{i}\left({x}_{i}\vert D\right)$ , m_i, and χ_i, as

$\begin{equation*}{m}_{i}=\frac{\partial }{\partial h}{\phi }_{i}\left({a}_{i},{b}_{i}+\beta {v}_{i}+h\right){\left.\right\vert }_{h=0},\end{equation*}$

$\begin{equation*}{\chi }_{i}=\frac{{\partial }^{2}}{\partial {h}^{2}}{\phi }_{i}\left({a}_{i},{b}_{i}+\beta {v}_{i}+h\right){\left.\right\vert }_{h=0},\end{equation*}$

where ${a}_{i\to \mu }={\sum }_{\nu \ne \mu }\frac{{w}_{\nu i}^{2}}{{V}_{\nu \to i}}{A}_{\nu \to i}$ , ${b}_{i\to \mu }={\sum }_{\nu \ne \mu }\frac{{w}_{\nu i}}{\sqrt{{V}_{\nu \to i}}}{B}_{\nu \to i}$ , ${a}_{i}={\sum }_{\mu =1}^{K}\frac{{w}_{\mu i}^{2}}{{V}_{\mu \to i}}{A}_{\mu \to i}$ , and ${b}_{i}={\sum }_{\mu =1}^{K}\frac{{w}_{\mu \to i}}{\sqrt{{V}_{\mu \to i}}}{B}_{\mu \to i}.$ The small size of $\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu \to i}}}$ validates handling a_i→μ ≃ a_i and χ_i→μ ≃ χ_i for i = 1, ..., N and μ = 1, ..., K. This yields ${V}_{\mu \to i}\simeq {V}_{\mu }={\sum }_{i=1}^{N}{w}_{\mu i}^{2}{\chi }_{i}$ for μ = 1, ..., K, and i = 1, ..., N. Similarly, as the difference between Δ_μ→i and ${{\Delta}}_{\mu }={\sum }_{i=1}^{N}{w}_{\mu i}{m}_{i\to \mu }$ is relatively small, ${A}_{\mu \to i}\simeq {A}_{\mu }=-\frac{{\partial }^{2}}{\partial {\theta }^{2}}\,\mathrm{ln}\,{\left.H\left(\frac{{{\Delta}}_{\mu }-{C}_{\mu }}{\sqrt{{V}_{\mu }}}+\theta \right)\right\vert }_{\theta =0}$ . Furthermore, we expand m_i→μ as ${m}_{i\to \mu }\simeq \frac{\partial }{\partial h}{\phi }_{i}{\left.\left({a}_{i},{b}_{i}-\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu \to i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}\simeq \frac{\partial }{\partial h}{\phi }_{i}{\left.\left({a}_{i},{b}_{i}+\beta {v}_{i}+h\right)\right\vert }_{h=0}-\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu \to i}\frac{{\partial }^{2}}{\partial {h}^{2}}{\phi }_{i}\left({a}_{i},{b}_{i}+\beta {v}_{i}+h\right){\left.\right\vert }_{h=0}={m}_{i}-\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu \to i}{\chi }_{i}\simeq {m}_{i}-\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }{\chi }_{i}$ and provide

$\begin{equation*}{{\Delta}}_{\mu }=\ \sum\limits _{i=1}^{N}{w}_{\mu i}{m}_{i\to \mu }\simeq \sum\limits _{i=1}^{N}{w}_{\mu i}{m}_{i}-\sum\limits _{i=1}^{N}\frac{{w}_{\mu i}^{2}{\chi }_{i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }=\sum\limits _{i=1}^{N}{w}_{\mu i}{m}_{i}-\sqrt{{V}_{\mu }}{B}_{\mu },\end{equation*}$

where ${\left.{B}_{\mu }=\frac{\partial }{\partial \theta }\,\mathrm{ln}\,H\left(\frac{{{\Delta}}_{\mu }-{C}_{\mu }}{\sqrt{{V}_{\mu }}}+\theta \right)\right\vert }_{\theta =0}$ . Additionally, Taylor's expansion, ${B}_{\mu \to i}\simeq {B}_{\mu }-\left(\frac{{\partial }^{2}}{\partial {\theta }^{2}}{\left.\,\mathrm{ln}\,H\left(\frac{{{\Delta}}_{\mu }-{C}_{\mu }}{\sqrt{{V}_{\mu }}}+\theta \right)\right\vert }_{\theta =0}\right)\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{m}_{i}=\ {B}_{\mu }+\frac{{w}_{\mu i}{A}_{\mu }}{\sqrt{{V}_{\mu }}}{m}_{i}$ yields

$\begin{equation*}{b}_{i}=\sum\limits _{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu \to i}\simeq \sum\limits _{\mu =1}^{K}\frac{{w}_{\mu i}}{\sqrt{{V}_{\mu }}}{B}_{\mu }+{a}_{i}{m}_{i}.\end{equation*}$

The aforementioned expressions provide the update equations for the node variables.

Statistical mechanics analysis of generalized multi-dimensional knapsack problems

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Problem setup

3. Statistical mechanics analysis on the optimal solution

4. Search algorithms

4.1. Achieving leading order optimality by greedy packing

4.2. Performance improvement in sub-leading order by cavity method

5. Summary

Acknowledgments

Data availability statement

Appendix A.: Details of replica calculation

Appendix B.: GAMP

Statistical mechanics analysis of generalized multi-dimensional knapsack problems

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Problem setup

3. Statistical mechanics analysis on the optimal solution

4. Search algorithms

4.1. Achieving leading order optimality by greedy packing

4.2. Performance improvement in sub-leading order by cavity method

5. Summary

Acknowledgments

Data availability statement

Appendix A.: Details of replica calculation

Appendix B.: GAMP