Optimization approach in quadratic assignment and graph isomorphism

Graph matching problem, also known as quadratic assignment problem (QAP) in a practical context, aims at finding the best correspondence between two given graphs. If two nodes are connected in one graph, it would be ideal if their corresponding images are connected in the other. The optimal solution should maximize the number of matches between the two graphs, while minimizing the weight of mismatches. Graph isomorphism is a particular case where an exact correspondence with no mismatches could be found. The quadratic assignment problem itself is hard to solve due to its high computational complexity. In this paper, the researcher analyzes two optimization approaches in detail: convex relaxation and the Fast Approximate QAP algorithm, both trying to solve the problem with lower computational effort. The performances of these two methods are further compared through numerical experiments. This paper focuses on pairs of equal-sized undirected graphs. ρ correlated graphs are constructed to simulate the noise in inexact graph matching problems. After comparing two existing methods, we propose an even faster optimization approach with comparable accuracy. This new approach provides a particular way to approximate an ideal initialization for the traditional Fast Approximate QAP algorithm. The significant reduction in the runtime makes obtaining good correspondence between large graphs feasible.


Introduction
Graph matching problem considers finding a bijective correspondence between the vertex sets of two equal-sized graphs that minimizes edge disagreements [1].With two nodes in one graph connected, we expect the correspondence we sought connects the images of these two nodes.Graph matching has aroused heated discussion in a wide range of scientific fields and practical problems, including computational biology, machine learning, and computer vision, and it is frequently formulated as a quadratic assignment problem (QAP) [2].If the exact isomorphism between two graphs exists, the quadratic assignment problem is equivalent to seeking the global optimal that maintains the vertex adjacency [3].However, the two graphs given might not be of the same structure in real-world applications, so it is impossible to always find a bijection providing exact matches [4].Thus, in inexact matching cases, the optimal correspondence should minimize the quadratic disagreements, i.e., the number of mismatches.
Graph matching problem has been studied and implemented in various scientific fields.However, it is extremely challenging in terms of computational complexity.The complexity of a graph matching problem is neither classified as NP-complete nor P [5].Although it can be solved directly in small graphs with good properties, it is impossible to solve for graphs in most real-life cases involving at least millions of vertices.In general, the quadratic assignment is intractable in polynomial time.Therefore, it is important to study different optimization approaches to make the original problem tractable and with acceptable accuracy.
This paper mainly focuses on exact and inexact graph matching problems for pairs of equal-sized undirected graphs.We analyze two optimization approaches: convex relaxation and the Fast Approximate QAP (FAQ) algorithm.Both of them solve the problem with lower computational effort.The original problem is often relaxed as a convex quadratic program, in which the space of doubly stochastic matrices replaces the set of permutation matrices [1].The Fast Approximate Quadratic Assignment algorithm is designed to efficiently and accurately match the large graphs given by big data [6].We will further compare the performances of these two methods by conducting numerical experiments.After comparing, we propose an even faster two-step approximate approach, whose accuracy is comparable with the best result of the previous two methods under the same scenario.The newly proposed two-step approximate approach significantly reduces the runtime, which makes obtaining optimal correspondence in large graphs much more feasible.

Mathematical description
Let   and   be the two graphs, with   = (  , ),   = (  , ).  is the vertex set of   , and  is the corresponding adjacency matrix.Since   and   are equal-sized undirected graphs, we have   =   and ,  are symmetric.Quadratic assignment problem implies finding the bijective map between   and   , such that the number of mismatches is minimized.Finding a bijective map is the same as finding a permutation matrix  in  = { ∈ {0,1} × :   =   ,   ⊤  =   ⊤ }.We could also write it as a permutation τ on [] s.t.τ() =  if  , = 1 [1]. is the cardinality of   and   .The correspondence maps node  in   to node τ() in   , node  in   to node τ() in   .Thus, the image of  , is  τ(),τ() .A mismatch exists when  , ≠  τ(),τ() .Viewing the mapping as a whole, the quadratic disagreements is measured as the distance between  and the  ⊤ .Therefore, we should minimize ‖ −  ⊤ ‖.
If   and   are isomorphic, the optimal correspondence  should make the distance between  and  ⊤  equal to zero [7].In the following sections, we will use Frobenius norm to measure the distance between  and  ⊤ .The mathematical description of graph matching problem is finding permutation  * s.t.

Convex relaxation
Since the target set of the original graph matching problem is the set of permutation matrices, which is not a convex set, it is frequently relaxed as a convex quadratic program in which the space of doubly stochastic matrices is substituted for the space of permutations [1].In this case, the target set is amplified to be a convex and continuous set [8].Using the "cvxpy" package in Python, we can perform numerical experiments on this convex relaxed program and compare the loss in various scenarios.

Birkhoff polytope and convexity
The Birkhoff polytope is the polytope of doubly stochastic matrices named after Garrett Birkhoff.A stochastic matrix is a square matrix used to describe the transitions of a Markov chain.A doubly stochastic matrix P is both right stochastic and left stochastic.Thus,  =  ⊤  = .Note that  is the column vector of 1 in every entry.
The Birkhoff polytope,  = { ≥ 0:  =  ⊤  = }, is a convex polytope in  × .It has ! vertices since there are ! different permutations among  items.According to Birkhoff-von Neumann theorem, the vertices of the Birkhoff polytope are the permutation matrices.Thus, every doubly stochastic matrix can be written as a convex combination of permutation matrices.Birkhoff polytope in  × is the convex hull of the set of  ×  permutation matrices [9].The relaxed graph matching problem becomes finding a doubly stochastic matrix  * in  = { ≥ 0:  =  ⊤  = } s.t.
To identify a convex program, the objective function, () = ∥ − ∥  2 , should also be a convex function.This could be proved by triangle inequality.Alternatively, we can prove explicitly that () is a quadratic program using Kronecker product and vectorization [10].Therefore, the relaxed quadratic assignment problem in Equation ( 2) is a convex problem.

Projection after relaxation
Aflalo's paper defines a broad class of graphs where the relaxed program guarantees the optimal correspondence, thus the problem is relaxed with no loss.It proves that if   and   are isomorphic and both satisfies (1) all eigenvalues are distinct (2) the entrywise sum of each eigenvector never equal to 1, the convex relaxed program is equivalent to the original problem [1].They share the same global optimal.Therefore, if two graphs   and   with these properties are provided, the loss of the relaxed program is quite informative to determine whether the exact isomorphism exists.We can infer there is no exact matching whenever the disagreement is larger than zero.
Nonetheless, when the properties are not satisfied in   or   , we should be aware that in general cases, the relaxed program gives a non-permutation output.The doubly stochastic matrix  * given by the convex relaxed program has to be projected back onto the space of permutation matrices to give a valid solution for the original quadratic assignment problem [1].In mathematical form, the projection step is described as obtaining a  ×  permutation matrix  * s.t.
Using Hungarian Algorithm, the projection step has a polynomial solver.We implement it through Python package "lap" in simulation.

Relax with noise: numerical experiment on 𝜌 correlated graphs
In an inexact graph matching problem, the convex relaxed program cannot guarantee how close the obtained correspondence is to the optimal one.It is worthwhile to evaluate the performance of the convex relaxation with projection step for non-isomorphic   and   .In the following discussion and numerical experiments, the noise level between two graphs is measured by between graph correlation.
Construct   and   with noise of level  ∈ [0,1]: 1. Let  be symmetric matrix with all zeros on its diagonal.Other entries of  can be any values in [, 1 − ],  ∈ (0, 1 2 ). 2.  , ∼ ( , ), and each entry in  is independent of the others.Let  and  be the adjacency matrices for  correlated () graphs with  ∈ (0, ) ,  , ∈ [, 1 − ] for all  ≠ .Take  ∈ .Let  =  ⊤ .According to the theorem in Lyzinski's paper, if  is strictly smaller than 1, it almost always holds that  ∉ argmin ∈ ∥ − ∥  [3].This theorem can be proved using the KKT condition, and the Borel-Cantelli Lemma.When the primal problem is convex, and with differentiable objective and constraint functions, the KKT condition is sufficient to verify whether a point is the primal and dual optimal or not [8].In other words, by comparing the KarushKuhn-Tucker point to the planted solution, we determine whether the convex relaxed program can recover the global optimal.
To conclude, the relaxed problem and the original one are almost always not equivalent if graph correlation is smaller than 1.Even with the projection step, numerical experiments verify that the optimal correspondence could not be found in most cases, since the doubly stochastic result is too far from the ground truth [3].
Lyzinski's work also presents that if (1 − )(1 − ) < 0.5, it almost always holds that the planted solution  is the only global optimal.If this condition is satisfied, the align rate could be the metric for loss.The higher the align rate, the lower the quadratic disagreements.We set  = 0.3 in the simulations, thus  is the only optimal correspondence when  is larger than 0.29.In spite of the fact that the convex relaxed program does not provide an optimal solution to the original graph matching problem under certain conditions, we are still intrigued about the extent to which the optimal correspondence can be recovered.Therefore, we implement the convex relaxed program and project the solution onto the permutation space.Without loss of generality,  =  is chosen as an in-built optimal for numerical experiments.We then compute how the projected result aligns with the in-built correspondence.Denote  ̂ as the final result, the align rate is defined as According to Figure 3, an obvious transition occurs around  = 0.8, indicating that the quality of this optimization approach is acceptable when  is sufficiently large.Though the convex relaxation cannot guarantee complete success, it is applicable depending on the noise level between the two graphs.

Fast approximate algorithm description
Though we are aware that the convex relaxed program performs well in certain graph isomorphism scenarios and provides an acceptable rate of alignment when the correlation between two graphs is sufficiently large, there are two main concerns with this optimization approach.To begin with, the convex relaxed program fails to retrieve the global optimal for most inexact graph matching problems.Consequently, we wish to find an alternative strategy that yields a higher rate of alignment.Moreover, when the size of graphs increases, the runtime of the convex relaxed program explodes for finding a doubly stochastic optimal matrix  * .In practice, however, the vast majority of problems involve a massive number of nodes, which reduces the applicability of the convex relaxed approach.Therefore, we intend to seek a faster approach that performs well on exact and inexact matching problems.
4. Compute the step size .
To minimize the original problem along the line segment from  0 to  0 , we should find  ∈ [0,1] s.t.
Since  is a quadratic function of , the exact solution is easy to obtain.
, the one gives minimum is the .Otherwise, we only compare the values at 0 and 1, and preserve the one giving smaller  value.

Numerical experiment on runtime and different initializations
The most time-consuming part of the Fast Approximate algorithm is the linear assignment.Compared with convex relaxation with the projection step, the Fast Approximate algorithm with a noninformative initialization has a significantly faster runtime, particularly as the number of vertices increases.
Considering the same sequence of exact graph matching problems, the runtime comparison between two programs is shown in Figure 4. Since the objective function in Equation ( 5) leads to an indefinite program, finding the optimal correspondence is not guaranteed.If the non-informative initialization  =     ⊤  is employed, it is likely that the algorithm will halt at a local minimum.This also explains why the algorithm fails at some exact matching cases.The importance of selecting an appropriate initialization becomes obvious in inexact matching problems.The outcome is much improved if the initialization is close to the global optimal, hence reducing the likelihood of being trapped in a local minimum.In this instance, the solution provided by Equation ( 2) might be a reasonable option for initialization, since it is relatively close to the optimal correspondence.At the very least, it is superior to the non-informative one.In the following numerical experiments, the performance of the Fast Approximate approach with various initializations is presented.It illustrates how adjusting the initialization may significantly affect the error.We reuse the  correlated graph construction method with  = 0.3.For the same sequence of inexact matching problems, we set different initializations:  computed by the convex relaxed program and the non-informative initialization , and check the align rate of the outcome  * with the planted solution . Figure 5 demonstrates that when  is larger than 0.6, the initialization has a crucial impact.The Fast Approximate program with convex relaxed initialization performs significantly better than the program with non-informative initialization.If  is sufficiently large,  ∈ (0.8,1], the program with initialization  could almost retrieve the planted correspondence .This result is better than the result in Figure 3, where the planted solution is lost in a wider range of .

Error comparison of existing methods
In the preceding sections, we discussed two optimization approaches, and the second approach has two initializations proposed.In addition to the align rate with the planted solution , we also wish to examine how the absolute error, ‖ − ‖  , varies with graph correlation.Figure 6 shows that the Fast Approximate algorithm with initialization  delivers the best performance with the lowest error compared to other algorithms.However, the time complexity of solving the initialization  remains unacceptable in large graph matching problems, as it relies on the convex relaxed program.Therefore, developing a more efficient method to approximate this initialization is necessary.If we overcome this limitation, it is conceivable to accurately solve the graph matching problems with a relatively large number of nodes.After changing the objective function, the updated Fast Approximate algorithm does not converge as fast as before.It takes more iterations to reach the stopping criteria.Figure 7 takes a series of inexact graph matching problems as examples and shows how the error decreases with an increasing number of iterations.Regardless of the graph size, the error always decreases fast at the beginning and changes slightly after a modest number of iterations.Thus, it is necessary to constrain the number of iterations to avoid meaningless oscillation.We set 1000 as the maximum number of iterations in the following numerical experiments.

Double fast approximate algorithm
The Fast Approximate program with initialization  yields outstanding results in inexact graph matching problems, and an updated Fast Approximate program can approximate this initialization using a different objective function.The proposed Double Fast Approximate approach utilizes the Fast Approximate algorithm twice.First, a proper initialization   ≈  is obtained by applying the definite Fast Approximate approach with a norm objective and a non-informative initialization.Then, we implement the classical indefinite Fast Approximate algorithm initialized by   to obtain the solution to the original problem.Using this two-step approximate approach, it is worth investigating whether solving a much larger inexact matching problem efficiently and accurately is possible.To verify the validity of our assumption regarding the first step initialization approximation, we compare the extra error when initialization  is solved by the convex relaxed program versus when it is approximated by the definite Fast Approximate program with norm objective.The extra error is computed by subtracting the Frobenius norm error derived from the optimal correspondence from the error given by our solution.In Figure 8, we observe that regardless of the graphs' size, even if  is acquired from different programs, the final extra errors are comparable.Therefore, approximating initialization  using the definite Fast Approximate method is legitimate and does not result in additional loss compared with  calculated by the convex relaxed program.
The Fast Approximate algorithm usually has a significantly better computational complexity than the convex relaxation.After verifying that the definite Fast Approximate program with a new objective function preserves the accuracy while obtaining the proper initialization , simulations are conducted to demonstrate how this program saves us runtime when it approximates the initialization.The same series of inexact matching problems are given to both of the following programs: (1) Solve the convex relaxed program to find   = .(2) Use  as the initialization for the Fast Approximate program with the norm objective to obtain   ≈ .
The scatter plot, Figure 9, shows how the runtime of two programs varies as the graph size increases.Though the convex relaxed program slightly outperforms the Fast Approximate approach for smallsized graphs, the Fast Approximate algorithm has a much superior runtime for large-sized graphs.To understand the relationship between graph size and runtime of different programs, we find the leading order of the polynomial regression.The results from the previous experiment are of the form (, ). is runtime calculated in seconds.Assume that there are constants  and  s.t.() = () + ().
The leading order  helps evaluate the runtime as (  ).To improve the estimation accuracy of the order , we allow the Fast Approximate program to solve problems with larger dimensions, such as graphs with more than 100 nodes.It turns out that the order of the convex relaxed program   is much higher than that of the definite Fast Approximate approach   .Since we have limited data points collected, the exact solution of  has not converged to an integer yet.But the gap between   and   is obvious.This runtime difference in obtaining initialization benefits our two-step approximate approach.It results in at least 7.3 times faster runtime for obtaining an ultimate correspondence in large (exceeds 60 vertices) graph matching problems when compared to the conventional convex relaxed approach.The superiority of the Double Fast Approximate approach becomes even more evident when we seek to obtain a good correspondence between larger graphs.We then implement the Double Fast Approximate approach for quadratic assignment problems with  = 100.The matching problems with this graph size are easy to solve using convex relaxation.From Figure 10, we discover that our two-step approximation program starts to provide a decent align rate when  is larger than 0.75, similar to the previous experiment utilizing initialization   , whereas the runtime is much faster.Therefore, the optimization approach we developed solves large-sized inexact matching problems efficiently and precisely.We can now consider graph matching problems in a broader size range.In the future, we can also explore the range of graph correlation in which the exact alignment is guaranteed.

Discussion
Two optimization approaches are analyzed in detail, solving the exact and inexact graph matching problems for equal-sized undirected graphs.Numerical simulations are conducted to compare their performances under several scenarios.In the convex relaxed program, the matching problem between two isomorphic graphs, whose all eigenvalues are distinct and the entrywise sum of each eigenvector does not equal one, can be solved exactly.However, if the two graphs are not isomorphic, the convex relaxed program does not recover the in-built solution.After projecting the doubly stochastic solution onto the set of permutation matrices, the align rate increases with the between graph correlation, and there is a transition value of the correlation from which the align rate increases rapidly.An important limitation of the convex relaxed program is its computational complexity.To reduce the runtime for large  and acquire better alignment performance regarding inexact matching problems, we analyze the Fast Approximate algorithm, which effectively reduces the runtime and provides a better alignment rate if a good initialization is chosen.For this algorithm, choosing a proper initialization is crucial; otherwise, the optimal correspondence is not guaranteed even between two isomorphic graphs.
To preserve the accuracy and simultaneously further reduce the runtime, we proposed the Double Fast Approximate approach.Use a definite Fast Approximate method with norm objective to get an initialization close to the doubly stochastic solution given by the convex relaxed program.With such an initialization, we then run the indefinite Fast Approximate program with trace objective.Our two-step approximate approach, combining the ideas from the two existing optimization approaches, does preserve the desirable accuracy and significantly reduces runtime when solving inexact matching problems.The advantages are even more apparent when the graph size gets larger, which makes obtaining good correspondence between large graphs feasible.
In the future, this research can be extended in several directions.We want to fine-tune the stopping criteria for the Fast Approximate algorithm and compare how the gap between   and   changes.We should try a series of parameters and explore how the accuracy of the initialization approximation step in our Double Fast Approximate approach varies with the stopping criteria.In addition, it remains to be discussed if any other initialization is superior to  and if there are other optimization procedures that approximate it with less computational cost.Furthermore, in our numerical experiments, we use an in-built permutation matrix  to construct   according to   .Our evaluation of accuracy is dependent on the recovery rate of .However, there will be some cases in which the planted solution  is not the global optimal permutation to the original problem described in Equation (1), for example, when the noise is too large.Our numerical experiments are valid because we create the graphs in accordance with the theorem established in Lyzinski's paper, which shows that, under certain conditions, the planted correspondence is the only global optimal [3].However, it is important to prove the precise threshold rigorously when the planted correspondence is no longer the only global optimal or even not optimal.In the context of  correlated graphs, the discussion of the align rate will only be meaningful when the planted solution is the only global optimal, since it guarantees that the higher the align rate, the lower the quadratic disagreements.

Conclusion
In this study, the convex relaxation and Fast Approximate QAP algorithm for solving undirected graph matching problems are examined in depth.When two graphs are not isomorphic, the convex relaxed program seldom identifies the global optimal.Moreover, its runtime becomes unacceptable when the graph size exceeds a particular value.The Fast Approximate algorithm with a proper initialization chosen can effectively reduce the runtime and provide a better rate of alignment.However, the computational effort required to obtain an ideal initialization must be considered.The Double Fast Approximate approach we proposed optimizes both the initialization and the outcome in large graph matching problems.It results in a runtime that is at least 7.3 times faster than the conventional convex relaxed approach for obtaining an ultimate correspondence in graph matching problems with more than 60 vertices.As the number of vertices increases, the superiority of the Double Fast Approximate method becomes even more evident.The significant reduction in runtime makes obtaining good correspondence in real-life graph matching problems more efficient.

Figure 3 .
Figure 3. Change of align rate by correlation.

Figure 6 .
Figure 6.Change of error by correlation.
The classical Fast Approximate algorithm uses () = −trace( ⊤  ⊤ ) as the objective function, which makes the program indefinite.If we change () to the convex function, () = ∥ − ∥  2 , the program becomes definite.The updated Fast Approximate algorithm with norm objective brings us around the global minimum before it stops.Without the projection step, the updated program should give a solution close to what the convex relaxed program provides.The changes triggered by the convex objective function are as follow.For step 2, we now have ∇   = 2 ⊤  − 2 ⊤  − 2 ⊤ + 2 ⊤ .For step 4, the program is convex and has an exact solution.Step 7 is skipped.Other steps remain the same as the original algorithm.

Figure 8 .
Figure 8. Extra error given by initialization  from different programs.