A New Fast Computation of a Permanent

Store-zechin is a general algorithm for computing the permanent of a square matrix, and the core ideas of this algorithm are multiplexing, recursion and storage. It means we just calculate once for every sub-items and for the second time, the previous result is substituted into the calculation. The advantage of Store-zechin algorithm is that it can make full use of computer memories and accelerate the calculations. In fact, it needs 2 n-1 (n- 2) + 1 additions and (2 n-1 - 1)n multiplications for computing the permanent of an n ± n matrix by Store-zechin algorithm. In the same situation, Ryser algorithm requires (2 n - n)(n + 1) – 2 additions and (2 n - 1) (n – 1) multiplications, R-NW algorithm requires 2 n-1 (n + 1) + n 2 - n -1 additions and 2 n-1 n + n + 2 multiplications. So Store-zechin has n2 n-1 +2 n+1 -n 2-n-3 additions, n2 n-1 -2 n +1 multiplications less than the Ryser algorithm, and 2 n-1 +2 n +n 2-n-2 additions, 2n+2 multiplications less than the R-NW algorithm. It can be confirmed that the Store-zechin can indeed calculate a permanent in fewer steps.


Introduction
In the year of 1812, Cauchy used the determinant as a special type of alternating symmetry functions.In order to distinguish it from ordinary symmetry functions, it is called "fonction symetriques permanents [1]".In the meantime, Cauchy introduced a subclass of the symmetric functions which was later named as permanents by T. Muir [2].The computation of the permanent of a matrix is known to be more difficult than the computation of the determinant.The difficulty of computing a permanent is directly proportional to the difficulty of a boson sampling problem.In recent years, with the advance of quantum computing technologies, a permanent is often regarded as a measure of the quantum supremacy by which people can determine whether quantum computers are worthy of research and development.Therefore, it has received more and more attention.

Basic Definition and Properties
The permanent of a square matrix is a number that is define in a way similar to the determinant.Let A be an n × n matrix.The permanent of A is defined as , ( ) 1 ( ) , where S n is the symmetric group over the set {1, 2, ..., n}, and  is an element of S n , namely a permutation of the numbers 1, 2, ..., n [3], while the definition of a determinant is , ( ) 1 e ( ) sgn( ) , where sgn() represents the parity sign of a group element [4].The only difference between the determinant and the permanent is the parity sign of a group element, so there are some similar properties between them [5][6], such as 1) Per(I) = 1, where I represents the n-th identity matrix (Normativeness); 2) Per(A T ) = Per(A), where A T represents the transpose of A (Transpose invariance); 3) Per(A) will be changed to k  Per(A) when any row or column of A is multiplied by a scalar k.

Computation Methods
At present, the well-known methods to calculate a permanent are the Naive algorithm, Ryser algorithm, and R-N-W algorithm.
Naive algorithm is a way based on the formula (1).It computes the permanent directly and the algorithm complexity of this algorithm is O(n•n!).
The Ryser algorithm is an efficient method [7].This method was proposed by H. Ryser in 1963, and used the principle of tolerance to calculate the permanent.It is defined as , where T k is the sum of the values of P(A k ) over all possible A k , A k is a matrix obtained from A with columns k removed, and P(A k ) is the product of the row-sums of A k .According to formula (3), it can be deduced that the algorithm complexity of the Ryser algorithm is O(n 2 2 n-1 ).
The R-N-W algorithm was developed shortly after the Ryser algorithm [8].Nijenhuis and Wilf used some techniques to improve the Ryser algorithm and reduced the complexity to O(n2 n-1 ).This algorithm can be descripted as where S runs over the subsets of 1, 2, …, n-1.And for each subset S  {1, 2, …, n-1}, we have to calculate ..., ), Suppose that the current subset S differs from its predecessor S' by a single element.Then , ( ) ( ') ( 1,..., ).
Thus, instead of requiring n(|S| + 1) operations to compute  1 , … , n in (6), we can get them in just n operations by (7).The key to (6) transitioning to (7) is to encode the subset with Gray code, and then we can perform related operations on its corresponding subsets.
In addition, with respect to the permanents of some special square matrixes --0-1 square matrixes for example, there are several fast computing methods [9][10].

Thought of the Algorithm
Store-zechin is an algorithm designed by us, which has seemingly been ignored by some pure mathematicians.The computer memories and stored data can be utilized effectively repeatedly so as to speed the computation of a permanent.The key idea of the Store-zechin algorithm is to calculate the permanent recursively and to replace the being calculated items with the previous stored results.For example, if n = 4 and where A i;j means the matrix that removes the i-th row and the j-th column.According to (8), we can find that Per(A 3,4;1,2 )，Per(A 3,4;1,3 )，Per(A 3,4;1,4 )，Per(A 3,4;2,3 )，Per(A 3,4;2,4 )，Per(A 3,4;3,4 ) are repeated.
So the second calculation of these items are replaced by their first results.

Data Structure of the Algorithm
In order to store the calculation results in a recursive process, we can build a global linked list.Check whether the item has been calculated before calculating each recursive item.If yes, return the stored result.Otherwise, calculate the permanent of this item and stored it in the linked list.We first need to create two structures, HeadNode and BodyNode.BodyNode contains three variables, Array, value and pbNext.The Array is a one-dimensional integer array which stores the columns that need to be removed.The value is an integer which means the permanent of a square matrix that removed columns and rows.In fact, the columns that need to removed can get from Array.So we can know how many columns should be removed which recorded as m.Then we can remove last m rows of the original matrix.So we only record the columns that need to be removed.The pbNext is a pointer which points to the next BodyNode node.The structure of BodyNode is shown in Figure 1.HeadNode also contains three variables, size, phNext and pbody.The size is an integer and it means how many BodyNode nodes are linked after the node.The phNext is a pointer which points to the next HeadNode node.The pbody is also a pointer and it points to the BodyNode nodes.The structure of HeadNode is shown in Figure 2. The whole linked list can be constructed by the above two structures as Figure 3.For the sake of convenience, we specify that only the BodyNode that removes one column can link to the first HeadNode and only the BodyNode that removes two columns can link to the second HeadNode and so on.Fig. 3.The structure of linked list Then we can deduce that in general, namely when A is an n-th order square matrix, we can get the following formula.,
The termination condition of the recursive is (9) and ( 10) and the rule that only calculates the sub-items that not been calculated constitute the Store-zechin algorithm for calculating a permanent.

Description of the Algorithm
Based on the key idea and the data structure, we can describe the general Store-zechin algorithm detailedly.
Calling statement: Store-zechin(pHead, A, n, del_index, exist_index, del_order); pHead: the pointer which points to the linked list; A: the matrix that needs to be calculated; n: the order of A; del_index: the array of the columns that need to be removed; exist_index: the array of the columns that still exist after the removal operation; del_order: the number of columns that need to be removed.

Algorithm steps:
S1：Find if there is such a BodyNode whose Array is same as the del_index in the linked list which is pointed by the pHead, S1.1：If it exists, return the value of the node, S1.2：If it doesn't exist, go to S2.
is the number at the i-th row and j-th column in A).
Creat a new BodyNode node, assigning del_index and sum to its array and value respectively.Then link the BodeNode to the linked list, S2.2：If n > 2，then let i  1，and go to S3.
S4：Let temp_exist_index  exist_index，and delete the i-th number of temp_exist_index.
S8：If del_order  0, creat a new BodyNode node, assigning del_index and sum to its array and value respectively, then link it to the global linked list.
In fact, we need to initialize some global variables before the algorithm starts.The initialization steps are as follows.
S1：Creat an empty lined list, and let pHead point to it.

Analysis of Time Complexity of the New Algorithm
Since the Store-zechin algorithm is obtained by recursion, the number of multiplication operations and addition operations of each sub-item can be derived by that used by the lower-order sub-items.

Multiplication Operations
According to the derivation process of the Store-zechin algorithm, it can be found that the number of multiplication operations required in each sub-item of the algorithm satisfies the following condition.
Namely, when n = i, the number of multiplication operations required for the first sub-item from right to left is i -1 (0 for i = 1, 2), and the number of multiplication operations of the j (j > 1) sub-items from right to left satisfies the following relationship.(when n = i, the number of multiplication operations to be used for the j -1 sub-item from right to left) + (when n = i -1, the number of multiplication operations is required for the j -1 sub-item from right to left) = (when n = i, the number of multiplication operations is required for the j sub-item from right to left).In fact, the number of multiply steps we need can be derived from the sequence 0, 0, 2, 3, 4, 5, ..., n and it can be shown like this.In A 1 , the number of multiplication operations of all sub-item can be obtained, as long as it is derived from the rightmost column to the left and follows the rule a i,j =a i, j-1 + a i-1, j-1 .But because in the sequence 0, 0, 2, 3, 4, 5, ..., n, the second item of this series is 0. It is inconvenient to consider, so we might consider the sequence 0, 1, 2, 3, 4, 5, ..., n and follows the process of A 1, then we can get A 2 .
Then the i-th item of the n-th row in A 2 can be expressed as 2 i-1 *n-(2 i-1 +(i-1)*2 i-2 ) and sumn(A 2 ) can be expressed as Now according to relation (12), we can derive sumn(A 1 ) as Formula (15) represents the number of multiplication operations required for each recursive item but it is not what we need for the Store-zechin algorithm.Looking back at formula (9), we can see that in a recursion term, the preceding coefficients also perform multiplication operations and the number of them is n.
In summary, we can deduce the number of multiplication operations to calculate the permanent of square matrix by Store-zechin under general conditions .

Addition Operations
Similar to the multiplication operations, the number of addition operations of each sub-item in the Store-zechin algorithm also satisfies a certain rule It also can list the number of addition operations required for all sub-items from the sequence 0, 0, The process of getting A 3 is similar to getting A 1 .The first item of sequence 0, 0, 1, 2, 3, 4, ..., n don't satisfy the general condition of n and it is not conducive to the generalization of the derivation.So we consider the sequence -1, 0, 1, 2, 3, 4, ..., n and we get A 4 after going through the same calculation as 6 5 4 3 2 1 / j -1 1 -1 0 2 0 1 1 3 .4 4 3 2 4 16 12 8 5 3 5 48 32 20 12 7 4 6 By comparing A 3 and A 4 , we can conclude that a i,i belong to A 3 is 1 larger than a i,i belong to A 4 ( i = 1, 2, ..., n ), and the other values in the two matrices are equal.Then we can completely represent the sum of n-th row in A 3 recorded as sumn(A 3 ) by firstly calculating the sum of n-th row in A 4 recorded as sumn(A 4 ).Sumn(A 3 ) and sumn(A 4 ) satisfy the following relationship sumn(A 3 )= sumn(A Then the i-th item of the n-th row in A 4 can be expressed as 2 i-1 *n-(2 i +(i-1)*2 i-2 ) and sumn(A 4 ) can be expressed as .
However, formula (21) just represents the sum of addition operations of each sub-items.All addition operations should also include the operations between each sub-items of the recursive top layer, see formula (9) for details.There are n sub-items, so it need n -1 addition operations.Now we can deduce the number of addition operations to calculate the permanent of square matrix by Store-zechin under general conditions 2 )) 1 ( 1).
After summing the formula (22), it becomes

Comparison of Complexities between New Algorithm and Existing Algorithms
As mentioned above, the current well-known algorithms for calculating the permanent are Naive algorithm, Ryser algorithm and R-N-W algorithm.Here, the addition operations, the multiplication operations and the total bit operations (assuming the maximum integer allowed is 2 64 ) will be used as the standard to compare the Store-zechin algorithm with the above algorithm.
Firstly, we count the relevant data of each algorithm when n = 3,4,……,10, and the results are shown in Table 1-3  From the comparison in Table 1-3, we can see that, when n > 5, the addition operations, the multiplication operations and the total bit operations all reflect Laplace > Ryser > R-N-W > Store-zechin.Besides, the difference between them increases as n increases.It is revealed that Store-zechin algorithm can complete the calculation of the permanent of the fifth order or more with fewer operations.
In order to prove the above statement, the addition operations, multiplication operations, and total bit operations of the four algorithms will be compared next.The results are shown in Table 4.
As can be seen from the comparison in the table, all three indicators reflect that the Naive algorithm has the largest expression, so its computational complexity is the highest, and the Ryser ranks second.Although the R-N-W algorithm has the same highest order as the Store-zechin, it has the larger small items, so the Store-zechin has the lower computational complexity.

Conclusion
Although the Store-zechin algorithm has been neglected by mathematicians, the algorithm can fully utilize the storage characteristics of the computer, and when the order of the matrix is improved, the Store-zechin algorithm can calculate the permanent more efficiently.Through theoretical analysis, we also confirm that the Store-zechin has the lower computational complexity than the Naive algorithm and the Ryser algorithm.The R-N-W has the larger small items, although it has the same highest order as the Store-zechin.As the order of the matrix increases, the Store-zechin algorithm will have better performance undoubtedly.Moreover, the Store-zechin algorithm is designed for the storage characteristics of computer of computers, so it is more compatible with computer.Therefore, in some performance tests, the Store-zechin algorithm can more fully reflect some of the features of the device and has a good application prospect.

Table 1 :
. Comparison of The Addition Operations of Four Algorithms

Table 2 :
Comparison of the Multiplication Operations of Four Algorithms

Table 3 :
Comparison of the Total Bit Operations of Four Algorithms

Table 4 :
Comparison of Computational Complexity of Four Algorithms