Population Cross Learning Algorithm Combining Greedy Search for Community Detection

In order to improve the precision of modularity optimization and community detection, this paper presented a complex network community detection algorithm based on cross learning among individuals of population combining greedy search. Individuals’ codes indicated community partition. Individuals comparatively studied with each other to spread good genes and optimize modularity fast. Besides, aiming at improving the algorithm, the best communities, where some randomly selected nodes will move in, would be found by using greedy search maximizing the local modularity increment. The algorithm was tested on artificial networks and some typical real networks, compared with some typical algorithms. The results show that algorithm can get convergence quickly, achieve better modularity value, and finely detect and identify community structures.


Evaluation Function
Modularity [5,12] is a mathematical measurement to evaluate the quality of complex network community partition, which is expressed as follows: m is total number of edges of a network. i and j are two nodes in the network. ij a is the element of the network adjacency matrix, i k and j k are degree of node i and node j . i C and j C represent communities to which i and j belong. When i and j belong to the same community 1 otherwise, it is equal to 0. Generally, the higher value of the modularity, the more accurate the community division.

Algorithm and Its
The above formula shows that the closer the individual i modularity is to the individual j modularity, the larger the probability threshold is and the smaller the probability of j learning from i is.
Conversely, the greater the modularity difference between individual i and j , the smaller j i c p  is and the greater the probability of j learns from i is.     1 is the n-dimension vector. The update of individuals is determined by the modularity difference and the random probability, which ensures the transmission of high-quality genes, avoids premature homogeneity convergence, and keeps diversity in the optimization process.

Local Greedy Search
Local search algorithm uses greedy search operator: calculates the modularity gain Q  when a node i leaves its current community to enter a neighbouring community and finds the community that makes Q  the largest one as the target community of the node moving.
There are two mainly steps: 1)Let node i leave the original community and become the independent node, generate 1 Q  see formula (5); 2) Increment 2 Q  generated by moving the independent node i into the new community, see formula (6).
Let C be the community which node i is to join (or leave), in  represents the weight sum of the C internal connection. tot  represents the sum of the weights from the node in C, including the connections within the community and the connections outside the community. i k represents the degree of node i , i ,in k represents the sum of the weight of the node i with nodes in C. The weight within the community is calculated twice per edge.

Algorithm Framework
According to the above algorithm ideas, the algorithm implementation framework is given as followed: Community detection algorithm 7. Replace the current population global maximum Q value ax , replace best with the corresponding individual solution. 8. Perform local search algorithm LocalSearch( , pa ) X 1) According to the probability threshold pa , the candidate individuals applying the local greedy search operator are selected. Each candidate individual i X is decoded for obtaining the corresponding community division.
2) Calculate Q  for each dimension   1 ij x j n   of candidate individual i X , When node j leaves its community to join other neighboring communities，  can be calculated from formulas (5) and (6).
3) Find the community that makes Q  the biggest, j joins this community.

4) Obtain a new population code
  ，if the iteration times t T  , go to step 4; Otherwise, go to step 10. 10. Exit the iteration, decode the optimal solution best , then get the community partition

Experiment and analysis
The numerical test is operated on a computer with an Intel i5 processor, 4G memory and Win7 operating system, and is programmed in MATLAB environment. In CLGSCDA, population scale is set to 30. The networks used in the experiment are real networks [14] including Karate, Dolphin, Hamster, PolBook, Football, Benchmark test network [15], synthetic networks Lfr500 with 500 nodes and Lft1000 with 1000 nodes.

Artificial Benchmark Test Network
Benchmark test network proposed by Lancichinetti can be used to examine the ability of algorithms to identify communities. There are 128 nodes and 1024 edges in the network, which is divided into four communities. Each community has 32 nodes. Nodes average degree is 16.
The hybrid parameter for generating the network significantly affects the community structure. The role of the hybrid parameter  is to controls the ratio between degree of intra-community of a node and its total degree. Each node has a fraction 1 −  of its links with the nodes in the same community and a fraction  with the rest of the network. The probability  is from 0.1 to 0.5 and community structure changed from clear to vague. When 0 5 .   , a node and its neighbors are more likely to belong to the same community than to belong to different communities, and community structure is easy to identify; When taking 0.5, averagely there are half of the connections of each node point to the nodes outside the community and the community structure is rather vague.
This experiment uses the standard mutual information (NMI) [4] to evaluate the community identification effect, see formula (8): A and B are two community partitions， C is a hybrid matrix,  It can be seen from Table 2 that the CLGSCDA algorithm fully identified the community structure of the nine benchmark networks with  from 0.05 to 0.45. For the last network with =0 5 .  community structure, the NMI reaches 0.6022. That is the best performance of all algorithms. The best of the remaining algorithms is Walktrap, which also accurately identifies the 9 networks with from 0.05 to 0.45. But the effect of identifying the last network community is slightly worse than CLGSCDA. Additionally, BGLL and Edge Betweenness performed well and 8 networks from 0.05 to 0.4 were correctly identified.

Real Network
From Table 3, compared with other algorithms, CLGSCDA has achieved superior values on several classic real network datasets. For the Karate network, the CLGSCDA got the optimal value of 0.4198 and only the LGA algorithm got 0.4198 also. In the Dophin network, the CLGSCDA achieved the optimal value of 0.5282, which is better than all the other algorithms. CLGSCDA in the PolBook network achieved 0.5272, which is as same as LGA and better than other algorithms. In the Football network, the algorithm achieved 0.6046, which is the same as the LGA and BGLL and is better than other algorithms. In the Hamster network, the CLGSCDA algorithm got the optimal value of 0.5527, which is better than all the other algorithms. As examples, the specific community partitions of Karate and Football obtained by CLGSCDA are illustrated as followed. The Karate real community structure modularity value is 0.37. The CLGSCDA algorithm got optimal modularity value of 0.4198 and has achieved best result from the perspective of optimization. Karate real community structure is divided into two communities, and by this algorithm is divided into four communities, which reflects the real network community division is deviated from the community structure based on mathematical division. From another perspective, it is reasonable: four communities are the subdivision of two real communities, as shown in Table 4. There are 12 real football communities. Five nodes (36, 42, 80, 82, 90) which is in one community represent relatively independent teams, and they play more games with other federations (community) teams than they do internally. In this sense, the topology should be divided into 11 communities. During the running of the algorithm, the most complete community partition is the 11-cluster network, as shown in Figure 2. Comparing 11-cluster community partition with 12-cluster real community partition, the nodes {80, 82, 110, 36, 42, 58, 59, 63, 97, 28, 90} are classified in error, and the error rate is 11/115=9.6%. Among them, {36, 42, 80, 82, 90} belong to a real community that is quite loose and not recognized by the algorithm.
In general, CLGSCDA optimizes the modularity and detects the community structure of the network more delicately.

Conclusions
Community mining is a hot topic in the field of complex networks. This paper proposed a community detection algorithm based on population cross learning combining local greedy search. In the implementation, the inferior population members absorb the genes of the better population members with a certain probability, and this probability is related to the modularity difference and random value. At the same time, in order to improve the quality of the population, a local greedy search optimization is performed in each iteration. The algorithm has few parameters, no prior knowledge and no special initialization method are required. Probability-based individuals cross learning algorithms and local