A Robust Controller Deployment Algorithm under Multiple Constraints for SDON

The deployment of controller for SDON (Software Defined Optical Networks) is an NP-hard multi-objective optimization problem, and it is difficult to find an optimal solution that satisfies all objectives. Therefore, to improve the performance of control plane, the model we proposed is constructed based on three constraints on the survivability, delay, and control redundancy, which has three advantages. first, the survivability is improved by not sharing the risk between the main control channel and the protection control channel, which also improves the utilization of network resources. Second, the accuracy of delay calculation is improved by using a delay optimization model, and a threshold value is set to constrain the delay. Finally, the greedy strategy idea is used to achieve the full network coverage of the control range and improve the convergence speed. Simulation results show that the algorithm improves the fault recovery probability by an average of 15% compared with the algorithm of KSP computed routing. In addition, the algorithm in this paper achieves the same survivability as the C-MPC algorithm while using an average 11.1% reduction in the number of controllers.


Introduction
Since SDN technology adopts the idea of centralized control, the control plane plays a crucial role for the whole network, and local link damage or single control node failure may lead to network-wide collapse, so the deployment of the control plane needs to be studied in order to improve the survivability of control plane.Current researchers have studied the controller deployment from delay and survivability constraints, respectively.Research on delay.The literature [1] creatively proposed two basic problems of controller deployment: the number of controllers and how to deploy them, took the average delay and the maximum delay as the test index, calculated the deployment scheme based on Dijkstra's algorithm and Greedy algorithm, and finally got the conclusion that one controller can meet the demand of the network, but the literature [1] only considered the transmission delay and the conclusion was not convincing.Lin yuan Yao et al. proposed a delay optimization model [2] to study the network control channel average delay and channel maximum delay, combined with the analysis of the network response feedback delay, and used the Dalgorithm and greedy algorithm to calculate the optimal deployment scheme, but the model is only applicable to static networks and cannot meet the user specified requirements.For this reason, Zeng Shuai et al. proposed a controller deployment strategy under specified delay constraints [3] to satisfy the survivability requirement as much as possible under the constraint of the maximum control delay tolerated by the user, and the strategy will increase the controller as the user delay requirement increases.Research on survivability.In 2016, Yu Xiong et al. proposed a minimum point coverage based controller deployment scheme [4] , which uses the principle of minimum point coverage in graph theory to ensure that each device in the transmission plane is adjacent to two controllers, and uses a very small dominating set to select a local controller among the selected controllers, and then selects a global controller in the center of the local controller, which greatly improves the network survivability and avoids the multi-controller conflict problem, but there are also problems such as too many controllers deployed and too much redundancy in the control plane to meet the user's customization requirements.To meet the user requirements, the literature [5] designed a deployment scheme with specified survivability constraints, defined the 100-km failure probability, and used it to calculate the average failure probability and analyze the survivability.However, in real networks, the hundred-kilometer length failure probability cannot be simply set to a certain fixed value, and for this reason, Lourenco R B R et al. proposed a disaster model-based failure probability calculation method using a linear programming model [6] and calculated the controller deployment scheme based on the failure probability of the actual network.The standard [7] also defines the concept of SLRG to improve survivability.The controller deployment scheme we propose is based on depth-first search DFS and greedy algorithm, which can meet the control plane performance requirements of most networks in terms of delay and survivability.At the same time, the control redundancy problem is also considered, that is, the number of controllers can be reduced as much as possible under the premise of meeting the requirements of survivability and delay.

Design thinking and deployment architecture A. Design thinking
This algorithm proposes a controller deployment scheme based on depth-first search DFS and greedy algorithm to improve the control plane performance.The algorithm can help in meeting the control plane performance requirements such as delay and survivability of most networks.The algorithm has the following advantages.1) The algorithm makes a more detailed analysis of the delay composition, taking into account the link delay and forwarding delay.Among them, the link delay refers to the delay of information in the transmission link, and the forwarding delay refers to the time difference from entering the device to leaving the device when passing through the optical switch device, as shown in Fig. 1 and Fig. 2. Most other algorithms ignored the forwarding delay, but simply regarded the link delay as the network delay.In practice, the control information is often transmitted by the traditional DCN channel, and the forwarding delay of the control information is measurable.We set up an experimental network and measured that the optical fiber transmission delay is about 5μs/km, and the forwarding delay of control information through a Huawei OSN8800 device is about 20μs.Obviously, the forwarding delay cannot be ignored.Link delay and forwarding delay are shown in Fig. 1 and Fig. 2.  2) This algorithm also refers to the concept of shared risk link group SRLG, and proposes to realize the risk non-sharing of control channel and protection channel, namely, the routing separation of main control channel and protection control channel, so as to avoid the collective risk caused by the same fault, so as to ensure higher survivability and network resource utilization.Risk sharing and risk nonsharing are shown in Fig. 3 and Fig. 4.  3) The algorithm requires the selection of a minimum number of first-level controller node sets while the control plane can meet the delay and survivability requirements, which can reduce the cost and control redundancy and improve controller efficiency while meeting the requirements.Therefore, the strategy of the algorithm is to meet the requirements of the survivability, delay and control redundancy of the control plane by realizing the risk-free sharing between the control channel and the protection channel, and minimizing the number of control nodes within the delay range.Parameter Settings are explained in Table 1

B. Hierarchical deployment architecture
For larger networks, a reasonable deployment of a single controller may be able to meet the higher network tolerance delay and SLRG (Shared Risk Link Group) requirements, but controller clustering techniques are needed to scale the single controller to meet the computational requirements.However, due to too homogeneous node deployment, single node damage may cause the whole network interruption.In addition, a single controller node is difficult to meet the low network tolerance delay.Therefore, hierarchical controller deployment architecture is still an efficient and secure deployment strategy.The deployment steps of the hierarchical controller architecture adopted in this model are implemented as follows.
Step 1. Deployment of C1 controllers First, we have to achieve full coverage of all optical switching nodes in the network using a minimum number of single-domain controllers C1, where each single-domain controller can determine a partition, and the optical switching nodes in the partition are governed by that single-domain controller C1, and the optical switching nodes where the controller is placed are controlled by the local controller, and an absolutely risk-free shared control channel Pathcont and protection channel Pathprot is established between the controller and the optical switching nodes in the control range.C1 deployment is indicated in Fig. 5.

Figure 5. C1 deployment
Step 2. Deployment of C2 controllers C1 controller is a coordinated master controller, through which the whole network information can be obtained to achieve global cross-domain resource scheduling and deployment.To achieve better survivability, the C1 controller is deployed in 1 + 1 remote cold backup mode in which the database is backed up offline, and the standby controller works in a non-running state.This hierarchical controller system improves control efficiency and security, and avoids problems such as overload and low survivability of a single controller.C2 deployment is indicated in Fig. 6.

C1 node selection
The following are the selection steps.
Step 1.In this step, the DFS depth-first search algorithm is used to search all paths from any network node to any other node in the network and record its path length and delay amount.The DFS algorithm is similar to the tree traversal, which can be implemented in the following steps.1) Select the initial node i and mark the node as selected.
2) Search for the first neighboring node n of node i.
3) If n exists, continue down the tree, if not, go back to step 1 and start again from the next node of i. 4) If n is not selected, then perform a depth-first search traversal recursion on n. (i.e., consider n as a new i and repeat the above steps).5) Search for the next neighboring node n of node i. Go to step 3.After finding the paths from any node to all other nodes, count the number of paths from any two nodes into the PathNum matrix.
Step 2. This step is used to filter the paths selected in step 1 (1) judge the delayed reachability of all paths between any two nodes, and screen out the paths that meet the requirements under the given delay threshold, and discard the other unreachable paths directly and update the PathNum matrix instantly.
2) All delayed reachable paths between any two nodes are selected as follows. 1 st .Determine whether the number of paths is greater than 1.If it equals 1, the path is directly used as the control channel. 2 nd .If the number of paths is greater than 1, determine whether there are two completely disjoint paths, the existence of several groups, if not, select the path with the smallest delay as the control channel.3 rd .If there is a combination of disjoint paths, determine whether the number of combinations is 1.If it is equal to 1, the path with small delay in the combination is the control channel, and the path with large delay is the protection channel.If the number of combinations is greater than 1, compare the average delay of different combinations and take the combination with the smallest average delay to set the control channel and protection channel according to the same principle.4 th .judge all nodes in the network, as long as there is a set of control channels and protection channels from node i to any other node, the node has the possibility of being selected as C1 node, and the node is put into the C1 set.
Step 3. The greedy algorithm is used to find the least number of nodes that can satisfy the need to use the control plane to cover the network nodes in full, and the number of nodes is recorded.The greedy algorithm, as Fig. 7 shown.also called greedy algorithm, is a simple and convenient solution strategy, by which the greedy strategy is the optimal solution at each step, but not necessarily optimal globally.The greedy algorithm solves the point coverage, which is to solve the maximum coverage capability nodes of the remaining target range of the network step by step and combine them to form the set of target nodes.

Figure 7. Greedy algorithm for network coverage
The steps of this algorithm to solve the coverage problem are as follows 1) First determine the target coverage range which is the set of nodes N of the network topology, then traverse each C1 node, and do the intersection of the set of nodes N and the set of nodes that the controller C1 node can control, so as to get the controller C1 that can cover the largest region, and lastly save this C1 node.
2) Delete the covered area from the set to get a new set of uncovered area, and then iterate through all controller nodes C1 to find the controller C1 that can cover the largest area.
3) Keep repeating the previous steps until all the network switching nodes are covered.The saved C1 nodes are combined to form the final set of C1 nodes and the number of nodes in the set of C1 nodes is recorded as PathNum.

C2 node selection
The C2 controller, as the upper layer controller that coordinates the C1 controller, has to select its deployment location after the C1 node is selected, and in order to improve the timeliness of C2 control over C1, the node with the shortest delay to all C1 nodes needs to be selected as the C2 controller deployment node in the whole network range according to the following formula At the same time, the secondary controller C2 should be selected to be deployed at the node where there is no primary controller C1 deployed to avoid risk sharing.In addition to satisfy the 1+1 cold backup strategy, the secondary location should be selected to deploy additional C2 controllers for backup use.Then, the C1 controller needs to select its assigned C2 controller by the size of the control delay, and the other controller as a backup.

System Performance Testing
To verify the control plane performance, the survivability of the control plane is tested by using the failure occurrence rate and the failure recovery rate of the control plane under the 100-mile failure probability.The failure probability and failure recovery probability are defined according to the following equations.where ℎ , 1 is the control channel path length to all 1 , ℎ , 1 denotes the protection channel path length, copath is the control route and protection route overlap part, K denotes the path sharing degree, P1 is the average failure incidence, and P2 is the average failure recovery rate.P1 and P2 are calculated and plotted accordingly as Fig. 8 shows.Analysis of the images shows that the channel risk unshared model has a slightly higher failure rate than the channel risk shared model, but in exchange for a fairly high failure recovery rate, which indicates the high survivability of this model.In addition, the algorithm in this paper uses an average reduction of 11.1% in the number of controllers while achieving the same survivability as the C-MPC algorithm.

Conclusions
The SDON controller deployment model proposed in this paper uses network algorithms such as DFS and greedy policies, draws on the concept of SRLG, optimizes the delay model, and is able to accomplish the requirements of user-specified delay, high survivability requirements in the control plane, and reduced control redundancy, with some improvement in overall effectiveness compared to SVVR, C-MPC, and Delay Constrained Deployment Methods.It can provide better guidance for network planning and optimization.

Table 1 .
: LIST OF SYMBOLS