Cloud edge cooperative attack recognition based on CNN

In recent years, the concept of cloud edge collaboration has been proposed, which uses some operations of edge and cloud to complete all kinds of collaboration, and has produced various applications in various fields. CNN is one of the hot spots of scientific research, and has excellent performance in the field of image voice. In view of these advantages, combined with the cooperation of CNN and cloud edge, this paper proposes cloud edge collaborative attack recognition based on CNN. The attack recognition model is trained by using CNN and open source data set, and cloud edge collaboration is realized by data and model transmission between edge and cloud. This method realizes the security collaboration between the cloud and the edge, and reduces the resource pressure of the cloud. At the same time, the cooperation of multiple edge ends increases the security identification ability of the cloud and improves the security coordination between the cloud and the edge.

IOP Publishing doi: 10.1088/1742-6596/1693/1/012143 2 In the literature [2], another multi-level hybrid attack type recognition model is proposed, which uses support vector machine and limit learning to improve the efficiency of known and unknown attacks. Kmeans is used to construct the training data set to improve the performance of the classifier. In the literature [3], we use the modified density peak clustering algorithm and the deep belief network intrusion detection. We use the improved density peak fuzzy clustering algorithm to reduce the size of the training set, and use the deep belief self network to classify the data, which solves the problem of sample imbalance. The literature [4] uses the network-based intrusion detection model based on the hidden Markov of counter countermeasures. The concept of pattern entropy is defined in this paper. The dynamic window and threshold technology are used to improve the adaptive, anti competitive and online learning capabilities of the system.
In this paper, a cloud edge collaborative attack recognition method based on deep learning is proposed. The attack is identified by training open source attack data sets. The training is distributed in each edge computing end. The edge computing end has certain resource training ability and information collection ability. Finally, the training set with high recognition accuracy is transmitted to the cloud to update the training set.

Cloud edge collaboration model
Cloud edge collaboration refers to data interaction and collaborative work between the edge and the cloud center. Edge computing is directly introduced into the edge and cloud to help users achieve smoother and lower delay service delivery from cloud to edge terminal. Therefore, the cloud edge architecture models at this stage are generally three-tier "cloud edge end". The specific model is shown in Figure 1. The cloud is provided by the cloud server, which manages all the edge computing terminals. It is the data center and service center of the whole model, and provides technical service support for the edge computing end. The edge computing end is distributed in the server near the user terminal, providing data support for users, faster response and improving user experience. The hardware equipment of the edge computing terminal is worse than that of the cloud, and the data it can store is limited. Therefore, it is necessary to transmit data to the cloud periodically to sort out the local data. The edge end is also the user end, which controls the terminal for the user.

Convolution neural network and attack recognition
The purpose of attack identification is to identify the types of various malicious attacks. It does not provide information support for the next defense. It is a favorable strategy to protect computer networks and systems. Attack identification is a way of intrusion detection. Using the method of machine learning, this paper extracts and classifies the network traffic data in the computer network, so as to judge whether there is attack behavior in the network, which is an active defense way. It has the characteristics of initiative, real-time and dynamic, which can make up for the deficiency of other static defense tools. The common network intrusion detection and identification model is shown in Figure 2. By monitoring Convolutional neural network has become one of the hot spots in the field of science. Its excellent performance in the field of image and speech makes more researchers invest in it. In recent years, some researchers have used the advantages of deep learning to complete some attack type recognition and detection tasks. CNN generally includes five parts: input layer, convolution layer, activation function layer, pooling layer and full connection layer. The input layer is mainly to preprocess the original input data. The convolution layer uses a filter to set a series of parameters to extract the features of the input data at different levels. The output of convolution layer will be mapped by activation function layer. Pool layer is to compress the parameters of the network and reduce over fitting. The neurons in the final total connective layer are connected to all the neurons in the upper layer, and the probability of belonging to the category is calculated.

Model design
This model is based on the cloud edge collaborative attack recognition model. Attack recognition training is carried out at each edge computing end, and the new data learned is updated to the current training set. The overall framework of the attack recognition model is shown in Figure 3, which is divided into five modules: data acquisition module, data preprocessing module, feature selection module, CNN attack identification module, and data transmission update module. The data acquisition module is mainly to collect malicious attack data from the client. After cleaning, add the data set to complete the self-learning function of the model. The data preprocessing module can process the data digitization, standardization and normalization, which is convenient for later training. Feature selection module mainly selects feature data to improve the training accuracy in the later stage. CNN attack recognition module is mainly used to image the input vector data first, then the training set data is used for training, and finally the test data set is used for detection. Data transfer and update function: the initial data set of each edge computing terminal is the same, and the current edge computing end data set is updated by collecting malicious data of the managed client in the later stage, so as to achieve self-learning function. In a fixed period of time, the edge computing end with high recognition rate will transmit its own data set to the cloud. The cloud will update the data set and carry out training, and the better model will be distributed to each edge computing terminal.

Feature module design
Although CNN performs well in the field of image processing, it also has some success in the field of language processing. It is suitable for training multidimensional data and data with strong local speed of the model. The dimension of the input vector data is an important factor that directly affects the amount of intermediate parameters. Therefore, it is very important to select the best data features. Purity is usually used to select features. Generally, information gain and information gain rate are used to measure purity. Since the number of attributes of different data features in text selected dataset is quite different, information gain rate is used for feature selection. The information gain rate is defined as: Where IG (D, a) is the information gain, D is the sample set, and IV (a) is the fixed value of attribute a. The more types of attribute a, the greater the value of IV (a).
This model selects features according to the information gain rate of different feature data.

3.3design of attack recognition based on CNN
(1) Classification output process. In essence, the process of classification output is to use fully connected neural network to classify and learn the one-dimensional feature vector output from the feature extraction layer: Where l = 1, 2, 3 ,L. L is the total number of output layer neural networks, a(l +1) is the output of l + 1 layer, c( l +1)is the input weighted sum vector of l+ 1 layer, f (x) is the excitation function, w (L + 1) is the weight of L + 1 layer, B (L + 1) is the bias of l+ 1 layer.
The essence of neural network parameter updating is error back propagation, and the error function is assumed to be J (W, b). In the fully connected neural network, the vector expression of parameter updating is as follows: Where α is the learning rate and is derived from the chain derivation rule.
Where: δ () is called energy function. According to formula (5) -equation (9), all parameters of output layer can be updated after l times of iterative calculation.
(2) Feature extraction process. In the forward transfer stage of feature extraction, the input data are convoluted with different convolution kernels to obtain the feature matrix. Because in each layer of convolution operation, the input characteristic graph will do convolution operation with different convolution kernels, so the convolution process can be expressed as follows: Where: W (i) denotes the ith convolution layer, and the kth input eigenvector corresponds to the j- The sampling layer performs down sampling based on local correlation to reduce the amount of data while retaining useful information. Generally speaking, the pooling layer appears after the convolution layer, and the output of the i-th sampling layer is assumed to be x +1 (i+1) , the input is X (i) , represents the output of the ith convolution layer, and the sampling process can be expressed as follows: Where: subsampling (x) is the sampling function, which usually takes the maximum or average value of the window area. In the sampling process, the multiplicative bias is 1 and the additive offset is 0. After several convolution and sampling operations, a one-dimensional eigenvector is finally generated.
In the reverse propagation stage, the neuron parameters are updated by the inverse derivation of the error function. Compared to the feature extraction stage, the difference is that the existence of the sampling layer reduces the matrix dimension. Therefore, delta δ ( +1) needs to be sampled as the matrix dimension of the volume layer, and the upper sampling function up () is introduced.
Where: ° is the multiplication of each element. Similar to the classification output process, the parameter update formula can be obtained as follows:

Cloud edge data transmission design
Data transfer and update function: the initial data set of each edge computing terminal is the same, and the current edge computing end data set is updated by collecting malicious data of the managed client in the later stage, so as to achieve self-learning function. In a fixed period of time, the edge computing end with high recognition rate transmits its own data set to the cloud. After the cloud updates the data set and carries out training, the better model is distributed to each edge computing end.

Data preprocessing
The data set used in this paper is the kdd-cup99 data set, which is a classic data set in the field of intrusion detection by collecting 90000 network connections and system audit data. The data set consists of 39 attack samples and normal samples, including 17 unknown abnormal samples. These abnormal samples can be divided into four categories: DOS (denial of service attack); U2L (unauthorized access from remote host); U2R(unauthorized local super user privilege access); Probe (scan attack). Each data in the dataset has 41 dimensions, 38 dimensions are digital features, and 3 dimensions are symbolic features. It is necessary to preprocess each feature attribute to eliminate the differences between features and make them comparable. Therefore, it is necessary to standardize the features of continuous data and then normalize the discrete features.
(1) The digitization of symbol features requires that the digitized features include three character features and markers. Since the input value of the algorithm is required to be a digital matrix, it is necessary to convert three symbolic features: protocol_ Type feature, flag feature and service feature are converted into digital form. protocol_ The type feature includes three symbols: TCP, UDP and ICMP, which are represented by numbers 1, 2 and 3. Similarly, the service feature has 70 symbols, so it is represented by positive integers from 1 to 70. There are 11 symbols in flag feature, which can be expressed by positive integers from 1 to 11. 23 kinds of marker sites were divided into five categories: normal, DOS, prob, u2l, u2r. They are represented by 0, 1, 2, 3 and 4 respectively.
(2) Among the 41 dimensional features of the continuous data standardized data set, 22 are continuous features, and the range and method of these 22 different continuous attributes are different. At the same time, tanh () function is used as activation function, and its distribution is around the zero point. It can be used for continuous attributes. In order to standardize the range of data set value, the data value is near the zero point. Moreover, because the absolute value of the ramp rate near the zero point of the activation function value is large, the convergence rate of the convolution neural network model can be accelerated. The standardized formula is as follows: Where: X is the original data; μ is the mean value of the sample; σ is the standard deviation of the sample; X * is the normalized data, which follows the normal distribution with the mean value of 0 and the variance of 1.
(3) In order to improve the efficiency of the algorithm, all discrete data features are normalized in the feature selection process. Similarly, in order to improve the convergence speed of convolutional neural network and eliminate the huge difference between different features caused by dimensionality, it is necessary to normalize the other 19 discrete features and adopt the maximum minimum normalization method. The normalization formula is as follows: Where: X is the original data of the feature, xmin and xmax are the minimum and maximum values of each feature respectively, and X * is the normalized data. Finally, the values of each feature are in [0,1].

experimental environment and evaluation method
In this paper, we use tensorflow deep learning framework in Linux environment, and use Python language to write the algorithm. The evaluation criteria are accuracy rate (AC), false alarm rate (FP) and recall rate (recall). TN is the number of normal behavior samples correctly classified; TP is the number of abnormal behavior samples correctly classified; FP is the number of normal behavior samples that are wrongly classified; FN is the number of abnormal behavior samples that are wrongly classified.

experimental results and analysis
In this paper, 10% kdd-cup99 data set is used as the experimental data set. Firstly, the gain rate of different features in the data set is calculated to filter the features, and the 20 features with the highest gain rate are intercepted. The is_hot_login feature is discarded, and the characteristic value of this term is 0. As shown in Table 1. In this paper, the input data is selected according to the gain rate of different characteristics of the input data, and the corresponding accuracy rates of four different attack modes are obtained by training the training data set, as shown in Table 2. It can be seen that when the dimension of the input data is greater than or equal to 25 dimensions, the accuracy rates of the four attacks do not change much, among which, the DoS attack accuracy rate is 92.1% -95.8%, and the Probe attack accuracy rate is 87.9% -90.7% .The accuracy of U2R attack was 90.1% -92.7%, and that of U2L was 92.8% -93.8%. Therefore, the dimension of the input vector can be selected as 25 dimensions. After adding real DoS attack data, the training will be conducted. DOS attack accuracy decreased by 21.3%. Before the data preprocessing, it is necessary to clean up the real data, remove the unnecessary data, clean the dirty data to meet the experimental data preprocessing standards, and then use the data preprocessing data digitization, data standardization, data normalization methods for processing. Table  3 shows the accuracy rates under different dimensions after adding real DoS attacks. Compared with the untreated convolutional neural network intrusion detection scheme used in reference [10], because the scheme only studies the two classification problem, the four attacks are replaced by abnormal behavior, and the multi classification is transformed into the binary classification problem. Table 4 shows the average training time of the model under different input data dimensions. Compared with the method in reference [10], the method adopted in this paper can reduce the training time by 198.04s and reduce 75%.

5.Conclusion
This paper proposes a cloud edge collaborative attack recognition model based on CNN, and conducts attack recognition training at each edge computing end, and updates the learned new data to the current training set. The model is divided into five modules: data acquisition module, data preprocessing module, feature selection module, CNN attack recognition module, data transmission update module. In order to increase the speed of feature extraction, the gain rate is used to select features. Experiments show that the average training time of this method is improved. In this paper, the self-learning ability of the whole model is added. The data obtained from the edge end is added to the training set after data cleaning, which makes the model more suitable for real scenes. However, due to the complexity of the real data, the accuracy of the model decreases after adding the real data, which needs to be improved in the later stage.