Overcoming challenges in deep inspect of vpn and proxy by deep learning

The rapid growth of proxying and VPN techniques has presented formidable challenges for internet operators in classifying network traffic. These methods not only hinder effective traffic policing and resource restriction but also introduce complexities by allowing VPNs to masquerade under application protocols. Furthermore, the increasing popularity of applications like Voice over IP (VoIP) and peer-to-peer (P2P) technologies further exacerbates the difficulty in controlling and classifying such traffic. Conventional techniques like Server Name Indication (SNI) analysis are witnessing diminishing effectiveness over time. Network providers now face the critical task of acquiring detailed knowledge about the specific applications and protocols utilised by their customers, enabling them to accurately allocate resources and ensure robust traffic management in this challenging landscape.


Introduction
Classifying network traffic, also referred to as application protocol identification, is a crucial process that involves matching traffic to the applications responsible for generating it.This classification serves as the foundation for various networking capabilities, ranging from network management and security to service differentiation, traffic management, trend analysis, and network research.In this paper, our focus lies on classifying the application protocol for network traffic packets.
Traffic classification methods can generally be categorised into three main approaches: [1] IP and port-based classification, [2] payload verification, and [3] statistical machine learning techniques.Each approach has its strengths and limitations.
IP and port-based classification, commonly used with techniques like Server Name Indication (SNI), offers the advantage of speed.However, it suffers from inaccuracies due to factors such as port obfuscation, network address translation (NAT), port forwarding, protocol injection, and random port assignment.
Payload inspection, which involves analysing the application layer payload, provides high accuracy.However, it relies on template-based methods that require frequent updates whenever new protocols are released.Moreover, this method often raises privacy concerns among users.
Statistical machine learning leverages traffic statistics to train models for classification.While it offers accuracy, this approach can be expensive and inefficient, as it often requires human intervention.Additionally, the execution of machine learning models can be slow.
In this work, we propose an alternative approach by employing deep learning techniques instead of static and traditional machine learning methods.We have developed an adapted Resnet architecture neural network for traffic classification, optimising hyperparameters specifically for VPN and proxy traffic identification.By selecting the optimal hyperparameters, we aim to achieve accurate and efficient traffic classification using deep learning.

Related Work
Several surveys have extensively explored network traffic classification from various perspectives.Finsterbusch et al. [4] conducted a comprehensive analysis of open-source Deep Packet Inspection (DPI) modules, including OpenDPI, nDPI, libprotoident, IPP2P, HiPPIE, and L7-filter.Their study focused on classifying protocols such as SMTP, IMAP, and HTTP.
Valenti et al. [5] specifically concentrated on supervised machine learning algorithms and underscored the importance of feature selection to construct highly effective classifiers.They developed two classifiers with distinct feature sets to enhance classification accuracy.
Salman et al. [6] undertook a detailed review of traffic classification techniques, with a specific focus on machine learning approaches, including supervised, unsupervised, and semi-supervised classifiers.They extensively discussed data collection methods and presented various strategies for machine learning classifiers.
Wang and Chen [7] delved into the application of deep learning algorithms for mobile network traffic classification.Their study provided in-depth discussions on data preparation, pre-processing techniques, input design considerations, and model architecture.
Zhao et al. [8] conducted a comprehensive review of traffic classification techniques and categorised them based on the features they deploy.The categories included port-based, payloadbased, correlation-based, behaviour-based, and statistical-based classifications.For each category, they provided an analysis of their workflow, advantages, disadvantages, and the features utilised.
Collectively, these surveys have contributed significant insights into network traffic classification.However, in the context of VPN applications, there is a notable gap in the literature regarding the utilisation of deep learning for deep inspection of traffic.This article aims to bridge this gap by exploring the application of deep learning algorithms specifically for the classification of VPN network traffic.

Method
To investigate the effectiveness of different architectures in classifying network traffic, two distinct models were employed: the Classical CNN and the ResNet architecture.These architectures were chosen due to their proven track records in handling complex data patterns and achieving high accuracy in classification tasks.By comparing the performance of these two models, valuable insights can be gained regarding their suitability for network traffic analysis.The Classical CNN is known for handling complex data patterns, while ResNet offers benefits in training deep networks.This comparison helps determine the suitability of each architecture for network traffic analysis.

Architecture Convolutional Neural Network
The proposed method utilises a Convolutional Neural Network (CNN) [9] for the task of network traffic classification.The CNN architecture consists of several key components (see Fig. 1).

Convolutional Layers.
The CNN begins with two sequential convolutional layers.The first convolutional layer takes an input with one channel, representing the traffic data.It applies a convolution operation with a specified number of output channels (c1_output_dim).The kernel size (c1_kernel_size) and stride (c1_stride) are also configurable parameters.ReLU activation is applied after the convolution operation to introduce non-linearity.Similarly, the second convolutional layer operates on the output of the first layer.It takes an input with c1_output_dim channels and produces an output with c2_output_dim channels, using a kernel size (c2_kernel_size) and stride (c2_stride) determined during model configuration.

Max-Pooling Layer.
After each convolutional layer, a max-pooling layer is applied.This layer performs downsampling by partitioning the input feature maps into non-overlapping regions of size 2. Within each region, the maximum value is selected as the representative value.This operation reduces the spatial dimensions of the feature maps while preserving the most significant information.

Fully Connected Layers.
Following the max-pooling layer, the tensor is flattened to a onedimensional form.It is then passed through a series of three fully connected layers (fc1, fc2, fc3).The sizes of these layers are pre-defined as 200, 100, and 50, respectively.Dropout regularisation is applied to each fully connected layer with a dropout probability of 0.05 to prevent overfitting.ReLU activation functions are also employed after each fully connected layer.

Output Layer.
The final fully connected layer, named 'out', serves as the output layer.It takes the output of the previous layer, which has 50 features, and maps it to the desired output dimension (output_dim) specified in the model's hyperparameters.
During the model setup, a dummy input is used to calculate the output size after the max-pooling layer.This information is necessary to determine the input size of the first fully connected layer.The dummy input has a size of (1, 1, signal_length), where signal_length is a hyperparameter representing the length of the input signal.Overall, this CNN architecture, composed of convolutional layers, max-pooling, and fully connected layers, aims to capture relevant features and dependencies in network traffic data, leading to effective classification results.The model configuration and hyperparameters can be adjusted to optimise the performance based on the specific classification task." by information about my code and how this work.

Architecture Residual Network
The ResNet (Residual Network) architecture [10] is a deep neural network used for network traffic classification.It consists of multiple residual blocks that facilitate the learning of complex representations in the traffic data (see Fig. 2).

Convolutional Layers.
ResNet starts with a series of convolutional layers that extract features from the input traffic data.These layers perform convolutions, which involve applying filters to capture local patterns and detect relevant features in the data.The number of output channels (c_output_dim) and the kernel size (c_kernel_size) are configurable parameters.

Residual Blocks.
The key component of ResNet is the residual block.Each residual block consists of two convolutional layers.The number of output channels for these layers (rb_output_dim) and the kernel size (rb_kernel_size) are configurable parameters.The skip connection in the residual block allows the network to learn residual representations, which are the differences between the input and the desired output.

Batch Normalisation.
Batch normalisation is applied after each convolutional layer within the residual blocks.It normalises the activations to stabilise and speed up the training process.The momentum parameter (bn_momentum) determines the contribution of the previous batch's statistics to the current batch's normalisation.

Activation
Function.An activation function (e.g., ReLU) is applied after each convolutional layer and within the residual blocks.It introduces non-linearity, allowing the network to learn complex and nonlinear relationships in the traffic data.

Pooling.
Pooling layers, such as max pooling, are occasionally used to downsample the feature maps, reducing their spatial dimensions while preserving the most important information.The pooling size (p_pool_size) and stride (p_stride) are configurable parameters.

Fully Connected Layers.
Following the convolutional and pooling layers, the feature maps are flattened to a one-dimensional form and passed through fully connected layers.The sizes of these layers (fc1_output_dim, fc2_output_dim, fc3_output_dim) are pre-defined parameters.

Output Layer.
The final layer of the ResNet architecture is the output layer.It takes the features from the previous layers and maps them to the specific number of output classes or labels (output_dim) corresponding to the network traffic categories to be classified.Overall, the ResNet architecture, with its convolutional layers, residual blocks, batch normalisation, and fully connected layers, aims to capture and model intricate patterns and dependencies within the network traffic data.By leveraging skip connections and residual learning, ResNet effectively tackles the challenges of training deep neural networks and achieves high classification accuracy in network traffic classification tasks.The specific parameter values can be adjusted to optimise the performance based on the requirements of the classification task.
The study focuses on comparing the suitability of individual architectures, rather than exploring the interaction between different neural network models.It assesses each architecture's effectiveness in analysing network traffic independently.Therefore, the classification of traffic is performed by a single neural network without any interaction with other models.
It is worth noting that the classification occurred with the accumulation of the result, in which the most frequently occurring class was selected for packets with the same IP and port.

Dataset
In this work, the classification of network traffic relied on a carefully curated dataset collected in the Netflow 10 (IPFIX) format.IPFIX, an Internet Protocol Flow Information Export protocol, is similar to Netflow and allows network professionals to gather and analyse flow information from network devices.
IPFIX was developed to standardise the formatting and transmission of IP information from exporters to collectors, based on Netflow Version 9.It is supported by various vendors including Cisco Systems, Solera, VMware, Citrix, and others.Although not widely popular in the networking community, IPFIX offers advantages such as data analysis during the collection process, making it useful for tasks like network monitoring, security, and advertising strategy development.
IPFIX works by monitoring IP actions within the network, where packets are collected by exporters and forwarded to a collector.These exporters use pre-defined templates to send information sets through IPFIX messages.One of the notable advantages of IPFIX over Netflow is its ability to support variable length fields, enabling the export of URLs, messages, or HTTP hosts.Additionally, the specification of Vendor IDs grants flexibility to export any required information.
The IPFIX standards requirements were initially outlined in RFC 3917, with Cisco NetFlow Version 9 serving as the foundation for IPFIX.The basic specifications for IPFIX can be found in RFC 7011 through RFC 7015 [11], and RFC 5103 [12].

Dataset Mining
The data collection process involved recording data dumps in the IPFIX format on a machine equipped with Deep Packet Inspection (DPI) capabilities.These machines were connected to others generating traffic through various VPNs.Since all traffic from these machines was covered by VPN, after a while, an IP and Port were collected on these machines in a special way to collect as many IP and Port as possible for each of the VPNs.Consequently, a substantial amount of data with unique IP and port combinations was collected to train the neural network, as VPNs would obtain new IP and port assignments upon blocking.

Dataset Structure
To ensure privacy, artificial identifiers and time-related information were removed from the collected data.Moreover, the data was enriched by appending Autonomous System Numbers (ASN) and assigning classes to the data.The following features were utilized for training in Table 3.

Dataset Processing
For both the application classification and traffic classification tasks, the dataset was initially split into a training set and a test set using an 80%:20% ratio.To address class imbalance, undersampling was employed on each class within the training set.While all available data was used for the application classification task, the traffic classification task only utilised the traffic from specific applications within each traffic category.This distinction arose from the absence of traffic category information for certain applications mentioned in the dataset description.

Train
The collected dataset was utilised to train four distinct neural network models.Among them were two CNN architecture models, each with its set of hyperparameters as outlined in Table 1 with architecture on Figure 1.Additionally, two ResNet architecture models were trained, with their respective hyperparameters detailed in Table 2 with architecture on Figure 2.
During the training phase, the models were exposed to the labelled network traffic data to learn and extract meaningful patterns and features for accurate classification.The training process involved iteratively adjusting the model's parameters to minimise the discrepancy between predicted and actual class labels.
The total ratio of protocols in the training sample in Table 4.

Test
After training, the trained models were evaluated on a separate test dataset to assess their performance.The test results are summarised in Table 5 and Table 6, which presents various performance metrics such as accuracy, precision, recall, and F1 score [13].These metrics provide insights into the models' ability to accurately classify network traffic into the predefined categories.
The equations of the four metrics are provided as follows: Recall = TP / (TP + FN) Where TP is true positive, TN is true negative, FN is false negative, and FP is false positive.

Experiments Setup
In the conducted experiments, a machine equipped with Deep Packet Inspection (DPI) technology was employed to analyse VPN traffic, which constituted a substantial portion of the collected dataset.The neural network was working on the machine with the DPI system to facilitate traffic classification.The traffic generated by machines connected to the DPI via various VPNs was effectively routed through the DPI and subsequently captured for analysis in dumps.
The neural network processed these captured data dumps, which were stored in a designated directory, and performed analysis on the network packets formatted in IPFIX.The analysed results were then recorded in a database, creating a comprehensive record of the classified IP addresses, ports, and their associated application protocol classes, as well as the source machine numbers.The experiment was carried out using VPNs, on which neural network models were trained, but with a different IP range for the purity of the experiment.
Leveraging all four trained models, the neural network classified the traffic by assessing the proximity of network packets to each traffic class to choose the near probability class.
This experimental setup made it possible to calculate the classification accuracy of models in real conditions for each protocol.By evaluating the performance of the models in accurately classifying traffic from different protocols, the experiments provided insight into the performance and reliability of neural network models in practical scenarios involving VPN traffic analysis.

Result
The neural network classified the traffic and recorded the corresponding class in the database.Subsequently, the real user protocol was compared with the class identified by the neural network to determine the number of correctly classified protocols and erroneous ones.The accuracy was then calculated based on these results.Since our focus is on assessing the trained models' ability to detect the VPN protocol, we will illustrate the recognition results specifically for VPN.These results are presented in Table 7 and Table 8.

Conclusion and future work
The experiment revealed an interesting result that was not initially evident from the Confusion Matrix: the ResNet architecture, when properly configured with appropriate hyperparameters, outperformed the CNN architecture in classifying user protocols.The CNN models struggled with protocols that had limited representation in the training set.
The superior performance of the ResNet model can be attributed to its ability to capture and model complex patterns and dependencies within network traffic data using residual connections.Additionally, the carefully chosen hyperparameters contributed to better performance compared to the initial model.
These findings underscore the effectiveness of the ResNet architecture and well-tuned hyperparameters in achieving high accuracy in network traffic classification tasks.To further improve the classification of user protocols, future development can involve incorporating new protocols, such as VPN, into the CNN and ResNet architectures.Additionally, enriching the traffic data with additional samples can enhance the distinctiveness and distinguishability of traffic packets, leading to improved accuracy in user protocol classification.

Figure 1 .
Figure 1.Convolutional Neural Network (CNN) Architecture for IPFIX Traffic Classification illustrates the Convolutional Neural Network (CNN) architecture specifically designed for IPFIX traffic classification.The architecture consists of three main components: Convolutional Layers, Fully Connected Layers, and the Output layer.

Figure 2 .
Figure 2. Architecture of the Hierarchical Residual Network (ResNet) for IPFIX Traffic Classification depicts the proposed architecture of the Hierarchical Residual Network (ResNet) designed for efficient feature extraction in IPFIX traffic classification.The ResNet com-prises two main components: "Residual Blocks" and "ResNet Blocks."Residual Blocks consist of Convolution, Batch Normalisation, and ReLU activation, allowing for direct gradient flow during training.The ResNet Blocks represent a hierarchy of feature extrac-tion with increasing filter sizes (64, 128, and 256 filters) capturing complex patterns.Skip connections between certain blocks enable enhanced feature reuse, contributing to the model's accuracy and convergence.

Table 3 .
Netflow columns for training.

Table 4 .
Total ratio of protocols.

Table 5 .
Confusion Matrix for models of Convolutional Neural Network (CNN).

Table 6 .
Confusion Matrix for models of Residual Network (ResNet).The results of Confusion Matrix indicate that the CNN architecture model equaled the ResNet models in terms of app protocol classification accuracy, and outperformed ResNet models of transport protocol classification.At the same time, hyperparameters №1 showed the best result than hyperparameters №2 for each of the architectures.

Table 7 .
Experiment Results for models of Convolutional Neural Network (CNN).

Table 8 .
Experiment Results for models of Residual Network (ResNet).