A Developed Multi-Level Deep Learning Model for Network Slicing Classification in 5G Network

5G is considered as a key contributor and infrastructure supplier in the communication technology industry, capable of supporting a wide range of services such as virtual reality, driverless automobiles, e-health, and a variety of intelligent applications. Network slicing is designed to support the diversity of services applications with increased performance and flexibility needs by dividing the physical network into many logical networks. Service classification allows 5G service providers to accurately select the network slices for each service. We propose a Network Slicing classifier that uses a Multi-level Deep learning Model. First, we created a dataset of 5G network slicing that contain attributes connected with various network services. Next, we performed a multi-level model that consist of a set of Machine learning and deep learning model (such as deep Neural Network, Random Forest and Decision Tree) as a first level followed by next level that which is represent Attentive Interpretable Tabular Learning model. The outcomes of the experiment showed that the proposed model was able to exceed the normal models with high performance results.


Introduction
The utilization of telecommunication networks has evolved considerably over the last two decades, effecting both the number of users connected and the volume of data transmitted.The service-oriented design of the 5G architecture enables a multi-service network to handle a variety of communication scenarios with a wide range of performance and service needs, and to supply all types of services to all types of user requirements [1][2][3].To meet the various service requirements, the Cloud Radio Access Network (C-RAN) architecture has been suggested as a possible 5G design.The primary idea behind C-RAN is to separate Base Band Units (BBUs) from antennas and combine processing power into centralized data centers, or BBU pools.To increase resource usage and reduce network costs, several base stations will share the BBU computation pools [4,5].Figure 1 show the architecture of C-RAN [6].An essential component of the 5G network is network slicing, which enables the creation of numerous isolated logical networks over the same physical infrastructure.A slice is created with the goal of enabling various use cases with various service needs.Due to the wide array of new networking services, 5G mobile networks have diverse service needs.The network slicing model has therefore evolved since the "one size fits all" networking notion is not suitable for 5G and beyond.In order for each logical network to be tailored to provide certain network capabilities and features for a particular use case, the physical network is divided into a number of logical networks (referred to as "network slices") [2,7].To be more precise, a network slice is a tenant's own private, isolated subnetwork, complete with its own set of topology, virtual resources, provisioning rules, and traffic flows.Logical networks are constructed and deployed to various services to satisfy the diverse communication requirements by monitoring the demands of users and administrators.Because network slicing enables dynamic, flexible, and scalable net-works to adapt quickly to changing business requirements, it is often recognized as a key enabler of 5G systems [8].Smart decisions must be made regarding the construction, design, operation, deployment, management, and administration of a network slice in order to effectively meet the quality of service (QoS) requirements of the service intended to be delivered through it.By analyzing the massive amounts of data in a short amount of time, it is challenging for a human to manually design and run network slices.Therefore, automation of these tasks is required.Network slicing operations can be automated using Machine Learning [7].ML can be helpful in a variety of scenarios, including QoS prediction, power savings, operation, fault management, maintenance, network setup, power control, coverage, service classification, resource allocation and throughput [9].In addition to ML; there is Deep Learning (DL), which is a field of it; it also can enable network automation to update the resources that are accessible and make changes immediately.In addition to processing and supplying information, DL will be tasked with adapting network resource use without human involvement.For every particular slice, DL will do real-time analysis to assess network performance, establish a prospective performance baseline, be proactive in predicting issues, examine various network components, and look for any abnormalities [10].The authors of [11] suggested a service classifier that employs a machine learning technique based on supervised learning to enhance classification and enable improved resource and traffic allocation over 5G/B5G-based networks.The proper service classification provides a way of offering a higher network QoS and optimizing QoE, making it critical in the area of telecommunications and, in especially, in the implementation of 5G/B5G networks.When a user makes a service request, it must be precisely classified so that the network operator can choose the optimal network slice for that service.They carry out simulations using different ML algorithms, and show that a Random Forest classifier achieves better prediction, with an accuracy of 96.6 %.Their biggest restriction was the lack of a 5G real-world operational dataset.To categorize 5G services, it was required to develop a dataset derived from standards papers and projects.Similarly, one of the challenges that we faced in this work is the lack of data sets for network slicing in 5G.Therefore, we used the information gatherd in the paper above for generating a 5G network slicing dataset.In this paper, our aim is to generate the relevant data attributes highly associated with network slicing, and to classify the optimal network slices using Multi-level Model; first level contain a three of ML and DL Models: Deep Neural Network (DNN), Random Forest (RF) and Decision Tree (DT); and second level that contain enhanced model: Attentive Interpretable Tabular Learning (TabNet).With the course; The work in [11] is the only literature we compare our findings with it.And we also implement our model on generating data and show that our model outperform some ML and DL techniques with both dataset from paper above and the generating one as will be explained later.The reminder of this paper is organized as follows: section 2 presents an explanation about the proposed model; section 3 shows the results and discussion; and section 4 concludes the paper.

The proposed model
The classification of 5G services with the optimal network slice that meet the users demands is becoming increasingly significant for effective resource utilization and network management as 5G networks evolve.However, the availability of labeled data for training accurate classification models is frequently lacked or limited; making reliable service classification difficult.The proposed model is implemented by the 5G core network; which will select the optimal network slice; that satisfy the demands of the users of the network.The proposed model consists of five main phases as shown in figure 2:  Data Generation. Data Normalization. Level 1: DNN, RF and DT. Level 2: Enhanced TabNet. Model Evaluation. Figure 2 below show the architecture of a proposed model.

Data Generation
The fundamental restriction of this study was the lack of a real dataset containing 5G operating system information.In this paper, we suggest a method for generating a 5G network slicing classification dataset based on the information that gathered by researcher in [11] they explored (manually) con-structing a synthetic dataset by examining parameters derived from ITU standards papers as well as other European studies and analysis documents performed out by telecoms corporations.The dataset contains 13 features (E2E Latency, Jitter, Bit Rate, Packet Loss Rate, Peak Data Rate DL, Peak Data Rate UL, Mobility, Service Reliability, Availability, Survival Time, Experience Data Rate Down Link, Experience Data Rate Upper Link, and Interruption Time).Table 1 shows features and their corresponding value types, all of which numeric values.The dataset also have 9 classes (Ultra-High Definition (UHD) Video Streaming, Immersive Experience, Smart Grid, e-Health, Intelligent Transportation Systems (ITS), Voice over 5G (Vo5G) services, Connected Vehicles, Industry Automation, and Video Surveillance).
The method is as follows:  Gather user input by asking for the number of services in different classes such as UHD Video Streaming, Immersive Experience, ets. Create a list to populate it with the selected services based on the entered numbers. For each class generate random values that represent features based on specific ranges and store them in the respective lists. Create a file to store the generated data in. Shuffle the data. Encode the service names in the dataset into numerical representations.We were also able to generate any number of instances for each class every time.As well as making the number of instances different for each class and unbalancing.So that the total number of instances in our dataset became 3000 instances, while in the dataset in [11] it was only 165 instances.Table 2 shows the classes and number of instances of both dataset in [11] and our generating dataset.The total no. of instances 165 3000

Data normalization
The data generated within this work is classified under tabular data type.Tabular data, which includes a collection of items with a set of characteristics, is the most prevalent data format in real-world applications.Data normalization is a data preprocessing operation that is often the initial step in intellectual analysis, especially with tabular data.The significance of its implementation is determined by the necessity to lower the artificial intelligence model's sensitivity to the values of the characteristics in the dataset in order to improve the adequacy of the examined model.The value of a feature in the original dataset is transformed into a specific range using data normalization.The requirement for such a step is determined by the potential sensitivity of the chosen model to the feature value.Data normalization is regarded as a generic transformation that ensure the converted data has specific statistical features [12,13].In this step all numeric feature values converted to a [0,1] range to enhance a model accuracy and scaling the data.

Level 1-DNN, RF and DT
In the first level of our proposed model, each of the following models was used (DNN, RF and DT).As it's known Neural Network (NN) has been discovered to be a highly new and effective paradigm for problem solving.It's an information management model that functions similarly to the organic nervous system of the human brain.NNs, like humans, may learn by example and may be created for specialized applications such as data classification in some scenarios [14].NN is formed of inputs that are multiplied by weights.These weights are then calculated by a mathematical algorithm that influences neuron activity.Some other function calculates the artificial neuron's output.The artificial neurons are structured in layers, and their responses are sent "forward," with errors propagated backwards.Neurons in the input layer provide input to the network, and neurons in the output layer provide output to the network.There might be one or more hidden intermediary layers.The training starts with random weights, and the purpose is to adjust them such that the error is as small as possible.The implementation of the back propagation model is divided into two stages.The first stage is referred to as training, while the second stage is known as testing.Back propagation training is based on the gradient decent rule, which seeks to adjust weights and minimize system error in the network [15].A DNN is made up of multiple layers of nodes.Various designs have been designed to solve challenges in various fields or use-cases [16,17].
Other models, such as RF and DT, were also employed in this work.The RF method predicts one (or a group of) outcomes by combining many 'random' binary decision trees that comprise the forest.RF is a technique for supervised learning.The primary RF working processes are as follows: picking random samples from a dataset, for each sample a DT were built; and receiving a prediction result from each DT, holding a vote for each predicted result, and as a final prediction the prediction result with the most votes were chosen [18].A DT is another example of a supervised learning approach.This model is based on the structure of a tree.The tree's root, in contrast side, is at the very top.The branches are constructed using objective criteria based on the attributes of the dataset, and the decision tree is continuously grown; or it works similarly to a flowchart.A DT node can be compared as a point of intersection that leads to two separate branches, or "leaf nodes."These nodes, which might become decisions themselves, each reflect a distinct consequence of a particular decision.All of the decisions made will result in a final categorisation [19].At this level of the proposed system:  The generated dataset is normalized and input to each of the aforementioned models so that the data is entered as a whole without separating it into training and testing.
 Where the outputs of each model, which will be predictions become a feature of a new dataset will be created, which will have 3 features, each of which represents the outputs of a specific model. The class, on the other hand, it will be derived from the previously generated dataset. This new dataset will be entered into the second level.
The goal here is that the data will be categorized by the first level models and provide a preliminary insight of the proposed model's second level (Enhanced TabNet), as explained in the following paragraph.

Level 2-Enhanced TabNet
As mentioned earlier; the data generated within this work is a tabular data type which includes a collection of items with a set of characteristic and is the most prevalent data format in real-world applications.Many issues emerge when working with tabular data, including lack of locality, mixed feature types (numeric, ordinal, categorical), data sparsity, and a lack of previous understanding of the dataset structure (unlike with text or images) [20].Although deep learning approaches succeed at classification and data production tasks on homogenous data (e.g., image, audio, and text data), tabular data remains a barrier.Tabular data sets named after the most recent "unconquered castle" for deep models.In contrast to image or language data, tabular data is diverse, resulting in dense numerical and sparse category characteristics.Furthermore, the feature correlation is weaker than the spatial or semantic association in images or audio data [21].Therefore, there are many models appeared recently to deal with tabular data, including the TabNet model that it used in this work.Google Cloud researchers (Arik and Pfister) in [22] proposed TabNet in 2019.TabNet selects model characteristics to draw inferences from at each level of the model using a sequen-tial attention-based machine learning technique.This strategy enables the model to increase its forecasting powers while also offering insights into its own decision-making processes.TabNet's design not only outperforms other neural networks and decision trees in terms of performance, but it also delivers understandable feature attributions.TabNet provides fast and easy-to-understand deep learning on tabular data.

Attentive transformer block. The attentive transformer block incorporates attention mechanisms,
allowing the model to focus on important features while disregarding noise.It learns rela-tionships between different features and enhances the model's ability to capture interactions.

Feature Masking Block.
In this block, the model learns to select informative features by applying feature masks.This enables the model to emphasize relevant features while suppressing less important ones, aiding in better feature selection.

Split Block. The split block divides the input into two branches, one for making predictions and another
for updating the feature masks.This separation allows the model to learn complex relationships in the data while also improving its interpretability [23].In summary, TabNet employs these four blocks to transform features, attend to important information, select relevant features, and handle predictions and updates separately.This architecture enhances both the predictive power and interpretability of the model for tabular data tasks.In this work, an enhancement was implemented at feature transformer block, utilizing a different activation function called Self-Gated Activation Function (swish) in place of GLU.In this work, an enhancement was implemented at feature transformer block, utilizing a different activation function called Self-Gated Activation Function (swish) in place of GLU.A switch activation function was introduced by researchers in [24] in 2017; their experiments shows that this activation function outperforms some other common functions.As well as for the proposed model, good results have been achieved, and this will be clarified later in the results and discussion paragraph.Swish mixes the identity function's linearity with the sigmoid function's nonlinearity.This combination of features enables the activation to transition seamlessly between linear and nonlinear behavior.In addition, the effectiveness of an activation function can vary across different layers to better optimization and generalization [25].The Swish activation function is defined as [24]: is the sigmoid function which is:   Another enhancement was implemented in this work by using Exponential Linear Unit activation function (ELU) instead of Like rectified linear units (ReLUs).ELU proposed by researchers in [26] their experiments shows that this activation function speeds up learning and leads to higher classification accuracy.And it outperforms some other common activation functions.Likewise, positive outcomes have been achieved for the proposed model, which will be further elaborated upon in the following section.In contrast to ReLU, which has an abrupt kink at 0, ELU is continuous everywhere, even zero.This smoothness can improve optimization stability, resulting in faster training convergence.In terms of training time, convergence, and generalization across several deep learning tasks, ELU outperforms ReLU [27].
The ELU activation function is defined as [26]:

In this equation,
x is the input to the function and  is a hyperparameter controlling the slope.Figure 8. shows the structure of TabNet model after enhancement, while the graph of ELU activation function shown in figure 9 [28].

Model evaluation
One of the most important things is the process of evaluating the classification effectiveness of the proposed model by calculating each of the accuracy, precision, Recall and F1-Score matrices.Correctness of model predictions is measured by using accuracy, while precision measures the possibility of the model being efficient in identifying positive cases.In addition, recall measures the model's success in capturing all positive instances.Finally, a harmonic mean of precision and recall are provided by F1-score.These metrics were calculated as follows [29]: Accuracy: Precision: Recall: For network slicing classification, classes might have different distribution and importance.Model performance may not be fully represented by accuracy alone.So, it is necessary to use other matrices like precision, recall and F1-score.For avoiding false positive as misclassifying a network slice; precision could be a valuable.Additionally, recall is so important to ensuring that no network slice are overlooked, missing network slice lead to service degradation.for instance, using F1-Score for achieving a balance between precision and recall is crucial, which is lead to evaluate model effectiveness.

Results and discussion
In this work, two cases were implemented.Case 1 is about implementing our proposed model on dataset in [9].The results shows that our proposed model exceeds the model in [10] on all measures (accuracy, precision, recall, and F1-score).
It has a greater accuracy, precision, and recall, implying that it makes more accurate predictions, particularly for positive classifications.In addition, it has a higher F1-score, indicating a better balance of precision and recall.These findings indicate that our proposed model outperforms the model in [11] in terms of predictive performance.The comparison results are shown in table 3 and figure 10.For case 2, the proposed model was applied to the dataset that was generated in the aforementioned method.The outputs and results demonstrate that the proposed model yielded butter outcomes compared to other models alone such as (DT, RF, DNN, and TabNet).
After implementing some models and our proposed model the results shows that the DT presents a moderate performance across all matrices, so in the context of network slice classification it is suitable for simpler slice classification.While RF, on the other hand, demonstrate a higher performance across matrices, so it is capable for a wider array of slice categories.Similarly, DNN achieved competitive results, exhibiting its ability to catch subtle patterns within network slice data and its potential for numerous categorization tasks.The TabNet model demonstrated strong performance, with balanced precision and recall and a solid F1-score, demon-strating its efficacy in dealing with complicated categorization challenges.The Proposed Model stood out, excelling in accuracy, precision, recall, and F1-score, establishing it as an attractive candidate for network slice categorization across multiple categories.The outstanding performance of the Proposed Model implies a profound comprehension of underlying data patterns, which lends itself well to correct categorization in the changing world of network slicing.Table 4 shows the results comparison between models (DT, RF, DNN, and TabNet) with the proposed model.While figures 11-14 sequentially shows the matrices comparison (accuracy, precision, recall and F1-Score) between models (DT, RF, DNN, and TabNet) with the proposed model.

Conclusion
In this paper, we created a 5G dataset containing significant data qualities that are strongly related to network slicing and developed a Multi-Level Deep Learning Model to categorize the optimal network slices in a 5G network.The suggested model's first level is a set of models consist of (DNN, RF, and DT).As the model's second level, we employ an enhanced TabNet model.The suggested model is validated using the 5G data set from [10] as well as our own 5G dataset.The testing findings are promising, with accuracy of 97 % and 98% for datasets from [10] and our dataset, respectively.In this work, we concluded that we can generate dataset for 5G network slicing with any number of slices, and the model used was able to achieve very high results compared to separate models.

Figure 3
illustrated the architecture of TabNet model.Additionally, figure4.depicts the structure of feature transformer block, while figure5.shows the structure of attentive transformer block.

Figure 4 .
Figure 4.The structure of feature transformer block.

Figure 5 .
Figure 5.The structure of attentive transformer block.

Figure 6 .
Figure 6.shows the structure of feature transformer block after enhancement, while the graph of swish activation function shown in figure 7 [24].

Figure 6 .
Figure 6.The structure of attentive transformer block.

Figure 8 .
Figure 8.The structure of TabNet model after enhancement.

Figure 10 .
Figure 10.Results comparison between Model in [10] and our proposed Model.