Self-testing Algorithm of Metro One-button Switch Station Equipment Based on a Neural Network

According to the operation and management requirements of subway stations, equipment managed by the station, such as lighting, escalators, broadcasting, etc., must be operated with one-button switch according to a certain process every day. The first step in one-button switch station operation is to perform routine self-testing on these devices. Traditional self-testing operations mainly rely on manual handling by operators, which greatly affects their work efficiency. To optimize the operational processes, it is necessary to establish intelligent self-testing algorithms. An AC-CNN (Attention Causal Convolutional Neural Network) model to increase the self-testing efficiency in subway switching stations was established in this paper. AC-CNN model is a deep learning prediction network based on attention mechanism, which can capture the causal convolutional layer of the time before and after relationship as the basic network structure, optimize the convergence process through residual connections, and fuse global and local attention representations through matrix multiplication in the attention mechanism module to reduce useful information loss. The algorithm used the one-hot encoding and sliding window feature engineering operations to preprocess the historical self-testing operation data and completed the prediction of the self-testing information of metro one-button switch station equipment through the deep analysis of AC-CNN. The on-site application results indicated that AC-CNN had greater advantages in terms of accuracy and operation time compared to those of existing BP neural networks, GBDT, and LSTM algorithms, which can significantly improve the work efficiency of subway staff on one-button switch station.


Introduction
Subways serve as the main component of the transportation system of large and medium-sized cities, with technical characteristics such as punctuality, speed, environmental protection, and safety.In addition, subways hold notable importance in terms of relieving urban traffic pressure, stimulating urban vitality, and constructing new development patterns.As a key aspect of subway transportation, the application level of modern technology in subway stations will affect the service quality directly.To build a high-level subway operation service system, major domestic rail transit companies have launched the construction and operation of smart stations [1][2][3][4].Changing the self-testing mode of station equipment is a key step in carrying out the construction of smart stations.However, currently, the station equipment still adopts the traditional regular self-testing mode, also known as the one-button switch station remote switch mode, whose essence is to use traditional integrated monitoring systems to achieve the attention mechanism module to reduce useful information loss and improve prediction accuracy.For the different modules mentioned above, this article conducted ablation experiments to demonstrate their effectiveness.
3) Based on the self-testing information data of the actual subway on-site one-button switch station, using One-Hot encoding and sliding window feature engineering to preprocess the complex historical self-testing operation record data, train the AC-CNN network, and predict the daily equipment selftesting information of the subway one-button switch station.
4) According to actual cases, it has been confirmed that AC-CNN outperforms BP neural network, GBDT, and LSTM in terms of prediction accuracy and final staff operation time.On-site experiments showed this algorithm has the advantages of clear principles and strong operability, and can effectively improve the work efficiency of subway one-button switching stations.

Basis of the Algorithm
When analyzing fault diagnosis and target detection cases, the pre-processing algorithm is usually selected according to the characteristics of the original data to ensure the timeliness and reliability of the algorithm meet the engineering needs.The alarm warning and confirmation information from the self-testing of one-button switch station equipment has the characteristics of continuity and complexity.Therefore, one-hot encoding and sliding window feature engineering methods were selected when constructing the self-testing algorithm, as shown in Figure 1.
The working principle of the self-testing algorithm can be summarized as follows.The preprocessing method converted the original historical self-testing operation records of all equipment in the station into a dataset for supervised learning.The causal convolutional network was improved by adding residual connections and a self-attention mechanism module.After the model was trained, the system successfully generated single-instance alarm notifications for all station equipment and proposed operational modes for the daily inspection plan.

Neural Network.
A neural network (NN) is one of the main supervised learning methods in machine learning.Supervised learning uses labeled training data to learn a model firstly and then uses the model to make predictions about new samples, as shown in Figure 2. Therefore, the fundamental purpose of supervised learning is to construct a mapping from input to output, which can be described by model F(x).This is the main reason why neural networks are widely used in problems such as regression and classification.Figure 3 shows the working principal diagram of a typical neural network algorithm.In a neural network algorithm structure, determining the appropriate parameter values is particularly critical, for the reason that the weight of each layer directly will affect the processing effect of its input data.Thus, the network learning process is essentially the process of seeking the optimal weight value of each layer.In specific applications, the loss function is usually used to measure the deviation between the actual output value and the desired output value of the neural network, and the deviation value is used as a feedback signal by the optimizer to gradually correct the corresponding weight value to reflect the fundamental role of the BP algorithm [11].
Without loss of generality, if a neural network is considered as a function, the functional equation is as follows: , ( ) , Where F is the neural network model.nm xy  RR ， .Suppose that the neural network to be studied has an L-layer structure, the layers are connected by neurons, and each layer has weight w, bias b and activation function a.Then, the corresponding functional analytic formula of the neural network can be defined as  4 shows the common model framework of a convolutional neural network, which mainly includes four functional module layers: a convolutional layer, a pooling layer, a fully connected layer and a SoftMax layer.The specific algorithm steps are as follows: Step 1: The multilayer convolutional layer and pooling layer are used to extract features from the input information.
Step 2: The fully connected layer is used to realize local association and data compression.
Step 3: The SoftMax layer or support vector machine and other models are used to complete classification processing.

AC-CNN
2.2.1.Network Structure.Given that convolutional neural networks are often used in processing [12] two-dimensional image data and that the causal convolutional layer in WaveNet [13] is good at processing and analyzing time series data, a causal convolutional neural network AC-CNN algorithm based on attention mechanism was proposed on the basis of optimizing the causal convolutional network to applied to the state prediction of the self-testing operation of a one-button switch station in a subway.In the network structure shown in Figure 5, the final output result was obtained after the input data were processed by 3 residual blocks, an attention block and the fully connected layer.The basic principles of each block are analyzed below.

1) Residual block
The residual block consists of the same two submodules, and each submodule is divided into a causal convolution layer (causal conv), a batch normalization (BN) layer, a rectified linear unit (ReLU) activation function and a dropout layer, as shown in Fig. 5(a).Since the output of the convolutional layer has a non-sparse Gaussian-like distribution, its normalized distribution results are more stable.In addition, the BN layer can improve the gradient of the network, which reduces the network sensitivity to the initialization of the weights.In addition, the need for a dropout layer is reduced, which improving the learning rate and training efficiency.The specific network forward propagation normalization expression is: Where μB and   2 are the mean and variance of the batch data, respectively, ε is the minimal term greater than 0, γ is the activation function, and β is the deviation value.
The ReLU activation function can alleviate the problems of gradient explosion and model overfitting during the training process.The analytic expression of this function is: In addition, the residual block introduces the skip connections in a residual network (ResNet) [14].The aim is to mitigate their effect on the network gradients and degradation by implementing a constant mapping of inputs to outputs for redundant layers of the deep network model.In this paper, we chose to include a 1*1 generic convolutional layer in the skip connections.
2) Causal conv block Figure 5(b) shows the structure of the WaveNet causal convolution layer.Causal convolution is a strict time-constrained model, leading to the unidirectional characteristics of the processing results, i.e., it is impossible to "see" future data, which is fundamentally different from traditional convolutional neural networks.To achieve longer dependencies between temporal data, the expansion coefficient d (the number of intervals) and the size of the convolution kernel k are introduced to construct the causal convolution layer model shown in Figure 6.The process of implementing causal convolution is analogous to applying zero-padding to a onedimensional convolution kernel.In the hierarchical structure, for the lowest layer, d=1, which means that each input point is sampled; for the middle layer, d=2, which means that every 2 input points are sampled, and the size of the convolution kernel in each layer is 3.
The higher the layer is, the larger the value of the expansion coefficient d.In addition, the expansion convolution causes the size of the effective window to expand exponentially.Thus, convolutional networks are usually chosen to achieve relatively excellent results with a relatively small number of layers.
In this scheme, a model with 2 causal convolutional layers was chosen.The size of the convolutional kernel of each layer was 3, and the expansion coefficients were 2 and 4.
3) Attention block Figure 5(c) shows the attention block of the network model, whose algorithm is the self-attention mechanism [15].The specific process of the attention block is as follows.The output of the residual block is generated through a 3-step operation.This involves two 1 * 1 ordinary convolution operations and one operation utilizing both maximum pooling (MaxPooling) and average pooling (AvgPooling).The results of the convolution and pooling operations are then spliced.The final step is 1 * 1 convolution and activation using the sigmoid function sigmoid.The specific computational model is as follows: Where RB is the result matrix that passes through the last residual block.MaxPooling and AvgPooling are the maximum and average pooling layer functions, respectively.f1*1 is the convolution kernel of 1×1 size, and σ is the sigmoid activation function.
The maximum and average pooling layers reinforce the network, allowing it to focus more attention on local features.The function of the sigmoid activation function is twofold.First, it captures the nonlinear elements of the data through a nonlinear representation to make up for the lack of a linear model in terms of expressiveness.Second, it doubles as a feature selection approach.The analytical formula of the sigmoid activation function is: By multiplying the results of one of the 1*1 convolution operations and the results of the attention calculation, the transition matrix D is obtained: Where f1 (x) is the operation result of a 1*1 convolution.After the transition matrix D is processed by softmax and then multiplied by the 1*1 convolution result of the remaining path, the output result of the attention block is obtained, which is also the final output result.Obviously, softmax prevents network gradient explosion to some extent.
The introduction of a self-attention mechanism to the convolutional neural network structure effectively excludes the influence of outlier information on the prediction results.This is achieved by enhancing the feature construction ability of the neural network on much local information of the time series, which in turn improves the accuracy of the prediction.

Loss Function.
In a neural network model, the loss function (Loss function) Le is an important measure of whether the model output results converge.In this paper, the loss function was chosen to calculate the cross-entropy model, and the specific definition of the formula is: Where yi is the true value, yi is the predicted value, and N is the sample size.

Backpropagation Updates
Weights.The error result of the loss function is fed back to the network, and after continuous repeated training, the loss function can be minimized and the model converges until the training is completed.The specific iterative calculation method is as follows: Where w is the weight and α is the learning rate.The initial learning rate was set to 0.001 in this method.

Dataset Creation and Preprocessing Experiments
More than 60 historical equipment self-testing record cases of integrated operation platforms from stations at Tianjin Metro were collected.Data from ten stations were selected as examples to establish datasets, and preprocessing was conducted as necessary.
In the process of one-button switch station equipment self-testing, the specific operation categories can be divided into three types: "No alarm information is displayed", "Show this alarm information, select to confirm" and "Show this alarm information, select to ignore".
Considering that these three kinds of data are text data and there is no permission between the data, the one-hot binary coding mechanism was used in this paper to numerically convert the information [16].The conversion results are included in Table 1.

Order Operation categories
One-Hot 0 No alarm information is displayed [0,0] 1 Show this alarm information, select to confirm [1,0] 2 Show this alarm information, select to ignore [1,1] In the coding mechanism, the first element of the code represents whether to display the alarm message, and the second element represents whether to ignore the alarm message.  2 shows the actual Self-testing devices and their alarm message categories for the metro station.For ease of presentation, Tables 3 and 4 show the relevant operation types in the form of serial number columns of Table 1, with the actual training in the form of one-hot encoding.Table 3 is a part of the historical data of all the equipment self-test operation records, i.e., the historical daily alarm information of each system or device.Rows represent the number of consecutive days in chronological order of the self-test operation records, including each type of equipment self-test after the operation of the relevant alarm information selection, which is different from the number of alarm categories of each device.Table 3. Historical self-testing operation records of some subway stations in Tianjin.

Escalators Heliport
Wastewater pump If the historical device self-testing operation records are set to a nested matrix Y, where n represents the number of consecutive days and m is the number of devices inspected, then: In the formula, Y1={y11, y12, y13, …, y1m}, Y2={y21, y22, y23,…, y2m} and so on.
If the prediction matrix Yprec has the same structure and the values are 0=[0,0], 1=[0,1], and 2=[1,1], then the prediction matrix Yprec based on the serial number in Table 2 can be characterized as: To standardize the data for supervised learning, the features need to be created using feature engineering [4] .The canonical form of the dataset can be defined as: A sliding window was adopted in this paper.Assuming that the statistical data recorded one week before T day are chosen as the trainable dataset on T day, the window length of the sliding window is 7. Xi={Xi1, Xi2, Xi3..., X ij} is defined as the eigenvalue of day i.Based on the statistics X (4) of each device in one week, it is more appropriate to select the statistical results of the number of ignored alarms X (1) in the previous week, the number of confirmed alarms X (2) in the previous week, and the number of unwarned alarms X (3) in the previous week as the feature value.The reason is that the statistical results based on one week are frequent and consist of diversified operation types.Table 4 presents the construction results of the feature data.Since the feature manufacturing process of the sliding window leads to zero data for the first 7 days in the constructed feature X, here, the constructed feature for the subsequent week is assigned to it.Table 4 only shows the data before and after the assignment of one week's statistic X (4) .This is the same for the remaining features.
Table 5 shows the datasets of the 10 stations.The datasets consist of a training set, validation set and test set, with a ratio of approximately 8:1:1.The data in the test set originate from the latter 30 days of records of historical data Y.Considering that one-button switch stations only need to predict the day's records based on historical records, a single-step prediction method was chosen to achieve the prediction of the test set to avoid cumulative error in the form of recursion in multistep prediction.Figure 7 shows the algorithm flow of the dataset composition.Figure 7 shows the training process after inputting the available dataset into the network and preprocessing the feature engineering data.The iterative training of the network weight parameters is continued until the model converges, and then the model is saved.The testing process involves inputting the test data into the saved model and outputting the test results.

Evaluation Standards
In the practical application of one-button switching station equipment self-testing in metro stations, it is only necessary to predict data for one day.This information should be displayed in an appropriate interface in the form of a small window reminder for the relevant operators.
For the convenience of elaboration, it is assumed that variable T represents the actual time required for the click operation, and the model is evaluated using the accuracy and mean square error value.Then, the accuracy and mean square error value EMSE (the average accuracy of 30 items in the test set) can be defined as:

Main Comparative Experimental Results
In the system solution, a convolutional neural network model is constructed using the TensorFlow framework, and a specific prediction for the self-testing operation of the day is provided.Figure 8 shows the descending curve of the loss values for the training process.Figure 8 shows that as the number of iterations increases, the training loss and validation loss continuously decrease.When the number of training iterations is close to 10, the two loss functions are stable and close to 0. This indicates that the engineering needs can be reached after 10 training iterations.Table 7 shows the AC-CNN method has a higher test accuracy and the lowest error value compared with those of the other machine learning methods.The accuracy of the new method is 3.6% higher than that of the LSTM method, which is commonly used to handle time-series data and is 2.3% more accurate than that of the GBDT lifting class method.Therefore, the new self-test algorithm with AC-CNN as the main body has better effectiveness.

Longitudinal Comparison between Different Devices.
In this section, the proposed method was utilized to conduct longitudinal comparisons among the 10 stations, comparing and analyzing the average accuracy of each device within them to validate their usage rates.The average accuracy of each device is given in Table 8.As seen from Table 8, the self-testing accuracy rate of PIS and other systems is high, while the selftesting accuracy rate of the escalator and rolling shutter equipment is low.
In addition, for equipment with a higher number of alarm types, it is difficult to achieve 100% satisfaction in terms of the daily operational simplicity provided by the system because of the extensive number of historical operation types recorded by relevant personnel.However, the engineering requirements are met.For equipment with a relatively low number of alarm types, the processing results of the historical alarm records and operation records are stable and reliable.Therefore, the new AC-CNN self-test method of a metro one-button switch station can provide a more objective evaluation suggestion for a station under its jurisdiction.

Horizontal
Comparison before and after Application.Figure 9 shows a histogram of the comparative results before and after the application of the new method at the 10 stations.Figure 9 shows that after the application of the novel Self-testing algorithm, the average time saved at each station is more than 90 s.The most time saved of 103 s is achieved at both the Second Hospital of the University of Medical Sciences and the Cultural Centre stations.As a result, the operation time of the staff can be reduced from the established average of 5 min to an average of 3.5 min.
Table 9 presents the comparison results of the workload before and after the application of the novel method.Table 9 shows that after the application of the new method at a metro one-button switch station, the average time consumption of the relevant staff is reduced by approximately 96.4 s, which is more than 1.5 min.The average number of clicks by the relevant staff is reduced from 23 times per day to 2~3 times per day.Therefore, after the application of the new algorithm for the self-test operation of the equipment in a metro one-button switching station, the work efficiency is effectively improved, and the workload is substantially reduced.

Experiments on the Selection of Relevant Parameters
To verify the impact of the relevant parameters on the network performance, features with sliding window sizes of 1, 7, 14, and 30 were selected to train the model, and the results of the average accuracy rate of all station data tests are included in Table 10.  10 shows that the accuracy rate is characterized by a convex pattern based on the length of the sliding window.The accuracy rate peaks with a sliding window length of approximately 7 days.This phenomenon arises from the observation that, under equation conditions, a work pattern with a period of seven days is the best in terms of the quality of staff work.

Ablation Experiment
Table 11 presents the results of a comparison of ablation experiments.Among them, the base model is the basic model without residual connections or attention mechanisms, the +residual block adds only residual connections to the base model, the +attention block adds only attention blocks to the base model, and the new method adds both residual connections and attention blocks to the base model.
Table 11 shows the following: 1) The base model is directly connected by a causal convolutional network, and the test accuracy is above 90%, which shows that the causal convolutional layer is effective for time series data prediction.
2) By adding only residual connections and attention blocks to the base model, the test accuracy values are improved by 0.6% and 1.4%, respectively.These results show that adding residual connections and attention blocks both improve the model's effect to some extent, and the addition of attention blocks is more effective.In addition, the model that combines these two block further improves the test accuracy, which is 3.8% higher than that of the base model.

Conclusions
To optimize the process of equipment self-testing, save operation time and improve the science and efficiency of one-button switch station operation, this paper proposes a convolutional neural network AC-CNN based on an attention mechanism.By conducting ablation experiments on the network structure, the effectiveness of the residual connection module and attention mechanism module in improving network prediction performance was verified.Experiments were conducted on the selftesting data of one-button switch stations of Tianjin Metro Line 6. Experiments have shown that AC-CNN has better prediction accuracy compared to existing classic prediction methods, which can improve the self-testing efficiency of subway one-button switch stations, so as to effectively improve the work

Figure 1 .
Figure 1.Working principle of the Self-testing algorithm.

Figure 3 .
Figure 3. Working principle of the neural network method.

Figure 4 .
Figure 4. Model framework of the convolutional neural network.

Figure 8 .
Figure 8. Loss function value of the training process.

Figure 9 .
Figure 9.Time consumption comparison for each station using the conventional method (manual equipment self-inspection mode) and our method.

Table 2 .
Self-inspection alarm information category for the station.

Table 6 .
Experimental results.The calculation results show that the accuracy rates for all stations exceed 93%.The stations that have applied the system have higher test accuracy rates, and the operation times of the corresponding staff are lower.There are some stations with low accuracy rates.This is mainly because the staff at these stations have not yet fully carried out the new self-testing function.Horizontal Comparison with Other Learning Methods.To further verify the high efficiency of the new self-testing method AC-CNN, the new method is evaluated in a side-by-side comparison with the BP neural network, gradient boosting decision tree (GBDT), and long short-term memory (LSTM) network for 10 stations as a benchmark.The comparison results are included in Table7.

Table 7 .
Comparison with other machine learning methods.

Table 8 .
Comparison of the results of different equipment/systems.

Table 9 .
Comparison with manual mode.

Table 10 .
Comparison of different sliding window size characteristics.

Table 11 .
Comparison of model ablation experiments.