Automatic x-ray image analysis for aviation security within limited computing resources

The objective of this work is to develop approaches to automating inspection procedure at airports. The article presents the deficiencies of the existing inspection system, concluding in the negative impact of the human factor. It is proposed to use convolutional neural networks for automatic x-ray image analysis of passenger baggage. The paper presents the results of the convolutional neural network with various input data and architecture within limited computing resources. In a view to further development, this study can contribute to the development of specialized software to help aviation security screeners through partial automation of their work.


Introduction
Currently, systems of technical vision are becoming more and more developed. Although until recently, clear and understandable detection and recognition algorithms were based on mathematical image models [1,2], since computing tools were quite limited and machine learning methods provided sufficient efficiency for solving recognition problems only for a narrow range of tasks. The scope of currently solved vision system tasks has expanded significantly with the convolutional neural networks [3][4]. The task of recognizing forbidden objects on X-ray images by aviation security screeners can be distinguished. The airport screening procedure is one of the key measures in aviation security. However, its effectiveness largely depends on the human factor, which is associated with the lack of aviation security training, the impact of the fatigue, the inability to process large volumes of data, and the distraction factor. To overcome these factors, it is necessary to implement automatic xray images recognition of passenger baggage and hand luggage.
In addition, it is still relevant to enter deep learning, which requires significant resources to ensure high efficiency of trained networks. Typically, computer clusters are used that perform parallel computing on multiple graphics processing units [5].
However, the question arises of what to do if resources are limited, and the initial solution needs to be obtained in the near future. This paper presents basic steps that can significantly increase the MIP: Engineering-2020 IOP Conf. Series: Materials Science and Engineering 862 (2020) 052009 IOP Publishing doi:10.1088/1757-899X/862/5/052009 2 recognition efficiency even for complex images using the example of X-ray images in aviation security.

Source data and network architecture
At the initial stage, the image database was pre-assembled. Its several features should be emphasized. Firstly, despite the fact that binary classification is performed, the task is more complex in fact. This is due to the fact that inside the classes "dangerous" and "safe", which are the outputs of the network, images of several completely different objects can be represented. Secondly, a significant area in the images can be of different sizes. In addition, with low target prevalence of prohibited items, the original x-ray images database of clean baggage will always predominate the similar database with prohibited items. In addition, for different subclasses within the division into "dangerous" and "safe", there is also an uneven distribution of images. Figure 1 shows examples of source images. In the first rowvarious images of dangerous objects, in the secondsafe. Preliminarily operations were carried out with the source dataset. They made it possible to increase the sample. Among them were various rotations and zoom operations.
The training was carried out on convolutional neural networks with the architecture shown in figure 2.  It is important to note that the number of layers could vary. However, taking into account the characteristics of the computer technology (NVIDIA GeForce GTX1060 3GB GDDR5 1708 MHz), limitations on RAM did not allow research on more complex architectures. Nevertheless, due to some changes to the network architecture, it was possible to increase the percentage of correct recognition.

Training and results
For the adequacy of the analysis, the initial sample were divided into training, validation and test sample, according to the table 1. According to table 1, if the network will always issue the "safe" object, the test sample will have 64 % of the correct answers. Let's consider iterations that can increase this figure within the shortest time.
1. At the beginning, we will use more user-friendly MATLAB interface to track the learning process. First, we reveal how critical the source data are to the color features of the images. As image processing in grey levels can reduce computing resources, it is implemented a network with 3 levels of convolution: 40, 36 and 25 neurons. For the best speed, optimization by gradient descent was chosen and the duration of training is 8 epochs.
The results of correct recognition on the test sample: black and white images -62.34 %, color -66.48 %. Therefore, it will take into account the color attribute.
2. We'll check how much and if it's possible to improve network efficiency when we move to ADAM optimization. The network architecture will remain the same. Correct recognition results on a test sample: 69.62 %.
3. Further, we complicate the network architecture a bit by making a network with 3 levels of convolution: 64, 48, and 32 neurons.
Correct recognition results on a test sample: 75.32 %. Figure 3 shows the network learning process. Presented graphs analysis shows that convergence is not achieved for 8 epochs, while significant time is spent on training in MATLAB. Therefore, further steps were performed using Python.
4. The network from item 3 during 200 epochs of training worked for 1690 sec., which is much less than the time spent by MATLAB for training in 8 epochs. Correct recognition results on a test sample: 69.32 %. The analysis shows that such a network turned out to be retrained, and the results on the test sample decreased.
5. The next test was performed on 100 epochs of training. Correct recognition results on the test sample: 81.75 %. 6. Finally, the image resizing method was used. Instead of 112x112 images, the converted 128x128 image was initially input. This allowed to increase the percentage of correct recognition. Correct recognition results on a test sample: 85.71%. 7. The latest change was an increase in the number of neurons in layer 1 and 2 to 128, which also improved the results. At the same time, the number of training epochs, taking into account compression, was reduced to 50, since for 100 epochs, the results on the training sample are 100 %, which indicates retraining. In addition, it also allows to reduce computational efforts by 2 times. Correct recognition results on a test sample: 90.47 %. Figure 4 shows an example of processing some images with this network. 8. Further experiments with training parameters, input data and neural networks did not improve these results. However, it should be noted that due to the lack of time during the day optimization It is important to note that table 2 represents obtained solutions in increasing order. In general, for the original task, it was possible to increase the recognition efficiency from 64 % to 90.47 %. It is usually considered to be a successful solution in non-critical recognition problems.

Conclusion
A neural network was trained using primitive modifications of the input data and the architecture of the neural x-ray images network of the airport security service. The training provided a test result of more than 90 %. Further improvement can be obtained by deep researching of other modifications related to an increase in the training sample or more complex networks organization. This study is expected to contribute to the development of specialized software to help aviation security screeners through partial automation of their work process.