Real time mobile based license plate recognition system with neural networks

In this paper, the implementation of localizing and recognizing license plate in real time environment with a neural network using a mobile device is described. The neural networks used in this research are Convolutional Neural Network (CNN) and Backpropagation Feed Forward Neural Network (BPFFNN). Image processing algorithm for pre-processing, localization and segmentation is chosen based on its ability to cope with limited computational resource in mobile device. The proposed license plate localization steps include combination of Sobel edge detection method and morphological based method. Detected license plate image is segmented using connected component analysis (CCA) and bounding box method. Each cropped character is fed into CNN or BPFFNN model for character recognition process. The neural network model was pretrained using desktop computer and then later exported and implemented in Android mobile device. The experiment was conducted in a moving vehicle on selected driving routes. The results obtained showed that CNN performed better compared to BPFFNN in a real time environment.


Introduction
Automatic License Plate Recognition (ALPR) is an image technology application that detects, extracts and recognizes license plate information from an image or video frame. The extracted information can be used in many applications, such as real time traffic monitoring system, vehicle access control and electronic payment system which include toll payment, parking fee payment and so on [1]. In real life application, ALPR system must cope with various challenges such as different types of license plate, with varying font and color and inconsistent license plate position that increases the difficulty in detecting and recognizing license plate [2]. Conventional ALPR system is usually installed as a fixedpoint camera, which is not portable, expensive and requires a high-resolution camera and desktop/notebook computer to process the image using complex image processing algorithm [3]. The accuracy of a license plate detection and recognition using high quality image is often higher than a low-quality image. Fortunately, smartphones nowadays are equipped with high-performance processor and decent camera hardware that can produce high quality image, thus making it possible to implement complex image processing algorithm for ALPR system. Android mobile platform provides infrastructure libraries that allow developer to access the mobile device camera hardware programmatically. There are many existing open-source computer vision libraries that support Android mobile development such as OpenCV SDK [4].
The implementation of real time ALPR system is complex and challenging. In this research, the moving camera approach is used instead of static camera. Moving camera approach is more complicated than static camera as the background continuously changes and there is no fixed position of where the license plate will be located in the video frame. Moving camera approach for LPR system utilizes real time computer vision techniques where the use of optimized algorithms in real time environment is vital to process instant license plate detection and recognition. The implementation of image processing algorithms is computationally intensive and resource exhaustive. Due to limited CPU and memory resources in mobile device, the implementation of an image processing technique needs to be considered as well. In this research, an image processing algorithm that is less complex and requires less processing time is preferred. For example, morphological based method and Sobel edge detector are suitable due to its simplicity, better performance and less processing time compared to other localization algorithms [5]. Meanwhile, the training process for NN is carried out in a desktop computer instead of a mobile device to speed up the process. There are particular challenges associated with real time LPR in mobile devices. For example, the camera installed onboard might be unstable due to driving in high speed or uneven road surface which may result in the camera losing focus. Driving speed limit needs to be maintained because driving at higher speed will increase the safety distance between car in front and thus increasing the distance between camera and target license plate. This will make it difficult for the target car to get into the detection range of the system and the size of license plate will appear much smaller in the video frame. Other than that, selecting the right camera resolution is important to ensure that a high-quality image is obtained as low quality image will affect the license plate detection accuracy.
In [6], the author proposed real time LPR method for moving camera in a complex scene using the color information of braking light on the vehicle which yielded very good results. However, the method was implemented on a computer, and it consists of several major processing steps such as filtering image background with color information on braking lights and histogram equalization. These methods are not suitable for implementation in mobile device as it requires longer processing time to process color pixels on an image instead of binary image. [7] implemented real time ALPR system for outdoor parking lots using CNN with GPU accelerated function. This method can effectively deploy complex networks model in real time and yielded very good results. However, the method needs to be carried out in a computer equipped with powerful GPU which is not suitable for mobile device implementation.
Artificial Neural Network (ANN) is also commonly referred to as neural networks (NN) is among the popular method for solving LPR problems. Researchers have proven the ANN is superior in image classification and recognition [8] [9], gaming development and design [10] [11], prediction and estimation [12] [13]. BPFFNN and CNN are among the widely used NNs in LPR research [14]. BPFFNN [15] and CNN [16] [17] have all been proven to perform well and demonstrated satisfactory results. This is the main reason why BPFFNN and CNN are adopted in this study.

Proposed method
The proposed method is divided into four stages, namely the image/video frame preprocessing, license plate localization, license plate segmentation and license plate character recognition.

Pre-processing
The first step in ALPR system is to acquire images or video frames from a camera. In this research, an Android mobile device is used as a moving camera to capture the LPR process. This stage is a three steps process, which is video to video frames conversion, grayscale conversion and noise reduction. The captured video is converted into video frames before undergoing further processing. Each frame will undergo grayscale conversion as grayscale simplifies the algorithm and reduces computational requirements compared to color images which require more color data to be processed [18]. Next, median filter is used to reduce noise in the frames due to its ability to preserve edges while reducing substantial amount of noise.

Localization
The second stage in ALPR system is to detect or extract license plate image from the video frame based on the rectangular shape and boundary of the license plate. The proposed method first converts the grayscale image into binary image using Otsu's binarization method. As license plate are usually designed in rectangular shape, vertical Sobel edge operator is applied on the binary input frame to find all possible rectangles that exist in the video frame. The morphological closing operation is then applied to fill the gaps between the license plate character and to smooth the outer edges.
In order to remove unwanted objects in the video frames such as tree, building and road signboard, the contours is identified, and each contour area is analysed. License plate area is predefined, and contours which are larger or smaller than the area is removed. The remaining contour is connected with a bounding box for license plate verification process. The bounding box is analysed in terms of its height and width. Since the size of a bounding box for license plate usually consists of greater width than height, any bounding box that doesn't fulfil this requirement is eliminated as non-license plate region.

Segmentation
After the license plate is extracted from the video frame, each license plate image is segmented in order to extract and separate each individual character. Binarization method is applied first. Next, connected component analysis (CCA) method is used to find individually connected contour on the license plate image. CCA analyses and assigns unique label to all connected pixels in the binary image. Each labelled object can be extracted separately according to its label [19]. After that, each separated digit and letter from the license plate image is bounded using bounding box method. Lastly, rectangle region is identified based on the bounded rectangle box, and each bounded character is cropped out as a single input data from the original image. Each cropped out character is normalized into 28x28 images before feeding it into the NN model for character recognition process.

Character Recognition
This section describes two algorithms which are CNN and BPFFNN. BPFFNN is a supervised learning algorithm which uses backpropagation learning method to calculate the gradient of loss in the network [20]. BPFFNN propagates through the network to generate output activation. When the forward propagation is accomplished, the output from the forward propagation is compared with the correct output by computing the gradient loss function using gradient descent. The gradient is fed through the network to update the weight value of each hidden neurons. This process continues until the error of the network output is reduced to an acceptable level through certain learning times. The BPFFNN model has 784 input neurons, 500 hidden neurons and 62 output neurons with one bias. The NN is trained with datasets from Chars74k [21]. Table 1 shows a summary of BPFFNN architecture used. CNN is a class of deep NN which contains multiple hidden layers and level of basic abstraction [22]. CNN is composed of three different types of layer, mainly convolutional layer, pooling layer, and fully connected layer which can be stacked as [INPUT -CONV -RELU -POOL -FC]. Convolutional layer applies filters on the original image which detects low-level features such as edges and curves. Each filter acts as a feature identifier which slides across all of the areas of the input image and performs multiplication between filter value and original pixel values of the image. Pooling layer performs downsampling operation such as max pooling or average pooling. It is responsible for reducing the dimensionality and control overfitting of the network progressively. Fully connected layer takes the input volume from the output of the preceding convolutional layer or ReLU or pooling layer and generates an output of N-dimensional vector. In this research, the CNN model is a four layers architecture which consists of three convolutional layers and one fully-connected layer. Each convolutional layer uses a 3 by 3 filter size. ReLU is used as the activation function which allows the network to train faster due to its computational efficiency without making significant difference to the accuracy [23]. Max pooling is applied in order to find the maximum value from the filter region of the input volume. The datasets used to train the CNN model is the same as BPFFNN. Table 2 shows the summary of CNN architecture used. Tensorflow [24] is a NN framework which provides supports and extensions for mobile platform implementation. The NN model for CNN and BPFFNN can be trained on a desktop first instead of a mobile device. The weight matrix and NN structure from the training process is imported into the Android device for character recognition purpose, but without the capability to retrain the weights. This method will save huge amount of the NN training time, and it is more efficient to export the trained NN model directly to the mobile device itself.

Experimental results
The LPR process is recorded, and the results are analysed and recorded. Average real time LPR accuracy rate and individual character recognition rate of both CNN and BPFFNN are compared. Based on Table 3, the result is tabulated based on the number of fully recognized license plate, the number of license plate partially recognized and the number of the license plate that is not recognized. A license plate is considered a fully recognized license plate when all the characters on the license plate are recognized correctly, whereas a partially recognized license plate means that only certain characters in the license plate are recognized and finally not recognized license plate indicates a valid license plate that is not recognized at all by the ALPR system. The experimental result shows that the average recognition success rate in CNN is higher compared to BPFFNN. CNN was able to fully recognized 64% of the license plates while BPFFNN managed to recognise 56% of the license plates. On the other hand, BPFFNN was able to partially recognize 21% of license plates as opposed to CNN which only managed to recognize 17% of the license plates. BPFFNN was not able to recognize 23% of the license plates whilst CNN was not able to recognize 19% of the license plates. There are several factors that might have contributed to this. Smartphone camera tends to lose focus which may result in the system not being able to recognize license plate that is positioned on left side or the right side of the video frame. Another reason could be due to speed at which the vehicle with the target license plate is moving resulting in a blurred image in the video frame. When driving at a speed above 50 km/h, the minimum safety distance between the vehicle in front will increase which in turn increases the distance of the license plate and the camera greatly. making it difficult to detect the target vehicle license plate and producing a smaller size license plate appearing in the video frame. The character recognition analysis in Table 4 shows that CNN obtains the highest character recognition accuracy rate in the majority of the characters compared to BPFFNN. CNN is able to successfully recognized a total of 13 characters which are 'E', '  license plate font used is not the standard font approved by Malaysia Road Transport Department (JPJ). Among these characters, both NNs had difficulty in recognizing the character 'Q' correctly. The character recognition accuracy rate of character 'Q' is the lowest for both NNs. Figure 3 shows an example of CNN and BPFFNN recognition result.

Conclusion and future work
This research investigates the potential of implementing the ALPR system in mobile device for real time environment using a low cost, portable and efficient ALPR system with the use of smartphone device. The ALPR system can be installed in user's smartphone device which is easily accessible by the user. This can reduce the need to carry multiple devices for separate usage. The image processing steps involved are discussed where the image processing algorithm that works best with mobile device is applied. The implementation of both NN in mobile device is discussed as well.
The performance of the proposed CNN and BPFFNN was tested based on the video image captured from a moving vehicle on the road. Based on the results obtained, CNN performed better compared to BPFFNN in terms of license plate recognition success rate. Both NN models were unable to achieve 100% success rate in recognizing all the characters. This is due to the complex real time environment condition where the system must cope with background changes rapidly, varying distance of license plate and camera, smartphone camera losing focus and motion blur caused by moving vehicles and camera. However, both NN models were able to achieve an average success rate of more than 70%.
The CNN model may be further refined to improve recognition accuracy. The ALPR system can also be tested in varying weather conditions (raining, cloudy or night time in low light condition). In addition, the proposed method can be further refined to improve recognition accuracy while driving with higher speed.