Study on Droplet Composition Recognition Based on LSTM Network Combined with Convolutional Neural Networks

The collision and spreading processes of droplets are not the same due to the physical and chemical characteristics of droplets of different composition. A droplet collision experiment system was set up to capture videos of the droplet collision and spreading processes which changed as the droplet composition changed. A cascaded network was applied to extract spatial-temporal features in the videos and realize droplet composition recognition. The proposed method achieved 80% classification accuracy, which was relatively better performance compared to single back propagation network.


Introduction
Droplet analysis technology usually collects or obtains the physical and chemical characteristic parameters of the liquid by detecting liquid droplets. There are four commonly applied droplet analysis techniques: optical fiber analysis technology [1], capacitive analysis technology [2], optical-fiber capacitive analysis technology [3] and image analysis technology [4]. Droplet analysis technology aims to qualitatively and quantitatively identify liquids, while droplet composition analysis technology aims to identify the composition of droplets. Because of this, droplet composition analysis technology could identify the droplet composition based on the droplet analysis technology.
Song and her team members obtain liquid drop fingerprint (LDF) from fiber-capacitive drop analyzer and identify different liquids by LDF [5]. They claimed that LDF was unique and specific for a certain liquid. However, this method puts forward requirements on the transmittance and volatility of the measured droplet. In response to these problems, image technology is a very promising solution. Along with the development of image technology, traditional image technology is not the only domain image technology to analysis droplet pictures. Machine learning and neural networks are used for droplet image analysis as well. Using the uniqueness of the droplet microscopic image, combined with machine learning to extract image features, the accurate recognition of the type of droplet is achieved [6]. Although, based on the macroscopic image of the droplet, the research requires images with high contrast. However, the rather accurate results of this research show the potential of neural networks.
In this paper, based on the correlation between droplet composition and droplet collision and spreading process, the droplet composition recognition system was established. The proposed recognition neural network was trained on droplet video data which was collected by self-built experiment platform.

Theoretical basis
Due to the different physical and chemical characteristic parameters of different droplets, the collision and spreading process are different. Because of the superiority of feature extraction of convolutional neural networks and LSTM, two combined networks are applied on the image sequence data.

Droplet collision and spreading process
The following three dimensionless parameters are used to describe the impact of droplets: Weber number , Reynolds number and Ohnesorge number ℎ . Weber Number represents the ratio of inertial force to surface tension. The smaller the Weber number is, the greater the influence of surface tension has. Reynolds number represents the ratio of inertial force to viscous force. The larger the Reynolds number, the greater the influence of the inertial force, and the smaller the Reynolds number, the greater the influence of the viscous force. The Ohnesorge Number is a measure of the relationship between viscous force, inertial force and surface tension. The formulas of Reynolds number, Weber number and Ohnesorge number are as follows: In the formula (1), represents the droplet density, represents the droplet volume, represents the droplet diameter, and represents the droplet surface tension coefficient. In the formula (2), represents the droplet viscosity coefficient. When the droplet composition is different, the Reynolds number, Weber number and Ohnesorge number are different. Based on the difference between the Weber number and the Ohnesorge number, the spreading methods of the droplets after collision can be classified into four categories, that is, the composition of the droplet is different, and the spreading process after the collision of the droplet is different. Therefore, the collision and spreading process of the droplet has composition specificity.

VGG-LSTM net
In the experiment, the video data obtained during the collision and spreading of the droplets is actually image sequences, which requires the proposed model to be able to extract the spatial features in the image and to learn the temporal features in the sequence as well. Convolutional Neural Networks, known as CNNs, use convolution kernel to extract effective features and use spatial structure to reduce the number of parameters that need to be learned. VGG network, mainly based on the 3×3 convolutional kernels and 2×2 pooling core, presents outstanding performance in image classification tasks [7]. In this paper, after each frame of a video sample is fed into VGG-16 network, the extracted features To extract temporal features, RNN (Recurrent Neural Network) is a common solution, which has the long-term dependency problems. LSTM, short for Long Short-Term Memory, was proposed by Schmidhuber in 1997 to overcome the shortcomings of long-term dependencies [8]. Because of its longterm memory, the extracted features are applied as the input of LSTM model. A video sample was viewed as a complete sequence ( ) = [ 0 ( ) 1 ( ) ⋯ ⋯ ( ) ] , and the droplet composition label ( ) was the output of the LSTM network.

Experiments and results
The droplet collision experiment system, shown in Fig. 1, consists of two subsystems: droplet generation system and droplet imaging system. The two major components of droplet generation system are the 3 droplet generator and a height-adjustable support structure. The droplet generator is composed of two syringes, which are connected by valves and hoses. The 30ml syringe is functioned as liquid storage container, and the 60ml syringe is functioned as hydraulic power source. A height-adjustable support structure is used to control the distance between the syringe needle end and upper surface of the surface modification system. The structure is also used to ensure that the syringe is perpendicular to the horizontal plane.
Keep pushing the syringe, and the liquid in the syringe flows through the needle because of the pressure. Due to the surface pressure, the liquid gathers at the end of the needle. When the liquid continues flowing through the needle, the gravity triumphs over surface tension at some point. Then, the droplet falls from the needle end with zero velocity. Finally, the droplet will collide with the surface of the board and spread. The image system is used to collect droplet data in this process. There are two subsystems belonging to droplet imaging system: light source system and video capture system. The major subsystem of light source system is composed of three LED array light sources, placed in different positions to improve image quality. The light intensity and illumination angle could be changed by adjusting the light intensity of each LED light source. The core part of the video capture system is the high-speed camera, which is connected to the computer to upload droplet data. The videos captured by the camera at 15000 frames per second are selected and saved via software.
There are three types of liquid applied in the experiment which are deionized water, absolute alcohol and DMMP mixed liquid (deionized water: DMMP = 10:1). Three types of droplet fall from the end of the needle, collide with the surface of the board and then spread on the board under different environment conditions. An original video sample is split into fifteen video samples by frame extraction in order to achieve data augmentation. The selected target area of each frame of these new video samples is then reorganized into the new sequence. After normalization processing for each frame, each video sample is labeled based on the droplet composition. A processed sequence sample is displayed in Fig.  2. There are 75 video samples in the datasets, including 60 training samples, which has 66 frames each.   Fig. 3, after the spatial features of each frame was extracted by VGG-16 model, the sequence data was then used as the input of LSTM. Each sequence is consisted of 66 frames and labeled based on the composition of the droplet. The VGG-16 model uses the pre-trained weight to extract spatial characteristics. The LSTM model has one LSTM layer with 6 cells and one dense layer. The LSTM model was trained using Adam optimizer with initial learning rate = 10-4. The batch size of each epoch was 32 samples and the number of epochs is 5. A softmax activation function was applied for droplet composition recognition because of the droplet recognition is a multi-class task. All the neural networks were implemented in Keras using python 3.7. The classification accuracy of the proposed method is 80% on the test set, while Hu's research achieved 50% classification accuracy [9]. In Hu's research, the neural network is not trained on the droplet image datasets, but on the characteristic data extracted by Canny algorithm and principal component analysis (PCA) technique. However, this method may cause the loss of the spatial and temporal features of the droplet image sequence. The results of the proposed method show that the network has successfully learned the temporal and spatial features of collision and spreading process of different droplets.

Conclusions
In this paper, droplet composition is classified by a cascaded network consisted of the VGG model and LSTM model. The model was trained and test on the droplet video dataset. The proposed method presents relatively good performance (80% classification accuracy) on the test set compared to Hu's research. The results of the proposed method present that the network has a rather promising advantage in extracting the spatial-temporal features of the droplet image sequences. In the future, the proposed method will be further trained and tested on the larger datasets.