Spirits quality classification based on machine vision technology and expert knowledge

By combining machine vision technology and expert knowledge, this paper proposes an online intelligent classification solution for Chinese spirits, which effectively improves the classification accuracy and production efficiency of spirits. Specifically, an intelligent spirits quality classification system is first designed, including spirits collectors, image sampling cameras, and computing devices. According to the principle that the size and shape of the bubbles in the spirits collector will change with the alcohol content in the spirits, a classification method of spirits quality based on the convolutional neural network (CNN) and bubble region of interest (ROI) selection is proposed. Furthermore, a post-processing method based on expert knowledge is proposed to improve the accuracy of the classification algorithm. A spirits quality classification dataset containing 139 119 images is created, and 15 CNNs are tested. Test results show that the highest spirits quality classification accuracy is 98.62% after using the bubble ROI selection method, and the highest classification accuracy reached 99.82% after adopting the post-processing method. Furthermore, practical application tests show that the solution proposed in this paper can improve spirits’ production quality and efficiency.


Introduction
In recent years, the application of artificial intelligence (AI) technology in industrial manufacturing has exploded, giving * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. birth to industrial AI [1], which has become the technical core of intelligent manufacturing. AI-driven manufacturing has significantly improved many aspects of the closed-loop production chain, from the manufacturing processes to the final product logistics [2]. In particular, the use of machine vision technology has greatly benefited the field of production monitoring [3][4][5].
Chinese spirits are distilled spirits with a long history, usually obtained by natural fermentation [6]. It is usually made from sorghum or a blend of barley, corn, rice, wheat, and sorghum and is rich in volatile components such as esters and organic acids. Traditional solid-state fermentation and brewing mainly include the following steps: material preparation, starter preparation, fermentation, distillation, aging, and blending [7].
During the distillation process, the quality of spirits will change continuously with the distillation time due to the constantly changing content of ethylene glycol, ethyl butyrate, acetaldehyde, and other substances. Sorting and collecting spirits according to their quality is a key link in spirits production, directly related to their final quality and production efficiency. Traditional production lines classify different qualities of spirits by manually observing the characteristics of the bubbles (shape, size, and quantity) produced by pouring the spirits. Although this method is simple and easy to implement, it has the disadvantage of different classification standards and relies heavily on the personal experience of workers, which is not conducive to maintaining the stability of spirits quality and improving production efficiency. Therefore, studying the intelligent quality classification technology of spirits is very necessary and valuable.
There are few studies on online spirits quality classification based on machine vision technology. Existing studies on spirits quality classification are mainly based on analytical instruments (such as spectrometers, chromatography, and mass spectrometers) to analyze the chemical composition of spirits. Based on Fourier transform analysis and Raman spectroscopy, Mandrile et al [8] identified the variety, origin, and aging time of red wine. The identification accuracy of varietal and origin is 90%, and the identification accuracy of aging time is 84%. Wu et al [9] measured the fermentation parameters in rice wine by Raman spectroscopy and classified the fermentation stage of rice wine based on the cars-support vector machine (Cars-SVM) algorithm, with a classification accuracy of 94.90%. Pérez-Caballero et al [10] used ultraviolet-visible spectroscopy to analyze the chemical composition of tequila and used various machine learning algorithms such as random forest and support vector machine to classify the category of tequila. The best classification accuracy is greater than 94%.
To the authors' knowledge, there are no machine vision studies specifically focused on classification of spirits bubbles. Fortunately, several studies have used machine vision technology to analyze bubbles generated by the gas-liquid twophase flow in chemical production processes, which inspired the research in this paper.
Liu et al [11] proposed an online bubble size distribution monitoring scheme through a fully convolutional network with multi-scale deblurring and multi-stage jumping feature fusion to identify the health status of the flotation froth process. Peng et al [12] proposed a watershed algorithm with an optimized mark and edge constraint for accurate segmentation of flotation froth images. Haas et al [13] proposed a faster regionbased convolutional neural network (CNN) detector to locate bubbles and a shape regression CNN to approximate bubble shapes in gas-liquid multi-phase flows. Qaddoori et al [14] used the traditional Hough circle detection algorithm to locate tiny bubbles in water flow and designed a shallow CNN network for bubble size classification. Cerqueira and Paladino [15] proposed a method to reconstruct bubble morphology based on anchor points and boxes, and trained a CNN network to extract bubbles that approximate ellipses. Wang et al [16] proposed a fast 3D reconstruction method for dilute bubbly flow based on DIF-LeNet by combining bubble information in light field images. Wang et al [17] proposed a framework combining a deep edge-aware network and a marker-controlled watershed algorithm to extract bubble parameters from hysteroscopic images. However, since spirits bubbles are too dense, it is difficult to extract individual bubbles for processing by image segmentation or object detection methods. Therefore, this paper adopts the image classification method to realize the classification of spirit bubbles.
Based on machine vision technology and expert knowledge, this paper proposes a complete solution for the quality classification of spirits in the actual production process. For the first time, the use of machine vision technology to replace artificial spirits quality classification in the actual production process has been successfully implemented, thereby improving the final quality and production efficiency of spirits. The main contributions of this paper are: (a) a bubble region of interest (ROI) selection algorithm based on gray-level cooccurrence matrix (GLCM) and non-maximum suppression (NMS) is proposed, which can effectively improve the spirits quality classification accuracy of the CNN; (b) a postprocessing method based on expert knowledge is proposed to improve the spirits quality classification accuracy; (c) by combining machine vision technology and expert knowledge, a high-precision spirits quality classification method is proposed, and the best classification accuracy reaches 99.82%; (d) a spirits quality classification solution is proposed and applied in actual production, which effectively improves the final quality and production efficiency of spirits.
The rest of this paper is organized as follows. Section 2 introduces the principle of spirit quality classification and the spirits distillation process. The spirits quality classification algorithm based on the CNN model and bubble ROI selection algorithm is proposed in section 3. The preprocessing and post-processing methods are also proposed in the same section. In section 4, a spirits quality classification dataset is established, and different CNN models are tested to verify the effectiveness of proposed methods. The practical application tests are also carried out in this section. Finally, the conclusions are provided in section 5.

Spirits distillation processes
Ethanol and water are the main components of distilled spirits, accounting for about 98%. Spirits also contain more than 1700 trace substances [18], including alcohols (such as isobutanol and n-pentanol), esters (such as ethyl caproate, ethyl lactate, and ethyl acetate), organic acids (such as acetic acid and lactic acid), and aldehydes (such as furfural, acetaldehyde, and acetal), etc. Take the distillation process of strong-flavor spirits as an example, which can be divided into five stages from I to V, namely the initial distillate (stage I), the second distillate (stage II), the third distillate (stage III), the last distillate (stage IV), and the tailwater (stage V). As the distillation time increases, the alcohol by volume (ABV) decreases, and the flavor gradually deteriorates, which means that the quality of spirits decreases. Figure 1 shows the effect of distillation time on the content of alcohol, ester, acid, and aldehyde in strongflavor spirits [19].
The entire distillation lasts about 20 min. Figure 1(a) shows the division of five stages. In stage I, the ABV in the distillate is very high and contains more alcohol-soluble esters. The spirits taste in stage I is fragrant and miscellaneous. The content of ABV and ethyl hexanoate in the distillate of stage II is still relatively high, and the substances such as esters, aldehydes, and acids are relatively balanced, resulting in the best flavor and quality of spirits in this stage. As shown in figure 1(b), ethyl lactate, organic acids, and aldehydes readily soluble in the water begin to increase in stage III due to the decrease of ABV in the distillate, resulting in a slightly sour taste and insufficient flavor of spirits. However, it can still be used as ordinary spirits after storage and blending. In stage IV, the ABV in the distillate dropped sharply, resulting in a significant increase in the content of ethyl lactate, organic acids, aldehydes, and some poorly soluble oils. At this stage, the distillate begins to settle, becomes sour, and becomes irritating and greasy. The ABV in the distillate of stage V drops to zero, and many ethanol-insoluble substances are distilled out.

Principles of spirits quality classification
The overall and partial details of distillate bubbles at different distillation stages are shown in figure 2. The golden bowl is used to collect the distillate to highlight the bubble morphology. Among the various substances in the distillate, the content of ethanol has the greatest influence on the visual morphology of bubbles. At 30 • C, the relationship between the surface tension of the ethanol solution σ and the ethanol concentration c is provided [20]: As shown in figure 1(a), the alcohol content of the distillate decreases, and its surface tension increases as the distillation progress. The continued increase in surface tension leads to a decrease in bubble stability. Therefore, there are more bubbles at stages I, II, and III, as shown in figure 2(a). When entering stages IV and V, the sudden drop in alcohol content leads to a sharp increase in surface tension and a sharp deterioration in bubble stability, resulting in a sharp decrease in bubble number.
Alcohol also indirectly affects the visual characteristics of bubbles by affecting the content and solubility of trace substances in the distillate, as shown in figure 2(b). The distillate at stage I is high in alcohol content and low in water-soluble foam substances such as organic acids and ethyl lactate. At this stage, the amount of adsorbent molecules is sufficient to maintain the stability of bubbles only when the bubbles coalesce to a certain extent and the total surface area of bubbles is small. Therefore, the bubble diameter at this stage is relatively large and can last for a few seconds. In stage II, the alcohol content of the distillate begins to decrease, the content of water-soluble organic acids, ethyl lactate, and other substances increases, and the foaming effect is pronounced, resulting in fine and dense bubbles. During the last two distillation stages, the alcohol content suddenly drops, the content of the foaming substances is too high, and the bubble stability is greatly reduced. Moreover, the higher content of alcohols, ethyl oleate, and ethyl linoleate in the last two distillation stages negatively affects the formation of bubbles as esters and other poorly water-soluble liquids substances will precipitate and form droplets, thereby preventing the formation of bubbles.
In short, the ethanol content in the distillate directly affects the visual characteristics of bubbles by affecting the surface tension. The ethanol content also affects the content and solubility of various trace substances in the distillate, and indirectly affects the visual characteristics of bubbles.

Hardware system
An intelligent spirits quality classification system is developed, and the schematic diagram is shown in figure 3. The system is mainly composed of the high-speed camera (the model is HIKVISION DS-2CD7A47FWD-LZS/ZJ, the shutter speed is 1/2000 s, and the frame rate is 25 fps), industrial Ethernet, AI server (Model: Dell PowerEdge R740, CPU: 1x Intel Xeon Silver 4114 2.2 GHz 10Core, GPU: 1x NVIDIA Tesla P4 GPU), distributed control system (DCS) (Model: SUPCON ECS700), and segment switching devices (valves), as well as multiple industrial application software (OPC, Visual Field, etc). The system can obtain real-time images of spirits bubbles, judge the current spirits quality, and control valves to collect spirits from different distillation stages.
The working process of the whole distillation system is as follows: (a) steam first enters the distiller from the bottom, which drives the alcohol substances to be distilled from the distiller's grains in the form of steam, then enters the condenser. (b) In the condenser, the steam condenses into liquid spirits, which then flow out of the bottom of the condenser. (c) Bubbles are formed after the spirits flow into the golden bowl collector. The camera captures the bubble image in real-time and sends it to the server. The server runs the spirits quality classification algorithm and sends the result to the DCS. DCS controls the opening and closing of valves according to the classification result so that liquor of different quality flows to  the corresponding storage tank. The detailed structure of the spirits quality classification system is shown in figure 4.

Classification algorithm
The flowchart of the proposed CNN-based method for spirits quality classification is shown in figure 5, which includes three steps: image preprocessing, image classification, and post-processing. The image preprocessing algorithm uses traditional machine vision methods to quickly extract the foreground region of the collector from the original image obtained by the camera. The image classification method first extracts the bubble ROI, and then classifies the image based on the CNN. The post-processing algorithm detects and corrects the classification results of CNN based on expert knowledge to eliminate unreasonable stage switching, and then outputs the final classification results.

Prior knowledge constraints.
The quality classification of spirits is closely related to the actual production process. Therefore, this section summarizes some prior empirical knowledge as auxiliary information for the classification  algorithm, as follows: (a) the sequence of the distillation process must be from stage I to stage V, and there will be no disorder; (b) stage I (initial distillate) usually lasts about 2 min, and the algorithm can identify this stage according to the distillation time; (c) due to the low quality of stage IV (last distillate), the distillation time of this stage is usually determined by the brewery, which means that the identification of stage V is also related to distillation time; (d) the classification algorithm in this study only needs to classify three stages of the distillation process: stages II, III, and IV.

Image preprocessing.
The useful foreground is only the region where the golden collector is located in the original image. Finding the foreground region and discarding the irrelevant background can speed up the inference speed of the CNN and reduce the input of noise information. Besides, during the production process, the camera and collector may shake or move due to environmental vibration, and foreground position in the image will also change accordingly. Therefore, it is necessary to detect the foreground region of the collector in the image.
The Gaussian filter is used to remove noise in the original image. A golden collector is specially designed to enhance the contrast of the bubble image. Since the golden color of the collector is different from the background color (as shown in figure 6), the color space segmentation method is adopted. Specifically, the original image located in the RGB color space is converted to the hue, saturation and value (HSV) color space [21]. Calculate the three color channels of the hue (h), saturation (s), and value (v) of the pixels inside and outside the collector. The conditions of threshold segmentation are where h min , s min , and v min are the lower thresholds of three color channels, and h max , s max , and v max are the upper thresholds. All thresholds are determined based on prior knowledge. Keep all pixels that meet the thresholds in equation (2) while setting other pixels to 0. The last step is to crop out all the zero value regions used for contour detection. Then input the foreground region of the collector into the CNN model for inference.

Bubble ROI selection algorithm.
Since more convolution operations can extract deeper semantic features, deeper and larger CNN models can theoretically achieve higher image classification accuracy and have better generalization. However, a complex CNN model with many parameters requires better hardware equipment and more training data, which will undoubtedly increase the difficulty and cost of the classification algorithm. A bubble ROI selection algorithm based on GLCM and NMS is proposed to improve the accuracy and generalization of the lightweight CNN model. After combining the bubble ROI selection algorithm, the CNN models no longer need to learn global images but accept local bubble images as inputs. The proposed bubble ROI selection algorithm is shown in figure 6, which consists of three steps: Grid division, GLCM propriety calculation, and NMS. The detailed description of each step is as follows.

Grid division.
First, assume that the global image resolution is W × H, and the resolution of each bubble ROI is w × h. Then, divide the global image into multiple grids at equal intervals in the row and column directions. The division step is s, and the list of candidate grid boxes is B = [b 1 , b 2 , . . . , b K ]. The element b i (i = 1, . . . , K) is the mark of each grid, and K is the total number of grids, which can be calculated as

GLCM propriety calculation.
The GLCM is a matrix that describes the gray-level relationship between pixels and can be used for image texture analysis [37]. The calculation of GLCM needs to set three parameters, namely gray quantization level l, direction A, and distance D. . Each element in A and D is the selected value of direction and distance, m and n are the elements of A and D, respectively. Then, calculate the GLCM with size w × h, and K matrices of size l × l × m × n can be generated, which are the matrices labeled as glcm 1,...,K in figure 6. In this paper, only one direction and one angle are selected to calculate the GLCM, and the angular second moment (ASM) is used as the texture feature index of each matrix. ASM can describe the uniformity of gray distribution and the thickness of texture. If the value of each element in the gray matrix fluctuates very little, then the value of index is small, otherwise it is large. Therefore, the ASM of the bubble ROI will be larger than other regions without bubbles. ASM can be calculated as where P (i, j) is the element of GLCM with coordinates (i, j). Every element in the evaluation parameter set ASM = [asm 1 , asm 2 , . . . , asm K ] corresponds to the elements of B in figure 6.

NMS.
The NMS is a type of algorithm utilized to select an entity (e.g. bounding box) from many overlapping entities. This paper uses the NMS to select the most suitable k bubble regions with the K candidate grids and their corresponding ASM indexes. The NMS selects the proposal regions with the highest index score, and the intersection over union (IOU) between selections should also be larger than the IOU threshold. Finally, k bubble ROIs are selected and marked as R = [r 1 , r 2 , . . . , r k ].
The following pseudo-code is the detailed bubble ROI selection process.   post-processed. When the distillation process and the distillate flow rate become unstable, the output result of the classification algorithm may be unreasonable. To further improve the performance of the classification algorithm, a post-processing method based on the state machine switching rule and expert knowledge is designed to correct the classification results, as shown in figure 7. In figure 7, there are five switching conditions. When the duration of stage I reaches the threshold based on expert knowledge (2 min in this paper) and the CNN classification results show that n (n = 5 in this paper) consecutive diagnosis results are stage II, it means that the distillation process of spirits has entered stage II. The switching rules between other stages are similar.
The post-processing step has two main functions. The first function is that the order of the classification results matches the actual production process. For example, stage III can only appear after stage II, not before stage II. The second function is to filter out incorrect classification results and improve the classification accuracy.

Datasets
Twenty-eight sets of spirits classification systems are deployed in a winery in Sichuan, China, as shown in figure 8. There are 28 spirits production lines in figure 8, including 28 distillation units, 28 cameras, 7 switches, 7 AI servers, 1 OPC server, and 1 DCS. Each production line has a separate distillation unit and camera. Each server is connected to four production line cameras and deploys four spirits quality classification algorithms, which operate independently. All servers communicate with the DCS via the OPC communication protocol.  In this paper, videos of 13 different spirits production lines are used as datasets and labeled, among which 10 production lines are used as the training dataset and 3 production lines are used as the test dataset. Each video covers the complete spirits distillation process, which includes bubbles images from stage I to stage V. The resolution of videos is 2560 × 1440, and the length of each video is about 1 h. The authors sample a frame from the video every 500 ms and manually label it as a dataset sample. Finally, a spirits quality classification dataset containing 139 199 images is established, including a training dataset of 128 532 images and a test dataset of 10 667 images. The image samples of different stages are shown in table 1. According to expert knowledge, stage I and stage V in the spirits production process can be directly identified and controlled by duration. Therefore, the proposed classification method is trained to classify stage II, stage III, and stage IV.

Preprocessing results
The original frame is an red, green and blue (RGB) image with a resolution of 2560 × 1440, and the size of the foreground region is 900 × 900, as shown in figure 9(a). The entire image is converted from RGB color space to HSV color space, and the pixel value ranges from 0 to 255. The entire image and foreground region's HSV histograms are shown in figures 9(b) and (c), respectively. Three graphs from left to right are the histogram distribution of hue channel, saturation channel, and value channel. The horizontal axis represents the pixel value of each color channel, and the longitudinal axis represents the normalized probability density.  region mainly covers the range from 80 to 255, and there are three peak points around 130, 200, and 255, which are close to the peak points of the entire image. Therefore, the value channel segmentation thresholds can only be set to v min = 100 and s max = 255. Figure 10(a) shows the mask for HSV color space segmentation of figure 9(a) using the thresholds set above, where the white pixels indicate that the thresholds are met and need to be retained, while the black pixels will be discarded. Due to the reflection of light, it can be found that some parts of the foreground region are no longer golden and are omitted. Therefore, some remedial measures are needed. The morphological opening operations are performed to filter small noise points outside the foreground region, and the closing operation fills the holes in the foreground region. Hough circle detection is used to realize the smallest circumscribed circle. The extraction result of the foreground region is shown in figure 10(b). The golden collector is accurately extracted and used as the input to the CNN model in the following classification step.

Bubble ROI selection results
For the bubble ROI selection algorithm, two parameters need to be determined when using the GLCM, namely, the pixelpair distance offset and the pixel-pair angle. The grid shrinkage method is used to select appropriate parameters, and eight different values of D = [0, 1, 2, 3, 4, 5, 6, 7] and A = [0, π/4, π/2, 3π/4, π, 5π/4, 3π/2, 7π/4] are selected to calculate the GLCM and ASM indexes of randomly selected foreground images. The calculation results are shown in figure 11. When the distance is fixed, the ASM index increases monotonously as the angle increases. When the angle is fixed, the ASM index fluctuates in a small range as the distance increases. The ASM index is not a convex function or a concave function about the angle and distance, which means that the change of ASM with distance and angle can be predicted.  Therefore, it is only necessary to select a pair of distance and angle values to calculate the ASM index.
The bubble ROI resolution is set as 224 × 224 to fit the input size of the CNN model, and the grid division stride is set to s = 20. The number of candidate grid boxes, K, can be calculated by the formula (3). The distance parameter of GLCM is set as D = [1], and the angle parameter is set as A = [π/4]. For three different distillation stages, the ASM index for each grid of the collector foreground image is shown in figure 12. It is worth noting that the horizontal and longitudinal axes refer to the grid number coordinates after division. It is easy to find that the grid with more bubbles has a higher ASM value. Then, NMS is applied to select the k most suitable grids as bubble ROI for CNN model training and testing. In this paper, N t is set as N t = 0.5, and k is set as k = 5. Figure 13 shows the bubble ROI selection corresponding to different value combinations of N t and k. When the value of N t is too small (N t ⩽0.3), a large area without bubbles appears in some ROI images (such as ROI r3 in figures 13(b), (f) and (j)). When the value of N t is too large (N t ⩾0.7), the ROI images will be severely overlapped, and the significance of selecting multiple ROI images will be lost (such as figures 13(d), (e), (h) and (i)). In order to select as many bubble areas as possible and at the same time ensure sufficient feature differences between different ROI images, N t = 0.5 is set in this paper.

Parameter settings
The setting of parameter k is similar to that of N t . As shown in figure 13, when the value of k is too small (k ⩽3), the number of extracted ROI images is small, and the bubble features may not be rich enough. When the value of k is too large (k ⩾7), there are many repeated areas in ROI images, and it may also cover bubble-free area. Therefore, k = 5 is set in this paper.

Classification results
As mentioned above, only three stages (II, III, and IV) need to be classified by CNN, while stages I and V are based on expert knowledge. An NVIDIA TITAN XP GPU is utilized for training the CNN model, and hyper-parameters are set as: the input image size is 224 × 224 × 3, the batch size is 64, the  However, the accuracy of the bubble ROI input reached 96.77%, which is an 8.51% improvement over the global image input. The meaning of 8.51% improvement is the difference obtained by subtracting the classification accuracy of the original method from the classification accuracy of the improved method.
Among the nine lightweight models, SqueezeNet has the lowest classification accuracy of 75.25% when using the global image input. Similarly, the accuracy of the bubble ROI input reaches 83.32%, which is an 8.07% improvement. Generally, the bubble ROI selection method can greatly improve the classification accuracy of the CNN model when the accuracy obtained by global image input is poor. Using bubble ROI input for CNN models with higher accuracy obtained by global image input can still improve the accuracy, but the improvement will be reduced. At the same time, the Mobi-leNetv3_Small model also has advantages in terms of total parameters, floating point operations (FLOPs), model size, and inference time.
In addition, the overfitting problem of CNN training is also tested by using ablation experiments. The data augmentation (including normalization, random flipping, random brightness and contrast adjustment), regularization (L 2 regularization), and dropout methods are performed while training the Mobi-leNetv3_Small model with the bubble ROI input, as shown in table 3. It can be found that the classification accuracy of the MobileNetv3_Small model using pre-trained weights (transfer learning) is 8.9% higher than that of the model using the random weights (random initialization). Data augmentation, regularization, and dropout methods slightly reduce classification accuracy. The lowest accuracy (95.7%) is obtained by combining L 2 regularization, dropout and random flipping methods. After combining L 2 regularization, dropout, random flipping, and random brightness and contrast adjustment methods, the MobileNetv3_Small model has the least drop in accuracy (only 0.72%). The results of the ablation experiments in table 3 show that the accuracy of the MobileNetv3_Small model is not greatly reduced after using various methods to prevent overfitting, which verifies that there is no serious overfitting problem.

Post-processing results
The post-processing results of different CNN models are shown in table 4. The bold values in table 4 represnt the results with the highest accuray in different experiments. The comparison results show that the classification accuracy of CNN models can be effectively improved by using the post-processing method. After using post-processing, the classification accuracy obtained by CNN models with bubble ROI input is generally higher than CNN models with global image input. The smaller the classification accuracy of the CNN model before using post-processing, the more significant the improvement of accuracy after post-processing is used. After using postprocessing, the classification accuracy of CNN models with Bubble ROI input exceeded 99%. The post-processing results of the MobileNetv3_Small model with the bubble ROI input are shown in figure 14. It can be found that the misclassification results in the original stage III and stage IV are corrected after using post-processing.
As shown in table 2, compared with VGG16, VGG19 achieves 1.46% and 1.32% higher accuracy on global image input without post-processing and bubble ROI input without post-processing, respectively. Similarly, VGG19 has higher classification accuracy than VGG16 after using post-processing, as shown in table 4. VGG19 improves the accuracy of global image input with post-processing and bubble ROI input with post-processing by 1.15% and 0.10%, respectively. Although the classification accuracy of VGG19 is better than that of VGG16, its performance does not outperform CNN models such as Xception, NasNet Mobile and MobileNetv3_Small. Therefore, in this manuscript, VGG19 is not the preferred model for practical application deployment.
In general, the deeper the classification model based on the CNN architecture, the higher the image classification accuracy. However, there does not appear to be a similar relationship between model computational complexity (operations), model complexity (number of parameters), and image classification accuracy. Bianco et al [39] summarizes the relationship between image classification accuracy, model computational   [39] show that the model computational complexity, the number of model parameters, and the image classification accuracy are not simply proportional. For example, in [39], VGG-13 and ResNet-18 have similar accuracy but differ greatly in computational complexity and number of parameters. There is a similar comparison between SENet-154 and SE-ResNeXt-101 (32 × 4d) in [39]. The main reason for a large number of parameters in the VGG model is that the last three cascaded fully connected layers (FC-4096/FC-4096/FC-1000) [22] bring a huge amount of parameters. For example, the parameters of the first fully connected layer (FC-4096) in VGG16 account for about 83% of the entire model parameters. Unlike VGG models, other models usually only have a small fully connected layer (FC-1000). Deeper convolutional layers improve image classification accuracy rather than more fully connected layers. Therefore, among all the models compared in this paper, although the VGG16 model has the largest number of parameters, it is not the model with the best accuracy, which is consistent with the conclusions of other literature. At the same time, the main reason for the shortest inference time of VGG16 is that the convolution layer is not deep, and the computational complexity is low. In conclusion, the fewer convolutional layers, the shorter the inference time and the lower the image classification accuracy. The more fully connected layers, the larger the number of parameters.

Practical application test results
In practical application tests, the MobileNetv3_Small model with bubble ROI selection and post-processing methods is deployed in spirits production lines. The running period of the classification algorithm is 500 ms. The parameters of the bubble ROI selection method are s = 20, D = [1], A = [π/4], N t = 0.5,k = 5. For the post-processing method, the duration of stage I and stage VI are 2 min and 30 min, respectively. Based on the above parameter settings, resources consumed by the server to run the classification algorithm are shown in table 5.
Ten spirits samples from different production lines are randomly selected for alcohol content test when the distillation stage is switched. Test results are shown in table 6. After applying the classification solution proposed in this paper, spirits switched from stage II to stage III has an average alcohol content of 63.7% VOL with a standard deviation of 0.8% VOL, and has an average alcohol content of 46.1% VOL with a standard deviation of 0.6% VOL when switching from stage II to stage III. Test results in TABLE VI verify that the proposed solution is accurate and stable in practical spirits production applications. One month after the deployment and operation of the spirits quality classification system, the output increased by 7.7%, and the production efficiency increased by 29.0%.

Conclusions
This paper proposes a spirits quality classification solution that combines machine vision technology and expert experience, which can classify spirits at different distillation stages by analyzing the bubble morphology. A spirits quality classification dataset is established and labeled based on the self-designed classification system and actual spirits production videos. A complete spirits quality classification algorithm from preprocessing to post-processing is proposed, and 15 different CNN models are trained and tested. Test results show that the proposed bubble ROI selection and post-processing method can effectively improve the classification accuracy of spirits quality. After using the bubble ROI selection method, the highest spirits quality classification accuracy can reach 98.62%. After adopting the post-processing method, the highest classification accuracy reached 99.82%. Furthermore, the practical application tests show that the solution proposed in this paper can significantly improve spirits' final quality and production efficiency.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://github.com/ MengchiCMC/Chinese-Spirits-Bubble-Datasets.