Application of Swin-Unet for pointer detection and automatic calculation of readings in pointer-type meters

Pointer-type meters are widely used in military, industrial, and aerospace applications. In this paper, we propose a method to automatically calculate the readings of pointer-type meters that exhibits strong robustness to various situations, such as complex backgrounds, tilted meters, blurred images, and meter images with uneven illumination. First, the mask maps of scale and pointers are obtained using the Swin-Unet semantic segmentation network. For the mask map of scale, the Swin Transformer image classification network is used to identify the values of the scale and the coordinates of the centroids of the scale, and for the mask map of the pointer, the least skeleton circle method is proposed to fit the linear equation of the pointer. Second, the influence of the pointer and several adjacent scale on the meter reading is considered, and the weighted angle method (WAM) is proposed to calculate the meter reading. In order to verify the robustness of the algorithm in this paper, the pointer detection method is compared with the traditional pointer detection method, and it is found that the pointer detection method in this paper works better, and the pointer detection algorithm in this paper is applied to different semantic segmentation results, and it is verified that the pointer detection algorithm in this paper can be well adapted to different semantic segmentation results. This paper also compares the proposed algorithm with the existing meter reading calculation methods. The experiments show that using WAM on uncorrected meter images reduces the error by 30% compared with the traditional angle method, and using WAM on corrected meter images reduces the error by about 50%, which finally verifies the effectiveness of the algorithm in this paper.


Introduction
The Industrial Internet has entered a hot flash, and artificial intelligence (AI) is also close behind.The application of AI in the industrial sector has also set off a fresh boom.In a large factory line, most of the equipment inspection is based on instrumentation, which is used to monitor the current operating status of each interface in real time [1].Instrumentation is widely used in chemical testing, aerospace, rail transit, ship transportation, substations and other places [2].Pointer-type meters are numerous in many applications for their ease of reading, wide range of applications, lengthy life cycle, low environmental impact, high accuracy and simple construction.Conventional methods require significant manual effort to obtain the meter readings, and use both regular and irregular field collection, which is not permissible for real-time acquisition of the meter readings [3].The complexity and the many workloads lead to fatigue at work, resulting in inaccurate readings and subjective assumptions.Manual readings are dangerous in some high altitude, high radiation, high temperature and high pressure environments [4].Our research work, which focuses on rapidly and accurately obtaining automated readings from pointer-type meters in a variety of complex environments, is of research interest for practical applications.
AI can be considered a major component of industrial transformation, enabling intelligent machines to autonomously perform tasks such as self-monitoring, interpretation, diagnosis and analysis [5].Different types of networks are used for different types of tasks with convolutional neural networks (CNN) dominating the field of image and video processing [6].CNN have yielded excellent results in areas such as image classification [7], behaviour recognition [8], target detection [9] and semantic segmentation [10].Many researchers have accomplished different types of tasks based on the Transformer self-attention mechanism.Du et al [11] used Transformer to extract high-level semantic features of sign language videos to recognize sign language information.Peer et al [12] provided a method to dynamically resize the Transformer model using the Greedy-layer method.Yang et al [13] proposed a language-based Transformer video question and answer model to encode complex semantics in video clips.
Inspired by Transformer's use in NLP tasks, many researchers are now applying Transformer to computer vision tasks [6].For example, DETR(DEtection TRansformer) [14], iGPT(image Generative Pre-Training) [15], Vit(Vision Transformer) [16], Swin-Transformer [17], etc which can fulfill the tasks of computer vision well.Among them, Swin-Transformer solves the two major problems of large scale and many parameters of Transformer model, and achieves better results in target detection and semantic segmentation tasks [18].He et al [19] proposed the ST-Unet semantic segmentation model to extract contextual feature information from remote sensing images to improve ground object recognition.Gao et al [20] proposed the Variant Swin Transformer network structure, which achieved satisfactory results in the task of detecting defects on the surface of an object.Cao et al [21] used Swin-Unet to accomplish the task of multi-organ and heart segmentation task.
Therefore, based on previous research, this paper proposes a segmentation algorithm based on Swin-Unet semantic segmentation network, which completes the segmentation of meter pointer and scale.The overall flow of the algorithm in this paper is shown in figure 1.The mask maps of meter pointer and scale are obtained simultaneously by Swin-Unet semantic segmentation network, and then the value of each scale is identified based on Swin Transformer feature extraction network.Meanwhile, the least skeleton circle method (LSC) is used to reduce the pixel points in the pointer pixel region to unit pixels and calculate the center of mass of all pixel points in concentric circles to further eliminate the bifurcation of the skeleton generated by the skeleton refinement algorithm.The linear equation of the pointer is fitted using the least squares method.In this paper, the distance between the scales adjacent to the pointer is considered in the reading calculation process and the weighted angle method (WAM) is proposed to complete the meter reading.The dataset used in this paper is derived from the meter images captured by security cameras in real scenarios, and the readings of the pointer meters are calculated in the images of only the dashboard.The circular pointer meter images in this paper include three types, 2.5 MPa, 10 MPa and 25 MPa.The meter image dataset contains complex backgrounds, uniform illumination, blurring, occlusion, and tilting.The algorithm in this paper verifies all the cases contained in the dataset, validates the robustness of the algorithm in this paper, and effectively solves the problem of automatically calculating the meter readings.
The main contributions of this paper are summarized as follows.
(i) A Swin-Unet semantic segmentation network based on a pure Swin Transformer structure to obtain both a mask map of scale and a mask map of pointers.Based on the Swin Transformer feature extraction network, the values of the scales are identified to provide the information underlying the calculation of the meter readings.(ii) The linear equation of the pointer is fitted using the LSC algorithm.The skeleton of the pointer pixels is extracted using the skeleton refinement algorithm, concentric circles of different radii are constructed based on the centroids of this skeleton, the centroids of all pixels in each circle are calculated, the branches of the pointer skeleton are further eliminated, and the straightline equation of the pointer is fitted using the least-squares method on the approximately simplified pixel points.(iii) The influence of several scale adjacent to the pointer on the reading result is considered and a WAM algorithm based on distance weighting is proposed to calculate the meter reading based on the distance between the pointer and each scale.
The paper is organised as follows: section 2 reviews existing research methods; section 3 addresses the proposed approach in detail; section 4 compares the algorithms in this paper with existing methods experimentally; and section 5 is a summary and an outlook for future work.

Related work
Pointer-type meters are used in a wide range of applications and many scholars have begun to investigate how to automatically read the readings of pointer meters in complex environments.Starting from fixing a camera in front of each meter to now using a moving and rotatable device to capture the image of the meter and then the captured image of the meter automatically calculates the meter reading.However, there are still some problems in this process, such as fitting the linear equation of the pointer, identifying the value of the scale, locating the center of rotation of the pointer and calculating the meter reading.In this paper, we review the following aspects: including calibration of the meter image and enhancement of the image quality, detection of information such as the instrument panel, pointer and scale, fitting the linear equation of the pointer and calculating the meter reading.
Meter Image Preprocessing: The movable device can be tilted and rotated when capturing the meter image, and completing the meter image correction can improve the accuracy of the meter reading calculation [22].The calibration method consists of calculating the similarity between the image to be detected and the template image features to complete the meter calibration [23][24][25].The feature-based detection method has limitations when the meter image is subject to natural lighting and the natural environment [26].Therefore, the deep learning based approach accomplishes meter calibration by detecting meter features using the relationship between circles and ellipses [27,28], or the slope of a square meter detection frame [29].High quality meter images can facilitate the extraction of the basic information needed for the reading recognition process, and image quality enhancement is necessary after obtaining the meter images.Image processing-based methods for meter image quality enhancement include methods such as image greyscaling to reduce computational effort [30], threshold segmentation to obtain target regions [31], and histogram equalisation to improve image contrast [32].For the difficulty in obtaining real datasets, researchers have generated virtual meteration data in large quantities by means of deep learning [33,34].
Detection of meter panels, meter pointers and scales: Dashboard images acquired using image acquisition equipment may contain a large number of background pixels and therefore require accurate detection of the gauge dial, pointer and scale values from the acquired image.Initially, methods such as threshold segmentation [35], region growing methods [36], straight line detection [37], ellipse detection [33]and edge detection [29] were used to fit dashboard and pointer straight line equations.These methods are limited in that they require manual setting of specific parameters for each image, and if the image is affected by natural lighting, external occlusions etc the set parameters will lose their value.Lv et al [38] proposed a Multi-Classifier under Feature Engineering to recognize the values of meter scales, and also designed a multilayer kernel regression positioning to improve the accuracy of meter recognition.Zhang et al [39] used Yolov4 to recognize each pointer dial in a water meter and used multi-feature fusion RFB-Net to detect scales and hands in each dial.Hou et al [40] localized the dial position of the meter by Mask R-CNN, identified all the numerical scale regions to determine the center of the dial, and finally extracted the pointer pixel regions by using region growing method.A similar process was used by Wang et al [41].Zhang et al [42] proposed the EF-SSD algorithm to detect pointers in automobile instrument clusters, which improves the accuracy of pointer localization by calculating the distance between fuzzy sets and measuring how much each pixel belongs to the edge of the pointer.Ma et al [43] effectively solved the pointer shading problem using a symmetry-based binarized threshold segmentation method.Compared to manually designed feature extraction methods, deep learning methods can extract deep semantic information from images.
Calculate Meter Readings: The ultimate goal is an accurate calculation of the meter reading.Current methods for obtaining meter readings include the distance method [22], the angle method [44], the template method [45] and the depth regression method [46] to determine meter readings.Wang [47] used different states of the pointer in a linear scanner to calculate the parameters of the pointer translation function and predict the pointer reading based on the response obtained at the center of mass of the pointer's point of light and a given input amplitude.Zhang et al [48] used a bi-directional heterogeneous network to extract instrument image features, constructed a soft depth regression network to predict the rotation angle of the instrument pointer, and finally calculated the reading of the instrument pointer based on the angle method.Zhou et al [28] similarly used the angular method to calculate meter readings.Human vision takes into account other scales around the needle when taking meter readings.Similarly, a computer should consider multiple scales around the needle when calculating meter readings.

Methods
In this paper, both the pointer and the scale of the meter image are segmented, the pointer pixel points are reduced by the image skeleton refinement algorithm to fit the linear equation of the meter pointer, the value and the scale center of the meter scale are determined, and the center of rotation of the pointer is also determined, and finally the meter reading is calculated by WAM.Each step of the reading calculation is described separately in this section.

Scale and pointer segmentation methods
Current approaches to obtain scale and pointer information for calculating meter readings include threshold segmentation, edge detection, semantic segmentation, and object detection.Most threshold segmentation methods and edge detection methods rely on the design of parameters, which cannot be adapted to most of the meter images for natural environments where the background of the meter images varies considerably.Approaches based on semantic segmentation and object detection can detect and identify scale location information.However, the acquisition of pointer and scale is performed in stages.In this paper, based on the shortcomings of the aforementioned studies, semantic segmentation is used to segment the pointer and scale of the meter simultaneously.
Traditional CNNs are unable to extract the global features and long dimensional information of an image with a small number of convolution kernels when computing the image.Therefore, the Swin Transformer improves the computation of global features by adding features within the LAYER+1 layer's neighbouring window to the window computed by the LAYER layer when computing image features [17].The pointer pixel area in the meter image is spread over a longer area of the image, in line with the problem solved by Swin Transformer.This paper uses a pure Swin Transformer block as the Swin-Unet [21] for the feature calculation block, complete with simultaneous segmentation of scale values and pointers.Swin-Unet consists of an encoder, decoder, bottleneck layer and a hopping structure.The Swin Transformer block is used as the depth feature calculation block of the image, patch merging completes the downsampling and patch expanding completes the upsampling, and a symmetrical pure Transformer image feature extraction network is designed with a U-shaped structure.The Swin Transformer block is shown in figure 2, which shows that it consists of a shift window based MSA module followed by a two-layer MLP between the GELU nonlinearities.An LN layer is added before each MSA module and each MLP module and a residual connection is added after each module.
In this paper, Swin-Unet semantic segmentation network is used to obtain the pixel regions of scale and pointer in the meter image.As shown in figure 3, the segmentation results of the meter image are demonstrated.The Swin-Unet semantic segmentation network can completely acquire the pixel regions of the scale and pointer for the meter images captured under different meter types and different environments.

Identify the value of the scale and the method of locating the center of rotation
For the data set used in this paper, it can be found that the position of the scale is located on the circumference of the pointer rotation, and the meter reading can be calculated from the relation between the position of the pointer and the scale.Completing the identification of pointer scale can provide an informative basis for the identification of meter readings.While segmenting the pointer, the segmentation of the scale has been completed, but the specific values and positions of the scale have not yet been determined.In this paper, the Swin Transformer image classification network, built using the Swin Transformer block, performs image classification for each scale image by cropping each scale pixel region into a separate image in a masked image segmented by the Swin-Unet network instrumentation image.A total of 14 scale ranging from 0 to 25 are covered in this paper, with each class being a value of the scale.This paper uses all pixel points of the scale to determine the center point coordinates of the scale.In the Swin-Unet segmented scale images, there will be some pixel in the background, therefore, in order to facilitate the determination of the pixel points of the scale, this paper uses the OTSU threshold segmentation method to adaptively complete the image center positioning of individual scale, without the need to set the threshold parameters artificially, to improve the automatic recognition of the meter readings flexibility.For the pixel points of individual scale, a skeleton refinement algorithm is used to eliminate a portion of the edge pixel points to form a pixel skeleton at the unit level of the scale, and finally determine the center point coordinates of the scale.The results of the centroid determination for each type of scale are shown in figure 4.
This paper uses the center of the ellipse in which the scale is located as the center of rotation of the pointer.The reason for this is that, on the one hand, the ellipse in which the scale is located is used throughout the calculation of the meter reading.If the center of the ellipse is used as the center of rotation of the pointer, the reading calculation error can be reduced to a greater extent by maintaining the same linear relationship between each scale and the center of rotation.On the other hand, the time required to recalculate the center of rotation of the pointer is reduced and the complexity of the algorithm is reduced.In computer vision, ellipses are usually used to represent the distribution of spatial points.Common ellipse fitting methods currently used include the five-point method, the Hough transform and the least squares method [49].The number of scale in the meter image that are detected in the semantic segmentation phase is uncertain.Therefore, after all scale centroids have been located, this paper uses least squares to fit an ellipse equation for the scale positions based on all scale centroids and using the ellipse centroid as the centroid of the meter pointer rotation.
In this paper, a rectangular coordinate system with the upper left corner of the image as the origin is used, which satisfies the right-hand rule, and the right side of the origin is the positive direction of the X-axis, and the lower side of the origin is the positive direction of the Y-axis.Let the general equation of the ellipse be equation ( 1) The process of fitting the elliptic equation is to find the set of parameters (A, B, C, D, E) and let the partition obtain all the scale coordinate points such that the error in equation ( 1) is minimized, then the value of equation ( 2) is minimized After the ellipse equation has been fitted, the coordinates of the center of the ellipse, which are the coordinates of the center of rotation of the pointer (x r , y r ), are calculated by equation ( 3) The final results of this paper for the scale identification of the scale and the coordinates of the center point of the scale, as well as the calculated coordinates of the center of rotation of the computed pointer, are shown in figure 5.

The pointer linear equations were fitted using the method of LSC.
An accurate fit to the linear equation for the meter pointer can provide more favorable information for meter reading recognition [42].Existing methods for pointer recognition start by detecting pointer regions from the dashboard.Methods for detecting dashboard pointers include template methods, threshold segmentation, Hough line detection, edge detection, object detection, and semantic segmentation methods.Finally, the pointer straight line equation is fitted based on the pixel points.The meter pointer in this paper has a thin pointer head, a large pointer root, and a center of rotation region protruding from the pointer edge.The use of the Hough straight line detection method is suitable for pointers with elongated pointer regions and straight pointer edges.Methods such as threshold segmentation and edge detection require artificial settings of some parameters, which are not robust and limited, and do not achieve excellent results for real-time collected images.By using deep learning methods, training a robust model can successfully adapt to the pointer region of complex images.For the semantic segmentation results presented in this paper, a method based on an image skeleton refinement algorithm is proposed, which completes the fitting of the pointer line equations.
In this paper, Swin-Unet is used to segment the completed meter pointer pixel area, and the pixel points are evenly distributed, but the edge of the pointer obtained by segmentation has jagged phenomenon.The LSC algorithm is proposed to solve the problem of segmenting imprecise pointer edges.First, edge pixels are eliminated according to the image skeleton thinning algorithm to refine the pixels in the pointer region to the cell level.Second, concentric circles of different radii are drawn with the centers of all the refined skeleton pixels, and the centers of all pixels in the rings between adjacent circles are computed, and the centers of pixels in all rings form the set of pointer pixels.Finally, the pointer straight line equation is fitted on this set using the least squares method.On the one hand, this method reduces the number of pixels in the pointer region and reduces the effect of segmented inaccurate pointer edge pixels on the pointer line fit.On the other hand, using the centers of pixels in different rings reduces the influence of pixels from the refined pointer skeleton branch on the fitted pointer straight line equation.The positioning meter pointer mainly includes the determination of the pointer linear equation and the determination of the direction of the pointer.where (x c , y c ) is the centroid of the pointer skeleton pixel point, n is the number of pointer skeleton pixel points, and (x i , y i ) is the i pixel point.
For the special case where the pointer just coincides with the scale, since the algorithm extracts the skeleton of pixels, the part that coincides with the scale will be refined to unit pixel values, and the excess scale pixels will be refined into branches of the pointer skeleton, and at the same time, the LSC algorithm eliminates the branches of the pointer skeleton, so this special case will also fit the linear equation of the pointer well.(iv) Fitting the equation of the pointer line using least squares: Let the general equation of the pointer line equation be equation ( 5) The pointer linear equation was fitted using least squares based on (x c , y c ) and all the (x Ci , y Ci ) points.The fitting process is to find the parameter set (A, B) such that the centroids of all detected pixel points in the pointer skeleton circle (x Ci , y Ci ), have the smallest error in equation ( 5), then to minimize equation ( 6) where n is the number of centroids of the pixel points in the circle and (x i , y i ) is the centroid of the i pixel point in the circle (x Ci , y Ci ).
As shown in figure 9, the results of the proposed LSC algorithm in this paper and the least squares method for fitting the pointer straight line equation are compared, where the least squares method fits the pointer straight line equation over all the pointer skeleton pixel points.The LSC algorithm in this paper accurately fits the pointer straight line equation for pointer images with different orientations.However, the least squares method is affected by the pointer skeleton branches, which produces a large error.
To determine the direction of the pointer, proceed as follows:   The determined pointer direction is shown in figure 10.

Reading calculation methods
Based on the basic information obtained from all the previous steps, the calculation of the meter reading is the ultimate goal.In this paper, a WAM is proposed to calculate the meter reading, based on the angle method, considering multiple scale near the pointer.The angle method uses the line between the center of pointer rotation and the tip of the pointer, the line between the center of pointer rotation and the two adjacent scale, and the angle formed by the three straight lines to calculate the meter reading, using equation ( 7) where There may be some error in the calculation of the meter reading using only two neighboring scale.To solve this problem, this paper proposes a WAM based on the distance between the pointer and the four neighboring scale to calculate the meter reading more accurately.Take the four scale adjacent to the left and right of the pointer, which are P l2 , P l1 , P r1 , P r2 , and P l2 < P l1 ⩽ P r1 < P r2 , the pointer is located between P l1 and P r1 .The four scale points correspond to form four sets of data, (P l1 , P r1 ), (P l1 , P r2 ), (P l2 , P r1 ) and (P l2 , P r2 ).Each set of scale is given a different weight from near to far according to its distance from the pointer, the greater the distance, the smaller the weight.The four sets of scale correspond to weights of 0.4, 0.25, 0.25 and 0.1 respectively, and the interpretation  diagram of the WAM is shown in figure 12.The formula for the WAM is as follows: where W = {0.4,0.25, 0.25, 0.1}, There is a certain amount of hyperparameters in the proposed method in this paper.For example, setting the initial parameter values for training the Swin-Unet model.In the LSC algorithm, determining the radius of concentric circles.In the WAM algorithm, determining the weights of the readings produced at different angles, etc requires the initial values to be determined artificially.However, the optimal values given in this paper can be referred to and tuned according to the actual problem to reduce the time consumed in finding the optimal parameters.The algorithm in this paper also has some limitations and can only be used for meter types that have a uniform scale and the scale is on a concentric circle.For a meter with a uniform scale, the angular value of the increase in the rotation of the pointer is theoretically proportional to the increase in the reading result.Therefore, the proposed method of calculating meter readings in this paper has a certain theoretical basis.

Experiment
The experimental data in this paper comes from a real factory with low image acquisition equipment requirements, using a fixed security-style, rotatable camera to capture meter images.A dataset with diversity was acquired in a complex environment and with different camera angles.Most of the images captured by the camera include background images, and to better study the gauge readings, only well-segmented dashboard images from images of real scenes are used in this paper.The gauges in this paper include three types As shown in figure 13, it demonstrates the data acquisition device, the flow of data preprocessing and the key issues to be solved in this paper.In this paper, for the surveillance image captured by the rotatable camera, the dashboard image in the surveillance image is recognized by the target detection algorithm.Mainly for the single image with only dashboard, the scale and pointer pixel regions are recognized by semantic segmentation algorithm, and the pointer fitting and calculation of meter readings are finalized.

Experimental results for scale and pointer semantic segmentation
In this paper, the task of segmenting both scale and pointers is accomplished by the Swin-Unet semantic segmentation network.The epoch is set to 200, the batch size is 36, the  In this paper, the 95% Hausdorff distance (hd95) and Dice coefficient (dice) are used to measure the semantic segmentation results of the pointer and scale.The prediction results for the test set are shown in figure 15 along with the hd95 comparison results and dice comparison results of the real results.The average dice for all test sets was 90.23% and the average hd95 was 7.96.This can be observed in figure 15 the dice for all test sets predicted results close to 90%, with individual images predicted below 75%, and when dice is lower, the corresponding hd95 will be higher.In this paper,  the meter images corresponding to the abnormal points in figure 15 are analyzed and the images corresponding to the abnormal points are shown in figure 16.As can be observed for meter images with No.34, No.35, and No.106, the network structure in this paper corresponds to poor segmentation results for the more blurred meter images, with some confusion between the target and background regions.Meters with an No.35 have a shaded pointer due to the scale being illuminated by oblique light, creating a pointer shadow being split into the true pointer area, resulting in a lower dice.The dice result of 89.24% is better for the meter with a No.51 than but its hd95 is higher.As can be seen from the segmentation results, the scale line near the scale 2.5 is predicted to be the pointer area and resides farther away from the pointer position, thus producing a larger hd95 results.At No.106 meters, the shadows of the natural environment are segmented into the pointer pixel area, resulting in a higher hd95 evaluation index.Based on the above analysis, it shows that the proposed Swin-Unet model in this paper depends heavily on the data used to train the model, and the performance of the algorithm will be affected if the training data can only represent some of the possible scenarios.Therefore, this paper should consider the fact that increasing the diversity of data and improving the generalization of the model should be considered in future research.
This paper sets up the same experimental environment and compares the Swin-Unet semantic segmentation network with other semantic segmentation networks, and the results of the comparison are shown in figure 17.In the presence of different illumination, pointer shadowing, and image blur, the network results in this paper achieve similar segmentation results as other networks.For the Unet network, the edge segmentation is better for the target region, fitting the scale and the pointer regions correctly.However, the pointer appears truncated in the segmentation results of the meter d, due to the presence of a small area of exposure near the tip of the pointer, resulting in a truncated pointer segmentation result.For Pspnet, the segmentation of the pointer and scale regions is also done, but the segmented pointer tips have a large jagged shape.For Deeplabv3+, the pointer edges and pointer regions are well fitted, but in the scale value segmentation, it can be seen that for the marked scale region, the segmentation results are close to the true value, but not close to the marked results, with some bias.From the Swin-Unet segmentation results in this paper, it can be observed that the segmentation of the pointer and scale edges is inaccurate enough, with small jagged edges and

Results of the scale value identification experiment
In this paper, the identification of scale is done by image classification.A total of 14 scale categories are involved for the three types of meters, and the dataset for scale classification is obtained by cropping from the results of meter image segmentation.each category is well differentiated in the confusion matrix results for the test set, but there are some regions where the scale categories are easily confused.In the 2.5 MPa type of meter the scale are easily confused with each other, this occurs because the 0.5, 1.5, and 2.5 scale contain three other scale 0, 1, and 2 respectively, resulting in these six scale interacting with each other to produce the results shown in figure 18.For the 0 scale, a certain amount of misclassification also arises, as it is more similar to the scale 8, and 10.Similar to this are scale 6, and 8, scale 20 and 25, and others.Therefore, this paper can conclude from figure 18   this paper with a fast convergence rate.The generalization accuracy of Top-1 Acc on the validation set reached 83.91% at the 8th epoch, and then gradually converged to eventually approach the average accuracy of all networks.It was verified that for the scale value classification problem, an image classification network built with a pure Swin Transform block could also reach the average.

Results of the pointer fitting experiment
Pointer detection is essential for meter reading recognition [42] pointer recognition methods mainly include silhouette methods, which compare the image to be detected with a template image to obtain the position of the meter pointer with respect to the initial pointer.Hough linear detection method, where the meter image is segmented by threshold segmentation and edge detection, and then the pointer straight line equation is fitted by Hough straight line fitting.Object detection methods, by means of an object detection network, detect the area where the meter pointer is located and determine the meter pointer straight line equation based on the diagonal of the pointer attached to the rectangular box.Semantic segmentation method, where the pointer region is obtained by a semantic segmentation network and then the pointer linear equation is fitted based on the pixels.In order to verify the effectiveness of the proposed LSC algorithm, it is compared with the existing pointer-line equation fitting methods, and the comparison results are shown in figure 20.
As shown in figure 20, different pointer fitting methods are used, including Hough line detection, target detection, and threshold segmentation methods.By comparing the results, it can be found that the proposed algorithm has high robustness and fits the pointer linear equations of various meter images well.
In the experiments on Hough linear detection, the parameters of threshold segmentation and Hough linear detection are set manually in this paper.It can be found that with fixed parameters, the Hough linear detection method works well for elongated pointers, such as No.f.For pointers with wide pixel areas and short edge lines, the fit is not ideal and multiple fit results are possible.Meanwhile, methods for Hough linear detection are limited by the results of image binarization.No.g due to the different reflection phenomena produced by the light in the image regions, the set threshold segmentation parameter is invalidated.For No.e and No.k, because there are interference factors such as acquisition time and pointer shadow in the background image of the meter, multiple lines are fitted in the image, which causes errors in the final straight line fitting results.Therefore, the Hough linear detection method is highly limited and less robust, which is not suitable for real-world scenarios.
In the experiment on target detection, the tip of the pointer was examined.It can be observed that the deep learning method is robust and the object pointer is well detected for images in different environments.However, due to the different sizes of the rectangular boxes for object detection, the resulting diagonal will be offset, and considering extreme cases, when the pointer is parallel or perpendicular to the horizontal axis, the diagonal of the object detection box will not be the best pointer straight line equation, such as No.j and No.k.In the threshold segmentation experiment, although the linear equation of the pointer of the meter is fitted accurately, it is necessary to find the best segmentation threshold for each image to further obtain the pointer pixel region more accurately.This method is limited and difficult to be applied in industrial production.
In the experiments based on semantic segmentation, the least squares method is used to fit all pixels of the pointer to complete the pointer line equation fit, which has no additional parameters to set and is more robust than alternative methods.However, if all pixels are fitted, the accuracy of the pointer-line equation will be determined by the accuracy of the semantic segmentation network results.Moreover, due to the large number of pointer pixels, fitting with all pixels incurs a certain amount of time consumption.Therefore, the proposed LSC algorithm eliminates the jaggedness of the pointer segmentation result, reduces the number of pixels to be computed, and eliminates the pixel branch of the pointer skeleton.It can be observed that the algorithm in this paper accurately completes the pointer fitting of the gage images for each case.
In order to verify the effectiveness of the LSC pointer detection algorithm, the LSC algorithm is applied to different semantic segmentation networks, and the comparison results are shown in figure 21.The experiments show that the LSC algorithm in this paper is robust and can still fit the linear equation of the pointer well for the Deeplabv3+ and Swin-Unet networks with poor segmentation.The linear equation of the pointer can also be fitted well for the pixel region of the pointer with defective segmentation.For example, the image No.d, the result of the Unet network segmentation, still fits the linear equation of the pointer well despite having a broken pointer region.Therefore, when deployed in real-time application tasks, or and resource-limited devices, a suitable semantic segmentation network is selected, and the LSC algorithm is used to fit the semantic segmentation results of the pointer.

Comparative experiment to calculate meter readings
In this paper, considering the influence of multiple scale adjacent to the pointer on the reading, a WAM algorithm is proposed to calculate the meter reading by using the angle between the pointer straight line equation and the adjacent scale value.In order to evaluate the accuracy of the reading results, this paper uses the relative error δ and quoted error η of the meter reading as evaluation indicators, and the calculation formula is shown in equation ( 9) where V true indicates the true value read manually, V test indicates the test value read by the algorithm and V max indicates the maximum range of the meter.
In the manual reading, the estimated reading is controlled at 1/5 spacing of the minimum scale according to the pressure gauge reading specification.The WAM algorithm of this paper and the traditional angle method were applied to the meter images of figure 17, and the relative and quoted errors were compared with the manual readings, and the comparison results are shown in table 2. The results show that for the No.b and No.c meter images, the error of this paper's algorithm is larger than that of the traditional angle method, and the analysis shows that the No.b and No.c meter images have an angular tilt during image acquisition.In the tilt direction, the unit angle of the scale near the tip of the pointer tends to change less and the unit angle of the scale far from the pointer changes more, while the traditional angle method only considers two adjacent scales, however, this paper considers multiple adjacent scales, therefore, the traditional angle method obtains lower reading error.Overall, after using the WAM algorithm, the calculated meter reading is closer to the true value of the meter, and the relative and quoted errors are reduced by about 35% compared with the traditional algorithm.
In order to verify the robustness of the algorithm in this paper, it is applied to different semantic segmentation networks, and the results of the readings of the algorithm in this paper are compared with those of the manual readings.The comparison results are shown in table 3, and it is obvious that the text algorithm can adapt well to the meter segmentation results of different semantic segmentation networks and obtain a small meter reading error.
In order to verify the performance of this paper's algorithm when processing low quality images, this paper compares the readout results of three low quality images, blurred, exposed and dark light.The low quality images are shown in figure 22 and the results of reading comparison are shown in table 4. The algorithm in this paper targets low-quality images, although the performance of the algorithm is degraded, and the relative and quoted errors of the meter readings vary around 10%.However, the model in this paper targets most of the normal images and the calculated readings have high accuracy.
To verify the soundness of the proposed algorithm, this paper compares similar algorithms for meter readings.Ma and Jiang [43] used a modified random sampling consensus algorithm to detect the meter pointer and eliminate pointer shadowing, followed by calculating the center of rotation by fitting a circular region of pointer rotation and finally calculating the meter reading using the angle between the 0 scale value and the pointer.Gao et al [50] designed a HOG multiclassification SVM numerical classifier that detected the gauge scale values, fitted the gauge pointer using the Progressive Probabilistic Hough Transform algorithm, and  finally calculated the automotive dashboard readings using the angle between the scale values adjacent to the pointer.Ji et al [4] used a threshold segmentation method to obtain the pixel values in the root region of the pointer, and then determined the linear equation of the pointer by fitting an external ellipse to the region.However, this method needs to calculate the optimal segmentation threshold for each image when detecting the pointer, which is problematic in practical applications.
In this paper, we fit the linear equation of the pointer based on the LSC algorithm, which does not require additional parameters and has better automation performance.This paper also uses the meter image correction algorithm proposed by Ji et al combined with the meter reading algorithm in this paper to calculate the readings of the corrected meter images.The above method is compared with the method of this paper on the meter image in figure 17.
As shown in table 5, the reading errors of different algorithms are compared.the algorithm proposed by Ma et al performs the reading calculation based on the angle between the 0 scale and the pointer, where the pointer can rely on less information and the angle between the 0 scale and the pointer is larger, resulting in a larger error in the calculated meter reading.For example, the results of No.e, No.f and No.g.Gao et al used the angle between two scales adjacent to the pointer to reduce the angle between the pointer and the scale to obtain more accurate reading results.However, the time consumption for calculating each image is unstable, resulting in a higher final time consumption.Ji et al calculated the meter readings on the corrected meter images and also considered the effect of the pointer-adjacent scales on the reading results, resulting in better reading results than the uncorrected meter images.For the pointer fitting method proposed in this paper, firstly, the LSC algorithm and the WAM algorithm are used on the uncorrected meter image, and the results obtained are better than those of Ma et al and Gao et al with an error reduction of about 50%.Secondly, using the LSC algorithm and WAM algorithm on the corrected meter image, the reading error is reduced by about 40% compared to the previous results.Therefore, when calculating the meter readings, several scales adjacent to the pointer should be considered and the meter images collected in the natural environment should be corrected, and more accurate readings will be obtained after the final calculation of the readings.
As shown in table 6, the time complexity of the algorithm in this paper and other algorithms are compared.With the guaranteed accuracy of the readings, the time consumption of the algorithm in this paper is high and can be applied in environments with less stringent timeliness requirements.For the algorithm of Gao et al the time consumption is not uniform because the time consumption varies with the number of straight lines detected by the Hough line fitting algorithm for different meter images.
As shown in table 7, this paper also gives the time consumed by each process when calculating the meter readings.On the one hand, more time is consumed in the segmentation process of pointers and scales.On the other hand, the proposed method has multiple computational steps including semantic segmentation, pointer fitting and reading calculation.Therefore, in the subsequent research, reducing the complexity of the semantic segmentation algorithm is considered and, at the same time, the complex meter reading calculation process is considered for further optimization.

Conclusion
In this paper, based on the Swin-Unet semantic segmentation network, both the scale and the pointer pixel region are obtained.The numerical value of the scale is recognized by the Swin Transformer image classification network and the center point coordinates of the scale are determined.The LSC algorithm is used to fit the pointer straight line equation.
Considering the influence of multiple scales on the reading of the meter, it is proposed to use the WAM algorithm to calculate the reading, and finally obtain accurate results, which verifies that when calculating the meter reading, multiple scales adjacent to the pointer need to be considered, which will improve the accuracy of the results.The LSC pointer detection algorithm in this paper is more robust than the traditional method, and at the same time, it can accurately fit the linear equation of the pointer in different semantic segmentation results.Based on the results of manual readings, the WAM algorithm reduces the result error by 25% compared to the traditional angle method.Compared to existing methods, readings using uncorrected meter images resulted in a 30% reduction in resulting error and readings using corrected gauge images resulted in a 50% reduction in resulting errors.The pointer meter reading algorithm proposed in this paper requires fewer parameters to be set artificially, and has high application value in the actual environment.However, this paper also has certain shortcomings, and the overall algorithm time consumption is long.In future studies, this article will continue to optimize the model and calculate the readings of the meter in a relatively short time.The algorithm in this paper applies to the calculation of single-pointer circular meter readings, and the reading calculations of multi-pointer meters will be considered in subsequent work.The proposed algorithm in this paper is still in the theoretical research stage, and some real-world variables may be involved in the future as we move over from theoretical research to practical applications.For example, for the problem that the performance of the algorithm for low-quality images has been degraded, it is considered to improve the generalization performance of the algorithm by increasing the diversity of the dataset.For the problem of the complexity of the meter reading calculation process, only the problem of specific meter types can be solved, etc.Consider whether a more automated method can be designed to regress meter readings directly from meter images in subsequent research.

Figure 1 .
Figure 1.Flow chart for automatic reading of pointer-type meter.

Figure 3 .
Figure 3. Segmentation results for different types of meters in the Swin-Unet network.

Figure 4 .
Figure 4. Results of the center coordinate localization for different values of the scale.

Figure 5 .
Figure 5. Recognition of scale and display of pointer rotation center results.

Figure 6 .
Figure 6.Gets the result of the pointer pixel area.

Figure 8 .
Figure 8.A schematic of the center points of all pixels in different rings.

Figure 9 .
Figure 9. Interpretation plots of the results of the pointer line equations fitted by different methods.
indicates the final reading of the meter, P l , P r are the values of the scale values adjacent to the left and right of the pointer, α is the angle formed by the line connecting the center of rotation and the tip of the pointer to the line connecting the center of rotation and the smallest of the scale values P l , P r , and β is the angle formed by the line connecting the center of rotation and the two adjacent scale values.A schematic diagram of the angle method is shown in figure 11.

Figure 11 .
Figure 11.Diagram of the angle method.

Figure 13 .
Figure 13.Schematic diagram of data collection and issues to be addressed.
, 2.5 MPa, 10 MPa and 25 MPa, each containing one pointer and six scale.The data set consists of 770 meter images with an image size of 224 × 224 × 3. 615 in the training set and 155 in the test set.The tiny model provided by Swin-Unet with input size 224 × 224 × 3 in this paper.In each step of calculating the gauge readings in this paper, a diverse selection of gauge images are compared with existing methods to verify the effectiveness of the algorithms in this paper.Image capture equipment: fixed security-style, rotatable cameras are Image labeling tool: labelme.Hardware and software environment: one NVIDIA A40 graphics card, 64-bit Windows 10 operating system, 8G RAM, Intel(R) Core(TM) i7-7500U CPU@2.70GHz,AMD Radeon(TM) 530 (2048 MB ), Python 3.8.1,Pytorch 1.11.0 and OpenCV 4.2.0.

Figure 14 .
Figure 14.Transformation trend of loss values in the Swin-Unet training set.

Figure 15 .
Figure 15.Comparison of hd95 and dice results for each image in the test set.

Figure 16 .
Figure 16.The meter image and segmentation results corresponding to the anomalies in figure 15.

Figure 17 .
Figure 17.Segmentation results for scales and pointers of different semantic segmentation networks.

Figure 18 .
Figure 18.Plot of the resulting confusion matrix produced by the test set in a scale value classification network.

Figure 19 .
Figure 19.Top-1 Acc comparison results of test sets for scale value classification on different image classification networks.
that in scale classification experiments, if a meter contains scales from another class of meters, this may lead to a decrease in the accuracy of scale identification.Using the same experimental environment and experimental parameters, the Swin Transform used in this paper is compared with EfficientNet, RegNet, ResNet, VggNet, and Vit image classification networks, with the network model using the minimal network structure.The Top-1 Acc results for each network on the validation set are shown in figure19.The EfficientNet network was more effective overall, achieving 95.62% accuracy on the test set using the EfficientNet model, the VggNet network was less effective, achieving 89.77% accuracy on the test set, and the Swin Transform network achieved 91.09% accuracy on the test set, close to the all network the average accuracy of all networks.Compared with other network models, the Swin Transform network used in

Figure 20 .
Figure 20.Comparison of the results of the pointer linear equation fit.

Figure 21 .
Figure 21.Comparison results of pointer fitting of the proposed algorithm in different semantic segmentation results.

Table 1 .
Comparison table of evaluation metrics for different semantic segmentation setworks.
The scale images were reset to a size of 224 × 224 × 3.In order to produce balanced samples for each scale category, a total of 13 562 scale classification datasets are constructed in this paper by cropping the scale images from different semantic segmentation results.During training, the dataset is divided into a training set of 10 844, a test set of 1369, and a validation set of 1349, with epoch set to 100, optimizer set to AdamW, and batch size set to32.This paper uses the scale classification model that performed best on the validation set and validates the test set, obtaining results of 91.09% for Top-1 Acc, 98.61% for Top-5 Acc and 90.92% for Mean Precision.As show in figure18,

Table 2 .
Comparison of manual readings with the results of the angular method and WAM algorithm readings.

Table 3 .
Comparison results of the readings of this paper's algorithm applied to different semantic segmentation methods.

Table 4 .
Comparative results of low quality image readings.

Table 5 .
Comparison results of reading errors between the algorithm in this paper and other methods (C+WAM: WAM algorithm is used on the corrected meter image).

Table 6 .
Results of the comparison of the time consumed by the algorithm in this paper and other methods (C+WAM: WAM algorithm is used on the corrected meter image).

Table 7 .
The meter readings calculate the time consumption for each process.