An Improved Convolutional Neural Network for Particle Image Velocimetry

With the wide application of Particle Image Velocimetry (PIV) technology in various engineering and research fields, the requirements for the accuracy, computational efficiency, and robustness of PIV algorithms are increasing. Although traditional algorithms have wide applicability, they suffer from low accuracy, large computational cost, and poor robustness. Recently, deep learning algorithms have provided new solutions, especially, convolutional neural networks with different structures, which have achieved good performance on synthetic PIV datasets. This paper proposes a structural improvement scheme for PIV convolutional neural network models. Experiments verify that the proposed method can significantly optimize the performance of the model on synthetic PIV datasets, providing a novel approach for improving other convolutional neural networks for PIV analysis.


Introduction
Particle Image Velocimetry (PIV) is a non-contact measurement technique that has been extensively applied in explosion analysis, combustion analysis, fluid dynamics analysis in aerospace and marine engineering, biofluid analysis, industrial process measurement, etc [1][2][3].The principle of PIV is first to add tracer particles in the fluid or utilize the self-luminous characteristics of the fluid, then use a pulsed laser to illuminate the particles within the volume and capture images at different times, and finally analyze the image pairs.The core of PIV algorithms is the matching of similar regions or features between images.Flow characteristics of the images can be obtained by calculating the displacement of particles [4].Traditional PIV algorithms rely heavily on statistical feature matching of regions or brightness constancy assumptions, which limits the accuracy due to method constraints and noise.These algorithms are computationally expensive and lack robustness [5][6][7][8].In recent years, deep learning methods, especially supervised learning methods, can automatically learn to estimate optical flow from data [9][10][11].These methods can achieve high-precision, fast, and robust optical flow estimation, and can be modified and applied to PIV analysis with superior performance over traditional PIV algorithms [12][13][14].
Our work focuses on developing effective architecture to optimize the performance of existing neural network models for optical flow tasks and applying them to PIV analysis.This paper introduces a novel neural network architecture named PIV-NetS-Plus and taps the potential of feature representations of the model itself, thereby improving its performance on PIV data.The content of the rest is arranged as follows: Section 2 introduces other works related to our work.Section 3 describes the methods used in this work to improve model performance.Section 4 presents the performance of the improved network structure introduced in this paper on synthetic PIV datasets and validates the efficacy of each introduced module through ablation experiments.Section 5 concludes the full paper.

Traditional PIV algorithms
Traditional PIV algorithms can be mainly divided into cross-correlation and optical flow algorithms.Cross-correlation algorithms [15] use the pixel information of corresponding regions in two images of the flow field to measure the relative displacement relationship, focusing on the matching of corresponding patterns or regions in image pairs.This algorithm first divides the two images into subregions of specified size, i.e., interrogation windows.The correlation coefficient is calculated and used to measure the correlation between different sub-regions in the two images, and thereby obtain the maximum correlated displacement.In [8], an iterative multi-grid algorithm with deformable windows (WIDIM) was proposed for cross-correlation, which first calculated the offsets in larger windows and then iteratively computed them in smaller windows with pre-offsets in the obtained vector field [16].This can improve the accuracy when displacement scales vary a lot.In general, cross-correlation algorithms compute in windows and obtain sparse average displacement information with large overlaps between windows, resulting in large computational costs.Optical flow algorithms are mostly variational optical flow algorithms.Variational optical flow estimation casts the problem as minimizing an energy functional.Horn and Schunck proposed to obtain the displacement value at each pixel by minimizing the global energy function in 1981 [6].This global energy function combines a term for data based on the brightness constancy assumption and a term for smoothness.By optimizing the global loss function, dense displacement information can be obtained, but the results are over-smoothed and lose edge information.Much subsequent research has extended the energy equations and achieved better performance [17].These traditional methods have been widely used in PIV research and are applicable to most scenarios.However, traditional optical flow algorithms are susceptible to noise interference and have limited measurement accuracy and poorer robustness.

PIV analysis based on Deep Learning
Benefiting from the inherent position encoding, convolutional neural networks (CNN) have achieved success in optical flow estimation, providing many fast and accurate end-to-end neural network models for optical flow estimation purposes.State-of-the-art deep-learning optical flow algorithms often adopt cost volume modules and coarse-to-fine pyramid frameworks to obtain better model performance [9].Cost volume introduces prior knowledge and constrains the solution space.However, our work focuses on the structure of learning the knowledge by itself, which means more potential is needed to fit the real condition.Pyramid structure decomposes the task and achieves better utilization of multi-scale information for improving the model performance, thus it is widely adopted.Since the principle of PIV algorithms is consistent with that of optical flow algorithms, deep learning optical flow algorithms can be modified and applied to PIV analysis after modification.Based on the early proposed FlowNetS optical flow model [9], in [12], it was first adapted to PIV analysis.This end-to-end model can generate dense displacement fields.In [12] the synthetic PIV data were also created for supervised learning, which are now used as benchmark datasets in most PIV neural network studies.PIV-LiteFlowNet [13] adapted the LiteFlowNet [10] optical flow deep learning model to PIV by introducing cost volume and pyramid structure, improving model performance on synthetic datasets.However, real PIV data is often difficult or costly to obtain, and synthetic data often fails to fully reflect real situations.Supervised learning with synthetic data tends to lead to poor model generalization.Based on the unsupervised learning method proposed for optical flow estimation in [18], an unsupervised learning model Un-LiteFlowNet was proposed for PIV in [14].But the performance is limited due to lack of sufficient supervision.In optical flow estimation, in [19], a semi-supervised optical flow model combining supervised and unsupervised learning was proposed, which may become a future research direction for PIV by integrating the advantages of both.Since the FlowNetS network has a large model size, its solution space is large as well.FlowNetS uses a structure including downsampling, upsampling, and skip connections, which can make full use of lowlevel information as well as high-level information.But previous research did not fully tap the potential of its representations.The architecture of FlowNetS is shown in Figure 1 and Figure 2. In the downsampling part, the network gradually reduces resolution by using convolutional layers, while increasing the number of feature channels, so that features from the two images can be extracted and matched, resulting in a final feature scale of 1/64 of the original resolution and 1024 channels.The upsampling part is also the refinement part.It adopts the coarse-to-fine structure and directly predicts the staged optical flow output using the distilled features, upsampled previous flow prediction (optional), and low-level features from downsampling passed through skip connections.The final output after bilinear upsampling is used as the prediction of the ground truth.
The convolutional neural network proposed by us, named PIV-NetS-Plus, is based on the FlowNetS-BN (FlowNetS with batch normalization) network structure from the latest version for optical flow estimation, which adds one more layer conv6_1 and one more output flow predicted from conv6_1.Through appropriate structural modifications and the introduction of submodules, we transfer it into PIV analysis, tap into the model's latent representational capacity and obtain encouraging performance on synthetic PIV data.The modifications are mainly in the following two aspects: • Convex Upsampling: Since the predicted optical flows from the original network output are at the 1/64, 1/32, 1/16, 1/8, and 1/4 of the original resolution, the convex upsampling [11] method is adopted to learn the convex combination relationship between missing flow values and nearby flow values from the final feature layers, and upsample the output flow based on corresponding convex operations, as shown in Figure 3.This method is applied to all five places where transposed convolution is used before modification or outputting the final flow.Compared to transposed convolution, bilinear upsampling [9], and de-convolution [12] methods, it alleviates information loss and reduces the redundant computation during upsampling and makes full use of surrounding information at the position to be solved, and tends to have better performance.• Squeeze-and-Excitation Block: Squeeze-and-excitation block was proposed in [20] to learn the relationship between channels of the features and focus more on important feature channels, which play the role of attention mechanisms in CNN models and effectively boost model performance.In principle, the squeeze part obtains a global compressed feature tensor for each feature map channel by using global average pooling on each feature map channel.The excitation part adopts two fully connected layers for generating a weight for each channel in the feature map.After re-weighting each channel with the generated weight, the weighted feature map is then used as input to the next module, as shown in Figure 4.This block is adopted for leading the focus of layers on important features after every residual block of the downsampling part and every de-convolution block.•

Datasets and assessment indicators
The synthetic PIV dataset used in our work is from [12].The generation process is that: synthetic example images and flow motion patterns are first generated, then the coordinates of particles are modified according to the flow motion.According to [12], particles can be described through a Gaussian function in two dimensions.The flow motion patterns in this dataset are from CFD simulations implemented in [12] and collected from online sources, as shown in Table 1.The generated particle images use different seeding densities, particle diameters, and peak intensities.More details about this dataset can be found in [12].The evaluation uses averaged endpoint error, abbreviated as AEE, between the model output result and the ground truth as the metric [14].[13], which stands for window deformation iterative multi-grid method, HS [13], which stands for Horn-Schunck optical flow algorithm, PIV-Net-noRef [12], PIV-NetS [12], PIV-LiteFlowNet [13], PIV-LiteFlowNet-en [13] and UnLiteFlowNet-PIV [14], which are introduced in Section 2. The model performance is evaluated on the Backstep, Cylinder, JHTDB Channel, DNS Turbulence, and SQG test datasets.For easier comparison, the unit of error is pixel per 100 pixels for Table 2 and Table 3.
Compared to existing models, the proposed PIV-NetS-Plus (ours) model achieves superior performance, surpassing current deep-learning PIV models on most datasets.The results are shown in Table 2. Figure 5 visualizes the result of the PIV-NetS-Plus on the cylinder test dataset.Judging from the results, using each component independently can achieve better performance on all test datasets.However, convex upsampling improves the performance more on complex datasets, JHTDB-channel, and DNS turbulence and SQG datasets, and squeeze-and-excitation block gains better performance on simpler datasets, Back-step, Cylinder datasets, which means that the former component focuses on small details and the latter one focuses on the large direction.From this perspective, the functions of the two components are compatible and jointly boost the overall model performance.

Conclusion
A new PIV neural network is proposed, which is based on improving the structure and adopting new components.Specifically, convex upsampling is adopted to handle micro-level details within the output result and a squeeze-and-excitation block is introduced to adjust the macro-level focus of the model and provide guidance for the overall optimization of the model.In this condition, the two components work synergistically to optimize the model's performance and preserve its learning capacity.Moreover, the model's representation potential is fully utilized, resulting in impressive performance even when processing complex synthetic PIV data.Besides, our work provides a novel path for designing PIV optical flow estimation neural network models in future research.

Figure 5 .
Figure 5. Result of the cylinder test dataset

Table 1 .
[12]hetic PIV Dataset in[12]In this work, our neural network model is trained with Adam optimizer, and the learning rate is set to 0.001.Reduce-on-plateau learning rate strategy is applied, reducing the learning rate to 1/5 of the original when the evaluation metric does not decrease for 10 epochs.We use 200 training epochs on the training datasets of all categories.The batch size is set to 8. The weights for different predicted flows are set to [0.00125, 0.005, 0.01, 0.02, 0.08, 0.32].The grey value of PIV image pairs is normalized to [0, 1] before inputting the network.Algorithms and neural networks for comparison are WIDIM

Table 3
shows the results of the ablation experiments.In the table, C. stands for the model with convex upsampling, S.E.stands for the model with squeeze-andexcitation block, w/o C. or S.E.stands for the model without neither, and C. + S.E.means the model with both.

Table 3 .
Averaged Endpoint Error Provided from Models with Different Modifications (Unit: pixel/100)