A robust pairing method for two-pulse particle tracking velocimetry based on coherent point drift

Particle tracking velocity has reached a high level of maturity in time-resolved measurements since the introduction and development of the Shake-The-Box algorithm. The effectiveness of this approach lies, in part, in its ability to exploit the temporal coherence of particle trajectories to reject the ghost particles while increasing the density of true particles. However, certain situations may prevent time-resolved measurements. In those cases, a Two-Pulse configuration is often the only option. This raises a challenge with regard to the capacity in separating the ghost from the true particles due to the lack of long-term trajectories. This article proposes a new approach to solve this problem using the coherent point drift (CPD) method. This method identifies a spatially coherent deformation field that models the transformation between two correlated sets of points. In the context of particle tracking velocimetry, the imposed spatial coherence of this calculation is believed to act in the same way as the temporal coherence that made Shake-The-Box successful. The CPD is governed by three parameters whose optimal values have been evaluated in the present contribution. These values were found to be weakly sensitive to the characteristics of the flow under study, ensuring that this method is robust without further tuning of the parameters. The method is then compared with the Two-Pulse implementation of Shake-The-Box (2P-STB) available in Davis 10.2. For this purpose, sets of realistic images were generated at two successive times for different configurations based of synthetically generated turbulent flows. The Iterative-Particle-Reconstruction in Davis 10.2 was then used to extract the list of particles to be processed by CPD. The comparison shows a better recall with 2P-STB than CPD, especially for large time intervals between frames, but an overall better rejection of ghost particles by CPD than 2P-STB, which was the expected benefit of this method.


Introduction
Turbulent flows, characterized by their chaotic and complex nature, challenge all aspects of fluid mechanics research, including modeling, numerical simulation, and experimental characterization.However, the former two aspects are on a different level than the latter, since both modeling and simulating require precise measurements for verification.The range of techniques that can be utilized for such measurements is too vast to be summarize in this paper.Instead, this article focuses on particle tracking velocimetry (PTV), which has benefited from major improvements in the last decade [1,2] and has become the reference for time-resolved threedimensional velocity measurements.These improvements are partly in the domain of particle detection in the images starting from the volumic self-calibration [3] which allows for accurate detection for instance with the iterative particle reconstruction (IPR) [4,5], the use of stochastic reconstructions [6,7], or by the introduction of advanced camera model correction [8].Arguably, the greatest improvement has been achieved with the Shake-The-Box method [9,10], which involves an iterative coupling between particle detection from the image and particle trajectories reconstruction in time.However, the original formulation of this method is only appropriate for timeresolved measurements, which are not feasible in air at moderate speeds due to the limited energy available from laser pulses at high repetition rates and the small diameter of the seeding particles.
Moderate to high-speed conditions are prevalent in aerodynamic applications, and strategies have been devised to address them using dual cavity lasers for two-pulse measurements by extending existing two-dimensional methods.The tomographic particle image velocimetry techniques (Tomo-PIV), which rely on algebraic volume reconstruction [11][12][13], are probably the most mature to date.The advantage of Tomo-PIV results is that they are calculated on a Cartesian grid, allowing for easy processing of data and physical interpretation.Nonetheless, tomographic reconstruction is typically time intensive and can suffer from significant errors due to ghost particle reconstitution if the particle density is too high [14].The spatial resolution is a further concern with Tomo-PIV, as the resulting velocity vectors suffer from spatial averaging due to the correlation calculation process.
These difficulties prompted the community to also develop PTV methods in an effort to suggest superior non-timeresolved methods.Provided that the IPR is able to accurately localize particles in space from images, the real challenge of PTV lies in pairing particles between two successive instants to build tracks.Dabiri and Pecora [1] categorized and reviewed the numerous approaches that enable matching particles from one frame to the next.From this list of techniques, the authors of the present work have identified four main categories that encompass most of the existing techniques, although they may not be exhaustive.First, there are methods that use a predictor to estimate the displacement of the particles either from a tomo-PIV [15], or a correlation optimized for discrete input such as in the two-pulse version of the Shake-The-Box (2P-STB) [16].Second, another class of methods involves matching particles based on the stability over time of the spatial patterns they create with neighboring particles.A recent example of this class of methods is introduced by Lin et al [17].The third category is heterogeneous because it includes methods based on machine learning, and especially neural networks, trained to solve the pairing problem.The work of Zhou et al [18] shows that it is possible to incorporate complex physics into the models, with promising outcomes.The fourth category consists in exploiting the spatial coherence of the velocity field to optimally advect the particles, and may overlap with the third category.For example, works initiated by Labonté and later developed by Ohmi on the use of self-organizing maps [19][20][21] or genetic algorithms [22] can make use of machine learning techniques, with an underlying constraint on the smoothness of the displacement field.
The method proposed in the present contribution belongs to this last category, which in a sense, can be seen as the spatial counterpart to time-resolved methods such as Shake-The-Box.The pairs are here identified by a constraint on the smoothness of the displacement fields at a given time, rather than on the smoothness of a trajectory.More generally, the techniques in this class fall within the domain of Point Set Registration, a well-established research area in the field of computer vision that Zhu et al [23] recently reviewed.
In the context of fluid mechanics, Stellmacher and Obermayer [24] applied a method based on the deterministic annealing [25] for PTV.This methods allows to solve the particle pairing problem iteratively by computing the optimal particle displacement field.In this framework, though, the displacements are yet limited to translation, rotation, and shear.Myronenko and Song [26] later developed a new approach following the same paradigm, but for non-rigid deformations.This is the coherent point drift (CPD) algorithm that continues to be extensively utilized, albeit with minor adjustments to enhance its performance.In addition to its ability to handle non-rigid deformation, the CPD was formulated to be robust to the presence of outliers in the input data, for instance ghost particles detected in the images in the context of PTV.The aim of this study is to assess the potential of this promising method under realistic conditions, including turbulence, localization errors, and ghost particles, and to compare its effectiveness with that of the commercial version of the 2P-STB implemented in DaVis 10.2.
The principle and the adaptation of the CPD for the present application is presented first, followed by an identification of the optimal controlling parameters of the method.A practical example utilizing synthetic images of particles in a turbulent flow is then described and serves as a basis for the evaluation of the performance of the method.

Overview of the method
The CPD algorithm is designed to align two point sets whose shapes are coherently related to each other.The two point sets consist of two lists of M and N spatial coordinates expressed in a reference frame of dimension D and noted By convention, the M points with coordinates Y are the source points, and the N points with coordinates X are the target.
The goal of CPD is to find the transformation T (Y, θ) that maps the points from the source to the one of the target such that the likelihood of correspondence between the points of the two sets is maximized.The sketch in figure 1 aims to illustrate this principles in an intuitive way, and includes the iterative process described below.
This transformation T (Y, θ) is defined by a linear combination of Gaussian supported by the source points Y, of variance β 2 , which is an input parameter of the model, and weighted by the coefficients θ ∈ R M×D , following: where G ∈ R M×M is a Gaussian Kernel given by The coefficients θ are the parameters that have to be determined.
Myronenko et al [26,27] propose to solve this problem by maximizing the likelihood function that represent the quality of the match between the transformed source points T (Y, θ) and the target points X.For practical reasons, the optimization aims to minimize the negative log-likelihood function where the parameter λ controls the trade-off between the data fitting, and the spatial coherence of the deformation Gθ.The term p(x n ), is a Gaussian mixture density with the centroid at position X and a uniform covariance σ 2 , to which a uniform distribution weighted by w (0 ⩽ w < 1) is added to account for noise and outliers.The optimal value for w should, in principle, increase with the number of outliers.In this notation, G(m, •) stands for the row m of the matrix G.
The negative log-likelihood is a function of two variables; the coefficients θ, and the covariance σ 2 which is related to the distance between the transformed source points and the nearest target points.The scheme developed by Myronenko et al [26,27] to solve this minimization problem for θ and σ 2 consists of an Expectation-Maximization (EM) algorithm.It is an iterative method that converges until a criterion is met, meaning that a pair (θ, σ 2 ) that are about to minimize E is found.It can be initialized with θ = 0, and σ 2 as the mean square distance between all points of the two sets ( Myronenko and Song [26] describe in detail the algebraic developments leading to the formulation of the E-step and the M-step as follows: • E-step: fill the matrix P ∈ R M×N with a normalized probability density of match between a point x n the transformed point T m = y m + G(m, •)θ given by with, • M-step: solve for θ ( G + λσ 2 diag (P 1) and update the value of σ 2 with N p = 1 T P1 being the sum of the element of P, and diag(v) is a diagonal matrix filled with the component of the vector v.
The vector 1 ∈ R n×1 contains n terms equal to one, n being either n = N or n = M.
The E-step and M-step are repeated until the difference between the old value of σ 2 and the updated one is smaller than a given threshold.After the convergence, the result is a new set T that corresponds to the transformed target points Y.In the context of PTV, the point sets Y and X are built from the detection of the particles in the images captured by the cameras at two successive instants.Therefore, if the CPD has performed as expected, all the points in T and in X associated to the same physical particle should be found close to each other.Similarly, all points corresponding to either spurious particle detections or detections missed in one of the two frames should be found away from any points in the other set.With the CPD, the identification of ghost and the particle pairing are performed simultaneously.
A new pair is considered found if the distance between a particle in X and its nearest neighbor in T is less than a given distance.If more than one particle are within the range of pairing, the closest particle is selected.
In the rare event that two particles in X are in the range of pairing with the same particle in T , this particle T i would eventually be paired with two different particles x j and x k .A further step is performed to find and remove from the list those potential spurious pairs.

Specific changes to the algorithm
The original CPD method has been used for a wide range of applications, all of which are associated with specific constraints.As a result, there is a long list of algorithms based on a modified version of CPD.In general, these modifications and improvements are derived from the fact that the point sets have local properties that are invariant to the transformations they undergo [28][29][30], others improve the robustness and the computational performances of the algorithm [31][32][33].These changes imply a higher complexity of the algorithm and lead to marginal improvements in most cases.These changes have not been designed for the specific context of PTV.This context is particularly challenging for point registration algorithms because many of the sources of difficulty for the methods can occur in the same flow.For example, in a shear layer, there are regions of high turbulence intensity where the coherence of the particle-associated patterns is reduced between successive snapshots.There are also regions of lower turbulence, but with either small or large displacements, depending on the side of the shear layer.In addition, vortex centers may be devoid of particles, while saddle points may concentrate them, leading to a local large increase in the number of ghost particles and errors in detection.An optimal version of CPD could certainly be derived for each condition, but the author's desire in a first contact with CPD is to keep the method as simple as possible, and therefore to stick to the original algorithm.
Conversely, an advantage arises from the context of fluid mechanics.Indeed, the maximum displacement of the particles is generally well predicted from the knowledge of the main characteristics of the flow being studied.As a consequence, instead of using equation (5) to compute the initial value for σ 2 , it can be directly set to a fraction of the squared maximum displacement.This initial value for σ 2 is always much smaller than that obtained from equation (5) which is found to significantly reduce the number of EM iterations until convergence, and therefore increase the speed of the registration.This overall reduced value for σ 2 implies that the matrix P obtained from equation ( 6) is mostly filled with values close to zero and only a small number of values closer to one per row.As a second modification of the CPD algorithm, it was therefore decided to evaluate equation ( 6) only if ||x n − T m || < α σ, with α ∈ R + * , the corresponding component in P is set to zero otherwise.After conducting a series of tests, it was determined that a value of α = 4 is a reasonable trade-off.This value is large enough to not introduce any measurable effect on the convergence of the algorithm, while also avoiding a large number of calculations, especially after a few iterations when σ becomes small.The CPD algorithm is described in pseudocode in algorithm 1, with changes from the original version highlighted in blue.

Application to large point-sets: use of sub-domains
In practical PTV applications, the number of particles is in the order of 10 4 -10 5 .Therefore, it may take excessive computing resources to solve the linear system in equation (8).Although Myronenko and Song in their original contribution [26] proposed optimizations to reduce the computational complexity of the problem by implementing the fast gaussian transform [34] and a low-rank approximation of the matrices to be inverted, solving large problems remains impractical.In the present study, it was decided to work around this limitation by casting the point set into subdomains of the initial measured volume.The problem is therefore divided into problems that are faster to solve and require less memory as described by the sketch in the figure 2. Nevertheless, it raises a concern: since particles enter and leave the sub-domain during the time interval between frames, there are regions at the borders of the sub-domains where particles cannot be paired.The spatial extension of these regions is directly related to the average displacement of the particles, thus to the local flow velocity.If the sub-domains are too small, or the flow too fast, the amount of unpaired particles will eventually become so large that the CPD will fail.
One solution to this problem is to create sub-domains that overlap by at least the average particle displacement between frames.This displacement can generally be estimated from the characteristics of the flow being studied.The sub-domains can also be stretched in the main direction of the flow to further overcome this difficulty.
If the experiment is designed such that the mean displacement does not exceed two times the mean distance ⟨d⟩ between the particles, the edge size of a typical cubic sub-domains could be l SD = 10⟨d⟩, including 2⟨d⟩ of overlap on each side, Algorithm 1. Modified Coherent Point Drift.

end while
Find new pairs from Y + Gθ and X Append new pairs to the list end for Clean the list of pairs from redundant and inconsistent entries and it would thus contain an average of N = 180 randomly spread particles according to the relation A linear system of this size is easily solved by any solver.

Optimal operating conditions
The Coherent-Point-Drift is controlled by mainly the three parameters β, λ, and w (see equations ( 2), ( 3) and ( 6)), and to some extent by the number of particles in the sub-domains.The purpose of this section is to assess the sensitivity of the pairing quality to the input parameters and to devise a set of values for these parameters that are robust in realistic experimental conditions.Realistic conditions must include false particle detections and triangulation errors to account for the image processing step of an experiment.It must also incorporate advection of particles by the mean flow and spatial decoherence due to turbulence.
False particle detection can be easily simulated by adding an arbitrary number of randomly positioned particles.The triangulation errors can be modelled by altering the ground truth particles positions with a Gaussian noise.The standard deviation of this Gaussian noise is set here to 2% of the average displacement of the particles, which would correspond to an error of 0.1 pixels for an average displacement of 5 pixels, consistently with observed errors in former benchmarks [35].
The advection of the particles by a turbulent flow requires either a base flow obtained from a direct numerical simulation of the Navier-Stokes equations, or a synthetic flow that has the same multi-scale properties as natural turbulence.The second option is preferred here for its flexibility, and because small scales of turbulence can be faithfully synthesized by an analytical turbulence generator.

Synthetic turbulence generator
The synthetic generator is implemented following the method proposed in Martinez-Sanchis et al [36].The flow is generated by the sum of a finite number of spatial Fourier modes Sketch of the modified algorithm for a 2D dataset.The algorithm includes domain partitioning into overlapping subdomains, particle pairing within subdomains using CPD, and merging of pairs and filtering of inconsistent pairs.[37].In this case, the number of modes is set to 600.The phase and the direction of the velocity fluctuations associated with each modes are random, but the norm of these fluctuations is imposed to follow a prescribed power spectral density for the turbulent kinetic energy.The spectrum is defined by the Kolmogorov scale l η , an integral length scale L u , the turbulent kinetic energy K, and the convection velocity U c .Assuming isotropic turbulence, the kinetic turbulent energy is K = 3/2 u ′2 , where u ′ is the turbulent velocity in one direction.
In the following, we consider turbulent conditions that can typically be found in shear layers in which the velocity varies from 0 m s −1 to U = 5 m s −1 across a thickness δ s = 5 × 10 −2 m.Assuming that the integral length scale is L u = 0.2δ s = 10 −2 m, the Kolmogorov length scale l η can be deduced from these characteristics knowing that In water at room temperature, if u ′ = 0.15U = 0.75 m s −1 , ν = 1 × 10 −6 , relation (11) The corresponding spectra is shown in figure 3.
For the purpose of testing the CPD, a list that contains the coordinates of an arbitrary number of particle is created.These particles are uniformly randomly distributed in a given volume larger than the desired measurement volume.Particles from this list are advected by the synthetically generated turbulent flow assuming a null relaxation time.The coordinates of each particles after the advection are stored in a second list.
All particles from the two lists which are located within the measurement volume are extracted, and their coordinates are saved to build the two point sets X and Y.With this method, some particles enter the measurement volume and others leave it in a realistic way, so not all of them can be paired between X and Y.The list of possible pairs is also recorded to assess the result of the CPD.To further mimic a real experiment, an arbitrary number of randomly placed points can be added to each point set to account for ghost particles, which are inherently detected.

Robust set of parameters
Five extrinsic factors have been identified as having the potential to influence the optimal set of parameters for the CPD: • δ * = ⟨δ⟩/⟨d⟩, the ratio of the mean displacement of the particles to the mean distance between the particles.• d * = ⟨d⟩/l η , the mean distance between the particles normalized by the Kolmogorov length scale.Small values for d * imply a greater spatial coherence of the particle trajectories, whereas randomness is enhanced by larger values for d * .• I = u ′ /U, the turbulence intensity, which is another source of randomness in the trajectories.• G * = N ghost /N true , the ratio of the number of ghost particles to the number of true particles.• N = N ghost + N true , the total number of particles.This parameter must be considered in a special way, since N can be decomposed into N = N + N ′ .The value of N is the timeaveraged value of N for a given subdomain, and N ′ is the random fluctuation due to the lack of homogeneity in particle concentration across a real flow and the random nature of ghost particle detections.In this decomposition, N is considered an intrinsic parameter, because for a given particle density it is directly set when defining the size of the subdomains.In contrast, N ′ is related to the flow properties and is therefore an extrinsic factor.For simplicity, only N will be considered, and it will be treated as either an intrinsic or an extrinsic factor, depending on the context.
The goal is to find a value for each of the intrinsic parameter λ, β, w and N that makes the CPD robust within the expected range of variations in the extrinsic variables.To achieve this goal, a limited number of test cases is devised by successively varying each extrinsic factor with respect to a base case which settings are summarized in table 1.It is there assumed that effects of these parameters are decoupled of each other.The base flow is made of water, it has a mean velocity is U = 5 m s −1 , the turbulence is convected with a velocity U c = 2.5 m s −1 , the turbulence intensity is The choice of a value for the intrinsic parameters is the result of a trade-off between the optimal values for each test case, and by considering the robustness of the pairing quality to variations in that value.The pairing quality is determined by analyzing the number true positives N TP (a ground truth pair is found), false positives N FP (a pair is found, but it does not exists), and false negative N FN (a ground truth pair is missed).The corresponding dependent variables are the error rate ϵ, or equivalently the precision ψ, and the ratio of particles found (recall) ζ defined by The duration of the process may also be used to assess the quality of the pairing.Each intrinsic parameter is determined by holding the other constant.An iterative approach was therefore considered to converge to an optimum.A set of values previously determined by trial and error was used for initialization.After this first iteration, a new set of values was determined.The coupling between parameters was found to be weak and their range of optimal performance large.Also, since the values used for the initialization were reasonably close to optimal, values of the parameters were already found to be converged after the first iteration, and are given in table 1.
The sensitivity of the pairing quality to these intrinsic parameters is assessed by analyzing figure 4 which shows the variations in ϵ and ζ against the intrinsic parameters for each test cases, corresponding to the variation of one extrinsic parameter from the base case.To help clarify, the structure of the figures 4(a) and (b) is explained in the following, and can be transposed to the figures 4(c)-(f).Figures 4(a Overall, figure 4 indicates that the least critical parameter regarding the quality of the pairing is λ (see figures 4(a) and (b)) because it can be set between 0.1 and 100 without significantly affecting the CPD process.There is nonetheless a weak trend corresponding to fewer detections and more errors with increasing λ for values above 10 −2 .This is particularly visible for the error when there is a significant loss of spatial coherence of the trajectories for large I or δ * .The error is also found to rapidly increase for λ < 1 if there are not enough particles.Based on these observations, the value λ = 1 is deemed to be a reasonable compromise.
The CPD also has little sensitivity to the number of particles N providing it is larger than 100 if G * is 75% or less as shown by figures 4(e) and (f).However increasing the number of ghosts increases the minimum N to keep the number of missed particles low, while the error rate remains stable, even if the detection rate is not optimal.On the other hand, the pairing time per particle shown in figure 4(i) is found to approximately scale with N 2 with the present implementation, indicating that the CPD can be excessively time consuming for large values of N. Depending on the number of ghosts, the optimal could be between 200 and 400, here the value N = 300 is considered.
The parameter w is analysed in figures 4(g) and (h).The value for this parameter must remain strictly less than one (see equation ( 6)), and the results show that it must be increased as the number of ghosts grows, which is consistent with the role of this parameter.It is remarkable that in any configuration, the pairing is optimal when w is close to one, with surprisingly no penalty on processing time (not presented here).A value of 0.98 is therefore retained.
The most critical parameter is β.Qualitatively, β controls the spatial coherence of the displacement field Gθ (see equation ( 2)).This parameter has the dimension of a length and is somehow connected to the scales of the turbulence.In the following, it is proposed to make this parameter dimensionless by normalizing it to the mean distance between the particles.It is shown by figures 4(c) and (d) that β/⟨d⟩ should never be smaller than 20 because lower values gives too much flexibility to the displacement fields, and therefore lead to an increasing number of false positive.This minimum value remains nearly constant over all the test cases, with only a slight tendency to increase with increasing displacement d * .This suggest that this limit is robust in most experimental conditions.Conversely, if the value of β/⟨d⟩ is too large, the displacement fields cannot account for the loss of coherence imposed by the turbulence, especially if the turbulence intensity I or d * is large, thus causing false negatives.An intermediate value of β/⟨d⟩ = 50 is chosen here as it provides a significant margin with respect to the limit of 20 while limiting the amount of false negative.

Insight into the performance of the method
The performance of the CPD with the set of parameters defined above is now tested for a variety of synthetic flows, seedings, and time intervals between pulses.The base flow is the same as in the previous section, that is The ratio δ * of the mean displacement between two pulses to the mean distance between particles is varied, and ranges from 0.4 to 6.This variation can be caused by changing the seeding density or the time interval between two pulses.These two ways of varying δ * are not expected to be equivalent.In fact, due to the self-similarity of the inertial range of the turbulent cascade, changing the distance between the particles is not expected to significantly change the random nature of the trajectories.On the contrary, increasing the length of these trajectories due to longer delays between pulses should amplify the chaotic nature of the turbulence.
In a first set of test cases, the time between frames constant and equal to 0.1 ms (10 kHz), which corresponds to a displacement ⟨δ⟩ = 50l η (0.5 mm).The distance between particles ⟨d⟩ is set to range from 125l η (1.25 mm) to ⟨d⟩ = 8l η = 13 µm) Conversely, in the second test, ⟨d⟩ is constant and equal to 50l η ( 0.5 mm).δ * is modified by changing the time between frames from 0.04 ms to 0.6 ms, corresponding receptively to a laser pulse rate of 25 kHz and 1.7 kHz.
Results are presented in figure 5 that shows both a map of the error ϵ and some contours of the recall number ζ against G * up to 150% and δ * up to 3. First, as expected, it is observed that the maximum achievable displacement decreases as the number of ghost particles increases.This is the result of the loss of coherence of the particle trajectories, which raises ambiguities about which particle, ghost or real, belongs to the trajectory or not.In all the four cases, a maximum displacement δ * = 2 can be achieved with less than 1.5% error regardless of the turbulence intensity.For larger ratio of ghosts particles, increasing the turbulence intensity from 5% to 15% has a stronger effect on the maximum δ * = 2.For example, if G * = 1, it is close to δ * = 1.5 for I = 5% and δ * = 1.0 for I = 15% considering a maximum error rate of 1.5%.The ratio of found particles ζ is strongly affected by the turbulence intensity for any amount of ghosts.Even for G * = 0, the maximum displacement such that 90% of the particle pairs are found is divided by more than two between I = 5% and I = 15%.For I = 15% and a given set of G * and δ * , ζ is also found smaller if δ * is set by the time between pulses rather than by the density.
Overall, figure 5 shows that the CPD performs optimally if the number of ghost is smaller or equal to the number of true particles (G * = 0), and if the displacement is in the order of the mean distance between particles.Outside of this range, performance is affected by the characteristics of the turbulent flow, whereby more small scale turbulence means more errors and fewer pairs identified.
In addition to the loss of the spatial coherence due to the turbulence, the CPD performance is also altered because more and more particles enter and leave the subdomains between the two frames as δ * increases.
This problem can be mitigated if there is a dominant direction for the bulk flow that can be estimated a priori.In this case, the subdomains that were initially cubic can be stretched in the flow direction.The effect of this stretching can be analysed from the results proposed in figure 6.For the comparison, the subdomains are stretched by a factor of two in the direction of the flow, while decreasing their sizes in the orthogonal directions to keep the volume constant.For the sake of readability, only the contours for ϵ = 2% and ζ = 75% are displayed for the same conditions as in figure 5.The beneficial effect of stretching the subdomains is clear for the turbulence intensity of I =5% for less than 100% of ghosts both in terms of precision and recall.For I = 15%, however, stretching only marginally affects the performances positively for the precision and negatively for the recall.This indicates that for high turbulence level, the loss of spatial coherence overwhelm the problems associated with the particles entering and leaving the subdomains in the failure of the CPD.

Comparison between CPD and 2P-STB based on synthetic experiment
In this contribution, pairing by means of CPD has only been tested on particle lists in which the ghosts were randomly distributed over the measurement volume instead of resulting from particle detection from camera image.This method is well suited for the purpose of a parameter study because it is well controlled and the generation of the data sets is time efficient.However, the random spreading of the ghost particles is not exactly what is observed in real experiments.In practice, ghost detections are the result of ambiguities in the particle detection process, and their positions are correlated with the positions of the true particles.Consequently, the ghost particles can be detected to some extent as particles that follow trajectories consistent with those of the true particle.If the flow is sufficiently smooth, or the time between the two frames is short enough, the CPD method may face challenges in differentiating between pairs of real and ghost particles.
A set of more realistic synthetic experiments, which incorporates particle detection into the pairing process, is therefore proposed to supplement the tests performed in section 3. The second objective of this series of synthetic experiments is to compare the performance of the CPD with that of the 2P-STB algorithm implemented in 10.2.

Synthetic experimental setup
The synthetic configuration aims to replicate the experimental setup as reported by Acher et al [8].It consists of four 12bit, 2 megapixels (1600×1200) cameras arranged in a pyramidal configuration.Each camera is equipped with a 50 mm lens with distortion given via a pinhole model and is tilted utilizing a Scheimpflug mount.The volume is illuminated by a laser beam that expands in one direction, and which is collimated and trimmed by a 10 mm slit in the other.The local laser irradiance is calculated assuming a Gaussian initial beam.Overall, the size of the measurement volume is approximately 300×240×20 mm 3 .The flow is seeded with polydisperse particles ranging in diameter from 25 µm to 35 µm, so that the brightness of individual particles varies locally by a factor of four.The brightness also depends on the local laser irradiance, which also changes by a factor of 3 across the image due to beam divergence.The particles are then projected on the cameras as gaussians distorted by the optical arrangement, and are integrated over the square pixels of the sensor.Three seeding densities are used to obtain 0.05, 0.075 and 0.1 particle-per-pixel (ppp) in the images.In the resulting image, 95% of the particles have a normalized brightness between 0.08 and 0.71, which can occasionally lead to the saturation of a pixel of the image due to overlapping particles.An example of the resulting images in provided in figure 8 for the three ppp.
A snapshot of the velocity field used to advect the particles is given in figure 7. Velocity fields are randomly generated for each test case by the synthetic turbulence generator described in section 3.1, with the mean velocity set to 1 m s −1 , the integral length scale to L u = 50 mm, and the Kolmogorov length scale being l η = 0.05 mm.Two turbulent intensities I = 5% and I = 15% are considered, and the time interval between the two frames ranges from 0.5 ms to 3 ms.The normalized displacements obtained fall between 0.4 and 2.1 for 0.05 ppp.
Pairs of noise-free images are then generated based on the aforementioned constraints, resulting in the samples presented in figure 7 saved as 16-bit png files.Additionally, images of a synthetic calibration plate placed parallel to the plane (⃗ x⃗ y) (see figure 7 for the definition of the axes) are generated for various positions along the ⃗ z direction over the domain.

Particle detection
The precise detection of particles is crucial for ensuring the quality of the pairing process.The comparison of particle pairing methods can therefore prove challenging if they are given different particle lists due to the use of different detection methods.
Although not explicitly stated, it can be inferred that the IPR used for particle detection in DaVis 10.2 is very similar, if not identical, to that used in the first pass of the 2P-STB.The use of the IPR to obtain a list of particles to be processed by the CPD is therefore relevant in an attempt to compare the CPD and the 2P-STB.Working with DaVis 10.2 for detecting particles in both process also allows for the use of the same polynomial camera calibration models, self-calibrations, and optical transfer functions.
Parameters of the IPR have been kept at their default values, which results for the most important ones in 4 inner and outer loops, a minimum intensity of 1000 counts (1.5% of the maximum) and a rejection of particles with an intensity weaker than 10% of the average intensity.Only the 'allowed triangulation error' remains as a free parameter because it directly affects the number of ghost detections.Maximum errors of 1 voxel, which is the default values, and 1.5 voxel will be tested in the following.The voxels are 0.2 mm cubes, which is also the approximate size of the back projection of a pixel onto the median plane of the measurement domain.
The results of the particle detection are given in table 2 for the three ppp and the two allowed errors.Detected particles are classified as true if they are located within one voxel of a particle that actually belongs to the particle list projected onto the images.If multiple particles are found within this research volume, only the closest one is accepted as true.All other particles are classified as ghosts.Furthermore, the recall given in the table 2 is the ratio of the number of true detected particles N true to the total number of particles that are projected onto the four cameras.
Table 2. Summary of the performance of the IPR implemented in DaVis 10.2 with default settings and maximum allowed triangulation of 1 and 1.5 voxel for the particle densities considered in the study.

Max error:
1 voxel 1.5 voxel In the context of this study, which involves noise-free images that are only slightly distorted, varying the maximum allowed error has little effect on the recall of the true particles in the measurement volume.Approximately 88% of the particles are detected for 0.05 ppp, and closer to 73% for 0.1 ppp.Changing the maximum allowed error from 1 to 1.5 voxel however results in a threefold rise in the number of ghosts N ghost .In the worst case, that is 0.1 ppp and a 1.5 voxel maximum error, N ghost is 2.5 times larger than N true .

Particle pairing and post-processing
The six configurations outlined in section 4.1 are tested for the comparison of the pairing methods, which includes the three variations in particle density and the two turbulence intensities.For each configuration, different normalized displacements δ * are tested by changing the time interval between frames in increments of 0.5 ms, starting from 0.5 ms.
The CPD pairing is performed from the list of particles obtained during the IPR step with the optimal settings summarized in table 1.The 2P-STB is ran directly from the images because it already includes the IPR.Although it was observed that this latter method slightly benefits from being used iteratively, even in the Two-Pulse version, for the sake of fair comparison with the CPD, which cannot interact with the DaVis 10.2 IPR, only a single pass IPR+tracking is done.The impact of performing a single pass or multiple passes 2P-STB remains marginal, with changes limited to a few percent, when compared to the disparities noticed between the two approaches.
Three passes of median filtering are applied to the tracks obtained by the 2P-STB.This is the minimum number of passes recommended in the DaVis 10.2 documentation, but was found to be the best compromise between increasing precision and decreasing recall.In fact, more passes tend to level out the precision while slightly decreasing the recall, whereas reducing the number of passes has a negative effect on precision.
A post-processing filtering procedure has also to be designed for the results of CPD which suffers from a fairly high level of error when compared to the expectation derived from figure 5.This higher rate error especially affects the configurations with the 5% turbulence intensity and short intervals between frames.Errors arise due to the presence of wellcorrelated ghost particles in these conditions that form tracks consistent with the local flow properties.The filtering process relies on two indicators.Firstly, the relative change in particle intensity, or brightness, along a track is generally greater for false tracks compared to true tracks.Secondly, the average distance between the source particle that was transformed with T (Y, θ) (see equation ( 1)) and the target particle it is paired with, that is later referred to as the pairing distance, is larger for false tracks than for true ones.Two thresholds are defined for these two indicators.If a particle features at least one indicator that exceeds the threshold, it is removed from the list.The threshold for the pairing distance is determined by analyzing the 25% of tracks exhibiting the lowest relative change in intensity, as they are deemed to be mostly true.The threshold is defined as the value of the pairing distance such that 99% of analyzed tracks are retained.The same procedure is done for the change in intensity, by considering this time the 25% of tracks exhibiting the lowest pairing error, and by keeping the 99% level.The impact of this filtering is noticeable in the results, but remains subtle.

Discussion comparison of pairing methods
Four metrics are used to compare the CPD and 2P-STB methods.The first two are classically the precision ψ and the recall ζ following the definition in equation (12).More importantly than the recall would be the relative spatial resolution R * of the measurement that is ratio of the mean distance between the available tracks ⟨d⟩ to the mean distance between the correctly identified tracks ⟨d true ⟩.A value of R * = 1 means that the maximum resolution is achieved, while values less than one indicate a loss of resolution due to missing pairs.For example, R * = 0.5 corresponds to a spatial resolution that is two times smaller than optimal, or equivalently, that the mean distance between two measurements is two times larger.In a three-dimensional problem, R * can be related to the recall ζ by noting from equation (10) that the average distance between neighboring particles within a given volume is a function of the cube root of the number of particles in the same volume, leading to The last metric is the F1 score, which typically balances precision and recall, but in this context is defined as to emphasize resolution rather than recall.A first overall comparison can be made on the basis of figure 9.For each case tested, the metrics obtained from the The first observation is that both methods have better performance at lower ppp, which is expected since the number of ghosts is also smaller at low particle densities.Overall, it is also observed that the CPD is superior to the 2P-STB in terms of precision, while the recall, and thus the spatial resolution, is better handled by the 2P-STB.There is no clear trend for the F1 score, with any weaknesses in a method being mitigated by its strengths.
The effect of increasing the turbulence intensity from 5% to 15% is found to be detrimental to the 2P-STB in terms of precision, while it is beneficial for the CPD.This outcomes is explained by the faster decorrelation of the ghosts particles with respect to the true ones, resulting in improved ghost rejection by the CPD.However, the recall decreases drastically for the CPD, while it is hardly affected for the 2P-STB, especially for longer intervals between frames.At 0.05 ppp, this decrease is not compensated by an increase in precision, which is already high, which gives the 2P-STB an advantage in terms of F1 score at low ppp and high turbulence intensity.
Detailed views of the 0.05 ppp and 0.1 ppp configurations are presented in figures 10 and 11 respectively.The 0.075 ppp configurations were discarded from this analysis because the results fall right in between the other two.For each configuration and frame interval, the bar graphs show the number of detected tracks and the number of correct tracks for the 2P-STB and the CPD.In addition, the maximum of available tracks from the list of particles detected by the IPR are plotted.Surprisingly, this latter number is close to the number of   true particles summarized in table 2. One would have expected this figure to be significantly smaller since building a track requires detecting the two particles it is made of.With 96% of the particle being recalled by the IPR, a recall of 92% for the track could be achieved assuming each frame's particle detection is an independent process.For 0.05 ppp this would mean that 75 ×10 3 tracks would be available while 79 ×10 3 tracks can actually be found by the CPD.This observation indicates that the particles are missed consistently within a pair of image, possibly because the particle intensity is low and it belongs to a dense cluster of brighter ones.Performance of the CPD in figure 10 are in agreement with the expectation outlined in section 3.3.Specifically, the recall, based on the available tracks form the IPR, is consistently high and stable for I = 5%, and drop below 75% for normalized displacement δ * greater than 1-1.5 for I = 15%.In accordance with the conclusion drawn from figure 9, the 2P-STB performs almost independently of the conditions for 0.05 ppp although the precision remains smaller than that of the CPD for all cases.This same stability is observed in figure 11 for the 2P-STB when the particle density is raised to 0.1 ppp.However the precision dropped significantly, in particular when the number of ghosts is enhanced by increasing the maximum allowed error to 1.5 voxel for the triangulation.To a lesser degree, the CPD results also exhibit decreased precision, especially if the ghost rate is high and the turbulence intensity is low, which promotes the identification of tracks between ghosts particles that are correlated with true particles.
Finally, figure 12 gives a qualitative comparison between the pairing methods for two configurations with a turbulence intensity I = 15% and a maximum error of 1.5 voxel.The comparison is done on the axial component of the velocity estimated on the (⃗ x⃗ y) plane at constant z = 0 from the particle pairs, either form the ground truth list, the CPD, or the 2P-STB.To facilitate direct comparison, the scattered velocity samples are then bi-linearly interpolated onto a regular grid with a spacing equal to half the mean distance between the particles.
In addition, the value turbulent kinetic energy (tke) is calculated from the interpolated field and reported above the corresponding sub-figures.The contribution of the localization error to the (tke) was also estimated by comparing the position of correctly detected particles to their nearest ground truth neighbors, resulting in a value of 0.210 −3 m 2 s −2 for 0.05 ppp (figures 12(a)-(c)), and 0.410 −3 m 2 s −2 for 0.1 ppp (figures 12(d)-(f)), so in the order of one percent of the tke.
Two configuration are considered.The seeding density is set to 0.05 ppp, and the displacement to δ * = 1.1 for first configuration (figures 12(a)-(c)) such that both methods perform optimally with respectively F1 = 0.89 and F1 = 0.90 for the CPD and the 2P-STB based on data reported in the figure 9.The density is then increased to 0.1 ppp in the second configuration and δ * = 1.4 (figures 12(d)-(f)) such that the performance of both methods is significantly reduced to F1 = 0.66 for the CPD and F1 = 0.6 and the 2P-STB.These two configurations can typically coexist within the same flow due, for instance, to local attractors formed by coherent Lagrangian structures, or simply a lack of uniformity in the seeding operation.
Under favorable conditions (see figures 12(a)-(c)) the differences between the two pairing methods and the actual groundtruth velocity field are barely visible.The velocity fields reconstructed in harsher conditions (see figures 12(d)-(f)), the turbulent kinetic energy is measurably smaller for the reconstruction resulting from the CPD than both 2P-STB and ground truth.This decrease in tke was expected and constitutes an intrinsic limit of the CPD method since tracks that depart too much from the smooth transformation involved in the CPD process (see equation (1)) are rejected during the pairing process.On the contrary, these conditions cause the 2P-STB to return numerous false vectors which increase the granularity of the velocity field, but they are not associated with an increase in the tke.

Conclusion
Reconstructing three-dimensional particle trajectories from a list of triangulated particles is a difficult task because it requires overcoming the challenges posed by ghost detection.Currently, the prevailing method to address this issue is the Shake-the-Box algorithm, which exploits the rapid temporal decoherence of the tracks associated with ghost particles to identify and remove them.This method, however, is not as effective for two-pulse configurations, which cannot be avoided when measuring high-speed flows, especially in air.This motivated the authors to attempt addressing this concern by adapting the CPD method, which relies on the spatial coherence of the tracks and is therefore not limited to the timeresolved problems.
The CPD is determined by four parameters, which the first part of the research documented in this article attempts to determine.The findings of this first investigation show that the values of these parameters can be set such that they can be changed by more than an order of magnitude without significantly changing the performance of the particle pairing on a series of realistic test cases.
The method is evaluated with the optimal parameters by comparing it to the 2P-STB algorithm implemented in DaVis 10.2 software.For this purpose, DaVis' IPR was used to detect particles from a series of synthetic images generated as realistically as possible from a known list of particles advected in a turbulent flow.The outcomes of this second study indicate that the performance of 2P-STB and CPD is similar for flows with a low turbulence intensity of 5%, with an overall slightly lower recall and higher precision for CPD than for 2P-STB.This trend is largely reinforced for a turbulence intensity of 15%, with a noticeable decrease in the recall of the CPD while maintaining a high precision, and conversely a decreased precision and unchanged recall for the 2P-STB.This inverse behavior leads to comparable F1 scores for both methods in any configuration, where this metric represents a balance between precision and spatial resolution of accurately reconstructed vectors.
Because the spatial resolution of the reconstructed fields varies only with the cube root of the number of particles, the authors would recommend that any user of either 2P-STB or CPD keep the particle density at a moderate level, such as 0.05 ppp.Under these conditions, both methods are reliable with minimal loss in resolution compared to a high seeding density of 0.1 ppp.With an average of 0.05 ppp, there may still be some tolerance if there is a locally higher concentration of seeding particles, although in these harsher condition the results frm 2P-STB may become less reliable that from CPD.
The CPD method should now be implemented in a timeresolved framework and may benefit from its precision to result in a higher convergence rate than the time-resolved Shake-The-Box algorithm.However, this will depend on the ability to increase the CPD recall to avoid trajectory breaks caused by missing particles.

Figure 1 .
Figure 1.Sketch of the working principle of the CPD.The two point sets X (•) and Y (•) made of real and ghost particles are drawn in their initial states on the left.For iteration 1 and n, the two point sets are X and T (Y, θ) (•).The shaded area shows the region of influence of the particles in X with respect to the calculation of the likelihood p(xn).

Figure 2 .
Figure 2. Sketch of the modified algorithm for a 2D dataset.The algorithm includes domain partitioning into overlapping subdomains, particle pairing within subdomains using CPD, and merging of pairs and filtering of inconsistent pairs.

Figure 3 .
Figure 3. Turbulent kinetic energy spectra imposed for the synthetic turbulence generator.The turbulence intensity is u ′ /U = 15%.
) and (b) refer to the sensitivity analysis of the pairing error rate ϵ (figure 4(a)), and recall ζ (figure 4(b)), to the parameter λ.The analysis is further decomposed in five sub-figures.In each sub-figure, three values of one of the extrinsic parameters are tested, while the other extrinsic parameters are held constant and equal to the base case values given in table 1.The three values of the tested parameter are the base case value and two others values explicitly written in red and blue to the right of the corresponding sub-figure.For example, the sub-figures constituting the first row of the figures 4(a) and (b) report the sensitivity for ghost rates values G * = 0% (blue line), 75% (dashed line) and 150% (red line).

Figure 4 .
Figure 4.Results of the parametric study to assess the robustness of the selected CPD intrinsic parameters through the error rate ϵ and the recall ζ.The CPD parameter λ is considered in (a) and (b), the parameter β/ < d > is analysed in (c) and (d), and the number of particles per sub-domain N in (e) and (f).For each case, three values of the are tested for the ghost rate G * , the ratio d * between < d > and lη, the turbulence intensity I, and N. The tested values are reported in red and blue on the right of each figure, and the results are displayed with respectively the same colors, the dashed line corresponds to the result for the base case value reported in the table 1.The vertical line indicates the value selected as optimal for the corresponding intrinsic parameter.The effect of the parameter w on ϵ and ζ is shown in (g) and (h) for various G * which values are given in the legend of (h).The processing time per particle Tp is shown in (i) as a function of N for the base case configuration in dashed line, the dotted line shows the slope 2.

Figure 5 .
Figure 5. Maps of the pairing error ϵ ranging from 0% to 3% for varying amount of ghost particles G * and normalized particle displacements δ * The contours indicates the ratio of particle found ζ.The turbulence intensity is 5% (a), (c) (d), and δ * is by adjusting either the time interval between the frames (a), (b) the density of particles (c), (d).

Figure 6 .
Figure 6.Ratio of ghost G * again the normalized displacement δ * such that the error is ϵ = 2% in (a), and the recall is ζ = 75% in (b).Dashed lines corresponds to cubic subdomains, and solid lines are for subdomains stretched by a factor of 2. Blue lines: I = 5%, red lines: I = 15%.

Figure 7 .
Figure 7. Definition of the realistic test case.Maps show slices of the theoretical flow components u th , v th , and w th at one given instant for I = 5%.

Figure 8 .
Figure 8.Samples of the negative images from one camera for three densities, and 3D rendering of a sampled volume including pairs of particles and ghosts particles (shaded) for two consecutive times in red and blue.The 3D views correspond to a 'straightforward' paring case with δ * = 0.7 and G * = 61% (0.05 ppp, 1.5 voxel error from table 2), and a 'challenging' one with δ * = 1.4 and G * = 264% (0.1 ppp, 1.5 voxel error from table2).

Figure 10 .
Figure 10.Comparison between the number of tracks obtained from STB (left bar) and CPD (right bar) for 0.05 ppp and various time between frames expressed as normalized displacements.The blue bar is the total number of detected tracks, and the green bar is the number of correct tracks.■ : maximum pairs existing in the volume.+ : maximum pairs existing in the list of particle from the IPR.

Figure 11 .
Figure 11.Comparison between the number of tracks obtained from STB (left bar) and CPD (right bar) for 0.1 ppp and various time between frames expressed as normalized displacements.The blue bar is the total number of detected tracks, and the green bar is the number of correct tracks.■ : maximum pairs existing in the volume.+ : maximum pairs existing in the list of particle from the IPR.

Table 1 .
Definition of the base case for the extrinsic parameters, and the set of robust intrinsic parameters for the coherent point drift.