Accuracy assessment of fringe projection profilometry and digital image correlation techniques for three-dimensional shape measurements

With ever-increasing demand for three-dimensional (3D) imaging and shape measurements in a variety of fields, measurement accuracy has become of vital importance to numerous scientific and engineering applications. This paper presents an experimental investigation into the accuracy comparison of two prevalent 3D imaging and shape measurement methods: fringe projection profilometry (FPP) and 3D digital image correlation (3D-DIC) techniques. A detailed description of their principles reveals their inherent similarities and fundamental differences. A measurement system composed of both techniques is employed in the study, and a test target with speckle checkerboard patterns on its surface is adopted to allow simultaneous FPP and 3D-DIC measurements. The evaluation puts emphasis on how the geometric angles between key hardware components affect the 3D measurement accuracy. Experiments show that the depth and height measurements of both techniques can reach sub-micron accuracy, and the relative accuracy of the 3D shape or position measurements can reach 1/600 000.


Introduction
In the past two decades, three-dimensional (3D) imaging and shape measurement technology has made remarkable progress and become a popular subject of interest in scientific research and engineering applications. The impact of the 3D imaging and shape measurement can be seen in numerous fields, such as machine vision, medical practice, reserve engineering, quality assurance, biometric security, 3D printing, entertainment, and unmanned transportation [1][2][3][4][5][6][7][8][9][10]. There are many key features to characterize the performance of a 3D imaging and shape measurement technique, including speed, resolution, accuracy, reliability, cost, application scenario, etc. Although there are often trade-offs among these features, measurement accuracy is always an essential one in the applications of the 3D imaging and shape measurement technology. The accuracy is related to several factors, including but not limited to technical mechanism, system components and setup, field of view, system calibration, geometric and surface characteristics of the measured objects, and ambient illumination. Accordingly, striving for a 3D imaging and shape measurement system with the most possible accurate performance has posed a difficult technical challenge.
Typical non-contact 3D imaging techniques for shape and deformation measurements include: the time-of-flight (TOF) method, interferometry method, laser scanning method, photogrammetry method, moiré method, structured-light or fringe projection method, stereo vision method, digital holography, and so on [11][12][13][14][15][16][17][18][19][20][21][22][23][24]. In many fields such as experimental mechanics and optics, the fringe projection profilometry (FPP) and the 3D digital image correlation (3D-DIC) methods are among the most established and widely used 3D imaging and shape measurement techniques owing to their robustness and accuracy [25][26][27][28][29][30][31][32]. The FPP technique, a kind of structured-light-based method, generally includes a projector, a camera, and a computer. The technique involves projecting active fringe patterns onto the target objects and capturing the distorted fringe patterns. The 3D shape information can then be extracted from the distorted fringe patterns. Unlike the FPP technique, the 3D-DIC technique is a stereo-vision-based method that heavily relies on detecting the stereo correspondence between two image sets captured by two cameras at different viewpoints, where the 3D coordinate information can be retrieved through using geometric triangulation upon the completion of camera calibration. It should be noted that the TOF method has recently become popular in certain applications including augmented reality, smartphones, autonomous vehicles, etc. Nevertheless, the TOF method faces the problem of relatively low resolution and low accuracy [33][34][35][36], and is thus not included in this work.
Over the years, numerous algorithms and schemes have been developed to enhance the performances of the FPP-based and DIC-based 3D imaging and shape measurements [37][38][39][40][41][42][43][44][45][46]. Many attempts have been made to improve the accuracy of the FPP-based systems such as building a mathematical model to reduce the measurement error [47][48][49][50], increasing the calibration accuracy with a liquid crystal display screen [51], and using customized fringes to enhance the sensitivity of phase detection [52]. Plenty of approaches have meanwhile been proposed to enhance the DIC-based systems such as utilizing a novel search scheme to improve the accuracy of the initial guess values [53] and using a multi-process parallel algorithm to increase the processing speed while maintaining the accuracy [54]. Recently, the integration of deep learning methods into stereo vision and fringe projection techniques [55][56][57][58][59][60][61] draw a surge of interest due to the advancements in automatic end-to-end learning networks as well as high-speed parallel computing, but their common shortcoming is the inferior accuracy. Despite these advances, an investigation into analyzing the relation of system geometry to measurement accuracy and comparing the accuracy of the two techniques under identical measurement conditions is lacking. Such an investigation helps provide a guideline on achieving the best possible measurement accuracy in practice, which is the motivation of this paper. Figure 1 illustrates the arrangement of the projector and cameras used in the experiment, which allows simultaneously conducting both FPP and DIC measurements under the same geometry configurations.
The study helps answer a frequently asked question: does the FPP technique or the 3D-DIC technique yield higher accuracy in the 3D imaging and shape measurement?
The following of the paper is organized as follow: section 2 describes the principles of the adopted FPP and 3D-DIC techniques as well as the indispensable camera calibration; section 3 presents experimental investigation into the measurement accuracy of the two techniques and a few experiments to demonstrate the capabilities of the techniques; and a summary with a brief discussion is included in the last section.

Camera calibration
In the human eye system, the mind combines two separate images to build a 3D stereo picture by matching up similarities and adding in differences. This natural vision mechanism is commonly employed in the 3D machine vision field, and the 3D-DIC technique is one of such 3D imaging methods. By replacing one of the cameras with a projector, as illustrated in figure 2, the 3D-DIC imaging system becomes an FPP 3D imaging system. Because a projector is technically a reversed camera, the fundamental principles of the FPP and 3D-DIC techniques are similar: both are inherently based on binocular stereo imaging or triangulation imaging. Their primary difference lies in the way how image matching and determination of 3D coordinates are facilitated. Since both techniques rely on using cameras, camera calibration is described below.
The camera calibration describes the relation between the 3D world coordinates of a point and its corresponding location in the planar digital image. With a basic coordinate transformation, an arbitrary point (x w , y w , z w ) in the world coordinate system can be transformed into a camera coordinate system as (x c , y c , z c ) by using: In the equation, R and T components, known as camera extrinsic parameters, indicate the rotation and translation parameters. Next, in the imaging plane of the camera, the pixel location (u, v) of the aforementioned point can be described with a pinhole model as: where x cn = x c /z c ; y cn = y c /z c ; α and β are the horizontal and vertical distances in pixel unit from the lens to the imaging plane, respectively; γ is a skew factor; and (u 0 , v 0 ) is the coordinates of the principal point. The last five parameters are often termed the camera intrinsic parameters. Considering the lens distortion in practice, the actual pixel location ( u, v) of the point in the captured digital image can be modeled from equation (2) as: where x cn = 1 + a 0 r 2 + a 1 r 4 + a 2 r 6 + a 3 r 8 + a 4 r 10 x cn + s 0 + s 2 r 2 r 2 + p 0 + p 2 r 2 r 2 + 2x cn 2 y cn = 1 + a 0 r 2 + a 1 r 4 + a 2 r 6 + a 3 r 8 + a 4 r 10 y cn + s 1 + s 3 r 2 r 2 + p 1 + p 3 r 2 r 2 + 2y cn 2 .
In equation (4), r 2 = x cn 2 + y cn 2 ; (a 0 , . . . , a 4 ), (s 0 , . . . , s 3 ), and (p 0 , . . . , p 3 ) represent radial, prism, and tangential distortion coefficients, respectively. It is evident that equations (1)-(4) yield the relation between the 3D world coordinates of a point (x w , y w , z w ) and its pixel location ( u, v) in the captured image. Through using a camera calibration target where the 3D world coordinates of the control points (such as the corner points of a checkerboard target) are known, the camera intrinsic and extrinsic parameters, as well as the lens distortion parameters, can be determined from a bundle-adjustment-based camera calibration process. Figure 3 show four commonly-used planar calibration targets. The relevant calibration algorithms can be found in [62][63][64][65][66]. A binocular imaging system contains two separate cameras. Thus, for a typical point (x w , y w , z w ) in the world coordinate system, equation (1) yields the following equations: where the terms with a ′ symbol are associated with the second camera. In equation (5), the extrinsic parameters R, T, R ′ , and T ′ are acquired in advance from the calibration of the two cameras; x cn , y cn , x ′ cn , and y ′ cn can be obtained from the captured images using equations (3) and (4). Consequently, if the physical point corresponding to (x cn , y cn ) in the first image is the same point corresponding to (x ′ cn , y ′ cn ) in the second image, then there are totally six equations as shown in (5) and five unknowns: x w , y w , z w , z c , and z ′ c . It is an overdetermined system, so the desired 3D world coordinates (x w , y w , z w ) can be solved. This explains why in theory a binocular imaging system can be employed for 3D imaging and shape measurements.
In practice, the FPP and 3D-DIC techniques adopt different approaches to facilitate the process of the 3D coordinates determination. The two techniques are elaborated as follows.

Fringe projection profilometry
The FPP setup is formed by replacing one of the two cameras in the binocular imaging system with a projector. A tremendous amount of work has been accomplished since the 1980s in the research and development of the FPP-based techniques. As technology evolves at an ever-increasing pace, accuracy has become the most important feature for the FPP technique in countless applications. Among the various approaches to implementing the FPP measurement, a considerably reliable scheme involves projecting a set of phase-shifted sinusoidal fringe patterns from a projector onto the objects, where the surface depth or height information is naturally encoded into the camera-captured fringe patterns. The technique reconstructs the 3D shapes through determining the height or depth map from the phase distributions of the captured fringes. In general, the original fringes are straight, evenly spaced, and vertically (or horizontally) oriented. They are numerically generated from using the following function [67,68]: where I is the pattern intensity at pixel coordinates (u, v); the subscript j denotes the jth phase-shifted image with j = {1, 2, ..., m}, and m is the number of the phase-shift steps (e.g. m = 4); I 0 is a constant coefficient indicating the intensity modulation; f is the number of fringes in the pattern image; W is the width of the generated image; δ is the phase-shift amount; and ϕ is the fringe phase. Figure 4 demonstrate four representative fringe patterns with various frequencies of 1, 4, 20, and 60, respectively. The fringe phase at a pixel in the camera-captured images can be calculated by using a standard four-step phase-shifting algorithm as: Because the equation uses an arctangent function, the obtained phase value is wrapped in the range of 0 to 2π (denoted with a superscript w), and it must be unwrapped to obtain the true phase. However, phase unwrapping is often a difficult task for cases involving complex shapes and geometric discontinuities.
In order to cope with this issue, a scheme of using multi-frequency fringe patterns is employed. The corresponding unwrapped phase can be calculated from: where i indicates the ith fringe-frequency pattern with i = {2, 3, ..., n}, and n is the number of fringe frequencies; INT represents the function of rounding to the nearest integer; f i is the number of fringes in the ith projection pattern, with f n > f n−1 > ... > f 1 = 1; and ϕ 1 = ϕ w 1 is satisfied for f 1 = 1. The ratio between two adjacent fringe frequencies f i f i−1 is normally smaller or equal to 5 to reduce the noise effect and ensure the reliability of the algorithm. A practical example is n = 4 with f 4 = 100, f 3 = 20, f 2 = 4, and f 1 = 1. The essential task of the FPP technique is to retrieve the depth or height map from the calculated phase distributions of the highest frequency fringes. The governing equation for a generalized setup where the system components can be arbitrarily positioned [69,70] is: where z w is the height or depth at the point corresponding to the pixel (u, v) in the captured images, and it is also the z-coordinate of the point in the reference or world coordinate system; ϕ is the unwrapped phase of the highest-frequency fringe pattern at the same pixel; and c 1 − c 29 and d 0 − d 29 are constant coefficients associated with geometrical and other system parameters. The 59 coefficients can be determined by a calibration process using a few gage objects that have many points with z w precisely known. Recalling that the camera calibration previously described yields the 3D coordinates of the control points on the calibration board with ultrahigh accuracy, so they can serve as the gage points for determining the 59 coefficients. The cost function of the corresponding non-linear least-squares optimization [70,71] is: where Z ij is the z-coordinate or height/depth information of lth control point on the calibration board obtained at the kth calibration position. The 59 coefficients can be easily determined by using the Levenberg-Marquardt algorithm or a similar non-linear least-squares algorithm.
After the determination of z w , the remaining two coordinates x w and y w of the same point can be calculated according to equations (1)-(4) as: where

3D digital image correlation
The 3D-DIC technique is a stereo vision method that performs 3D imaging and shape measurements using two images captured by two separate cameras, typically one on the left side and the other on the right side. Recalling in equation (5) that (x cn , y cn ) and (x ′ cn , y ′ cn ) must be associated with the same physical point, it is therefore required that the DIC algorithm fulfills matching the points in one image (the reference) to their corresponding points in another image (the target) with subpixel accuracies.
For an arbitrary pixel ( u 0 , v 0 ) in the reference image to be matched, a square subset region of (2M + 1) × (2M + 1) pixels with its center located at ( u 0 , v 0 ) is selected in the DIC analysis as the reference subset, where M is a positive integer. The corresponding subset in the target image (i.e. the target subset) should be a homography transformation of the reference subset because they are the same region captured by two cameras from separate positions and directions. It is noted that the usage of subsets not only makes matching pixels be feasible but also helps reduce the noise effect. Denoting the disparity between the centers of the two matching subset patterns as (ξ, η), the transformation function for the entire reference and target subsets can be expressed as [30]: where i and j range from −M to M, and η uv are the transformation parameters. These parameters can be determined by minimizing the least-squares-based correlation coefficient defined as [72]: where a and b are scale factors, c is an offset of intensity, and indicate the intensity values at a pixel in the reference subset and the potential matching pixel in the target subset, respectively. The 15 unknowns, including 12 shape-transformation parameters and 3 intensity parameters (a, b and c), can be solved by using an iterative algorithm such as the Gauss-Newton or Levenberg-Marquardt algorithms [73][74][75]. During the iteration process, an interpolation operation should be carried out to obtain the intensity values at subpixel locations in the target subset [76] It is also noteworthy that the introduction of the non-linear scale factor a into the cost function is to compensate for the possible non-linear intensity variations between the captured images for accuracy-enhanced measurements.
To better present the iteration and the derivation of the iterative equation, equation (14) can be rewritten in another form as: The best estimate of the mapping parameters is established by minimizing C (p). This can be iteratively carried out by applying the Gauss-Newton algorithm to equation (15), which yields the governing equation as: where n = 0, 1, 2, ... indicates the iteration step, and J ij is the Jacobian vector defined as [77]: In equation (17), are the intensity gradients of the target subset at location in the x-and y-directions, respectively. In the iteration, the convergence tolerance can be set to 1 × 10 −5 for each element of p.
The above iterative algorithm is capable of performing the image matching process with high accuracy at a very fast speed upon a reasonably good initial guess for the unknown transformation parameters which are mainly the low-order terms ξ, η, ξ u , ξ v , η u , η v in equation (13). Such an initial guess can be carried out by using a manual way of selecting three pairs of matching points in the reference and target images, or using an automatic full-field scanning process in the case of small shape change of the target subset with respect to the reference subset. A feature-based matching scheme may be employed to conduct the initial guess [78,79] if the previous two schemes are not applicable.
Upon the completion of image matching, a pixel ( u, v) in the region of interest in one image is now linked to a corresponding pixel ( u ′ , v ′ ) in another image. From equations (3) and (4), (x cn , y cn ) and (x ′ cn , y ′ cn ) can then be determined. Subsequently, by eliminating z c and z ′ c , equation (5) yields  Equation (18) is an overdetermined equation system, and the desired (x w , y w , z w ) can be acquired from its linear least-squares solutions. That is, expressing the equation as Ax = B, and the solution is calculated from

Experiments
The FPP and 3D-DIC experiments have been implemented to investigate the variations of measurement accuracy induced by the geometric changes of hardware positions. The experiment system is composed of two EPIX SV9T001C cameras with a resolution of 2048 × 1536 pixels, an EPSON Powerlite 98 projector, and a desktop computer with an Intel Core i7-980 processor and 8GB RAM. The experiments use two planar calibration boards with 10 × 7 concentric-circle patterns on each, where the pattern spacings are 25.4 and 12.7 mm, respectively. The image capturing and analysis software is written in the C++ language, and the cameras and projector are synchronized by the software. The captured images are saved in the 8-bit bitmap format, which is uncompressed and lossless.
In order to achieve 3D shape measurements with high accuracy, a few measures have been taken for the experiments to isolate the noise sources. These measures include running the experiments in the basement lab, setting up the systems in an enclosure on an isolation optical table, and starting the experiment after thermal equilibrium is reached.

System geometry
The experiments adopt a typical distance of about 1 m from the cameras and projector to the target. Longer or shorter distances can be used, but they do not affect the goal of the accuracy comparison. The angle between the camera and the projector/camera positions in the experiment system plays a key role in affecting the measurement accuracy. In triangulation theory, the height or depth distinction can be more accurately detected if the baseline (i.e. the distance between the two cameras) increases. A longer baseline indicates a larger viewing angle between the cameras in the 3D-DIC technique or between the camera and the projector in the FPP technique. A larger angle leads to larger disparities of points in the images (here the projector is technically treated as a reversed camera for the FPP measurement), which theoretically helps enhance the measurement sensitivity and consequently accuracy. In practice, however, an increased angle causes larger affine transformation between the images, which may bring down the measurement accuracy. In addition, it often results in less overlapping regions to reconstruct the desired 3D shapes, and a large angle should be particularly avoided in the presence of excessive occlusions and shadows. On the other hand, a decrease in the angle may bring more difficulty in distinguishing the differences between images, which can further diminish the measurement accuracy.
In the experiments, the measurement accuracy of both techniques is investigated with different camera-camera or camera-projector angles ranging from 15 • to 45 • with an increment of 10 • . Angles beyond this range are not considered because they are impractical in accurate real-world applications. In the system hardware setup, the two cameras are positioned symmetrically with respect to the reference plane (or the x-y plane of the world coordinate system), whereas the projector is located right below one of the cameras and oriented in the same direction, as previously illustrated in figure 1.
The access to an appropriate gage object whose height or depth is precisely known with sub-micron or nano-scale accuracy is unavailable in this work. Instead, the specimen target is a cuboid with its front surface covered by a special pattern as shown in figure 5, where the white and speckle regions are for the simultaneous FPP and 3D-DIC measurements, respectively. Under the cuboid is a translation stage, which is driven by a differential adjuster and piezoelectric actuator with a translation range of 0-25 µm at nanoscale accuracy. In addition, the front surface of the specimen target is positioned parallel with the reference plane and perpendicular to the motion direction of the translation stage. The adjustment and alignment operations are aided by using laser light and reflection mirrors. The inevitable small misalignment in practice is negligible since the induced error is relatively tiny. For instance, a 2 • misalignment in the motion direction would result in an error of 0.006 µm for a movement of 10 µm. Furthermore, the relative and averaged position of the specimen surface is chosen as the physical quantity for accuracy assessment, and no additional noise-reduction process is applied to the techniques. It must be clarified that the 3D shape measurement accuracy is determined by the minimum depth change that can be detected, therefore a flat surface driven by an ultrahigh-resolution translation stage is employed.
The primary procedure of the experiments is as follows: (a) Capture images for the system calibration. The FPP technique requires capturing 10-20 sets of the calibration board images at different positions with the left camera (for simplicity, the FPP measurement is assumed to use the left camera hereafter). At each position, multi-frequency phase-shifted fringes are Figure 5. Speckle checkerboard patterns on the specimen surface and the illustration of masks and z-coordinate maps. The z-coordinate map is not uniform due to a small misalignment between the specimen surface and the x-y reference plane of the world coordinates.   of the world coordinate system, the positions are calculated by subtracting the initial position from each new position after translating the target. It can be seen from the tables that the 25 • configuration yields the best results. The results also reveal that the largest error of 0.699 µm occurs at 45 • angle in the case of 15 µm displacement for the FPP measurements, and the largest error of 0.702 µm occurs at 45 • angle in the case of 10 µm displacement for the 3D-DIC measurements. It is noteworthy that the largest errors seem to occur at a random distance because of system uncertainties, but the root cause is unclear.
The relative accuracy is defined as the ratio of out-of-plane measurement error to the in-plane width dimension. With the field of view being 415.0 mm wide in the experiments, the relative accuracy can be calculated as 0.000699 mm/415.0 mm ≈ 1/595 000 for the FPP technique and 0.000702 mm/415.0 mm ≈ 1/590 000 for the 3D-DIC technique. Overall, the measurement accuracy of both techniques is close to 1/600 000, indicating an ultrahigh accuracy. In the following experiments, the angle of 25 • is adopted.

360-degree 3D image reconstruction
To demonstrate the measurement accuracy of the FPP and 3D-DIC techniques, the 2-in-1 experiment system has been utilized to reconstruct 360 • 3D images of some objects. The logic of this experiment is, without high measurement accuracy, building accurate 360 • 3D images from multiple views would be difficult because of error accumulation.
In the experiment, the projector is utilized to project speckle patterns on the objects of interest owing to their lack of required surface texture. The measurement system is first positioned to obtain the 3D image of an object from a fixed view, and the measurement is then repeated by rotating the object to cover the entire surface. Totally ten different 3D images are acquired and combined to form a complete 360 • 3D image. Figure 6 shows the 3D images reconstructed from the two techniques where the first image in each row is cropped from one of the original images captured for the corresponding technique and the remaining images are selected views of the reconstructed 360 • 3D image. In the figure, the first two rows demonstrate representative results acquired by the FPP technique, and the following two rows show representative results obtained by the 3D-DIC technique. Because the true dimensions of those objects are unknown, visual assessment is employed here.
By successfully generating the 360 • 3D images, the experiments help verify the capabilities of the FPP and 3D-DIC techniques in terms of measurement accuracy.

System resolution
Results presented in figure 6 show different resolutions for the FPP and 3D-DIC techniques. Such an issue of resolution is not able to be revealed from the previous accuracy test since the specimen is planar. Therefore, an experiment on system resolution is conducted.
In this experiment, the field of view is reduced to about half of the previous experiments and a smaller calibration board with a grid distance of 12.7 mm is adopted. In the meantime, the camera-camera and camera-projector angles as well as the test objects remain unchanged. Figure 7 displays the comparison of the acquired 3D reconstruction results with the ones obtained in the previous experiments. It is evident from a visual comparison that higher resolution can substantially improve the performance of both techniques, especially the 3D-DIC one. The reason is that the higher resolution of the cameras gives more detail for the local regions, and the resolution of the 3D results is accordingly improved. Because the DIC algorithm highly depends on local information for disparity detection, the improvement is visually more distinct.

Multiple separate objects
The ever-broadening applications have driven the 3D imaging and shape measurement techniques to possess the capability of handling geometric discontinuity and acquiring 3D images of multiple separated objects in the field of view simultaneously. For this reason, the fourth experiment has been implemented to demonstrate such a capability. Representative results presented in figure 8 validate that both techniques can cope well with geometric discontinuities. Again, it is shown that the FPP technique outperforms the DIC technique in terms of local detail due to the pixel-by-pixel processing versus subset-based analysis.

Conclusion
This paper has presented an experimental investigation into the accuracy comparison of two popular techniques for the 3D imaging and 3D shape measurements: the FPP and 3D-DIC techniques. It is revealed that the fundamental principles of the two techniques are both based on binocular stereo vision, or more rigorously, triangulation imaging. Their main difference is that the FPP technique employs fringe patterns to facilitate the image correspondence detection, while the DIC technique uses speckle or texture patterns to carry out the image matching task. By using a measurement system composed of both the FPP and the 3D-DIC techniques, an evaluation on how the geometric angles between key hardware components affect the 3D measurement accuracy is accomplished for both methods. It turns out that the depth and height measurements of both techniques can reach sub-micron accuracy, and the relative accuracy of the 3D shape or position measurements can reach 1/600 000 for both techniques. The results indicate that the accuracy of depth measurement can be close to that of the classical optical interferometry techniques such as Twyman-Green interferometry and Fizeau interferometry [80].
It is important to point out that the acquired ultrahigh accuracy is based on detecting the average location changes of a flat plane, so the noise influence is quite low in the assessment. The investigation considers only the rigid-body translations because their ground-truth values can be assured. Moreover, since the accuracy of the 3D shape measurement is fundamentally the ability to detect the smallest depth or height change, it is technically reasonable and sufficient to study solely the translations.
For both techniques, the accuracy is substantially higher than the resolution. Specifically, the accuracy can reach sub-micron level and the typical resolution is at the scale of tens of microns. Consequently, the resolution will be dominant in actual applications where local details are often of interest. Because the FPP algorithm is based on pixel-by-pixel processing and the DIC algorithm is based on subset-matching analysis, the FPP measurement outperforms the 3D-DIC measurement in terms of resolution. Nevertheless, the resolution of cameras has been improving at a pace much faster than that of projectors; therefore, the resolution is not a practical problem in the applications of the 3D-DIC technique. The major problem with the 3D-DIC technique remains the analysis speed, though speed evaluation is another topic beyond the scope of this paper.