Propagation of uncertainty for an epipole-dependent model for convergent stereovision structure computation

An analytic model incorporating stereo epipoles is proposed for structure computation using a convergent stereovision setup. The developed model is predicated on the image parameters of both CCD camera sensors, together with two extrinsic parameters, namely the stereo baseline distance and the stereo projection angle of the scene point of interest. In the model, the points on the image planes are measured relative to the principal points, stereo epipoles are featured, and only focal length-normalized camera sensor coordinates are required for structure computation. The reconstruction model could be employed in active vision-based metrology in which the stereo imaging cameras are systematically rotated about their vertical axes relative to each other. The performance of the model is studied, and its accuracy tested by comparing the 3-space coordinates it predicted to the those obtained by a gold standard triangulation and to the ground truth results. In terms of execution speed the proposed reconstruction model exhibited a computation time of 0.6 ms compared to 6.2 ms and 9.9 ms recorded for the direct linear transformation and gold standard triangulation algorithms respectively. The coordinate measurement uncertainties determined by experimental methods are subsequently compared with those obtained by a theoretical approach based on the analytic reconstruction model. Strong correlations were found to exist between the two sets of uncertainty values obtained.


Introduction
Stereovision is an imaging technique that allows the reconstruction of point coordinates in three-dimensional (3D) space based on images acquired from two cameras (Sankowski et al 2017).By detecting the same point in the corresponding frames, its coordinates (x, y, z) can be precisely determined.* Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Stereovision techniques aim to recover 3D information from two images of the same scene taken from several different points of view (Samper et al 2013).
Stereovision finds application in several areas such as navigation of autonomous vehicles, mobile robots, parts inspection for quality assurance, 3D measurements, and tracking and identification of objects.However, unique areas of application abound which include measuring the position of a mobile drilling unit used in the manufacture of semitrailer chassis (Samper et al 2013), 3D hand interaction in augmented reality (Lee et al 2008, Lee andChun 2014), and measuring the relative velocity of a moving object (Murmu et al 2019).Interestingly, in the field of agriculture, simple stereovision systems have been designed for and demonstrated in the estimation of the size and weight of live animals (Menesatti et al 2014), estimation of the spatial growth parameters of plants (Lati et al 2013) and measurement of the height of field crops during cultivation (Kim et al 2021).Sasiadek and Walker (2019) also highlighted the relevance of stereovision to unmanned aerial vehicle (UAV) navigation and UAV aerial refuelling.In the field of metrology, stereovision is deployed in pose estimation and the measurement of the dimensions of objects (Di Leo et al 2011a and2011b).
In terms of the relative orientations of the image sensors and principal axes of the stereo cameras, a stereovision system may be classified as a coplanar-parallel stereovision system (which has coplanar image planes and parallel principal axes) or convergent stereovision system (which has non-coplanar image planes and non-parallel principal axes, but pairs of image plane principal axes are coplanar e.g.horizontal image 1 axis is coplanar with horizontal image 2 axis).The coplanarparallel stereovision setup may be referred to as a rectilinear stereovision rig (Hartley 1999).While most measurement uncertainty studies in the stereovision literature focused on coplanar-parallel stereovision, the convergent stereovision system seems not to have received significant attention particularly regarding scene reconstruction and measurement uncertainty that take the stereo epipoles into account.It is of course noteworthy that for a stereo camera system with non-coplanar image planes and non-parallel principal axes, the basic triangulation formulas applicable to the coplanar-parallel stereo camera setup will be invalid for 3-space scene reconstruction.Such a convergent stereovision system with non-parallel principal axes (offering the benefit of a wider common field of view) is geometrically shown in figure 1 where the dashed lines indicate the principal axes of the cameras.The angles θ, α 1 , α 2 , ω 1 and ω 2 in figure 1 are defined in section 3 of this paper.
Obtaining 3-space coordinate information from a stereovision system essentially involves three steps-camera calibration, point-to-point correspondence matching and triangulation.Barnard and Fischler (1982) list the functional components of the computational stereo paradigm as image acquisition, camera modelling, feature acquisition, image matching, depth determination, and interpolation.In this work, the focus is on structure computation which encompasses depth determination.Hartley and Zisserman (2003) presented a numerical method for computing the position of a point ⌣ X = (x, y, z) in 3-space given its image plane coordinates ⌣ x 1 = (u 1 , v 1 ) and ⌣ x 2 = (u 2 , v 2 ) in both camera sensors and the camera matrices P 1 and P 2 of both views.In the said method, the governing set of equations for 3D structure computation, by way of linear triangulation, is given (in compact form) by the homogeneous equation where and p jT are the jth rows of the camera matrices P 1 and P 2 .
Equation ( 1) is widely employed in reconstruction problems in geometric computer vision and is applicable to both coplanar-parallel and convergent stereovision systems.It is linear in the components of X, and can be solved by the method of least squares which is essentially a numerical method in which singular value decomposition is employed.
A closed-form analytic reconstruction model has been developed for use in convergent stereovision metrology.It is widely cited in stereovision literature for the estimation of the 3-space coordinates (x c , y c , z c ) of a world point relative to the left camera coordinate frame.It has been studied by Lin and Chen (2013), Yang et al (2018), Gao et al (2020), and Liu et al (2021).This model however cannot be employed in active convergent stereo-camera imaging and it does not consider the epipolar geometry of the stereo-camera setup.
In this work, a unique method for structure computation of a convergent system using an analytic approach is presented.The method proposes a model (independent of the camera matrices after calibration) that requires not only the baseline of the stereo setup and the image plane coordinates of a 3space point of interest but also, the epipoles, and the stereo projection angle defined by the epipolar plane.The technique developed in this paper fundamentally follows the geometric approach adopted in developing the model studied by Lin and Chen (2013), Yang et al (2018), Gao et al (2020), and Liu et al (2021) except that the ensuing analysis assumes the optical centers to be behind the image planes in accordance with the approach of many studies on stereovision.Also, in the method adopted in this study, the epipolar geometry of the stereovision rig was taken into consideration and the developed model can be applied in active stereovision metrology.
Over the past few decades, several researchers in the machine vision community have made invaluable attempts at understanding and analysing the measurement accuracy of objects in 3D space using stereovision systems.Typical sources of error are the uncertainties related to the intrinsic and extrinsic parameters of the stereovision setup, and their effects on measurement accuracy have been widely studied.Also, there are uncertainties associated with the basic steps-image acquisition, camera calibration, segmentation, and correspondence-involved in stereovision measurement prior to reconstruction, and they cumulatively contribute to the inaccuracy of the reconstructed point Barnard and Fischler (1982).Chen et al (2008) considered uncertainty propagation, under calibration and reconstruction, for a generalized stereovision system taking into consideration the degrees of freedom associated with a typical camera matrix.However, in the input covariance matrix, the input quantities were regarded as random and uncorrelated and synthetic data were used to demonstrate the performance of the developed uncertainty models.
Di Leo et al (2010) and (2011b) carried out a comprehensive analysis of uncertainty propagation in stereo reconstruction considering the errors introduced by the calibration and triangulation algorithms.For calibration, the direct linear transformation (DLT) was used whereas a linear triangulation algorithm as presented by Hartley and Zisserman (2003) was adopted owing to its 'simplicity, numerical stability, and repeatability'.However, only 20 image pairs were used in the study of measurement uncertainty.
In this study, using the law of propagation of measurement uncertainty together with the proposed model for structure computation and 40 image pairs, theoretical and experimental uncertainty values are determined and compared for convergent stereovision metrology.In the theoretical uncertainty model, both full and diagonal input covariance matrices were considered to investigate the assumption of parameter independence.The variation of measurement accuracy with the parameters (linear and angular) that feature in the developed reconstruction model is also studied and demonstrated through graphical presentations.
Section 2 in this paper presents the theoretical fundamentals of stereo vision.In section 3 the proposed analytic reconstruction model is developed, and the associated uncertainty expressions illustrated in section 4. The experiments and results are discussed in section 5.This is followed by the limitations and the evaluation of the model in sections 6 and 7 respectively.The paper is concluded in section 8.

Stereovision fundamentals
In stereovision two cameras are deployed to facilitate the determination of the 3D coordinates of a world point of interest by a process that essentially involves image acquisition, stereo calibration, and structure computation.The acquired image pairs are used for the calibration process which yields the camera matrices containing the intrinsic and extrinsic parameters.With the camera matrices the projection points of the world point on the image planes can be obtained.
Table 1 shows the equations that generally express the projection or mapping of an arbitrary 3-space world point to the image planes in a stereovision system.The world point has the absolute or inhomogeneous coordinates ⌣ X w = (x, y, z) with respect to the left camera and may be homogenized as X W = (x, y, z, 1).
The parameters of the matrix equations are defined as follows: α u1,2 and α v1,2 are focal lengths of the cameras in pixels in the horizontal and vertical directions.u p1,2 and v p1,2 are the horizontal and vertical pixel coordinates of the principal points on the image sensors.u 1,2 and v 1,2 are the horizontal and vertical pixel coordinates of the 2D point on the image sensors.s 1 and s 2 are the skew parameters for both cameras.K 1 and K 2 are the intrinsic matrices of the cameras.g and h are arbitrary homogeneous scale factors.R and ⌣ C respectively represent the rotation matrix and the inhomogeneous coordinates of the left camera cenetr relative to the right camera frame.
If the position ⌣ C and orientation R w c of the camera coordinate frame relative to the world frame are known, and X c is defined as the vector of the absolute coordinates of the 3-space world point relative to the camera frame, using the reconstruction equation is obtained as (3) Solving equation (3) numerically for X c will be a computationally difficult and time-consuming task.In this paper an analytic model is developed to directly determine X c that will satisfy equation (3).A numerical task is therefore solved with a geometric approach yielding an analytic model.

Propagation of uncertainty
The guide to expression of uncertainty in measurement (ISO/IEC 2008) defines measurement uncertainty as the 'parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand.'The parameter could be a standard deviation which is a quantity that provides a quantitative measure of the concept of uncertainty.The uncertainty associated with a measured or numerically evaluated physical quantity explicitly expressed as a function of some input quantities is dependent on the uncertainties of the input quantities.The output uncertainty can be conveniently determined by the so-called law of propagation of error which is widely used in stereovision metrology.This has been extensively studied by Chen et al (2008), Di Leo et al (2010), Di Leo andPaolillo (2011a and2011b), Lin and Chen (2013), Yang et al (2018), andLiu et al (2021).
Mathematically, for an arbitrary output random variable y dependent on or functionally related to n interdependent or correlated input variables x 1 , . . ., x n such that y = f (x 1 , . . ., x n ), the uncertainty u y associated with the output variable is defined as the positive square root of the variance u 2 y which itself is given by Table 1.Equations for 3D-to-2D point mapping in a typical stereo system under central projection.
Camera 1 Camera 2 where u (x i , x j ) = u (x j , x i ) is the estimated covariance associated with x i and x j .Equation ( 4) can be written more compactly as where Λ x is the input covariance matrix and J x is the input Jacobian matrix.
If the input-output relational equations are in explicit multivariate form, the governing equation for the law of uncertainty propagation assumes the expression where Λ q is the output covariance matrix given by Λ p is the input covariance matrix and J p the input Jacobian matrix.The size and content of Λ p and J p are functions of the number and the algebraic expressions of the input quantities upon which the output variables (x-, y-, and z-coordinates) are dependent.
Equation (6) shows how the uncertainties of the input quantities p i , taken to be equal to the standard deviations of the probability distributions of p i , combine to give the uncertainty of the output variable q if that uncertainty is represented by the standard deviation of the probability distribution of q.The covariance matrix, for a multivariate probability distribution, is a matrix of the variances and covariances of the variables.The variances constitute the diagonal elements, while the covariances make up the off-diagonal elements.In this work, the uncertainties of the input quantities are determined by finding the standard deviation of each of the quantities after 40 image acquisitions.

Analytic reconstruction model
The schematic configuration of the cameras of the stereo camera rig used in the development of the reconstruction model, together with the coordinate frames, is shown in figure 2. In this analysis it is assumed that the origin of the retinal image plane coordinates is at the principal point.The coordinate frame of the left camera is taken as the reference frame for any 3D world point, and the y-coordinate world point is measured relative to the reference epipolar plane perpendicular to the image planes of both cameras.For this study, the underlisted symbols1 are defined as follows: The camera sensor parameters of the stereovision system are ū,v, ūe , ūp and vp whereas the external parameters are b, θ, φ and ω.The horizontal coordinates of the epipole and image point are measured relative to the principal point x 0 of the image plane, and the image point varies as the position of  the 3-space world point changes with respect to the camera frames.
To develop the model for 3D structure computation, there are four possible practical scenarios to consider, each of which is described by the position of the principal axes of both cameras relative to the epipolar plane defined by the baseline and the projection lines from the scene point to the optical centers.The four possible cases are shown in figure 3.
In practice, one of the four possible settings in figure 3 will be applicable in convergent stereo camera imaging.The relevant schematic is dictated either • by the positions of the image points relative to the principal points on the respective image planes or • by the relative positions of the stereo principal axes, relative to the geometric bounds of the reference epipolar plane.
Selecting the correct option among the four can therefore be automated using these criteria.The implication, therefore, is that the reconstruction model to be developed could be employed in active vision-based metrology in which the stereo imaging cameras can be systematically rotated about their vertical axes relative to each other.Such rotation would alter the relative positions of the principal axes, which are coplanar with the reference epipolar plane, thereby varying the stereo convergence angle and possibly extending the stereo field of view.
In the mathematical derivation that follows, given a known 3-space point ⌣ X w = (x w , y w , z w ) defined with respect to a world coordinate frame, a model is geometrically developed for the point ⌣ X c = (x c , y c , z c ) relative to the left camera coordinate frame in terms of the point's retinal image plane coordinates and linear and angular parameters.

Both principal axes within epipolar plane
The development of the model commences with the geometric configuration represented in figure 3(a) which satisfies the conditions u p1 > u 1 and u p2 < u 2 .From trigonometry it is understood that: where ) .
The 3D coordinates are then given by )] sinθ Normalizing the image plane coordinates in equation ( 8) with respect to the focal lengths such that u f = ū and v f = v , equation ( 9) is obtained where (10)

Left principal axis outside and right principal axis within epipolar plane
Similar analysis for the setting represented by figure 3(d), for which u p1 < u 1 and u p2 < u 2 , shows that where It should be noted in this case that the directions of the xand y-axes of the left camera frame have been reversed with the relative positions of all axes remaining the same.This ensures a right-handed coordinate frame is maintained and is responsible for the negative signs associated with the x-and y-coordinates.

Both principal axes outside epipolar plane
Consider a scenario where the principal axes of the cameras are outside of the reference epipolar plane which is shown in figure 3(b).Note that u p1 < u 1 and u p2 > u 2 .To facilitate mathematical analysis, the directions of the x-and y-axes of both frames have been changed while the relative positions of all axes remain the same, thereby maintaining a right-handed coordinate frame.Obviously From the law of cosines where (14)

Left principal axis within and right principal axis outside epipolar plane
Again, similar analysis for the setting represented by figure 3(c), for which u p1 > u 1 and u p2 > u 2 , shows that where It can be observed that the three world spatial variables in the four sets of equations are expressible in terms of the products of functions in which parameters from both sides of the stereo setup are interlinked.Though all the three world coordinates are generally dependent on the baseline distance b, stereo projection angle θ, epipole ūe2 and image points ū1 and ū2 , only the y-coordinate distinctively varies with v1 .
The developed analytic model is therefore represented by equations ( 9), ( 11), ( 13) and ( 15), each of which is applicable under certain conditions.The four equations are geometrically equivalent and are each applicable provided the associated conditions are satisfied.Theoretically, the stereo projection angle can vary in the range 0 < θ < π .It is obvious that at θ = π (which corresponds to a setting where the 3-space point of interest is on a straight line that joins both camera centers) the coordinates would be numerically and geometrically undefined.The same is applicable for θ = 0, a scenario which can be practically approximated by a sufficiently short baseline distance with the 3-space point of interest considerably further away from the cameras.For very small values of θ, sinθ in the denominators of the model equations may simply be approximated by θ.
The input Jacobian matrix J p is defined as and the input covariance matrix Λ p is given by By way of similar analysis, equations for J p and Λ p can be readily obtained for the three other cases of the model represented by equation ( 11) (for which and equation ( 15) (for which u p1 > u 1 , u p2 > u 2 and p = [β, θ, (ū p1 − ū1 ), (v p1 − v1 )]).
In the foregoing presentations in this section, full covariance matrices Λ p were used, indicating the covariances between all the input parameters.This is a departure from the partial covariance matrices or diagonal approximation of covariance matrices used by some researchers.

Experimentation and results
The stereo rig used in this work is depicted in figure 4. It is basically composed of two identical Canon EOS 4000D cameras whose technical specifications are listed in table 2. Throughout the experiments, the cameras remained rigidly fixed relative to each other.The calibration object is mounted on a microstage with a movement range of 50 mm and movement resolution of 0.01 mm.

Stereo rig calibration
The first step in the experimental procedure is camera calibration.Calibration is required to extract the internal and external parameters of the cameras.These parameters may be extracted with well know camera matrix decomposition methods (Hartley and Zisserman 2003).From this the focal lengths, epipoles, principal image points and camera centers are used.The latter is used to find the base line distance.The convergence angle η is calculated from the extracted rotation matrices.This will facilitate the determination of the image plane coordinates necessary for the implementation of the developed model.The calibration object, depicted in figure 5, has 96 well-defined corner coordinates (the dimension of each square being 20 mm by 20 mm).Each corner was measured on a mitutoyu coordinate measurement machine (CMM).The object was initially positioned at the zero-millimetre mark of a micrometer stage as figure 6 depicts, and a pair of images simultaneously captured for the calibration of the stereo rig.
Image points were detected by first doing a line detection on the edges of the bright squares of the calibration object, using a Hough line transform from OpenCV.The line intersections were then calculated and represented the corner points.This, to some extent, mitigates the effect of wear or damage on the corners of the squares.The contrast between the black background and the bright top surface ensures that only the points on the top surface of the squares are used.
A calibration algorithm from Hartley and Zisserman (2003) was used that simultaneously determines the distortion parameters according to their distortion model.

Theoretical considerations
The camera matrices P 1 and P 2 indicate that the stereo camera pair are finite projective cameras since the lefthand 3 × 3 submatrices are characteristically non-singular and are each decomposable into upper-triangular calibration matrix K and orthogonal rotation matrix R as a product M = KR.The following theoretical concepts were considered during this experiment.
1. From the anatomy of the finite projective camera generally represented in compact mathematical form as P = [M|p 4 ] (where M is the non-singular left-hand 3 × 3 submatrix and p 4 is the fourth column of the camera matrix P), the principal axis vector passing through the camera center and directed towards the front of the camera is given by v = det (M) m 3 (where m T 3 is the third row of M).With this information the convergence angle η of the stereo rig, the angle between the principal axes of the cameras, can be determined.2. Theoretically the principal point is determined as x 0 = Mm 3 .It may alternatively be extracted from the camera calibration matrix.3. The baseline is the distance between C 1 and C 2 , and since the stereo rig is mounted with finite projective cameras, C = (−M −1 p 4 , 1) T .Given that C represents the onedimensional right null-space, or simply the right-null vector, of P, that is, PC = 0, C may alternatively be determined using the method of least squares.4. The epipole on the right image plane is computed as e 2 = P 2 C 1 .As e 2 represents the left-null vector of the fundamental matrix F or the right null-vector of F T , it can alternatively be determined from the equation e T 2 F = 0 or F T e 2 = 0.

Experimental evaluation of developed model and its uncertainty
To evaluate the performance of the developed model and its associated uncertainty, some corner points on the calibration object were selected as candidate 3D scene points.The selected points were chosen because of their geographical spread in the field of view and depth of field of the stereo rig and their varying depths from the reference frame.Table 3 shows the 5 candidate points with the applicable model equations and conditions at every station.The model is evaluated by comparing the x-, y-, and z-coordinates (of the candidate points on the calibration object) predicted by the model with the triangulated values and the ground truth.
The calibration object has 96 well defined calibration ground truth points.In order to increase the number of ground truth points, the object was subsequently moved to the 5-, 10-, 15-, 20-, 25-, 30-, 35-, 40-, 45-, and 50 millimetre marks of the micrometer stage away from the calibrated stationary stereo camera pair and new image pairs simultaneously acquired at every station; a station corresponds to each of the stated millimetre marks of the micrometer stage.The calibration object was moved over 5-millimetre intervals to facilitate the study of the performance of the developed analytic model with distance.At each station also, new ground truths and triangulated coordinates (for all the numbered corners on the calibration object) were obtained by the following steps: 1.After obtaining the camera matrices P 1 and P 2 , the calibration object was moved to the 5 millimeter mark and triangulated coordinates obtained for all the defined corners using the DLT. 2. Defining the centroid c as c = 1 96 ∑ 96 i =1 (x i + y i + z i ) where i = 1, 2, 3, . . ., 96, the centroids c 1 at the zero-millimetre mark and c 2 at the 5 millimetre mark were obtained using the defined ground truth GT 1 (CMM measured points) and the DLT-triangulated coordinates respectively.3. The displacement vector is determined as d 1 = c 2 − c 1 .4. The ground truth for the calibration object at the 5 millimitre mark then becomes GT 2 = GT 1 + d1.
Moving the object to the 10 millimitre mark, the foregoing steps are repeated to determine a new displacement vector d 2 and a new ground truth GT 3 = GT 2 + d 2 .This cycle is repeated for the calibration object at the other stations relative to the micrometer stage at intervals of 5 mm.For each displacement of the calibration object from the stereo rig the calculated distance d moved is given as the norm of d.
For the study of stereovision measurement uncertainty, the calibration object pictured in figure 5 was initially positioned at the 20 mm mark of a micrometer stage scale as figure 6 depicts, and a pair of images simultaneously captured for the calibration of the stereo rig.Moving the calibration object to the 25 mm mark (while the calibrated stereo rig remained stationary), 40 image pairs of the corner points were acquired.For the sake of clarity, the subsequent steps undertaken in the experimentation are described as follows: 1.The image points on both sensors were recorded.Then the points were undistorted using the model of Hartley and Zisserman (2003).The undistorted image points were used in all subsequent analysis.The angular parameters, θ and β, of the finite epipolar planes calculated for each image pair acquisition.The x-, y-, and z-coordinates of the candidate world points for each image pair were determined using the applicable reconstruction model represented by equations ( 9), ( 11), ( 13) and ( 15) depending on the values u p1 and u p2 relative to u 1 and u 2 respectively.2. For each world point of interest, the input Jacobian matrix J p and input covariance matrix Λ p were computed to obtain the output covariance matrix Λ q using equation ( 6), the law of propagation of error.For a particular world point of interest, J p and Λ p were determined using the appropriate equations for them which are dictated by the values u p1 and u p2 relative to u 1 and u 2 respectively.The expressions for J p and Λ p are stated in section 4. The values of the x-, y-, and z-coordinates and of the other parameters used in finding J p are the averages of their respective values determined as described in the foregoing paragraph.It is important to state here that full input covariance matrix Λ p (accounting for the covariances between all the applicable system parameters) was determined for each world point using the data from the 40 image pairs.Finding Λ q using the error propagation equation, in the context of this work, is regarded as a theoretical approach to calculating the measurement uncertainty of the 3-space coordinates.3. Step 2 was repeated for each world point using diagonal input covariance matrix Λ p , which was obtained by setting the off-diagonal entries to zero.The essence was to study the effect on measurement uncertainty of ignoring the covariances between the input parameters.4. For each of the 40 image pairs, triangulated coordinates for the world points of interest were obtained using the gold standard algorithm.With these triangulated values, an output covariance matrix Λ q was determined using established formulas in statistics for variance and covariance.Finding Λ q using the formulas, in the context of this work, is regarded as an experimental approach to calculating the measurement uncertainty of the 3-space coordinates.
The diagonal entries of the output covariance matrices Λ q determined in steps 2, 3 and 4 represent the variances and, also, a measure of the uncertainties of the spatial coordinates of the candidate world points.The value of uncertainty is the positive square root of the variance.
By way of demonstrating that 40 image acquisitions for each corner point is satisfactory in the study of the uncertainty model presented in section 4, the convergence of the standard deviation of each of the parameters θ, β, (ū 1 − ūp1 ), (v 1 − vp1 ), (ū 2 − ūp2 ) is tested.Figure 7 shows the behavior of the curves for point 0. Except for the parameter (v 1 − vp1 ), others indicate a satisfactory convergence of the standard deviations which could be improved by taking a relatively much larger number of images.For practical limitations, however, this work settled for 40 image acquisitions.

Results
The performance of the developed geometric model relative to the triangulated and ground truth results is shown in the graphical sketches of figure 8 for the x-, y-, and z-coordinates of the candidate 3D points.For the sake of space, only point 0 is shown; the graphs for points 42, 60, 64 and 92 quite similar.It can be readily seen that the 3-space coordinate values predicted by the model closely follow the triangulated values and the ground truth with minimal or insignificant error for all the candidate corner points.
In this paper the error for each coordinate is defined by considering the value dictated by the developed model relative to the triangulated value and to the ground truth result.For the triangulated value, the error may be expressed mathematically as The variation of error with the displacement of the world points of interest from the stereo rig is shown in figure 9.For the x-and z-coordinates the magnitude of the error for each candidate corner point on the calibration object is reasonably constant as the point moves further away from the reference camera frame.As expected, it is indisputably obvious that, in terms of magnitude, corner point 92 (the nearest point to the left camera frame) has the least amount of error while corner point 42 (the furthest point from the left camera frame) has the largest amount of error for each displacement of the calibration object.The same can reasonably be said of the y-coordinates for some displacement points.While most of the error variations for the x-and zcoordinates follow predictably straight-line trends, the variations associated with the y-coordinates (bounded as −0.5 mm < δ yDLT < 1 mm) are visibly irregular.This seemingly anomalous trend could be attributed to the observed deviations of the stereo setup from the fundamental geometric assumptions adopted prior to the mathematical derivation of the model.First, there was an offset between the vertical components of the cameras' principal points.Again, the vertical components of the principal point and the epipole of the left camera were out of horizontal alignment with each other.The pre-model assumptions are clearly stated in section 6 in which the limitations of the model are acknowledged.
Studying the model relative to the ground truth results, the error becomes Figure 10 shows the variation of error with the displacement of the world points of interest from the stereo setup.It can be observed that the error variation is irregular with a pattern of rises and falls for the three coordinates.Again, this seemingly unusual trend could be ascribed to the recorded deviations from the fundamental pre-model geometric conditions such as the offset and misalignment stated in the previous paragraph.However, the error is reasonably and can be said to vary within some discernible bounds or limits (−1.5 mm < δ GT < 2 mm) for any candidate world point.
The variation of the range error for a world point of interest relative to the reference camera frame is shown in figure 11.The graphs show seemingly undulating patterns.However, the corresponding regression lines indicate a general trend in which the error in the measured range increases with the distance of the world point from the stereo system.
Table 4 shows the model-based theoretical and triangulation-based experimental uncertainties for each of the five candidate world points under consideration.The uncertainties were extracted from the diagonal entries of the output covariance matrices determined as described in section 5.3.
Strong correlations can be seen between the theoretical and experimental values of the uncertainties, and the values are of the same order of magnitude for the x-, y-, and z-coordinates of all the candidate points.Under the theoretical approach using the law of uncertainty propagation, measurement uncertainties were first determined using full input covariance matrices Λ p , which accounted for the covariances of pairs of all the input parameters.The theoretical uncertainty values were then re-calculated for the world points using diagonal input covariance matrices Λ p obtained by setting the off-diagonal entries of the full input covariance matrices to zero.This was aimed at studying the effect of ignoring the covariances of the input parameters on the measurement uncertainty.
For each 3-space coordinate, the theoretical uncertainties (determined with and without full Λ p ) compare with each other as follows: • The x-coordinates, except for point 60, yielded larger uncertainties if the input parameter covariances are ignored (although the values for point 92 did not change).• For the y-coordinates, except for point 92, larger uncertainties were recorded if the input parameter covariances are ignored.• The same can similarly be said of the z-coordinates except for point 60 and point 92.
On the balance of probabilities therefore, it can be inferred that neglecting the covariances of the input parameters of the stereovision system by using diagonal input covariance matrix Λ p in the theoretical method gives slightly larger uncertainty values for the measured 3-space coordinates.So the uncertainty model predicts smaller measurement uncertainties if all the system parameter covariances are utilized in implementing the law of error propagation.In most of the cases, the uncertainties obtained by the experimental approach (which used the gold standard triangulated coordinates as sampled data) are larger in value than the theoretical uncertainties (with or without full input covariance matrices) for respective scene points and their 3space coordinates.Taking the experimental uncertainty values as baseline or reference, the theoretical uncertainty values closer to the experimental uncertainties for each of the 3-space coordinates are highlighted in gray in table 4. Out of the 15 comparisons, seven theoretical uncertainties (with full Λ p ) are closer to the experimental uncertainties while six theoretical uncertainties (with diagonal Λ p ) are closer to the experimental uncertainties.This implies that there is no absolute trend in which a particular set of theoretical values predominate in terms of proximity to the baseline values.In practical terms, and from a probability viewpoint, within the context of this study, it may be affirmed that it makes no significant difference if full or diagonal input covariance matrices are used in the theoretical computation of measurement uncertainty.
As expected, the uncertainties for the x-, y-, and zcoordinates of point 42 have the largest values (for both theoretical and experimental computations) since the point is the furthest from the reference coordinate frame of the stereo rig among all the candidate scene points under consideration.The further the scene point is from the stereo camera system, the larger the uncertainty in the measured value.The coverage factor of the uncertainties is 1 since they are equal to the standard deviations.Defining the error associated with each measurement as Table 5 indicates the error values for each point for each world point for the model-based theoretical and triangulationbased experimental computation of the output covariance matrices, the diagonal entries of which are the variances of the measured 3-space coordinates.The world points are listed in order of their distance from the origin of the reference coordinate frame.
It can be observed that the theoretical and experimental errors for each point are closely related and of the same order of magnitude.Except for point 92, it is noted that, in general, using full input covariance matrix Λ p in the theoretical approach to the determination of measurement error predicts relatively smaller error values compared to when the covariances of the input parameters are ignored.However, the results show that the theoretical error values are generally less than the experimental error values for the scene points.The theoretical values closer in magnitude to the experimental values are shown in gray.
As expected also, the error for point 42 recorded the largest value (for both theoretical and experimental computations) since the point is the furthest from the reference coordinate frame of the stereo camera rig among all the candidate scene points under consideration.The further the scene point is from the stereo camera system, the larger the error in the measured value.It should be noted that the conclusions drawn regarding tables 4 and 5 are predicated on the fact that only 40 image pairs were used in the experiment on stereovision measurement uncertainty.
It is important to acknowledge that the overall accuracy of the developed model can be affected by the imperfections on the silver plates of the calibration object.This arises in capturing the pixel coordinates of the corner points.
Given that the uncertainties of the reference camera coordinates are embedded in Λ q , generalized expressions for them may be obtained.From equation ( 6) the following may be written: The expressions for the variances imply that there are correlations between every pair of the input parameters such that the off-diagonal elements of Λ p are non-zero.Using equation ( 22) graphical characteristics relating the accuracy of the model to the parameters in the stereo system can be obtained.The system parameters considered are focal length f, stereo projection angle θ, angle between baseline and right projection line β, normalized horizontal image coordinate on the left sensor (ū 1 − ūp1 ), normalized vertical image coordinate on the left sensor (v 1 − vp1 ), and normalized horizontal image coordinate on the right sensor (ū 2 − ūp2 ).The accuracyparameter relationships are shown in figure 12 which is representative of the candidate points on the calibration object.Since focal length does not appear explicitly in the proposed model and in the uncertainty equations, the graph showing the variation of the accuracy of the model with focal length is obtained by applying the technique of distribution of error defined by equation ( 6) (which gives the definition of variance) to the developed analytic model generally represented by equations ( 8) and ( 13), together with equation ( 21) (which defines the total measurement error).This method was adopted by Di Leo et al (2011b), Lin and Chen (2013), and Yang et al (2018), and may be extended to the angle and displacement parameters in the model.
Demonstration of the variation of measurement accuracy with focal length is as follows.Generally, from equation ( 6) and ignoring the covariances between pairs of the input parameters that feature in the model, the variances of the measurements for the x-, y-, and z-coordinates may be expressed as Re-writing equation ( 11) as and equation ( 13) as It is obtained that Recalling equation ( 21) which states that ε = √ σ 2 x + σ 2 y + σ 2 z , the error-parameter curves can be obtained for the left and right focal lengths.The non-linear characteristic for the variation in focal length (figure 12(a)) and the bathtub-shaped curves for the change in angular parameters (figures 12(b) and (c)) are consistent with the graphical characteristics obtainable in the available literature Yang et al (2018) and Gao et al (2020).This attests to the validity of the proposed model.As the focal length increases, the accuracy increases asymptotically to a near-zero error value.For the angle parameters, the error is minimum at 90 • .The accuracy is at its poorest if the stereo projection angle approaches 0 • (corresponding to when the depth of the 3D scene point from the stereovision rig is infinitely large) and 180 • .These, however, do not represent practical stereo settings.Scene points with θ and β values in the practical range of 60 • −80 • yield optimally low measurement errors; this result agrees with the findings of Liu et al (2021).In respect of the normalized image coordinates, the graphs (figures 12(d)-(f)) are equally non-linear.The non-linearity is stronger for the left camera sensor than it is for the right image plane.For both cameras, however, the error asymptotically collapses to a non-zero value as the normalized image point relative to the principal point increases.

Observed limitations of the developed model
The following observations were recorded in the performance of the developed geometric model.
1. Studying the effect of change in the stereo baseline distance on the 3-space measurement accuracy and uncertainty for the vision-based system (as was done for other system parameters using the law of error propagation) is not a mathematical feasibility with the developed epipolefeatured model.This arises because the baseline parameter vanishes if the first-order derivative of the model is taken with respect to the baseline.2. Although both cameras are identical and efforts made to ensure they are stationed on platforms that are roughly equal in height (there is a calculated height difference, based on the camera centers as extracted from the decomposed camera matrices, of 0.67 mm between the camera centers), an offset of 28 pixels or 0.12 mm between the vertical components of the cameras' principal points was recorded (v p1 = 1730.52pixels and v p2 = 1702.76pixels).This could potentially affect the performance of the developed model which, from first principles, assumed that the principal points of both image planes are horizontally aligned with each other on the reference epipolar plane.
3. Again, the vertical components of the principal point (v p1 = 1730.52pixels) and the epipole (v e1 = 1797.94pixels) of the left camera were discovered to be out of horizontal alignment with each other, the opposite condition of which the developed model equally assumed ab initio.This, also, could undermine the performance of the model in structure computation for the y-coordinate of any scene point of interest.4. Finally, having assumed a skew factor s = 0 in the analytic model, the non-zero skew of about 27 pixels recorded on the left image sensor after calibration can equally influence the calculated coordinate values of a particular world point of interest.
Despite all the limitations, however, it can be stated that the model 1. gives representative results comparable to the optimal triangulation algorithm.2. is representative of many real stereovision systems, and therefore can be used to predict the performance of such systems.

Evaluation of the proposed analytic model and its uncertainty
Relative to the model demonstrated for structure computation in vision-based metrology by Lin and Chen (2013), Yang et al (2018), Hu et al (2020), Gao et al (2020), and Liu et al (2021), the analytic model developed in this paper is unique and it compares as follows: 1.It takes care of the dynamic situation where the cameras can rotate about their vertical axes relative to each other, hence the four sets of equations that represent the four possible relative positions of the principal axes, relative to the geometric bounds of the reference epipolar plane.2. The proposed model brings the epipole of the stereo setup into significance as it incorporates the horizontal coordinate of the epipole of the right camera in the four equations.3. Except for the baseline distance and focal length, the parameters in terms of which the spatial coordinates are defined are all image plane coordinates.This is because mathematical expressions were derived for all the applicable angular parameters θ, β, φ , α, ω thereby avoiding any propagable uncertainty or error associated with the measurement of angular variables.
Compared to the numerical method of reconstruction by Hartley and Zisserman (2003), the analytic model proposed in this paper has the benefit of offering reduced computational complexity as it does not require knowledge of the system rotation matrix or Euler angles and the entries of the two 3 × 4 camera matrices for 3D structure computation after calibration.To simplify the model, the intrinsic parameters are normalized with respect to the focal lengths of the cameras.The model is unique in the sense that the points on the image planes are measured relative to the principal points, and only normalized camera sensor coordinates are required for structure computation.
By way of further demonstrating the merits of the proposed reconstruction model, its computation time was compared to that of the DLT and gold standard triangulation algorithms of Hartley and Zisserman (2003) using the pyMultiCam module in Python environment.For the sake of the comparison, Point 0 on the calibration object was chosen.The proposed reconstruction model exhibited a computation time of 0.6 ms compared to 6.2 ms and 9.9 ms recorded for the DLT and gold standard triangulation algorithms respectively.
In the study and comparison of the uncertainties of the coordinates predicted by the proposed model the theoretical approach and the experimental procedure of Di Leo and Paolillo (2011a) were adopted.While their work used 20 image pairs, this paper made use of 40 image acquisitions.As there was a strong agreement between the theoretical and experimental results in their work, so was there a strong correlation between the theoretical and experimental uncertainties obtained in this work for the candidate world points.

Conclusions
An epipole-dependent analytic model has been developed using a geometric approach for the structure computation of a world point with a convergent stereovision setup.In terms of accuracy the model was found to be sufficiently reliable.Relative to the numerical result obtained by the gold standard triangulation approach, the error performance of the model for the x-and z-coordinates of the chosen experimental candidate corner points proved to be satisfactory while the ycoordinates gave predictably rise-and-fall trends.The errors (relative to the ground truth results) associated with the x-, y-, and z-coordinates were observed to equally follow predictably rise-and-fall patterns within some discernible bounds as the range of the world point from the reference coordinate frame increases.The validity of the proposed model is underscored by the nature of the graphical characteristics that represent the variation of accuracy of the model with focal length and stereo projection angle which are consistent with what is obtainable in the open literature.In static mode, the model has the benefit of being simple and offering reduced computational complexity as the image plane parameters are normalized with respect to the focal lengths of the cameras, and it does not require knowledge of rotation angles and the entries of the two 3 × 4 camera matrices for structure computation after calibration.Again, it brings the epipoles of the stereo camera rig into significance.In terms of execution speed the proposed reconstruction model exhibited a computation time of 0.6 ms compared to 6.2 ms and 9.9 ms recorded for the DLT and gold standard triangulation algorithms respectively.The reconstruction model developed in this study could be employed in active vision-based metrology in which the stereo imaging cameras are systematically rotated about their vertical axes relative to each other.Such rotation would alter the relative positions of the principal axes on the reference epipolar plane, thereby varying the stereo convergence angle.In this case, the model can be integrated with the online calibration process and updating of epipolar geometry involved in active stereo camera imaging.Using the model an algorithm could be written to reliably determine the 3-space coordinates and accuracy of any scene point within the field of view and depth of field of a calibrated stereovision system.The model though is not without limitations which are ascribable to the stereo system deviations from the fundamental geometric assumptions adopted prior to developing of the model.

Figure 1 .
Figure 1.Geometry of a typical convergent stereovision system.The parameter definitions are given in section 3 of this paper.

Figure 2 .
Figure 2. Geometric configuration of a convergent stereovision rig for development and performance evaluation of model.

Figure 3 .
Figure 3. Four possible positions of principal axes relative to epipolar plane.(a) Both principal axes within (b) both principal axes outside (c) left principal axis within and right principal axis outside (d) left principal axis outside and right principal axis within.

Figure 6 .
Figure 6.Calibration object at the zero-millimetre mark of the micrometer stage before 5 millimitre interval movements away from the stereo camera pair.

Figure 7 .
Figure 7. Convergence of standard deviations of the normalised input parameters.

Figure 8 .
Figure 8.Comparison of developed model with triangulated values and ground truth for (a) x-coordinates (b) y-coordinates (c) z-coordinates.

Figure 9 .
Figure 9. Variation of coordinate difference (model relative to triangulation) with world point displacement from stereo rig.

Figure 10 .
Figure 10.Variation of coordinate difference (model relative to ground truth) with world point displacement from stereo rig.

Figure 11 .
Figure 11.Variation of range error with distance from the reference camera frame.

Figure 12 .
Figure 12.Variation of accuracy of the proposed model with (a) focal length f, (b) stereo projection angle θ, (c) angle between baseline and right projection line β, (d) normalized horizontal image coordinate on the left sensor (ū 1 − ūp1 ), (e) normalized vertical image coordinate on the left sensor (v 1 − vp1 ) and (f) normalized horizontal image coordinate on the right sensor (ū 2 − ūp2 ).
Focal length e Epipole u Horizontal coordinate of image point u e Horizontal coordinate of epipole u p Horizontal coordinate of principal point ū Normalized horizontal coordinate of image point ūe Normalized horizontal coordinate of epipole ūp Normalized horizontal coordinate of principal point v Vertical coordinate of image point v p Vertical coordinate of principal point v Normalized vertical coordinate of image point vp Normalized vertical coordinate of principal point α Angle between principal (or optical) axis and baseline ψ Angle between camera 1 projection line and baseline β Angle between camera 2 projection line and baseline η Angle between principal axes of cameras [to be called convergence angle] θ Angle between projection lines of a 3D scene point [termed stereo projection angle] π Reference epipolar plane φ Vertical angle between projection line and reference epipolar plane [termed vertical projection angle] ω Horizontal angle between projection line and principal (or optical) axis [termed horizontal projection angle]

Table 3 .
Candidate points of interest on calibration object for model evaluation and uncertainty evaluation.

Table 4 .
Uncertainties for the candidate points on the 3D object at 25-millimetre micrometer stage station (uncertainty at a point of interest is obtained after 40 image acquisitions).

Table 5 .
Errors for the candidate points on the 3D object at 25-millimetre micrometer stage station.(Error at a point of interest is obtained after 40 image acquisitions.).