Theoretical accuracy assessment of model-based photogrammetric approach for pose estimation of cylindrical elements

Gorka Kortaberria; Eneko Gomez-Acedo; Jorge Molina; Alberto Tellaeche; Rikardo Minguez

doi:10.1088/1361-6501/ab0b7d

1. Introduction

1.1. The context and limitations

The recovery of the 3D geometric information from 2D images is a fundamental problem in computer vision and photogrammetry fields. Recent reviews and advances in this field regarding this problem are presented in Luhmann et al (2013, 2006, 2016), Crc (2015) and Heipke et al (2016). Most of the approaches among multiple perspectives and features are based on collinearity equations and on a correspondence problem. Although these strategies are valid for most cases, sometimes they are not feasible. One of the main limitations of these methods is the preparation of the scene when the geometric elements to be measured are either large (>5 m), far away (>20 m) or hardly accessible. Harsh environments, such as those where elements to be measured are thermal emitting surfaces, are also challenging for traditional solutions as the location of artificial targets on the surface is not possible. Occlusion problems for multi-view approaches is another limiting factor for target-based photogrammetry where line of sight is not guaranteed. Moreover, the measurement in solar plants environment suffer from poor object illumination and signalization problems as well as possible image recording difficulties due to air convection of hot surfaces. To tackle these drawbacks, model-based approaches are advisable in order to avoid scene preparation and to estimate the pose of the element of interest by means of geometric fitting. These kind of solutions are based on high quality detection of contour points that define the element or elements to be measured. Depending on this, high accuracies for 3D element verification, pose estimation or tracking can be obtained when the camera network employed assures image ray suitable convergence. One of the limitations corresponds to necessary approximated values of the element to be measured and previous camera network calibration solving (Fraser 2013, Fraser and Stamatopoulos 2014, Summan et al 2015, Luhmann et al 2016). The method can be used for several geometric components spatial determination such as lines, cylinders and circles.

1.2. Background

Pose estimation of geometric elements is a common requirement in 2D and 3D vision based applications, where different methods have been developed and studied to tackle this problem from different point of views. In the bibliography different methods (see figure 1) can be found such as pose orientation from known points (Oberkampf et al 1996, Triggs 1999, Zhi and Tang 2002, Xu and Liu 2013), pose estimation using shape-based 3D matching (Osada et al 2001, Rosenhahn et al 2006, Teck et al 2010, Yang et al 2016), pose estimation using surface-based 3D matching (Rabbani and Van Den Heuvel 2005, Su and Bethel 2010, Liu et al 2013, Paláncz et al 2016, Figueiredo et al 2017) and pose estimation based on model-based approaches. Pose estimation is required in motion applications as well as for measuring, bin-picking or even alignment tasks. The selection of the most suitable approach needs to consider multiple factors directly related to required measuring specifications. In this research the model-based approach has been selected as the accuracy, large working distance, round surface form, harsh measuring environment and high number of measuring perspectives are demanding requirements which other approaches do not fit.

**Figure 1.** Review of cylinder pose estimation methods.
Download figure:
Standard image High-resolution image

This paper presents not only a novel photogrammetric simulation approach, but also a theoretical simulation chain to design a camera network that fulfills and enables one to guarantee that the measuring accuracy of the employed approach is under the application tolerance.

Thus, both aspects were analyzed by previous researches in order to provide a general overview in this field and to know more about the approaches followed in each case, as well as about the obtained results.

In order to evaluate if a photogrammetric procedures is fit to purpose, simulation based approaches permitting modelling testing and output result reliable quantification are required. Usually known as camera network design problem, this topic has been deeply studied in the following two main approaches, one of them uses mathematical models (Dunn 2007, Alsadik et al 2012, Dall'Asta et al 2015, El-hamrawy et al 2016, Tushev and Sukhovilov 2017) and the other uses synthetic data, following the 'design by simulation' concept (Olague and Mohr 2002, Becker et al 2011, Piatti and Lerma 2013).

Focused on the second method, which is similar to the simulation procedure described in this paper, Piatti and Lerma (2013) developed a virtual simulator that underpins the design of a photogrammetric measurement based on 3D scenes (Becker et al 2011) presents a free ray tracing software to build up a virtual close range photogrammetric sensor and simulate 3D scenes based on this simulation approach, and finally (Buffa et al 2016) presents a simulation study for a dimensional characterization of an antenna combining different tools.

Regarding simulation packages (software, libraries) for a photogrammetric network design and optimization based on nominal or synthetic data (images or geometrical information), there are few available options. The main ones are the combination of Spatial Analyzer© and VSTARS© inspection tools, academic softwares such as Phox© (Luhmann 2016) for photogrammetric design and parametric mathematical understanding, photogrammetric libraries integrated in computer vision Matlab© toolbox (Tushev and Sukhovilov 2017), opensource libraries (MICMAC©, APERO©, SFM©, GRAPHOS©) for dense point cloud reconstruction, etc.

The above-mentioned simulation alternatives and industrial photogrammetric solutions enable one to measure not only the spatial position of artificial targets or fixtures, but also the location of some natural features such as holes, lines (Gruen and Li 1991), pipes (Veldhuis and Vosselman 1998, Zhang et al 2017) or spheres in 3D. However, together with this objective, these alternatives need to triangulate these features from multiple perspectives with determined incidence angles. Therefore, the accuracy is reduced most cases and the correspondence problem must then be solved. A common approach is to use epipolar geometry to enhance stereo image matching strategies.

Previously mentioned approaches do not take into consideration the cases where the photogrammetric problem is solved by means of a minimization of the distance among image rays and the spatial geometric element to be fitted and measured. They offer the possibility to estimate the pose of a geometric element adjusting the 3D coordinates of the targets that define the object. Therefore, there is a lack of knowledge in this sense when this kind of ray projection based photogrammetric approaches are required. A general overview of this tracking 3D methods is described in Doignon (2007), where limitations and challenges are also mentioned.

In the literature, there are few references regarding photogrammetric approaches where the correspondence problem (Hilton 2005, Fraser et al 2010) is avoided. Most of the existing references are theoretical studies about the modelling and the problems derived from the geometrical determination of 3D elements such as lines, circles or cylinders. A survey and estimation approaches for these geometric elements is presented in Doignon (2007). However, compared to these geometries, the literature regarding the pose estimation of cylindrical objects with constant radius, is somewhat sparse.

Moreover, although other geometrical features have been studied more in detail, there are two main classifications for the references based on single-view or multi-view approaches for cylinder pose estimation. In any of this cases, the obtained relative accuracy is not studied for large scale applications.

Both for single view (Shiu and Huang 1991b, Ferri et al 1993, Puech et al 1997, Penman and Alwesh 2006, Doignon and De Mathelin 2007, Liu and Hu 2014) approaches or multi-view ones (Houqin and Jianbo 2008, Becke 2015, Teney and Piater 2014, Becke and Schlegl 2015, Zhu et al 2015, Zhang et al 2017), cylinder pose estimation can be estimated based on several approaches or on a combination of them. Depending on the image data taken into consideration, only cylinder orientation in three degrees of freedom (dof) or five dof pose can be established employing the contour data corresponding to cylinder's circular borders. The combination of different contour data can tackle a more robust pose estimation combining both, orientation and position results.

On the one hand, there are different approaches for monocular camera methods such as probabilistic ones (Hanekr et al 1999), basic or more complex models depending on the dof number for the cylinder's pose (Huang et al 1996, Doignon and De Mathelin 2007), models considering different image data from shape matching outputs (Shiu and Huang 1991a) or cases where the geometrical dimensions of the element are known a priori (Huang et al 1996, Puech et al 1997, Renaud et al 2005, Liu and Hu 2014).

Apart from this, 3d circle pose estimation is solved in Shiu and Huang (1991a) and Andresen and Yu (1994) for the same approach where even the orientation is established by ellipse modelling and fitting approach.

On the other hand, there are multiple view approaches considering more complex and robust camera networks both for cylinder pose and 3D line detection.

For example, a canonical representation based software is presented in Navab and Appel (2006) for stereo or multi-views, theoretical preliminary modelling for pose estimation based on contour data and single or multi-view approaches in Becke (2015), semi-automatic linear feature extraction algorithm for estimation of 3D elements from multi-view approaches based on LSB-Snakes in Gruen and Li (1991), the reconstruction of straight and curved pipes from digital images with not corresponding points in Veldhuis and Vosselman (1998) and Zhang et al (2017), a least-squares method for locating a linear object by using its multiple parallel projections, etc...

As a summary of this literature compilation and analysis, the following list indicates the main difficulties and drawbacks that must be studied in detail so as to overcome them:

In most cases, image noise is not taken into account in the simulation which is advisable for reliable model characterization
The models are not applied for large-scale applications or case studies where the errors are amplified due to projection effects.
Measuring solution validation against validated sources or tools is missing.
Suitable feature point extraction is not usually available.
Tools for accurate synthetic data creation (geometrical or image data) are not available.

1.3. The application

The research tends to enable an accuracy and robustness simulation of a model-based photogrammetric approach for a cylinder pose estimation when the element to be measured is hardly measurable with artificial targets (Knyaz 1998, Joon Ahn and Rauh 2001, Shortis et al 2004, Wijenayake et al 2014, Guo et al 2016). Thus, our approach to cylinder pose robust estimation differs from traditional approaches which avoid the correspondence problem among images and multiple view perspectives.

The application is focused on thermal concentration applications, (see figure 2) where positioning a closed loop of the moving receivers is required for high energy efficiency performances. Traditional photogrammetric approaches (Shortis and Johnston 1996, Pappa et al 2001, Pottler et al 2005, Shortis et al 2008, Stynes and Ihas 2012, King et al 2013, Hafez et al 2017) are not suitable because of surface curvature and the impossibility to add distributed targets over the element due to high working temperatures and solar flux concentrations. Another disadvantage of traditional photogrammetric approaches is related to the high incidence angles that are required for planar targets in curved surface introducing target center measurement inaccuracies. In addition, the approach described in this article aims to develop more flexible photogrammetric solutions for 3D scenes than those offered by commercial solutions. Moreover, model-based methods are not integrated in these industrial software, which means that, in order to quantify the scope and design the best suitable measuring network considering the real scene requirements, it is necessary to carry out this modelling and test its performance.

1.4. Main objectives of the research

The objectives of the research are to develop, implement and validate a model-based photogrammetric simulation tool in order to assess the six dof positioning accuracy of a cylinder for a specific camera network and synthetic image data. Is it not an optimization procedure but a test and accuracy assessment approach for design purposes of this photogrammetric method. Therefore, the main error sources are studied, and the most suitable camera network is designed and stablished by means of developed simulation tools. The results of the research will enable one to quantify and determine the accuracy and suitability of this photogrammetric approach for large parts and working distances in harsh environments. Model testing will also describe the limitations for this photogrammetric approach offering the possibility to assess the cons and pros against traditional approaches.

2. Materials and methods

2.1. Description of the simulation method

The overall method comprises several tools, an implementation of a model (cost function), which enables the estimation of the five dof positioning of a cylinder based on contour points and a Montecarlo approachfor uncertainty assessment. The rotation of the cylinder around its axis is not controlled by the model. This model is fed with image data points extracted from synthetic images. The camera network (extrinsic orientation), the 3D scene and imaging parameters are established and the synthetic images are generated for each cylinder's spatial pose. For each cylinder pose, three images are generated for three camera views. These images are depict grey values ranging from 0 to 255 values simulating a 3507 × 2480 pixel camera and 6 mm principal distance lens. The pixel size is 2.8387 µm which is the scaling factor to convert pixel points to metric image points in mm. Afterwards, these images are processed with an image processing algorithm and data contour data points are obtained for each camera and pose with sub-pixel edge extraction methods. Eventually, these data points, camera network and an a priori geometry describing the 3D scene are imported in the photogrammetric model implemented in an engineering simulation environment. The model estimates for each image combination and fixed camera network, the real pose of the cylinder, minimizing the tangential distances among the light rays projected from image planes to the 3D geometry.

In order to obtain more realistic output results, the model has been improved with a Montecarlo simulation approach. Once the contour points are extracted from images, these pixel points are modified by random distribution error sources to evaluate their effect on the model's output. This process is repeated n times and the model output result is stored for statistical distribution analysis. The overall simulation workflow is presented below (see figure 3).

**Figure 3.** Simulation consecutive stages and overall process workflow.
Download figure:
Standard image High-resolution image

2.2. Implementation and modelling

2.2.1. Synthetic data generation.

In order to generate synthetic images, first of all the 3D scene is created 20 m away in radius from camera positions (see figure 4). In this case, 3D geometry comprises a cylinder of 8 m length and 250 mm diameter. This element is positioned and oriented according to a reference system simulating difference poses. For instance, vertical orientation and centered position (Pose1) or totally oblique ones (Poses 2 and 3). After the several poses are established, the camera views are added for the same reference system. These views (three cameras) are fixed and require identifying several parameters, such as the position, the pointing orientation, the principal distance and image sensor dimensions. The field of view depends on the principal distance, lens aperture, selected camera model and therefore in sensors dimensions. Real illumination aspects affecting image generation as well as optical lens distortion are not considered in this step as the design environment is not prepared for this aim. However, imaging imperfections will be considered in Montecarlo simulation process as image noise error sources.

In the real application, object signalization limitations as well as image distortion effects will appear and directly affect the image data quality and therefore the performance of the model. While image distortion errors can be modelled, characterized by offline or self intrinsic calibration approaches (Fraser 2013, Luhmann et al 2016) and compensated as it is a systematic optical error source, object detection and contour point data extraction is a demanding requirement in high solar flux concentration applications (Lee et al 2013, Ruelas et al 2017). In order to minimize imaging errors and optimize image acquisition and processing steps, implemented cameras are expected to work in the infrared spectrum with dynamic range fitting which higlights the contrast between cylinder contour points and the background. In this manner, the shape-based matching process is simplified and optimized as the images contain less and more useful information speeding up image processing tasks.

Returning back to the image generation tool, once geometric parameters are settled (see figure 5), different cameras can be selected, and synthetic images are rendered (full grey level) and generated.

2.2.2. Image data processing.

The employed image processing approach is based on shape matching methods that enable the finding and location of the objects of interest in an image. Within this aim, a suitable image pattern is required to define objects that are represented and recognized by their shape. There are multiple ways to determine or describe the shape of an object. In this paper, the shape is extracted by selecting all those points whose gradient exceeds a certain threshold. Typically, the points correspond to the contours of the object. The main image processing steps for previously generated synthetic images are described in the following workflow:

1.
Image importation
2.
Image pre-processing: a binarization on a greyscale image is applied to obtain a black and white image establishing a suitable threshold.
3.
Shape based-matching (Harun and Sulaiman 2011): this is the most challenging phase of the overall image processing procedure because following tasks depend on its output. The shape based- matching algorithm tries to find the corresponding shape of an element comparing the image data against a trained pattern.. It does not use the gray values of pixels and their neighborhood as template but describes the model by the shapes of contours. This process is divided in two main steps. First of all, the shape-based pattern is defined in a sample image and trained to become the image pattern as robust as possible considering effects such as neighborhood points, contrast, noise, occlusions or even perspective changes (rotation, scale variation, etc). Secondly, this trained pattern is applied as an operator in an image where the object to be detected appears and the object and its region of interest (ROI) are determined correspondingly.
4.
Edge data point extraction: Once the ROI of the object is restricted, edge extraction operators are applied to define the contour points of the shape (see figure 6). In order to speed up the subpixel-precise edge extraction, it is recommendable to apply it only to a reduced ROI. As accuracy is required for the vision based application, sub-pixel edge extraction operators are used (see figure 5). The main steps for image data extraction are: creation of sub-pixel contours by means of edge operators such as threshold_sub_ pix and selection of relevant points where some of the segments are deleted, and others are combined to define the edge of interest. Usually a primitive fitting step follows these previous steps, but in this case, is not necessary.
5.
File creation for each image with contour point: finally, an Ascii file is created with these data for each image where the points corresponding to the contour points are described in row and columns in pixels units. These points do not consider image noise nor imaging parameter uncertainties.

**Figure 6.** Sub-pixel edge location (red points) and line fitting (in green).
Download figure:
Standard image High-resolution image

2.2.3. Modelling.

The core of this research is a photogrammetric model based on light ray projection from image planes to 3D space. Each image point and the projection center of each camera define a light ray that is tangent to the cylinder whose pose is to be found. Establishing the distances among the light rays and the geometric element and minimizing them enables to estimate the element's parameters.

The proposed methodology is not based on traditional approaches that employ artificial targets to measure a specific geometry. Usually, the 3D positions of these targets are obtained and then a least square algorithm is applied to estimate the most approximated element that fits with these 3D points. This common approach requires to assure the correspondence of the points enabling their triangulation, but this requirement is not always reached. When the element to be measured is, for example, a cylinder, the aforementioned approach is not the most suitable one. Thus, another approach is necessary to solve this limitation. The developed model for estimating the five dof positioning of a cylinder comprises an alternative approach based on Best-fit adjustment of a cylinder to measured 2D contour points in multiple images (see figure 7).

**Figure 7.** Edge method for cylinder pose estimation. (a) Multi-view 3D representation of the model (b) projection model from image plane to 3D space.
Download figure:
Standard image High-resolution image

On the one hand, the inputs that feed this model are the contour points of each image, a camera network definition around those points, image parameters (principal distance and image scaling factor), image dimensions (pixel number) and an a priori cylinder pose (position, orientation) as well as dimensions (radius). On other hand, the output values are the real position and orientation (pose) of the cylinder related to a fixed coordinate system. Indeed, the pose of the cylinder is defined as 2 points P1 and P2 corresponding to the axis (see figure 10) for enabling an easier simulation result interpretation.

Model's performance is studied by means of inverse problem approaches which minimize the tangential distances among the light rays and the cylinder to be calculated. These rays are created for each image considering each image point (Z = 0) and the projection center (same for each image). The light rays are fitted to 3D lines that are defined in a common reference system shown in figure 3. In order to estimate the tangential distances for each ray and image, a 3D intersection function is employed. This function considers the estimation of the distances between each image ray (3D line) and cylinder axis and accordingly, estimating the tangential distances (see figure 8). The cylinder's pose is obtained by minimizing iteratively these distances.

**Figure 8.** Photogrammetric model representation for several cylinder poses (in blue) and light rays (in yellow) pointing towards the primitive. (a) Oblique pose. (b) Vertical pose.
Download figure:
Standard image High-resolution image

In the following lines, a detailed mathematical description of the model is presented and the data flow is explained.

The starting point for the model are the image data of the contour points extracted from the synthetic images. For a specific cylinder pose, three files with image data are imported to the developed model. Each file corresponds to a camera and an interextrinsic orientation of this camera regarding to the absolute reference system. Besides, approximate values of the cylinder pose are required as input data.

This data in pixels (rc) is transformed and scaled (m) to metric data (mm) taking into account image's horizontal and verticals dimensions (dimHor and dimVert) and a rotation (α = 90°) between coordinate systems. The data xy is in camera local coordinates. The applied plane transformation is,

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle x=m\cdot \left( R\cdot rc+T \right)\nonumber \end{align} \tag{ 1 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \left[ \begin{array}{@{}c@{}} x \nonumber \\ y \nonumber \end{array} \right]=m\cdot \left( \left[ \begin{array}{@{}cc@{}} \cos \propto & -\sin \propto \nonumber \\ \sin \propto & \cos \propto \nonumber \end{array} \right]\cdot \left[ \begin{array}{@{}c@{}} r \nonumber \\ c \nonumber \end{array} \right]+\left[ \begin{array}{@{}c@{}} {\rm dimHor}/2 \nonumber \\ {\rm dimVert}/2 \nonumber \end{array} \right] \right).\nonumber \end{align} \tag{ 2 }$

Once each contour point is transformed for each image plane, a third coordinate is added to xy with the value of the focal distance (f ). Thus, for each image point x'y 'z' coordinates are available in local coordinates.

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \left[ \begin{array}{@{}c@{}} {{x}^{\prime }} \nonumber \\ {{y}^{\prime }} \nonumber \\ {{z}^{\prime }} \nonumber \end{array} \right]=\left[ \begin{array}{@{}c@{}} x \nonumber \\ y \nonumber \\ f \nonumber \end{array} \right].\nonumber \end{align} \tag{ 3 }$

After this transformation, another spatial transformation is required for each point to convert the x'y 'z' local coordinates to spatial XYZ coordinates corresponding to the main reference system. R (rotation matrix) and X₀ (translation vector) parameters define the extrinsic orientation and translation of the image plane (projection center) regarding to absolute reference system. These parameters are supposed to be known from previous calibration step.

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle X={{X}_{0}}+{{R}^{-1}}\cdot {{x}^{\prime }}\nonumber \end{align} \tag{ 4 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \left[ \begin{array}{@{}c@{}} X \nonumber \\ Y \nonumber \\ Z \nonumber \end{array} \right]=\left[ \begin{array}{@{}c@{}} {{X}_{0}} \nonumber \\ {{Y}_{0}} \nonumber \\ {{Z}_{0}} \nonumber \end{array} \right]+\left[ \begin{array}{@{}ccc@{}} {{r}_{11}} & {{r}_{21}} & {{r}_{31}} \nonumber \\ {{r}_{12}} & {{r}_{22}} & {{r}_{32}} \nonumber \\ {{r}_{13}} & {{r}_{23}} & {{r}_{33}} \nonumber \end{array} \right]\cdot \left[ \begin{array}{@{}c@{}} {{x}^{\prime }} \nonumber \\ {{y}^{\prime }} \nonumber \\ {{z}^{\prime }} \nonumber \end{array} \right].\nonumber \end{align} \tag{ 5 }$

With each transformed image point (X_iY_iZ_i) and the projection center point (X₀Y₀Z₀) of its corresponding camera, 3D lines are created (see figure 6). A straight line (X_li) between each transformed image point P_i (X_i, Y_i, Z_i) and P₀ (X₀, Y₀, Z₀) in parametric form is defined by:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {{X}_{li}}={{X}_{i}}+t\cdot \left( {{X}_{0}}-{{X}_{i}} \right)\nonumber \end{align} \tag{ 6 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \left[ \begin{array}{@{}c@{}} {{X}_{li}} \nonumber \\ {{Y}_{li}} \nonumber \\ {{Z}_{li}} \nonumber \end{array} \right]=\left[ \begin{array}{@{}c@{}} {{X}_{i}} \nonumber \\ {{Y}_{i}} \nonumber \\ {{Z}_{i}} \nonumber \end{array} \right]+t\cdot \left[ \begin{array}{@{}c@{}} {{X}_{0-}}{{X}_{i}} \nonumber \\ {{Y}_{0-}}{{Y}_{i}} \nonumber \\ {{Z}_{0}}{{Z}_{i}} \nonumber \end{array} \right]=\left[ \begin{array}{@{}c@{}} {{X}_{i}} \nonumber \\ {{Y}_{i}} \nonumber \\ {{Z}_{i}} \nonumber \end{array} \right]+t\cdot \left[ \begin{array}{@{}c@{}} a \nonumber \\ b \nonumber \\ c \nonumber \end{array} \right].\nonumber \end{align} \tag{ 7 }$

Here P_i (X_i, Y_i, Z_i) is any image point corresponding to the line and the direction cosines are

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \frac{{{X}_{0}}-{{X}_{i}}}{d}=a\nonumber \end{align} \tag{ 8 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \frac{{{Y}_{0}}-{{Y}_{i}}}{d}=b\nonumber \end{align} \tag{ 9 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \frac{{{Z}_{0}}-{{Z}_{i}}}{d}=c.\nonumber \end{align} \tag{ 10 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm Where}\,d=\sqrt{{{\left( {{X}_{0}}-{{X}_{i}} \right)}^{2}}+{{\left( {{Y}_{0}}-{{Y}_{i}} \right)}^{2}}+{{\left( {{Z}_{0}}-{{Z}_{i}} \right)}^{2}}}.\nonumber \end{align} \tag{ 11 }$

The following step is to estimate the intersection point between each line (X_li) and the approximated axis of the cylinder defined in the initialization. The intersection of two lines in 3D space it is only possible if they lay on a common plane. Otherwise, the lines are skew and the shortest distance between them is established. The intersection point is defined in the middle of a line that is perpendicular to both intersected lines and whose length is the minimized distance.

With each estimated intersection point, a distance d_i between this point and the cylinder axis is determined. These distance values (cost-function) for all image points are afterwards minimized to adjust the parameters of the cylinder by means of iterative Best-fit methods.

The distance d_i between 2 spatial lines defined by each transformed point P_i (X_i, X_i, Z_i) and a characteristic point P_C (X_c, Y_c, Z_c) corresponding to the cylinder axis with their respective direction cosines n_i (a_i, b_i, c_i) and n_c (a_c, b_c, c_c) is determined by

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {{d}_{i}}=\frac{\pm \left| \begin{array}{@{}ccc@{}} {{X}_{i-}}{{X}_{c}} & {{Y}_{i-}}{{Y}_{c}} & {{Z}_{i-}}{{Z}_{c}} \nonumber \\ {{a}_{i}} & {{b}_{i}} & {{c}_{i}} \nonumber \\ {{a}_{c}} & {{b}_{c}} & {{c}_{c}} \nonumber \end{array} \right|}{\sqrt{{{a}^{2}}+{{b}^{2}}+{{c}^{2}}}}\nonumber \end{align} \tag{ 12 }$

where,

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle a=\left| \begin{array}{@{}cc@{}} {{a}_{i}} & {{b}_{i}} \nonumber \\ {{a}_{c}} & {{b}_{c}} \nonumber \end{array} \right|\quad b=\left| \begin{array}{@{}cc@{}} {{b}_{i}} & {{c}_{i}} \nonumber \\ {{b}_{c}} & {{c}_{c}} \nonumber \end{array} \right|\quad c=\left| \begin{array}{@{}cc@{}} {{c}_{i}} & {{a}_{i}} \nonumber \\ {{c}_{c}} & {{a}_{c}} \nonumber \end{array} \right|.\nonumber \end{align} \tag{ 13 }$

The cylinder known radius (r) is subtracted to each estimated radial distance. The residual distance values among the image rays (3D lines) and the cylinder axis for all image points (i) and cameras (j ) are minimized according to:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \underset{j}{\mathop \sum }\,\underset{i}{\mathop \sum }\,d_{ij}^{2}-r=\min. \nonumber \end{align} \tag{ 14 }$

In order to solve this minimization problem, a linearization of the cost function is required for each parameter of the cylinder's model and each image point data which enables to construct the jacobian matrix (Kosmopoulos 2011, Afzal et al 2016, Hansen and Sutherland 2018) and estimate the corrections of model's parameters in each iteration. The convergence for the solver is obtained when the correction values (ΔP_c and Δn_c) of the element's geometric parameters are below a suitable threshold value.

The form of linearized equations follows

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle & \frac{\partial {{d}_{ij}}}{\partial x}\cdot \Delta {{X}_{c}}+\frac{\partial {{d}_{ij}}}{\partial y}\cdot \Delta {{Y}_{c}}+\frac{\partial {{d}_{ij}}}{\partial z}\cdot \Delta {{Z}_{c}} \nonumber \\ & \quad +\frac{\partial {{d}_{ij}}}{\partial a}\cdot \Delta {{a}_{c}}+\frac{\partial {{d}_{ij}}}{\partial b}\cdot \Delta {{b}_{c}}+\frac{\partial {{d}_{ij}}}{\partial c}\cdot \Delta {{c}_{c}}=-{{d}_{ij}}.\nonumber \end{align} \tag{ 15 }$

The employed iterative method for minimization problem is not specified in detail as it is common issue in bibliographic references concerning inverse problems (Gao et al 2016, Nowak 2017, Ramm 2018) and it is out of the scope of this research.

2.3. Validation and testing

The validation of the implemented model is an important issue that ensures that the obtained results are accurate and meaningful, and therefore the conclusions in relation to these results can be considered truthful. For this purpose, a validated trigonometric approach was employed to check the partial data and results of the developed model. Once the model was approved, further tests were carried out to understand its scope concerning error sources and possible deviations between the theoretical model and the real scene. Moreover a Montecarlo simulation approach was established to estimate cylinder pose uncertainties and determine model's output distribution.

2.3.1. Validation of the developed model.

The implementation of the developed model has been checked against an inspection tool called Spatial Analyzer© (hereafter SA) based on the same cylinder pose and camera network definition. This is a 3D inspection software that enables to reproduce the described photogrammetric approaches (see figure 9) from the point of view of pure trigonometry which permits to carry out 2D data validation and accuracy comparison in section 3.1 avoiding the part of image generation and data processing. Facing this objective, the SA interface is used to replicate the developed model in terms of camera network, cylinder pose definition and image ray composition from image planes towards tangential 3D points in the cylinder. Every data is referenced to the main world coordinate system.

**Figure 9.** Definition of validation 3D pose and reference data processing with SA tool.
Download figure:
Standard image High-resolution image

The employed procedure and workflow for 3D scene definition and reference data extraction is the following one:

1.
A cylinder is placed for a specific pose (position and orientation)
2.
Local coordinate systems are defined corresponding to external camera orientation
3.
Image planes are created in these coordinate systems
4.
Projection centers are defined for each image plane
5.
A plane is estimated that in tangential to the cylinder surface and passes through each projection center (3×)
6.
Tangential points above the cylinder surface are estimated projecting the cylinder axis to previously created tangential planes
7.
With each point and its corresponding projection center, 3D lines are created
8.
The intersection of these lines with the image planes describes the image points

Thus, applying this methodology, reference data can be used to check the implemented model. Among this data, 3D tangential points and image points both in local and global coordinate system are obtained. The employed scaling factor to convert pixel points in image coordinates (metric values) is 2.8387.

2.3.2. Test for validation and comparison of results.

The following chapters describe the tests and the targeted results for each test considering several aspects and parameters of the model. A symmetric camera network is studied which guarantees data redundancy and a proper visualization of the cylinder from different point of views. Asymmetric camera networks are not included in this paper, but asymmetry appears on 3D tangential point distribution all around the cylinder. First, a comparison among image data points comparing the values obtained from synthetic images and SA data is presented. Then, the accuracy of the developed model is analyzed based on image data and its inaccuracies, taking theoretically known pose values as a priori data. The model's robustness is tested with a deviation range of priori pose data to understand the dependency of the model on a priori known data values.

2.3.2.1. Comparison of synthetic image data and geometric data.

Based on reference data obtained from the SA tool, a comparison among data types was carried out to verify and understand the differences between synthetic transformed image data (see equation (3)) applied to photogrammetric models and pure trigonometric precise data. In this manner, the effect of image to metric data transformation can be assessed and quantified.

This comparison can be established at two different stages of the model related with different data types. For instance, 2D image points can be compared with their corresponding 3D tangential points, but in this paper only 2D data are compared. The comparison comprises a qualitative and quantitative comparison between image data (see equation (2)) obtained with several cameras in their local coordinates (2D metric data). In the following chapter, 3D data resulting from the geometrical fitting are assessed to analyze their accuracy.

2.3.2.2. Accuracy of cylinder pose estimation model.

After comparing the developed model against the reference 2D data and validating its performance, the accuracy for model parameters is assessed. This analysis is carried out on three poses of the cylinder for the different case studies, together with their respective synthetic data obtained from the procedure described in the process workflow (see figure 2). The aim of this test is to estimate pointing errors of the model by comparing the estimated cylinder parameters and the nominal values. Instead of comparing the obtained values of position and orientation with their nominals, two extreme points (P₁, P₂) of the cylinder (see figure 10) have been defined for each pose simulation case. The aim is to make easier the understanding of the results. Considering that extrinsic orientation of the camera network is precisely known and fixed, residuals are taken as model accuracy indicators.

**Figure 10.** Cylinder pose definition based on two spatial points (P1, P2).
Download figure:
Standard image High-resolution image

Apart from mean positioning errors between estimated and nominal cylinder poses, XYZ coordinate uncertainties have been calculated for each cylinder point (P1 and P2) in order to understand the reliability of these results. This has been implemented by a Montecarlo simulation approach which enables to add image data and principal distance variations to the model. In this manner, the theoretical data is fitted to a more realistic scenario. The employed simulation parameter values are the following ones: image noise (0.5 pixels) and principal distance uncertainty ±10 pixels. The convergence threshold values for this uncertainty assessment have been settled to 1 mm (average of residual tangential distances).

Moreover, cylinder dimensions (length and diameter) have been changed for each cylinder pose to check if this variation affects models accuracy. Lengths of 4, 8, 16 metres and diameters of 250, 500, 1000 mm have been tested as representative case studies.

2.3.2.3. Robustness of cylinder pose estimation model.

Besides testing the accuracy of the model, its robustness is also tested. One of the requirements of this photogrammetric model is that approximated pose values of the cylinder are necessary to initialize it. Considering that real applications do not assure that the real pose and the nominal one are close to each other, this effect has been simulated and quantified to determine the model's robustness.

In this case, testing procedure consists in introducing initial errors for known poses and camera network to check the robustness and performance of the model. Errors ranging from 1 mm to 1000 mm are applied to a priori cylinder parameters (points P₁ and P₂ as in accuracy testing) and the model's response is studied for a 1 mm convergence threshold (which is the maximum value of correction parameter values). These errors are applied separately on each position coordinate (XYZ) and then combined for both points' coordinates (P₁ and P₂). Within this test a 8 m and 250 mm diameter cylinder case study has been studied.

Another interesting characteristic from the point of view of the model's robustness, is the minimum amount of image data required and their distribution along the contour lines of the cylinder. This aspect was also analysed when establishing the need for image data quality. The relationship between the model's input data and output data (pose estimation) was studied for the following cases and compared against full 2D data for pose 2:

Continuous data taking into consideration only the contour points of one side (See case study B in table 3)
Reduction of continuous image data points from 25% to 75% (See case studies C, D and E in table 3)
1/3th of image data for each camera view without correspondence for the same cylinder area (See case study F in table 3)

3. Results

The following results show the capabilities of the developed modelling from the point of view of accuracy and robustness, which are critical aspects for its characterization for future implementations in real applications.

3.1. Comparison of synthetic image data and geometric data

In order to validate and compare the quality of the image data when working with synthetic images in relation to reference data (SA), figure 11 presents, a qualitative comparison of image points on the left side, together with a detailed zoom on the right side. The represented case study corresponds to an oblique pose (pose 3) for a 8 m length and 250 of diameter cylinder. The comparison shows that the data obtained from 3D and transformed into 2D with SA is continuous and straight for contour points (green points) while the employed synthetic data present some imperfections. Regarding to synthetic data transformed to metric data (see equation (2)), the distribution of the points is not straight and even equidistant deviations are found compared against reference data (see zoom in right side for figure 11). Thus, it is obvious that synthetic data are not perfect and present some deviations derived/resulting from the rendering step and the edge point extraction approaches.

In order to analyze the deviation range numerically, in table 1 the error (FE) of each image data type and the 2D distance among fitted lines is shown for an oblique pose and 3 camera views. While the error form for SA reprojected image data is 0, the values for the synthetic are not negligible. These deviations are the main error sources that are conditioning 3D pose estimation accuracy analyzed in section 3.2.

Table 1. Quality analysis and differences between reference image data and synthetic data for an oblique pose.

Camera ID	Synthetic FE (mm)	2D distance (mm)
1	0.10	0.05
2	0.10	0.05
3	0.06	0.14

In summary, it can be stated that the synthetic image data generation process is correctly implemented in the model and the procedure is accurate up to 0.14 mm for image coordinates corresponding to an oblique pose, which is the most critical case.

Although there is no way to guarantee/assure perfection, a possibility that could improve synthetic data accuracy is to use better rendering methods for image generation step. Thus, this error needs to be taken into account for these simulation procedures. Error minimization could be achieved by processing the input image data before the model feeding and then fitting these data as accurate contour 2D lines, but this approach would also introduce a systematic error for 3D distances estimated with the model.

3.2. Accuracy analysis of the model

The accuracy of the model has been established for all mentioned case studies (see section 2.3.2.2) and cylinder poses. Based on synthetic images, the image processing step, added imaging error sources and the previously described photogrammetric model (see section 2.2.3), the poses and uncertainties of the cylinder are estimated. In order to facilitate the comparison between obtained results and the nominal values, the pose is defined by means of two points (P1 and P2) that lay on the cylinder's axis and determine its length. This comparison is shown in table 2, where positioning errors (E_xyz) among nominal points and measured ones are presented as well as their uncertainty (U_xyz). For all the error values the mean differences are below 20 mm for working distances of 20 000 mm in radius (camera location) with uncertainties below ±5 mm. Therefore, the relative accuracy lays between 1/1000 and 1/20 000 for all case studies. Accuracy depends on the cylinder's pose and the image data quality mentioned in section 3.1, where deviations of tens of microns correspond to mm in 3D space due to projections effects. Moreover imaging error sources such as image noise and principal distance uncertainty add more error variability to image points. The more vertical the pose, the better results (ten times better for Pose1), because the contour points of the cylinder will be more accurately defined and detected. Although other rendering approaches were also employed for synthetic image generation, the results are similar, so the accuracy of the model is limited to this error source.

Table 2. Nominal and measured point comparison for all cylinder and pose case studies (Errors and uncertainties in mm).

POSE ID	Point ID	CS 1^a		CS 2		CS 3		CS 4		CS 5
POSE ID	Point ID	E_xyz	U_xyz	E_xyz	U_xyz	E_xyz	U_xyz	E_xyz	U_xyz	E_xyz	U_xyz
1	P1	0.28	0.26	2.93	0.26	0.59	0.25	0.60	0.25	0.05	0.09
1	P2	1.27	0.34	1.42	0.35	0.71	0.30	1.58	0.43	0.77	0.23
2	P1	10.43	4.57	7.53	4.09	12.47	4.29	11.08	4.38	9.27	3.48
2	P2	13.11	5.57	9.35	4.61	14.92	5.10	12.21	4.65	13.47	5.38
3	P1	8.57	4.63	7.68	3.51	6.52	2.98	6.43	4.03	9.81	3.50
3	P2	11.77	5.77	9.92	4.87	8.56	3.22	8.69	4.77	17.83	4.80

^aCase studies: 1 (L = 8000 mm and D = 250 mm), 2 (L = 8000 mm and D = 500 mm), 3 (L = 8000 mm and D = 1000 mm), 4 (L = 4000 mm and D = 250 mm), 5 (L = 16 000 mm and D = 250 mm).

Besides, figure 12 presents a histogram for each pose, showing the residuals of the distances between the cylinder axis and the estimated 3D tangential points (see equation (12)) for case study1. As this figure shows, the residuals are not symmetric, and the differences tend to create groups which means that systematic errors are conditioning the results. This tendency is probably related to the deviations and the distribution of the image data points employed.

Although the range of the distance residual values for each pose corresponds to even tens of mm, the average values are similar to the absolute deviations of the points P1 and P2 analyzed in table 1.

3.3. Robustness analysis of the model

One aspect and limitation of the photogrammetric model is the need to know a priori data close to cylinder's pose. However, this requirement is not always assured with enough accuracy, therefore, it is important to establish the robustness of the model regarding this aspect. Aiming at this, the model has been tested with a priori pose data with known XYZ deviations for P1 and P2 points and the fitting result was assessed. The expected result is the nominal pose correcting these initial deviations. After testing, the model for oblique and vertical poses for all case studies, the following can be stated:

For XY position deviations, the residuals of points that define the cylinder pose are less than 1 mm for large working distances.
For Z deviations, the model converges but the result is not robust because a reference along the axis is missing.
Robustness does not depend on the cylinder pose; deviations are similar in both cases.
The model depends on a priori pose knowledge, but the real position can deviate even few meters from supposed known values.

In order to analyze the image data effect on the cylinder pose fitting result, several cases were assessed changing both the number of points and its distribution along the contour points of the cylinder. As an example of this evaluation, table 3 presents the results for the cases presented in section 2.3.2.3 for the camera pose number 2 and cylinder dimensions of CS1. These tables present point coordinates P1 and P2, and the differences between them can be established taking them as a reference for case study A.

Table 3. Spatial positioning comparison for diverse 2D image data set. (A) Estimated values with both contour sides. (B) Estimated values with one contour side. (C) Estimate values with filtered contour points 25%. (D) Estimated values with filtered contour points 50%. (E) Estimated values with filtered contour points 75%. (F) Estimated values for not corresponding areas of the cylinder.

Case study ID	Point ID	Estimated coordinates (mm)			Differences (mm)
Case study ID	Point ID	X	Y	Z	ΔX	ΔY	ΔZ
A	P1	−11 025.34	6355.53	−12 727.92	—	—	—
A	P2	−6122.95	3529.23	−7072.95	—	—	—
B	P1	−11 018.88	6354.09	−12 727.92	6.46	1.44	0
B	P2	−6126.19	3535.51	−7062.7	3.24	6.28	10.25
C	P1	−11 020.50	6353.43	−12 727.92	4.84	2.1	0
C	P2	−6124.22	3535.31	−7063.59	1.27	6.08	9.36
D	P1	−11 019.48	6353.86	−12 727.92	5.86	1.67	0
D	P2	−6124.72	3535.17	−7062.57	1.77	5.94	10.38
E	P1	−11 019.92	6353.95	−12 727.92	5.42	1.58	0
E	P2	−6124.16	3535.12	−7063.49	1.21	5.89	9.46
F	P1	−11 009.31	6357.94	−12 727.92	16.03	2.41	0
F	P2	−6139.51	3538.72	−7041.35	16.56	9.49	31.6

For any of the cases presented, the differences among each approach are not higher than 10 mm, except for the case where the image points for each camera view are not correspondent (See case study F). For this exception, the deviations for both XY and Z directions are higher because of a worse conditioned image data point distribution.

4. Discussion and conclusions

Results highlight that the theoretical model is suitable for close-range photogrammetric pose estimation and sensible to a priori known data and image data quality. Therefore, it could be used with design purposes in order to fulfill measuring system requirements and select a suitable and affordable approach. In this research a symmetric camera network has been employed considering that is a rather realistic approach for real implementation. Other effects such as receiver deformation, camera network instabilities or computational efficiencies are not taken into account in this research. However, real scenery image data quality and disturbances due to high energy concentration effects as well as imaging parameter uncertainties are considered in the simulation process. The real materialization of the simulated photogrammetric approach will consider intrinsic and extrinsic calibration methods, referencing methods and cylinder pose estimation testing against certified measuring procedures. Thus experimental tests will be carried out once the overall procedure is established and every tool is prepared. The obtained results will be correlated with the model's performance.

With these preliminary results, it can be concluded that the model is accurate, and that this simulation procedure enables the obtaining of cylinder pose values with relative low deviations so that the suitability of tracking application requirements can be assessed. The relative accuracy of the model falls between 1/1000 and 1/20 000 with uncertainties of ±5 mm for all case studies and poses. In the future, further synthetic data generation procedures and tools will be studied to speed up the simulation procedure and steps as they are time-consuming tasks.

The model is conditioned to initial a priori pose values that are not always known or provided, so it would be preferable to improve the independency of the model in this sense. Nevertheless, in some applications this fact does not constitute a limitation. For example, in a positioning control loop, the position previous to the one that is to be estimated is always known and it is very similar.

The robustness of the model is strong for the cylinder's axis orientation but weaker for spatial position determination. In order to improve this functionality and therefore, the response of the model, a model-based 3D circle estimation approach is required so as to combine it with the procedure presented in this paper. Higher model performance for five dof accurate pose estimation could be achieved with a combination of both approaches

According to the image data point number and distribution, it has been demonstrated that the model is robust when the image data is well distributed along the cylinder. Besides, in order to improve the computation time of the model, high data filtering is possible, thus assuring the same accuracy level.

Acknowledgments

This research is supported by the Basque business development agency and MOSAIC project funded by the European Union's Horizon 2020 research and innovation programme.

Author contributions

EG-A designed the 3D scene poses in SW© and created the synthetic images, JM processed the images and created the data files, GK developed the photogrammetric model and simulated its capabilities, AT and RM reviewed the overall concept, GK also wrote the paper and all co-authors checked it considering their contribution and expertise.

Conflicts of interest

The authors declare no conflict of interest.

Theoretical accuracy assessment of model-based photogrammetric approach for pose estimation of cylindrical elements

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. The context and limitations

1.2. Background

1.3. The application

1.4. Main objectives of the research

2. Materials and methods

2.1. Description of the simulation method

2.2. Implementation and modelling

2.2.1. Synthetic data generation.

2.2.2. Image data processing.

2.2.3. Modelling.

2.3. Validation and testing

2.3.1. Validation of the developed model.

2.3.2. Test for validation and comparison of results.

2.3.2.1. Comparison of synthetic image data and geometric data.

2.3.2.2. Accuracy of cylinder pose estimation model.

2.3.2.3. Robustness of cylinder pose estimation model.

3. Results

3.1. Comparison of synthetic image data and geometric data

3.2. Accuracy analysis of the model

3.3. Robustness analysis of the model

4. Discussion and conclusions

Acknowledgments

Author contributions

Conflicts of interest

Theoretical accuracy assessment of model-based photogrammetric approach for pose estimation of cylindrical elements

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. The context and limitations

1.2. Background

1.3. The application

1.4. Main objectives of the research

2. Materials and methods

2.1. Description of the simulation method

2.2. Implementation and modelling

2.2.1. Synthetic data generation.

2.2.2. Image data processing.

2.2.3. Modelling.

2.3. Validation and testing

2.3.1. Validation of the developed model.

2.3.2. Test for validation and comparison of results.

2.3.2.1. Comparison of synthetic image data and geometric data.

2.3.2.2. Accuracy of cylinder pose estimation model.

2.3.2.3. Robustness of cylinder pose estimation model.

3. Results

3.1. Comparison of synthetic image data and geometric data

3.2. Accuracy analysis of the model

3.3. Robustness analysis of the model

4. Discussion and conclusions

Acknowledgments

Author contributions

Conflicts of interest