Multi-view 3D scene reconstruction using ant colony optimization techniques

Dimitrios Chrysostomou; Antonios Gasteratos; Lazaros Nalpantidis; Georgios C Sirakoulis

doi:10.1088/0957-0233/23/11/114002

1. Introduction

Contemporary complex mechatronic systems are often required to deal with objects in three-dimensional (3D) space and, therefore, a 3D representation of these objects is preferable to raw sensorial inputs, especially in cases where further processing is demanded. Moreover, the accurate 3D model of an object can be used in a plethora of applications involving volumetric measurements, such as virtual reality [1], CAD, CAM [2] and quality control [3], to name a few. Additionally, 3D image processing techniques have been developed using optical integral-imaging systems [4, 5]. The use of optical processing methodologies has led to remarkable developments in applications such as face recognition [6], pattern recognition [7] and 3D displays [8]. Among the multitude of existing methods for 3D structural reconstruction, one can distinguish the use of methods ranging from stereo vision-based to multi-view volumetric ones. In stereo vision [9], the aim is to use two (or more, in the case of multi-view stereo methods) images of the same scene, in order to infer the depth of the depicted objects. The procedure of associating the available information, known as the correspondence problem, first evaluates possible matches between the images and then applies an optimization step, either local or global, to establish the most probable matches [10]. Having computed the depth of the scene, a corresponding point cloud can be computed, constituting a 3D structure representation of the scene under consideration. Solving the correspondence problem is an inherently demanding task and has received much attention recently [11–14].

In multi-view volumetric methods, such as space carving and shape from silhouettes, the main concept involves the representation of the area of the scene in the vicinity of an object by a grid of voxels and to gradually dissolve this 3D space by carving away any successive layers of voxels of high discrepancy. Early space carving approaches, such as voxel coloring [15], and its generalized form [16], determined whether to remove a voxel using its visibility properties. In this set of methods, every voxel is visited in a visibility compatible order, so that only voxels already checked are permitted to occlude those under consideration. The 3D space is divided by a sweeping plane moving away from the camera and every voxel within the plane is not allowed to occlude any other. Broadhurst et al [17] suggest a probabilistic framework, where each voxel is assigned a probability using a Gaussian model and classified using the Bayes theorem, which is computed by comparing the likelihoods of existence for the voxel. Consequently, when a voxel is initially examined, its visibility in every input image is uniquely determined. Zeng et al [18] extend previous carving methods for non-Lambertian objects from an arbitrary set of calibrated images using reflectance models. Franco and Boyer in [19] and Guan et al in [20] have also proposed probabilistic methods for multi-view silhouette fusion in the context of model-free tracking. However, since the majority of these methods are based on background subtraction, they require special environmental conditions and are not directly applicable to the problem of 3D reconstruction from real-world image sequence. On the other hand, computing the 3D shape using silhouette-based methods can provide useful initial solutions and emerge as the dominant image feature, especially for the case of sparsely textured objects. Although silhouette-based approaches are not capable of retrieving surface concavities in great detail, they exhibit important advantages and are often preferred over other more complicated approaches. First, these specific methods enjoy significant stability and efficiency, which allows them to operate in challenging imaging conditions. Second, silhouette extraction seems to be the only feasible alternative for recovering textureless or homogeneous objects. Third, they usually do not require exact visibility estimation. This constitutes a great advantage over multi-view or photometric stereo and shading techniques, where visibility reasoning leads to major queries. Most of the silhouette-based methods for reconstruction aim at approximating the visual hull of the imaged object; i.e. the maximal shape that yields the same silhouettes as the actual object for all views [21]. The earliest attempts use polyhedral [22], volumetric representations [23] or local surface models [24]. Some recent approaches apply a fusion of silhouette constraints and photo-consistency, either by combining them into a single cost function [25], or by using silhouette points to constrain the computed surface based on stereo information [26, 27].

The complications with space carving and shape from silhouettes methods arise when images are segmented incorrectly, due to the fact that when a voxel is erroneously removed initially, it could emerge as an artificial hole in the final 3D model which was absent in the real input photographs. A common approach to resolve this issue is to apply a global threshold for light variations around the scene and a photo-consistency metric as a following step. Even so, a single threshold usage implies difficulties in achieving optimum results due to several noise and quantization errors. Moreover, simultaneous use of silhouettes and photo-consistency may introduce a bias in the reconstruction in close proximity to the visual hull. Thus, Vogiatzis et al [28] proposed a two-phase approach where initially the visual hull is computed and then is refined, in the second phase, considering photo-consistency and Kolev and Cremers [29] introduced the combination of both multi-view and fusion methods on convex domains.

In this paper, the classical problem of inferring a dense 3D volumetric reconstruction of an object from a collection of calibrated views along a common global coordinate system is considered. The proposed method shares certain aspects with the space carving and the shape from silhouettes methods. In similar fashion to [28] and [29], the first step in the proposed method is the application of a straightforward silhouette segmentation method based on image thresholding, leading to a visual hull preserving the foreground voxels from those in the background. As a second step, a metric used to prevent erroneous removal of voxels is derived from the compensation of the luminosity changes among voxels and is calculated within the HSL color space. The most notable novelty of our proposed algorithm lies in the computational simulation of the behavior rules of ant colonies as an optimization technique for further refinement of the final 3D models. While other researchers avail themselves of several fusion techniques for choosing the appropriate voxels and surface elements to refine the final output, this paper shows the potential of fusing silhouette-based and nature-inspired methods to provide more accurate 3D models.

The remainder of this paper is outlined as follows: in section 2, the luminosity-compensated, voxel dissimilarity measure is explained and described. Subsequently, in section 3, the optimization framework, based on the intelligent behavior of cooperating artificial ants, is presented. A detailed description of all stages of the algorithm is presented in section 4. Results of experiments are presented and validated against state-of-the-art implementations in section 5, while this paper concludes with some final notes and suggestions for future work in section 6.

2. Lightness compensating image comparison

In multiple view vision methods, the problem of finding correspondences becomes increasingly difficult to resolve as the number of available views increases. This is mostly due to specular reflections and the general change in the ambient illumination, when observed from significantly differentiated viewpoints. As a result, the need for robust and lightness-invariant pixel dissimilarity measures is evident in multi-view vision, with the chosen dissimilarity measure defined and calculated within the HSL color space, which can be represented by a double cone. H stands for hue and signifies the human impression of colors depicted. Each color is represented by an angular value ranging between 0° and 360° (0 for red, 120 for green and 240 for blue). S stands for saturation and quantifies how vivid or gray the particular color is. Its value ranges from 0 for gray to 1 for fully saturated (pure) colors. The L channel of the HSL color space stands for luminosity determining the intensity of a specific color. It ranges from 0 for completely dark colors (black) to 1 for fully illuminated colors (white). Thus, in the HSL color space, lightness is distinguished from the other characteristics of color. This implies that a given color will, theoretically, result in the same values of hue and saturation regardless of the environment's illumination conditions. Ignoring the luminosity channel will inevitably lead to loss of information and, as a result, to slightly inferior results for ideal lighting, although it will still provide robustness against real, non-ideal and non-uniform lighting conditions [13]. The omission of the vertical (L) axis from the color space representation leads to a 2D circular disk, defined only by H and S. In this reduced color space each color P_j can be represented as a planar vector with its initial point being the disk's center. As a consequence, it can be described as a polar vector or equivalently as a complex number with modulus equal to S_j and argument equal to H_j. Thus, a color in the new luminosity-ignoring color space representation can be described as

$\begin{equation} {{\bf P_j}} = S_{j}\,{\rm e}^{{\rm i}H_{j}}. \end{equation} \tag{ 1 }$

Based on this color description, a luminosity-compensated dissimilarity measure (LCDM) has been proposed in [13], according to which, the variance of two colors P₁ and P₂ can be found in the reduced HS color space as the difference of the two complex numbers:

$\begin{eqnarray} &&{\rm LCDM}_{P_1,P_2} = |{\bf P_1} - {\bf P_2}|\nonumber\\ &&\hphantom{{\rm LCDM}_{P_1,P_2}} = |S_{1}{\rm e}^{{\rm i}H_{1}}-S_{2}{\rm e}^{{\rm i}H_{2}}|\nonumber\\ &&\hphantom{{\rm LCDM}_{P_1,P_2}} = \sqrt{{S_{1}^2} + {S_{2}^2} - 2S_{1}S_{2}\cos (H_{1}-H_{2})}. \end{eqnarray} \tag{ 2 }$

Equation (2) is the mathematical formulation of the LCDM dissimilarity measure, which takes into consideration any chromatic information available, except luminosity. In contrast to other popular dissimilarity measures, such as absolute differences or squared differences, LCDM can show robust behavior against viewpoint-dependent chromatic differentiations. The LCDM exhibits such practical attributes that renders it an ideal measure to manage the stereo correspondence problem [14]. In particular, compared to other dissimilarity measures LCDM was assessed against, it has been shown to show significant robustness against changes that illumination differentiations introduce to the RGB values of pixels. Our aim is to take advantage of the behavior that LCDM has exhibited, in the case of binocular vision, and extend it to the multi-view domain. Consequently, the LCDM has been adopted as the chosen voxel dissimilarity measure in the algorithm, presented herewith.

3. Ant colonies

In the approach discussed in this paper, the search activities are centered around so-called ants, i.e. agents with very simple basic capabilities which, to some extent, mimic the behavior of real ants, with the view to advance research in multi-view reconstruction. The earliest literature on the ant-based techniques, which kicked off the respective research, is considered to be the ant colony optimization (ACO) [30]. This heuristic has been used successfully to solve a wide variety of problems such as the traveling salesman problem [31], image retrieval [32, 33], classification [34, 35] and data flow in communications networks. The simple question arising from the usage of the ACO in the above interdisciplinary applications is: How do the ant algorithms work? Ant algorithms are basically a colony of cooperative agents, designed to solve a particular problem. These algorithms are by their nature stochastic because they avoid the local minima entrapment and provide very good solutions close to the natural solution [36]. More specifically, one of the problems studied by entomologists was to understand how almost blind animals like ants could manage to establish shortest route paths from their colony to feeding sources and back [30]. Real ants are in some ways very unsophisticated insects. Their memory is known to be very limited and they exhibit individual behavior that appears to have a large random component. However, acting in a collective way, ants collaborate to achieve a variety of complicated tasks with great reliability and consistency, such as finding the shortest pathway from their nests to a food source, among a set of alternative paths. The most vital feature of this collaboration is that only local information is required. Ants could exchange information in two different ways: indirect communication, called stigmergy, and direct communication. Stigmergy is biologically realized through pheromones, a special secretory chemical described by evaporation ratio and deposited as a trail by individual ants when they move. Specifically, it was found that the medium used to communicate information among individuals regarding paths, and used to decide where to go, consists of pheromone trails. A moving ant lays some pheromone (in varying quantities) on the ground, thus marking the path by a trail of this substance. While an isolated ant moves essentially at random, an ant encountering a previously laid trail can detect it and decide with high probability to follow it, thus reinforcing the trail with its own pheromone. The collective behavior that emerges is a form of autocatalytic behavior where the more the ants follow a specific trail, the more attractive it becomes for being followed [31, 37]. The process is thus characterized by a positive feedback loop, where the probability with which an ant chooses a path increases with the number of ants that previously chose the same path.

Through the famous double bridge experiment, Goss et al [38] provided a mathematical point of view as the aforementioned experiment gives a probabilistic model for this type of foraging behavior of ants. More specifically, due to the fact that ants can detect pheromone, when choosing their way, they tend to choose paths marked by strong pheromone concentrations. In ACO algorithms, an ant will move from point i to point j with probability

$\begin{equation} p_{i,j}=\frac{{\big(\tau ^{\alpha }_{i,j}\big)}{\big(\eta ^{\beta }_{i,j}\big)}}{\displaystyle \sum {\big(\tau ^{\alpha }_{i,j}\big)}{\big(\eta ^{\beta }_{i,j}\big)}}, \end{equation} \tag{ 3 }$

where τ^α_{i, j} and η^β_{i, j} are the pheromone value and the heuristic value associated with an available solution route, respectively. Furthermore, α and β are positive real parameters, the values of which determine the relative importance of pheromone versus heuristic information. During their food search, all ants deposit a small quantity of specific pheromone type on the ground. As soon as an ant discovers a food source, it evaluates the quantity and the quality of the food and carries some back to its nest. During the return trip, every ant with food leaves on the ground a different type of pheromone of specific quantity, according to the quality and quantity of the food found. In the ACO algorithms, pheromone is updated according to

$\begin{equation} \tau _{i,j}={\left(1-\rho \right)\tau _{i,j}+\Delta \tau _{i,j}}, \end{equation} \tag{ 4 }$

where τ_{i, j} is the amount of pheromone at a given position (i, j), n is the rate of the pheromone evaporation and Δτ_{i, j} is the amount of pheromone deposited, typically given by

$\begin{equation} \Delta \tau _{i,j}=\left\lbrace \begin{array}{@{}ll@{}}1/L_{k} &\mbox{ if ant $k$ travels on edge $i,j$} \\ 0 &\mbox{ otherwise} \end{array} \right\rbrace , \end{equation} \tag{ 5 }$

where L_k is the cost of the kth tour of an ant (typically measured as length). Finally, the created pheromone trails will guide other ants to the food source. An example of how a real ant colony manages to find the shortest route from their nest to the food, based on different quantities of pheromone deposited, analogous to different routes is presented in the experimental setting of figure 1. Consider for example the experimental setting shown in figure 1. The ants move along the path from food source F to the nest N. At point B, all ants walking to the nest must decide whether to continue their path from point C or from point H (figure 1(a)). A higher quantity of pheromone on the path through point C provides an ant a stronger motivation and thus a higher probability to follow this path. As no pheromone was deposited previously at point B, the first ant reaching point B has the same probability to go through either point C or point H. The first ant following the path BCD will reach point D earlier than the first ant which followed path BHD, due to its shorter length. The result is that an ant returning from N to D will trace a stronger trail on path DCB, caused by the half of all the ants that by chance followed path DCBF and by the already arrived ones coming via BCD. Therefore, they will prefer path DCB to path DHB. Consequently, the number of ants following path OCB will increase with time unlike those following path BHD. This causes the quantity of pheromone on the shorter path to grow faster than that of the longer one. Consequently, the probability a single ant will choose the shorter path is greater.

Inspired by such stochastic behavior of real ants, ant algorithms become the software tools that coordinate the update of the information of a common memory similar to the pheromone trail of the real ants. When a number of these simple artificial agents cooperate in the basis of memory updating, they are able to build good solutions to hard combinatorial optimization problems. Accordingly, the ACO algorithms exploit a colony of artificial ants or cooperative agents, designed to solve combinatorial optimization problems. However, as mentioned before, the artificial ant agents possess properties that differentiate them from real ants and thus involve various ant algorithm-based systems [37, 39]. Along with these unique features that enhance the capabilities of the artificial agents, there are other governing parameters such as the optimum number of ants, the pheromone decay rate and the constants that make the solution converge to the experimental results.

**Figure 1.** An example of a real ant colony. (a) An ant follows BHD path by chance. (b) Both paths are followed with same probability. (c) Larger number of ants follow the shorter path.
Download figure:
Standard image

4. Algorithm description

The proposed algorithm consists of several stages and these are depicted using a flowchart in figure 2. A detailed description of the different stages is given below:

4.1. Silhouette extraction

Initially, the image set is projected into an array of voxels. A straightforward, image thresholding approach is then used, resulting in the first black-and-white silhouette of each source photograph. Having this introductory 2D approximation of the shape of the model, a voting scheme similar to the one proposed by Yemez and Schmitt [40] is then used. Once the initial projection of the voxel is known, an intersection test between the projection of the voxel and the estimated silhouette is attempted. Voxels related to the image foreground are reserved and labeled as IN, those related to the background are rejected and labeled as OUT, while those lying on the contour of the silhouette are labeled as ON.

4.2. Visual hull construction

As has already been mentioned, difficulties applying space carving algorithms arise when images are segmented incorrectly, due to the fact that when a voxel is erroneously removed, it cannot be retrieved later on. Literally, it means that in the final 3D model there could emerge artificial holes of no relevance to the real input model. At this step, any voxel labeled either as foreground (IN) or as contour (ON) provides a contribution to the resulting silhouettes. They are subsequently used as an input to carve the remaining available 3D space by intersecting the cones generated by back-projecting the object silhouettes of all the photographs. The final maximum shape is consistent with the object's silhouettes as seen from any viewpoint in a given region and it constitutes the visual hull of each view. However, the major drawback of this algorithm is its inability to cope with concavities in the examined objects. Since a proper performance criterion would be based on the ability of a reconstruction model to deal with difficult angles and complex views, this drawback has led to the employment of the sophisticated voxel dissimilarity measure described in the next step.

4.3. Luminosity-compensated dissimilarity measure

In order to check for the existence of a voxel inside the resulting carved 3D volume, the variance of the color samples has to be computed. A large amount of variance indicates voxels unlikely to originate from the same surface point and, thus, the particular voxel should be carved away. As the viewpoints of the scene can vary significantly, shadowing and other phenomena can alter the perceived color of the same surface point. Thus, a luminosity-compensating similarity (or, equivalently, dissimilarity) measure for the remaining voxels is needed. The chosen dissimilarity measure is defined and calculated within the HSL color space. As each element of this particular color space can be processed separately, the dissimilarities appearing in voxels due to major lightness contrasts can be managed as described in section 2. The illumination alterations introduced by the initial RGB values of the pixels across both working datasets can be disregarded as long as significant increase in robustness is shown. Preserving every piece of chromatic information available, except luminosity, can cause many voxels previously labeled as ON to be discarded, as most of the lightness disparity of the model appears on the surface points near its contour and the remaining with lower lightness disparity, to be re-labeled as IN and be part of the final 3D model. To show this, figure 2 shows the results after applying the LCDM on the visual hull model and it is clear that numerous details are preserved from the voxel carving, e.g., the gaps between the steps in the front, as well as the details on the columns of the temple.

4.4. Ant colony optimization

In order to improve the model resulting from the application of LCDM, an optimization framework based on the intelligent behavior of cooperating artificial ants is used for the first time in such an application. The simulation of ant colonies makes use of the behavior rules observed in nature to design cooperation strategies and local stigmergetic communication. The combination of those two features of artificial ant colonies allows our implementation to discard any additional lightness variations and further refines the reconstructed model. As mentioned in section 3, ants are able to decide their travel path from one point to another using the probability p_{i, j}. At this point every voxel is considered to have a center position in 3D (x, y, z) and labeled, either as a foreground or a contour voxel. The ant agents start traveling through all images resulting from previous steps trying to find the best paths which include the corresponding voxels with the same labels across all views. Thereby, the amount of pheromone left in every similar voxel of the 3D model is increased, being visible from all views and assigned the same label. In this way, all voxels with large amounts of pheromone come to constitute the final model. The adoption of the ant colonies simulation has led to remarkable results in both examined datasets while the models produced remain unaffected from any shortage of heavy texture, a significant improvement over other methods found in the literature. In the next section, our experimental results are presented showing the proposed method's remarkable performance, as compared with some state-of-the-art implementations.

5. Experimental results

In order to quantify the improvements contributed by our approach, a number of standard real-world objects were examined. A qualitative benchmark has been implemented using the Middlebury dataset which provides volumetric ground truth models of two models which was introduced by Seitz et al [41]. It consists of the temple of the dioskouroi and a stegosaurus recorded by three different camera setups. The objects were illuminated by multiple light sources and captured with a camera mounted on a calibrated spherical gantry. Images with cast shadows, where the camera or the gantry was in front of a light source, were removed from the datasets. The full dataset consists of 312 views for the temple and 363 for the dino, the ring counterpart uses 47 camera views for the temple and 48 for the dino on a ring setting around the objects, while the sparse version uses 16 sparse views on a ring setting around both objects. All images for all six setups have been captured with 640 × 480 pixel resolution. Figure 3 depicts sample views of the two objects and the spherical gantry used for image capturing. As the respective images of both objects were acquired in a controlled indoor environment, the final silhouettes were resolved by thresholding intensity values of background pixels. Remaining pixels with a lightness value of at least 10 were accepted as foreground pixels and kept their place in each object. The temple object is a 159.6 mm tall plaster replica of an ancient Sicilian temple. It is quite diffused and contains numerous geometric structures and textures challenging to reproduce. Deep concavities are present, especially in the back of the object, as well as in the steps at the front. Figure 4 depicts the evolution of the object during successive steps of algorithm execution. As for the dino object, it is a 87.1 mm tall, white and strongly diffused plaster dinosaur model. On the other hand, the dinosaur figurine, despite its great surface smoothness and weak texture, possesses deeper concavities, especially between the legs of the model. Figures 5 and 6 show a qualitative analysis for both datasets using all camera setups. The availability of the ground truth is ideal for a correct analysis and it is clear that after processing all the image data, most of the uncertainties are eliminated. The LCDM module together with the ant colony refinement results in a smoother surface. Concavities, normally the cause of challenging complications, are now easily identified and reconstructed with remarkable detail as the number of images available for processing increases.

**Figure 3.** Sample views of the two objects used for evaluation. On the top left the *temple* and on the top right the *dino*. In the camera setup, the positions of the camera views are presented for the *full*, *ring* and *sparse* datasets.
Download figure:
Standard image

**Figure 4.** Evolution of the *temple* object using the *full* camera setup while using all 312 views. From top to bottom: the visual hulls produced as an initial approximation of the properties of the object, the final object after LCDM measure and ant colony refinement and the object as a textured model.
Download figure:
Standard image

**Figure 5.** Qualitative performance of the proposed approach along the six different camera setups of *temple* and *dino* datasets. From left to right: the original photograph, results using the *sparse*, *ring* and *full* setup and, finally, the ground truth of the model acquired by a laser scanner.
Download figure:
Standard image

**Figure 6.** Qualitative performance of the proposed approach along the six different camera setups of the datasets from another view. In a similar fashion to that in the previous figure, from left to right, for both rows: the original photograph, results using the *sparse*, *ring* and *full* setup and, finally, the ground truth of the model acquired by a laser scanner.
Download figure:
Standard image

Every element of the proposed method has been implemented using Matlab in a PC equipped with a Core2 Quad processor synchronized at 2.83 GHz for each core and equipped with 8 GB of DDR3 RAM. As the algorithm is using voxels to register an initial bounding box around the object, the memory capacity of the computer is crucial. A starting volume of 480³ voxels was used as a bounding box for all three different camera setups for both datasets. Table 1 shows the remaining voxels after the initial carving for each counterpart of the datasets along with the runtime needed to execute the algorithm from silhouette extraction to final textured model. A qualitative comparison of several methods found in the literature is depicted in figure 7. Moreover, in figure 8, a sample of zoomed portions of the reconstructed models is shown for a better appreciation of the results. The full camera setup is used in all cases, taking advantage of all available images in the dataset. The proposed algorithm shows notable outcomes with competitive performance. Furthermore, the runtimes needed for computing the final 3D models used in figure 7 are presented in table 2 showing that our algorithm is quite promising, considering the implementation details.

**Figure 7.** Qualitative comparison of the currently proposed approach with results from other researchers. The *full* camera setup is used in all cases. From left to right for both rows: the currently proposed method, results from Jancosek *et al* [42], Guillemaut and Hilton [43], Esteban and Schmitt [26] and Goesele *et al* [44].
Download figure:
Standard image

**Figure 8.** Zoomed portions of the reconstructed models from the currently proposed method and ones in the literature.
Download figure:
Standard image

Table 1. The number of voxels used for each camera setup and the respective runtime of the algorithm.

Dataset	Cameras	Total grid voxels	Visual hull voxels	Runtime (H:M:S)
Temple full	312	480³	232³	06:15:12
Temple ring	47	480³	216³	03:05:15
Temple sparse ring	16	480³	208³	00:45:35
Dino full	363	480³	216³	06:20:25
Dino ring	46	480³	208³	02:45:15
Dino sparse ring	16	480³	200³	00:35:25

Table 2. Runtimes for all methods shown in the comparison in the format hour:minute:second (H:M:S).

Dataset	Proposed method	Jancosek et al	Guillemaut and Hilton	Hernández and Schmitt	Goesele et al
Temple full	06:15:12	00:43:25	02:35:11	01:20:00	226:40:00
Temple ring	03:05:15	00:17:44	00:35:24	02:00:00	034:00:00
Temple sparse ring	00:45:35	00:06:01	00:23:21	02:10:00	011:26:48
Dino full	06:20:25	02:13:17	06:12:08	06:16:00	318:28:00
Dino ring	02:45:15	00:37:32	00:41:25	02:06:00	041:56:00
Dino sparse ring	00:35:25	00:23:16	00:25:36	01:46:00	004:03:12

6. Conclusions

A method that performs 3D object reconstruction based on multiple views of the same scene has been presented. The method is based on a space carving algorithm equipped with a lighting compensating dissimilarity measure and refined by an artificial ant colony. The results of the dissimilarity measure employed exhibit robust behavior against lightness variations, a very important consideration for multi-view algorithms, while the use of ant colonies as an optimization method imparts the algorithm with further robustness without sacrificing total runtime. The dissimilarity measure employed is defined and calculated within the HSL color space. As each element of this particular color space can be processed separately, dissimilarities appearing in voxels, due to major lightness differentiations, can be managed. In order to improve the model resulting from the application of LCDM, an optimization framework based on the intelligent behavior of cooperating artificial ants is utilized. The simulation of ant colonies makes use of the behavior rules observed in nature, to design cooperation strategies and local stigmergetic communication. The combination of those two features of artificial ant colonies allows our implementation to discard any additional lightness variations and further refines the reconstructed model achieving highly detailed reconstructions. In addition, such combination of algorithms is very amenable to parallel implementation that can be further accelerated by being implemented on programmable graphic processing units and potentially resulting in great speedups over conventional optimization techniques. Contemporary solutions, such as the CUDA software, comprise an efficient way of realizing such implementations. The proposed technique is applicable to all application involving 3D measurements, e.g. in manufacturing, including tasks such as painting, manipulating, milling, etc, as well as to standard robotic and machine vision applications.

Acknowledgments

The authors would like to express their gratitude to the reviewers of this paper for their valuable remarks.

Multi-view 3D scene reconstruction using ant colony optimization techniques

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. Lightness compensating image comparison

3. Ant colonies