Comparative study of sampling strategies for sparse photon multispectral lidar imaging: towards mosaic filter arrays

In this paper, we investigate the recovery of range and spectral profiles associated with remote three-dimensional scenes sensed via single-photon multispectral lidar (MSL). We consider two different spatial/spectral sampling strategies and compare their performance for a similar overall number of detected photons. For a regular spatial grid of pixels, the first strategy consists of sampling all the spatial locations of the grid for each of the L wavelengths. The second strategy is consistent with the use of mosaic filter-based arrays and consists of acquiring only one wavelength (out of L) per spatial location. Despite the reduction of spectral content observed in each location, the second strategy has clear potential advantages for fast multispectral imaging using only a single frame read out. We propose a fully automated computational method, adapted for each of the two sampling strategies in order to recover the target range profile, as well as the reflectivity profiles associated with the different wavelengths. These strategies were also assessed with high ambient background. The performance of the two sampling strategies is illustrated using a single-photon MSL system with L = 4 wavelengths (473, 532, 589 and 640 nm). The results presented demonstrate that although the first strategy usually provides more accurate results, the second strategy does not exhibit a significant performance degradation, particularly for sparse photon data (down to 1 photon per pixel on average). These results suggest a way forward for the integration of single-photon detector arrays with mosaic filters for use in a range of emerging photon-starved two-dimensional and three-dimensional imaging applications.


Introduction
In recent years, single-photon timing has emerged as a candidate technology for high-resolution three-dimensional profiling [1], and the performance of the approach has been demonstrated in a number of field trials [2][3][4]. Time-correlated single-photon counting (TCSPC) is a statistical sampling technique which records the arrival time of detected photons with respect to the emitted laser pulse or absolute time. The timing resolution of the system is limited by the overall system response, which is typically limited by the variance in detector rise-time, resulting in a timing error that can often be measured in 10s picoseconds (ps), considerably less than conventional analogue optical detection approaches. In addition, the high sensitivity of single-photon detectors can mean detection over longer ranges and/or the use of lower power laser sources. However, a potential drawback of singlephoton counting is that the integration times required for accurate depth measurement can be too long, although advances in ps resolution single-photon detector arrays in both Si-CMOS [5,6] and InGaAs/InP [7,8] have shown excellent potential for depth profiling with rapid data acquisition. Although most single-photon avalanche diode (SPAD) arrays with TCSPC functionality have typically been available in a 32×32 pixel format or similar, developments in stacked CMOS offer the potential of much larger format SPAD arrays with ps timing [9,10]. Such large format SPAD arrays may provide the ideal focal plane array platform for the form of mosaic filter described below. Even with singledetector scanning systems, significant reductions in acquisition time have been demonstrated by application of advanced computational imaging approaches, such as first-photon [11] or other methods dedicated to single-photon images [12,13] which have allowed depth images to be reconstructed with very few photon returns, typically in the range of 1 photon per pixel on average.
Single-photon approaches have been used for demonstrations of multi-spectral depth imaging for target identification [14] and for the measurement of the physiological parameters of foliage [15]. In both cases, the depth information for each wavelength was required, as was consideration of the relative returns as a function of distance. In this manuscript, we have scanned a colourful target using multiple wavelengths in order to reconstruct images with depth and colour information. Using red, green and blue wavelengths, it is possible to reconstruct RGB-depth images of the scene. It is important to note that RGB-depth images can also be reconstructed by combining a colour camera with a singlewavelength single-photon lidar system. However, such an approach can present several drawbacks, in particular for applications where the scene is reconstructed from extremely low photon counts, as in our work: (i) estimating target reflectivity without timing consideration generally makes the discrimination between source and ambient illumination photons challenging and makes the reconstruction less robust to illumination conditions; (ii) data registration issues may arise from the use of two or more imaging systems; and (iii) using a single wavelength for ranging can provide poor depth profile (surfaces with poorly reflecting colour not being detected) if the scene of interest contains regions of low reflectivity at that wavelength. By spreading the emitted photons across multiple wavelengths, we can expect better detection of the various objects of the scene.
In recent years, the implementation of mosaic filters, or multispectral filter arrays, to produce simultaneous multispectral images on a single frame read-out has been pursued by several groups [16,17]. Spatial arrangements of a small set of L filter types were designed with each individual filter aligned to a known single detector pixel. This means that images can be reconstructed for each of the L wavelengths after the simultaneous measurement at all wavelengths on an individual focal plane array. We have simulated this arrangement using our optical scanner to move to individual pixels in order to map out pre-determined and randomised masks for each of four distinct wavelengths. By using a predetermined scanning sequence to form a different mask set for each wavelength, we can the investigate the full image reconstruction at each wavelength. Furthermore, we examined the reconstruction of full-colour image at the signal level of approximately one photon per pixel, on average. These investigations suggest a way forward for mosaic filter technology to be used in conjunction with individual SPAD arrays in next generation single-photon multispectral lidar systems, using signal levels in the sparse photon regime.

Experimental setup
The time-of-flight system used for these measurements was based on the TCSPC technique. This technique measures the time difference between the outgoing laser pulse and a photon detection event associated with that laser pulse. In these measurements, the detector used was a SPAD. Typically, over a number of laser pulses, individual photon events are recorded, which can be used to determine the time-of-flight to the target. A schematic of the system used for these measurements is shown in figure 1, and a schematic depicting the optical layout of the transceiver unit is shown in figure 2.
The pulsed illumination used in this experiment was provided by a supercontinuum laser source (SuperK EXTREME EXW-12, NKT, Denmark) in conjunction with an acousto-optic tunable filter. The spectrally tunable source was fibre-coupled to a custom-built transceiver unit. The average optical power levels supplied to the transceiver for the four illumination wavelengths used in these measurements were in the range 0.68-1.75 mW. These power levels were too high for a target at the 1.8m stand-off distance used for Schematic layout of the system. The spectrally tunable supercontinuum laser system provides a pulsed illumination to the transceiver unit and an electrical synchronous start signal to the TCSPC module. The stop signal was provided by the electrically gated SPAD detector which was fibre-coupled to the receive channel of the transceiver unit. The transceiver unit contained an optical scanning mechanism which scanned the target scene for each of the four selected wavelengths using the approaches described in the manuscript.
these measurements, and appropriate attenuation was used to reduce the average optical power at the target to approximately 300 nW. For the measurements reported here, the laser output had a repetition rate of 19.5 MHz and was coupled in to the transceiver unit via a m 5 m diameter core photonic crystal, polarisation maintaining fibre (FD7-PM, NKT Photonics, Denmark). The optical configuration of the transceiver unit was broadly similar to the set-up used in [18] with the optical components chosen to optimise the system for a wavelength range of -400 700 nm. The transceiver system was monostatic, meaning that the transmit and the receive channels share a common beam path resulting in a parallaxfree system which allowed for operation over a wide range of target distances. A polarising beam splitting cube (PBS-251, Thorlabs) with a wavelength range of -420 680 nm was used to de-multiplex the return signal from the common channel. Two scanning galvanometer mirrors (GM1 and GM2 in figure 2) were used in order to steer the beam over the XY plane (which was nominally perpendicular to the incident beam). A Canon objective lens (OBJ in figure 2) with an effective focal length of 100 mm and the aperture set at f/8 was used to focus the transmitted light on to the target. Light scattered from the target was then collected by this lens and coupled back in to the receive channel. The focused spot size was approximately m 200 m. A thin-junction Si-SPAD (PDM series, Micro Photonic Devices), with an operating range from 400 to 900 nm, was used to detect this return signal. The return signal was coupled to the single-photon detector using a multimode optical fibre ( m 50 m diameter core). The detector had a timing jitter of approximately 50 ps full width at half maximum (FWHM) and an active area of m 50 m diameter.
One of the disadvantages of using a monostatic system in conjunction with a detector with single-photon sensitivity is that the optical components within the system can produce significant, (yet predictable) back reflections. Hence, we used an electrical gating approach (gate duration 30ns), so that the detector was intentionally de-activated at the expected return time of the unwanted back-reflections from the optical components in the common transmit/receive channel. Electronic gating of the detector was provided by a pulse pattern generator (81134A Keysight Agilent). Failure to use such detector gating can result in incorrect depth measurements and, more commonly, saturation of the detector or possible pulse pile-up effects associated with very high levels of detected photon returns [19]. The relative lack of afterpulsing effects in Si-SPAD detectors, compared to InGaAs/InP SPAD detectors [20] for example, means that use of detector gating will not significantly alter the background level of the signal. Hence, the signal-to-noise ratio of the return signal will not be affected by detector gating since the gate will always encompass the temporal envelope of both the return signal and background.
During the measurement scans, each detection event was time-tagged by the TCSPC module (HydraHarp 400, Pico-Quant GmbH, Germany) resulting in an independent time-offlight measurement for each individual event. A synchronous electrical signal from the laser source provided both a start trigger to the TCSPC module and a trigger to the electrical gating of the Si-SPAD detector. The stop trigger was provided by the detector and each measurement was stored, with respect to the synchronisation signal, in the form of timing histograms for each pixel. In the case of these measurements, a histogram with a timing bin size of 2 ps was constructed for each pixel. The measurements reported here were performed in a laboratory with the target placed at a stand-off distance of 1.8 m from the transceiver unit. The target, as shown in figure 3, was a small, colourful Lego minifigure attached to a matt grey backboard.
Four illumination wavelengths of 473, 532, 589 and 640 nm were used to perform these measurements. These wavelengths were chosen in order to cover the visible range of the spectrum and for later colour image reconstruction. The average optical power for each wavelength was adjusted so that the number of photon returns for each of the four wavelengths was the same (to within 2%) when using a Spectralon panel. The Spectralon panel (SRT-99-050 Spectralon Diffuse Reflectance Target, Labsphere) with 99% reflectivity for the four wavelengths considered was selected to provide a calibrated Lambertian scatterer operational across the wavelength range used in these experiments.
In order to obtain the temporal instrument responses of the system, the panel was placed at a stand-off distance of 1.8 m from the transceiver unit, nominally in the same plane as the target, and a 100 s single-point acquisition was made for each of the four wavelengths. This resulted in waveforms constructed from more than 1000 photons whose peak-tobackground ratio was larger than 1000. Due to the short target distance, the optical power was very low with approximately 300nW average power being used in these measurements. The overall timing jitter of the system ranged between 50 and 120ps FWHM, depending on the operational wavelength, as shown in figure 4. The timing jitter includes the laser pulse duration, detector response, and jitter contributions from other components (e.g. timing modules, cabling, etc). The variation in timing jitter seen at the four different wavelengths used is due mainly to changes in laser pulse duration. As shown in figure 4, there is a difference in arrival time of the pulses at the different wavelengths. This time delay is greater than one nanosecond between the shortest and longest wavelengths used in this experiment. The reason for this spectrallydependent delay is chromatic pulse dispersion of the white light continuum pulse within the optical fibre lasing medium.
Additionally, an estimation of background, and relative illumination over the field of view, was taken by scanning an area of the Spectralon target corresponding to the extent of the target area, using a per-pixel acquisition time of 10 ms. These reference measurements were performed in dark conditions.
A pixel format of 200×200 was used over a scanned area of approximately46 46 mm at the target plane in each of the measurements reported here. This pixel-to-pixel pitch of approximately m 230 m was slightly larger than the spot size of the focussed illuminating beam (approximately m 200 m). As mentioned in the Introduction, two sampling methods were used for these measurements of the regular spatial grid of pixels: • Full frame: this consisted of sampling all of the spatial locations in the200 200 pixel grid with each of the four wavelengths, using a separate scan for each wavelength. The scans were taken using a combined per-pixel acquisition time of 40ms (10ms per wavelength). • Simulated mosaic filter: the second strategy, which mimics the use of mosaic filter-based arrays, consisted of sampling each spatial location in the200 200 pixel grid with only one wavelength (out of four). This meant that the scan for each wavelength acquired data for 25% of the pixels, and a per-pixel acquisition time of 40 ms was used.
In order to simulate measurements that would be obtained when using an actual multispectral mosaic filter on a focal plane array sensor, four binary masks were created. These masks were used to denote the set of pixels that was to be sensed at a specific wavelength, with all other pixels fully blocked. More specifically, for each pixel of the 200×200 grid, the wavelength to be acquired was chosen by drawing from a uniform distribution defined on ¼ { } 1, , 4 (each wavelength presenting a probability of 1/4 to be selected). Thus, the 200×200 pixel mask associated with each of the four wavelengths had 25% of its pixels active. These masks were loaded into the scanning control software to indicate the set of XY spatial locations to be sensed for each wavelength. The data acquisition was achieved by sampling sequentially the four wavelengths, with 25% of the spatial locations being consecutively sampled for each wavelength before changing the illumination wavelength. An example of simulated multispectral mosaic filters (20×20 pixels) is depicted in figure 5 for illustrative purposes. Note that the resulting simulated multispectral mosaic filter does not account for potential spectral cross-talk since real mosaic filters used for wavelength selection are likely to have some level of light transmission at the other three wavelengths. In this work, we assume ideal multispectral mosaic filters and such cross-talk issues will be the subject of future research.

Observation model
This section introduces the statistical model associated with multispectral lidar (MSL) returns for a single-surface reflecting object which is used for 3D scene reconstruction. We consider a 4D array Y of lidar waveforms of dimensioń´Ń N L T row col , where N row and N col represent the number of rows and columns of the regular spatial sampling grid (in the transverse plane), L is the total number of spectral bands or wavelengths used to reconstruct the scene and T is the number of temporal (corresponding to range) bins. Let , , ,: , , ,1 , , , be the lidar waveform obtained in the pixel ( ) i j , using the ℓth wavelength. The element y i j ℓ t , , , is the photon count within the tth bin of the ℓth spectral band considered (assuming that a waveform is actually observed for this wavelength and spatial location). For each pixel, the set of observed wavelengths in denoted , . Pixels/wavelengths that are not observed satisfy = " . Due to the design of the experiments, for each pixel, the detected photons result from two main contributions: (1) from direct path reflections of the photons originally emitted by the laser sources onto the surface of the object of interest or (2) dark photon counts and ambient illumination (assumed to be stationary in time but potentially non-stationary spatially). Moreover, we assume that the laser beam (for each pixel) encounters a single surface which is assumed to be locally orthogonal to the beam direction (such that the width of the target response for each wavelength does not change). This is typically the case for short to mid-range (up to dozens of metres) depth imaging where the divergence of the laser source(s) can be neglected. Let d i j , be the position of an object surface at a given range from the sensor, whose mean spectral signature (observed at L wavelengths) is denoted as l . According to [12,21], each photon count y i j ℓ t , , , is assumed to be drawn from the following Poisson distribution where (·) g ℓ is the instrument response, evaluated at discrete time positions as discussed in section 3.2.2 and whose shape differ between wavelength channels. In equation (1), t i j , is the characteristic time-of-flight of photons emitted by a pulsed laser source and reaching the detector after being reflected by a target at range d i j , (d i j , and t i j , are linearly related in  free-space propagation). Moreover, the instrumental responses { (·)} g ℓ are assumed to be known, as occurs when they can be accurately estimated during imaging system calibration.
, , represent (positive) wavelength-dependent background illumination levels. Note that for the setup described in section 2, for which no spectral filter is used (in contrast to actual mosaic filter-based systems), the wavelength dependency does not apply as the system simultaneously captures the whole background illumination spectrum in each pixel. However, for generalisation purposes, in this work we assume that can change across the acquired spectral channels. It is important to recall that, in this work, we consider applications where the observed objects consist of a single visible surface per pixel. We do not consider cases where the photons can penetrate through objects (e.g. semi-transparent materials for which we could infer the internal composition) or be reflected from multiple surfaces. This assumption allows the estimation of the target spectral responses to be reduced to a two spatial dimensions problem. The problem addressed is to estimate the range of the target (for all the image pixels) and estimate the target spectral responses assuming that the background illumination is unknown. The next section introduces the Bayesian model for this problem. , 2 when it is assumed that the detected photon counts/noise realisations, conditioned on their mean in all channels/ spectral bands, are conditionally independent. Note that in equation (2),  l (· ) f ; denotes the probability mass function of the Poisson distribution with mean λ. Considering that the noise realisations in the different pixels are also conditionally independent, the joint likelihood can be expressed as and T is a matrix gathering the target ranges.

Prior distributions. Each target position is a discrete variable defined on
In this paper, we set 301, 300 min max and the temporal resolution of the grid is set to the resolution of the single-photon detection (i.e. 2 ps in section 4). As in [12], to account for the spatial correlations between depths of neighbouring pixels, we propose to use a Markov random field (MRF) as a prior distribution for t i j , given its neighbours ( ) is the neighbourhood of the pixel ( ) i j , and , . More precisely, we propose the following discrete MRF where   0 is a parameter tuning the amount of correlation between pixels,  ( ) G is a normalisation (or partition) constant and where f (·) is an arbitrary cost function modelling correlation between neighbours. In this work we propose to use the following cost function which corresponds to a total-variation (TV) regularisation [22,23] promoting piecewise constant depth image. Moreover, the higher the value of ò, the more correlated the ranges of neighbouring pixels. Several neighbourhood structures can be employed to define ( ) i j , ; here, a four pixel structure (1 order neighbourhood) will be considered in the rest of the paper for the MRFs used.
In a similar fashion to the depth parameters, we use TVbased priors for the target spectral signatures and background levels in L and B, respectively. More precisely, we assume that the elements of L and B take value in arbitrarily defined finite sets of discrete values (typically ( ) 0, 1 for the target reflectivity) and define the following prior models where L ℓ (resp. B ℓ ) stands for the target reflectivities (resp. background illumination) associated with the ℓth spectral channel. Note that these prior models rely on regularisation parameters ( g { } ℓ and n { } ℓ ) which control the amount of spatial correlation between parameters of adjacent pixels. This strategy allows us to account for spatial correlations between reflectivity parameters and between illumination levels to improve estimation performance. It also allows the consideration of an efficient algorithm able to automatically adjust the amount a spatial correlation pixels (as will be discussed in below).

Estimation strategy
Simultaneously estimating L B , and T from the observed waveforms Y is challenging mainly due to the multimodal nature of (3) (in particular with respect to T). In a similar fashion to [11,[24][25][26], we simplify the problem by estimating L B, and T sequentially, using assumptions which can often be satisfied in practice. The three estimation steps are detailed below.
3.3.1. Background estimation. First, we assume that we have access to a set of lidar waveforms which do not contain any target return and which are used to estimate the background illumination. Such data can be obtained for instance by recording calibration measurements with the laser source(s) being switched off. This however requires the ambient illumination to remain constant between the calibration measurements and the actual measurements. Instead, here we consider a fraction of the original data for which we do not expect any target to be present. A total of T=1100 consecutive bins for each pixel and wavelength are considered here. In that case, equation (2) reduces to i j , and the background levels can be inferred from the posterior distribution Here we resort to an adaptive Markov chain Monte Carlo (MCMC) similar to that proposed in [12,27] to compute the marginal maximum a posteriori (MMAP) estimators of the background levels given by where n approximates the marginal maximum likelihood estimator of n (the interested reader is invited to consult [12,27] for further details about the sampling strategy adopted here). As mentioned above, we used a simulationbased algorithm here in order to automatically adjust the regularisation parameters n. However, it is important to mention that is log-concave and that B could be inferred via standard maximum a posteriori (MAP) estimation using state-of-the-art convex optimisation techniques (provided that n is properly tuned). Note that the estimation of B (or L) from sparsely sampled data can be seen as an inpainting or matrix completion problem. However, due to the Poisson likelihood in (3) and the reduced number of detected photons considered in this work, inpainting methods relying on Gaussian noise assumption (e.g. [28,29]) might not be well adapted here and more specific algorithms (e.g. [30,31]) may be required.
, is the (discretised) integral of the intrumental response associated with an object with unitary reflectivity and located at t i j , from the detector. In this work, we assume that , i.e. that the integral of the instrumental response does not depend on the distance of the target. This is typically the case in practice when the admissible set of target ranges are far enough from the extreme bins of the recorded histograms, i.e. when the peaks associated with target returns are not truncated. In such cases, equation (11) becomes for each observed pixel , , , , , , , , which does not depend on the distance of the target. We can thus define the following new posterior distribution for L ℓ (conditioned on the previously estimated background levels which is exploited to estimate each L ℓ . In a similar fashion to the background levels, we resort to an adaptive MCMC method to compute the MMAP estimators of the target intensities, conditioned on the estimated background levels, and given by where ĝ ℓ approximates the marginal maximum likelihood estimator of g ℓ . Again, it is worth noting that is log-concave and that L could also be inferred via standard MAP estimation using state-of-the-art convex optimisation techniques (provided that g ℓ is properly tuned). In contrast to the distributions (9) and (13) considered to estimate the background levels and target intensities, is generally highly multimodal and thus cannot be maximised (w.r.t. T) efficiently using convex optimisation tools. Again, we use an efficient MCMC method to estimate simultaneously T (via MMAP estimation) and ò (via marginal maximum likelihood estimation).
The resulting algorithm, which is estimated sequentially B, L and T, together with the associated regularisation parameters is thus fully automated and does not require practitioners to adjust critical parameters. In addition to its computational efficiency and robustness (w.r.t. convergence issues), the proposed method can also be used to provide a posteriori measures of uncertainty associated with each estimation step. The interested reader is invited to consult [26] for examples of the use of such measures.

Results
In this section, we evaluate the performance of the generic algorithm proposed in section 3 for ranging and reflectivity estimation from MSL data using data acquired by the imaging system detailed in section 2.

Without ambient illumination
To investigate the effect of the sampling strategy on the recovered depths and reflectivities, we first consider measurements performed under dark conditions, with a negligible contribution from background ambient illumination. The ground truth depths and reflectivity parameters have been obtained from complete 200×200 pixel scans (also acquired with negligible ambient illumination) for each wavelength and long per-pixel acquisition times (40 ms per pixel and per wavelength, leading to approximately 40 000 photons per pixel on average for each spectral band) to reduce uncertainty caused by the Poisson noise statistics. As discussed in section 2, the optical power for each wavelength has been adjusted in order to detect on average the same number of photons for each wavelength within a given period when imaging the reference target (Spectralon panel). To allow a fair comparison between the two sampling methods, we artificially reduced the per-pixel acquisition time such that for both scenarios, the same number of photons per pixel is detected, on average. In other words, we compare the depth and intensity imaging performance of the two sampling strategies (and associated estimation methods) using X photons, either spread across four wavelengths (higher noise but more wavelengths) or associated with one of the four wavelengths (less noise but a single waveform). Figure 7 illustrates the degradation of the target reflectivity estimation as the average number of detected photons reduces, when considering four full scans (all the pixels are observed for each wavelength). This figure shows that even with almost 1 photon per pixel (i.e. 0.25 photons per wavelength), it is still possible to identify the main regions of the coloured target and to discriminate the target from the backplane. Figure 8 compares the estimated intensities obtained using the full scans and the measurements acquired using the simulated mosaic filter approach. Note that the reference RGB profile is barely distinguishable from the top left image and is thus not included. For high photon counts, both acquisition modes provide detailed coloured images, although the full scan enables finer details to be recovered. This can be explained by the mosaic sampling scene only acquiring 25% of the pixels for each wavelength. The missing information is only partially compensated by the spatial (TV) regularisation used when estimating the target reflectivities. As the number of detected photons reduces, both methods tend to provide similar results as the amount of uncertainty induced by the lack of photons in the observed pixels becomes more significant than the uncertainty, due to the amount of pixels that are not observed.
The performance of the two imaging strategies in terms of reflectivity estimation are compared quantitatively using the reflectivity absolute error (RAE) defined by where l i j ℓ , , (resp. l i j ℓ , , ) is the actual (resp. estimated) target reflectivity in the ℓth band of the pixel ( ) i j , . The top subplot of figure 9 compares the mean RAEs (±standard deviation) obtained with the two methods as the average number of detected photons decreases. As observed in figures 7 and 8, the performance of both methods degrades (higher mean RAEs and larger variances) as the data quality degrades. Yet, with almost 1 photon per pixel, the mean RAE remains below 0.1 for both methods. Note that for longer acquisition times (i.e. higher photon counts), the full-scan approach outperforms the other strategy which suffers from the reduced number of observed pixels (25%) for each wavelength. This difference arises mainly from the relatively small spectral  details of the minifigurine which are more difficult to recover when reducing the number of observed pixels. Generally, fullscan approaches will provide better estimated spectral profiles as they do not rely on interpolation schemes. However, the performance degradation will reduce when the scene is only composed of large spectrally homogeneous regions (compared to the resolution of the array) and when additional sources of uncertainty are no longer negligible (e.g. a reduced number of photons as in the top subplot of figure 9 or presence of ambient illumination as will be seen in section 4.2). Figure 10 compares the estimated range profiles obtained when reducing the number of detected photons or equivalently the per pixel acquisition time. Note that the reference depth profile (constructed from from approximately 4×40 000 photons per pixel on average) is barely distinguishable from the top left image and is thus not included.
This figure illustrates the general degradation of the ranging performance as the data quality degrades, especially at the boundaries between the Lego minifigure and the backplane.
The ranging performance of the sampling strategies is quantitatively measured using the depth AE (DAE) defined by where t i j , (resp.t i j , ) is the actual (resp. estimated) range associated with the target in the pixel ( ) i j , . To assess the ranging performance of the two methods, we compare the cumulative density functions (CDFs) of the DAE, which are less sensitive than the mean and standard deviations to large range errors, as can occur at the boundary between the target and the backplane due the relative large range difference between the two objects. The DAE CDFs obtained using the two methods are shown in figure 11. With more than 1100 photons per pixel on average, the DAEs obtained by the two methods is smaller than 1 mm for more than 95% of the pixels. In contrast to the estimated intensities, the range profiles obtained by the two methods with large photon counts are relatively similar (although the full-scan approach provides slightly better results). This can be explained if a single band is observed in each pixel, the overall number of detected photons in each pixel is generally large enough to provide sufficiently accurate range estimates. The full-scan approach is generally more robust since all the wavelengths are observed in each pixel. Using a single wavelength per pixel generally increases the probability of weak target returns (in particular if the target presents a low reflectivity for this wavelength) while the full-scan approach only requires at least one of the L reflectivity parameters to be high in order to obtain an accurate depth estimate. Moreover, the ranging performance also depends on the shape of the instrumental Figure 9. Mean RAEs obtained using full scans (red) and the mosaic filter approach (blue) as a function of the average 'useful' per-pixel photon counts (photons originally emitted by the laser source, and not events associated with background). The error bars represent the±one standard deviation intervals. The top graph corresponds to RAE obtained as a function of average photon count per pixel without ambient illumination, and the bottom graph with ambient illumination. Figure 10. Depth profiles (in mm) estimated from the full scans (top) and the mosaic filter approach (bottom) as a function of the average 'useful' per-pixel photon counts (photons originally emitted by the laser source, and not events associated with background). The reference range is arbitrarily set to be the closest target range within the field of view (i.e, the closest point on the boxerʼs gloves). Figure 11. Empirical cumulative density functions (cdfs) of the DAE obtained using full scans (red curves) and the mosaic filter approach (blue curves) without ambient illumination. In the title of each subfigure, the first value corresponds to the average per-pixel photon count obtained with the full scan, and the second refers to average per-pixel photon count using the mosaic filter approach. responses considered. As can be seen in figure 4, the peak of the IRF at 473 nm is broader than that at 640 nm. If a single wavelength per pixel is used, the ranging performance is thus better at 473 nm (assuming the target presents the same reflectivity at both wavelengths). Consequently, spreading the energy in each pixel across multiple wavelengths generally improves the ranging performance.
For completeness, figure 12 depicts examples of RGBdepth representation of the reconstructed target using the two sampling strategies compared in this work.

With ambient illumination
In this section, we investigate the impact of significant ambient illumination on the performance of the two sampling strategies. The experiments discussed in the previous section have been repeated with ambient illumination provided by a standard 25 W bulb table-lamp, while keeping all the other experimental parameters constant (optical power, acquisition time). Figure 13 compares estimated RGB profiles obtained via the two sampling strategies, with and without ambient illumination. This figure shows that the reflectivity estimation is not significantly affected by the ambient illumination, thus illustrating the reliability of the proposed algorithm for realworld ranging applications, such as daylight field trials in outdoor environments. The right-hand side column of figure 13 depicts the estimated ambient illumination levels at 473 nm, which are in good agreement with the experimental observation conditions (the external lamp was located on the left-hand side of the field of view as shown in figure 13). Note that since no spectral filter was used, the estimated background intensity profiles correspond to integrated intensities of broad-band photons (wide spectrum bulb lamp) and that the background levels estimated for the different scans (i.e. for different illumination wavelengths) are similar. However, if filters were used, it would be possible to reconstruct different background profiles, associated with different parts of the background illumination spectrum.
Figures 9 (bottom) and 14 compare quantitatively the reflectivity estimation and ranging performance associated with the two sampling strategies in the presence of ambient illumination. In terms of intensity estimation, figure 9 shows that the difference between the two sampling strategies   Empirical cumulative density functions (cdfs) of the DAE obtained using full scans (red curves) and the mosaic filter approaches (blue curves) with ambient illumination. In the title of each sub-figure, the first value corresponds to the average per-pixel photon count obtained with the full scan and the second value corresponds to the average per-pixel photon count obtained using the mosaic filter approach. reduces, especially for high photon counts. This can be explained by the uncertainty induced by the presence of additional sources that becomes more significant than the uncertainty induced by the degradation of the spatial resolution. Similarly, the difference in terms of ranging performance reduces, although the full-scan approach still provides more reliable range estimates for the reasons mentioned above. With less than 10 useful photons per pixel on average, the two methods provide very similar results, thus making future mosaic filters particularly attractive for fast imaging while maintaining good ranging performance and reliable reflectivity estimates.

Conclusion
In this paper, we have compared the ranging and colour estimation performance of two sampling strategies for the analysis of 3D scenes using a four-wavelength MSL system, namely a full scan approach and a mosaic filter approach. The results obtained show that the lack of spectral information induced by the acquisition of a single wavelength per pixel can be tempered to some extent by incorporating spatial correlation (through spatial regularisation) when recovering the depth and reflectivity profiles. In particular, the two strategies present similar performance when the overall number of detected photons reduces (or equivalently a reduction in overall acquisition time). For longer acquisition times, the mosaic filter approach generally performs slightly worse than the full scan approach using the TV-based spatial regularisations used in this work. However, this observation should be mitigated by the fact the performance of the mosaic filter approach could be further improved using other spatial regularisation approaches. Moreover, the performance degradation of the mosaic filter approach reduces in the presence of ambient illumination sources, which introduce more uncertainty than that associated with the reduced number of observed pixels. Hence, in photon-starved measurements, where the number of detected photons per pixel is less than 1, our results show that that the mosaic filter approach is likely to introduce very similar errors to the full scan approach. As discussed in section 2, due to the design of our simulated multispectral mosaic filter, the recorded data are not corrupted by potential cross-talk between different wavelengths, and this should be considered in later work. Nonetheless, we have demonstrated the potential for the mosaic filter approach for 2D and 3D imaging in the sparse photon regime. Future work includes the consideration of scenarios where multiple surfaces can be visible in a given pixel (e.g. as may arise for longer range applications that include scenes with foliage, transparent media, camouflage, etc), which would lead to multiple peaks in the recorded histograms.