Comparing Multiple Extended Object Tracking with Point Based Multi Object Tracking for LiDAR in a Maritime Context

In this paper, we compare a recently developed multiple extended object tracking method with a point-based tracker for maritime applications by evaluating the methods on both real and simulated LiDAR data. Being able to track other vessels is a key part of maritime situational awareness, and multi-object tracking is well suited to this task. Traditionally, multi-object tracking uses the point approximation to simplify the target tracking problem, meaning that an object is assumed to only generate a single measurement. With recent advances in sensor technology, this approximation is no longer valid. Multiple extended object tracking instead makes use of methods that estimates the extent of a target using all available measurements. However, the target models used in extended object tracking are by necessity more complex. We compare these two approaches on real and simulated LiDAR data, which was gathered by tracking smaller vessels. We find that the extended object method has greater performance on simulated data and experimental data with comparatively little wake clutter. However, on experimental data with more wake clutter, the point-based method outperforms the extended object method since it is less sensitive to the disturbance caused by wake clutter. We can conclude that multiple extended object tracking methods have the potential to be more accurate for LiDAR data, but since the models are more complex, measurement sources need to be modeled correctly.


Introduction
Situational awareness is a key property of an autonomous system since autonomous operation requires an understanding of the surrounding environment, of which other vessels are a key component.To estimate the state of other vessels based on a set of measurements, a key challenge is to determine which vessel generated which measurement.Handling this uncertainty in measurement origin is the key challenge in multi-object tracking [1].With recent advances in sensor technology, such as LiDAR and radars with higher resolution, a single target can generate several measurements.This enables algorithms that can determine the size and shape of a vessel, known as the target extent.This is of particular importance in constrained environments, such as inland shipping.However, multi-object tracking has traditionally used the point approximation to simplify the target tracking problem.It consists of two assumptions, the first being that an object is assumed to only generate a single point measurement and the second that each measurement is assumed to only be generated by a single object.This approximation does not allow an estimation of the target extent.In addition, in order to not violate the first assumption when using high-resolution sensors, the use of a clustering method is needed such that several measurements are combined to form a single target detection.This could introduce further errors by violating the second assumption [2].Multiple extended object tracking instead makes use of extended object tracking methods, where the extent of a target is estimated using all available measurements, enabling estimation of the extent of a target in addition to the kinematic states.However, this makes the target models more complex [3].There are several different target models to choose from but one model, the Gaussian process (GP) model, is particularly suited to LiDAR data since it can model measurements that originate from the contour of a target.It is also able to model complex shapes [4].Both regular multi-object tracking and multiple extended object tracking methods have been used in a maritime context [5][6][7], but they have not been compared directly.In this paper, we compare the two approaches in a scenario where extended object tracking is relevant, using LiDAR data.The primary research question is if the tracking performance can be improved by explicitly modeling the target extent, instead of simplifying the data by using clustering.For comparison, we use a newly developed Poisson Multi Bernoulli Mixture (PMBM) tracker based on the Gaussian process target model [5], which can estimate an arbitrary ship shape, and the Joint Integrated Probabilistic Data Association (JIPDA), which has been used in several other works for tracking ships [6,7].To compare the ability of both methods to estimate the extent, we equip the JIPDA with a simple method for estimating the target extent, based on the measurement cluster associated with a target.

Background
We first give a brief overview of multi-object tracking and the relation between the JIPDA and the PMBM filter.Then we give a brief overview of extended object tracking.

Multi-object tracking theory
Multi-object tracking methods are generally probabilistic, and the joint probabilistic data association (JPDA) is arguably the simplest example.The JIPDA adds the ability to model the existence probability of a target along with the target state.Both of these have been heavily used in the multi-object tracking community [1,6,7].
The current theory of multi-object tracking is based on Random Finite Sets.The stateof-the-art is the PMBM filter, which is the direct solution of a multi-object Bayes filter [8].It tracks both the set of undetected targets via a Poisson point process (PPP) and the set of detected targets via a multi-Bernoulli mixture (MBM).The multi-Bernoulli mixture can represent multiple data association hypotheses through the weights of the components of the mixture.The PMBM framework has also been used in extended object tracking to derive a multiple extended object filter [9].
It has been shown that the JIPDA is a form of the PMBM provided certain assumptions are made [6].The first assumption is that the Poisson component representing unknown targets is considered stationary in the JIPDA.The second is that the JIPDA performs mixture reduction to reduce the number of data association hypotheses to a single hypothesis, which is a weighted representation of all the feasible data association hypotheses.The JIPDA therefore can be considered a Poisson multi-Bernoulli (PMB) filter.In contrast, the PMBM filter can theoretically maintain and track all possible hypotheses.However, in practice mixture components with low weight are pruned, such that only the most likely data association hypotheses are retained to limit the exponential growth of hypotheses.

Extended Object Tracking
Extended object tracking methods allow targets to generate more than one measurement, which enables the ability to estimate the extent of the target in addition to the kinematic parameters.This requires a model for the extent and how it relates to the measurements.An example of such a model is the GP model, which defines the extent as an arbitrary closed shape.This is achieved by defining a radial function for the shape, where the radii are modeled by the GP.This model was first presented in [4], where a state space model was defined as where x c k is the position of the centroid of the target from which the extent is defined, ϕ is the target heading, and x * k represents any additional kinematic states of the target.In the original paper, these are the velocity in each direction in 2D and the angular velocity φ. x f is the representation of the extent of the target, in the GP model, this is a vector that specifies the radius of the shape at a specific equidistant angle.We use this same state space model in this paper.
For this state space, we can define the following description where z k , h k (x k ) and R k are all augmented vectors or matrices with the size corresponding to the number of measurements generated by one scan of the target.For a single measurement, the measurement model can be written in the form where z l k is the measurement l at time k, θ l k is the corresponding angle of the origin of the measurement of the target contour and f is a function describing the radius for each angle.In the GP model, the specific form of this function is given by Gaussian process regression.θ l k can be expressed both in a global frame θ l k (G) and the local target frame θ l k (L) as The detailed derivations are available in the original paper [4], but it should be noted that the measurement model has two important properties.It has a non-linear dependence on x c and ϕ k and it is an implicit equation due to the dependence of . This implies a need for non-linear filtering techniques.

Extended Object PMBM
The extended object tracker used is a PMBM tracker using a GP target model (henceforth referred to as the GP-PMBM tracker) and it was previously presented in [5].The state space of a target consists of the kinematic state and the extent state, provided by the GP model.Furthermore, the state space also contains parameters of a gamma distribution, which models the Poisson rate of the expected number of measurements for each target.To handle the nonlinearity of the GP model, an iterated extended Kalman Filter (IEKF) is used to improve the linearization, and a special criterion is used to initialize the optimization to make it more robust.The birth density is defined using a mixture representation of a PPP intensity, where the mixture components are spread along the edge of a circle to provide full coverage of the surveillance area.Gating is used to reduce the number of possible measurement-to-target associations and the stochastic optimization method [10] is used to find the most likely subset of those associations.

JIPDA
The JIPDA implementation used is the same as the one that was used in [6].This implementation also includes an interacting multiple-model component and a visibility component.However, neither of these components are used in order to make the comparison fair, since the extended object tracker does not have these components.The motion model used is a constant velocity (CV) model with low process noise.Similarly to what is done in the GP-PMBM, gating is used for all current targets as a first step to reduce the number of data association hypotheses.
After that, tracks that share measurements are clustered and processed together.If the number of measurements or targets is low, the hypotheses are enumerated manually.If the number is higher, Murty's algorithm is used to find the most likely subset of associations.To initialize new tracks, measurements that are not gated are used as the initial point of a new track.To adapt the JIPDA to high-resolution LiDAR data, euclidean clustering is used as a pre-processing step to merge measurements that are close to each other, so that the assumption that a target generates a single measurement can be enforced.The clustering is done so that the distance between each measurement in a cluster is no more than 5 meters, and uses the fclusterdatamethod from the SciPy library [11].To generate an extent estimate, we keep track of the most likely measurement cluster for each target, i.e., the measurement associated with the highest weight prior to mixture reduction for each timestep.Then, as a post-processing step, we use this measurement cluster to generate a convex hull using the measurements in that cluster, using the matlab function convhull [12].This convex hull is then used as an estimate of the extent of the target.

Simulation Study
In this section, we present the result from a Monte Carlo simulation study comparing the performance of the two methods.

Simulation Scenario
The scenario consists of four ships approaching from the edge of the surveillance area and traversing it to the opposite end while performing a turning maneuver.See Fig. 1 for a detailed view.The scenario lasts for 250 timesteps and vessels appear two at a time at the edge of the surveillance area at timestep 20 and 40.The target vessels are 6 m long, and 3 m wide and they have a pointed bow where the full width is achieved 2 m behind it.The measurements are generated by simulating a LiDAR with a maximum range of 100 m, angular resolution 0.25 • and a modeled radial accuracy of 0.1 m.Measurements are only generated if they hit a vessel and only one measurement is generated per angle.This simulates occlusion since a ship that is behind another ship from the perspective of the sensor will not generate any measurements.In addition, clutter is generated using a PPP with rate λ c = 20 and a uniform spatial distribution.
The results are averaged over 100 Monte Carlo simulation runs.

Parameters
The PMBM parameters are chosen as follows: probability of detection P D = 0.9, probability of survival P S = 0.99, and clutter rate λ c = 20.The gating probability, which determines the size of the gate used to reduce the number of measurement-to-target associations, is set to P G = 0.99, the pruning parameters are 0.01 for the existence probability, 0.01 for PPP mixture components, and 0.01 for multi-Bernoulli mixture components.A target is assumed to exist if the existence probability is higher than r = 0.5.Both target models use σ c = 0.2 m as the noise parameter for the CV model and the GP model uses σ ϕ = 0.1 rad/s as noise for the constant angular velocity model.Furthermore, the GP model uses σ r = 0.3 m for the measurement noise.
The window length used in the gamma prediction step is 20.For the GP target model, we use 9 test angles to parametrize the extent and the hyperparameters are σ f = 0.5 m, σ r = 0.5 m, σ n = 0.001 m, l = π/4 and the forgetting factor α = 0.01.The maximum amount of IEKF iterations is 50.The prior value of the gamma distribution is α 0 = 1000 and β 0 = 100.The covariances of the birth density are inflated to ensure coverage of the whole circle: the positional component is 20 m, the velocity components are 3 m/s, and for the GP model the heading component is π and the angular velocity is π/4 rad/s.In the case of the extent, for the GP model the prior is given by the covariance function.For the JIPDA, the measurement noise strength is σ r = 3 m and the constant birth intensity is 10 −6 m −2 .The initial covariance of the velocity is 10 m/s.The threshold for confirming a new track or terminating an old one is 0.999 and 0.001 respectively.

Performance Evaluation
To compare the performance of the trackers, the generalized optimal sub-pattern assignment metric (GOSPA) [13] is used.It provides a metric for the performance of a multi-object tracking algorithm by incorporating localization error, missed targets, and false targets into a single metric.However, due to the different state spaces of the different target models, the distance measure is only comparable between the shared states, namely the position and velocity.Therefore, these states are used to calculate the localization error.The GOSPA cut-off c and power p were set to 10 and 2, respectively.To compare the extent estimates of the target models we use the process of associating estimates to targets to calculate an Intersection-Over-Union (IOU) metric, which is a common metric to compare methods for extent estimation [4,14].

Results
The metrics are presented in Table 1.We can note that the results are relatively close in terms of the overall GOSPA score, but that the contribution from different sources varies between the two methods.The GP-PMBM tracker shows superior performance with regard to the localization error, and consequently also the IoU metric.The most likely explanation for this is that the clustered measurements are centered on the contour of the ship, which does not correspond to the centroid.The GP-PMBM can account for this disparity due to the assumption that measurements are generated by the object contour, whereas the JIPDA can not.On the other hand, the JIPDA has an overall better performance in the birth process, with a lower false alarm rate.This indicates a struggle on the part of the GP-PMBM tracker in an environment with a larger amount of clutter measurements, especially the termination of false tracks.Fig. 2 shows the evolution in time.We can see that the JIPDA, in particular, struggles after timestep 150, indicating an inability to cope with the occlusions and maneuvers that occur when the targets meet in the simulation.The evolution throughout the simulation run for selected metrics as an average over the Monte Carlo runs.

Experimental Test Data
In this section, we present results from real LiDAR data gathered from tests in the Trondheim canal, which utilized the two autonomous ferry platforms milliAmpere and milliAmpere2 [15].

Test scenario
We present two separate scenarios, one with a single vessel performing maneuvers in front of the sensor in the canal (see Fig. 3a) and one with two vessels traveling in separate directions in the canal and passing each other (see Fig. 3b).These scenarios were also used in [5].The data from the first scenario were gathered using milliAmpere2 which is equipped with two Ouster OS1 32 LiDARs, the point clouds from the two LiDARs were combined and the returns from land and static obstacles along the canal were filtered out using manual land masking.The target vessel was a small motorboat that was 2.8 m long and 1.6 m wide.Furthermore, the point cloud was transformed to 2D by only retaining the point closest to the sensor in each angular resolution sector.The second scenario was originally published in [7] (as scenario 13).The target vessels were two different motorized vessels that were both 7 m long and 3 m wide, see the original paper for details.Note that for this scenario the ground truth data gathered was only positional GPS data without heading.

Parameters
Most of the parameter values are the same as those in the simulation study.However, for the GP-PMBM tracker, some parameters are modified.The range used to define the birth density is reduced to 40 m and 60 m respectively, due to the observed range at which the LiDARs were able to detect the target vessels.For the second scenario, α 0 , the initial parameter for the gamma distribution governing the expected number of measurements, is set to 500 to account for the lower sensor resolution.The extent priors are set such that the length and width of the prior are roughly equivalent to the target vessels, but the same prior is used to represent both ships in the second scenario.In addition, some tweaks are made to attempt to mitigate some observed effects that are not modeled.To account for wake clutter, the clutter rate is increased to λ c = 60 and λ c = 100 for the first and second scenario respectively, while the gating probability P G is set to 0.95 for the same reason.Finally, to account for errors related to sway affecting the pitch of the LiDAR sensor, measurement noise strength σ r is set to 0.5 m.The JIPDA uses the same parameters as in the simulation study.

Performance Evaluation
We use the same metrics that were used in the simulation study, with the ground truth data gathered used to calculate the metrics.For the first scenario, ground truth was measured by using a dual antenna inertial navigation system, and the extent of the vessel was measured manually with the LiDAR data to compare the estimated extent with the ground truth.For the second scenario, because only positional data was available, the heading was inferred from the velocity vector, which is a significant source of error for the calculation of the IOU metric.

Results
The first scenario contains only a single vessel, performing some complex maneuvers.The metrics in Table 2 show superior performance for the GP-PMBM in both the GOSPA score and the IOU metric.This is due to the difference in localization error, verifying the simulation results.Looking at the evolution of the run over time in Fig. 4, we can see that the JIPDA is not coping as well with the maneuvers.This causes peaks in the localization error around timestep 700 and 900, which is when the maneuvers happen.The second scenario is more complex, as it contains two targets, with one target being occluded by the other.The metrics are given in Table 2 and the evolution over time is shown in  Fig. 5.Here the result is the opposite and the GP-PMBM has the worst GOSPA score with a higher localization error.This is primarily due to the disruptive effect of the wake clutter on the GP model which was highlighted in the original paper presenting the GP-PMBM tracker [5].
The JIPDA is less sensitive to wake clutter due to the wake measurements being masked by the use of clustering.However, judging by the peak around 1200, it is nevertheless affected.However, the relatively low value of the IoU metric for both methods compared to the other cases shows that the JIPDA's superior performance is more due to the failure of the GP-PMBM method, and both methods have potential for improvement.

Computational time
Since the implementation of these algorithms is in two different programming languages, being matlab for the GP-PMBM tracker and Python for the JIPDA, a direct comparison between the computational time of the two algorithms is not possible.Nevertheless, the JIPDA was orders of magnitude faster than the GP-PMBM.For the simulated data, it was approximately 10 times faster and for the experimental data, it was 100 times faster.expected and there are many reasons for this.One reason is the use of clustering in the JIPDA, which reduces the number of measurements processed by the filter, in some cases significantly.The other reason is the use of the Gauss-Newton optimization in the GP-PMBM, which in effect means that the Kalman filter update step will be run several times.An additional factor that influences computational time is the number of association hypotheses that are found and retained in the PMBM filter, which in this case depends on the parameters of the stochastic optimization algorithm and the pruning parameters.However, if computational time is an issue, both algorithms could probably be significantly sped up.For instance, multi-object tracking is a highly parallel problem, and parallelization could be utilized to speed up both methods.

Conclusion
This paper has presented a comparison between a point-based method (JIPDA) and an extended object method (GP-PMBM) on maritime data.The results show that the GP-PMBM method has superior performance on simulated data and some real data, as shown by the IoU metric and the GOSPA score.This shows that there is a gain in performance to be had by utilizing extended object tracking methods.However, on real data with a lot of wake clutter and poorer resolution, this gain disappears and the JIPDA shows superior performance.This is due to the fact that the JIPDA is not as affected by the clutter since the clustering can partially mask it.However, the relatively low value of the IoU metric for both methods shows that neither method handles the wake clutter well.In conclusion, the extended object tracking method has the potential to provide more accurate tracking of other vessels, if all the measurement sources can be modeled correctly.If the sensors will pick up measurements from sources that are not properly modeled, such as wake clutter, a point-based method along with clustering is more robust.Therefore, in a situational awareness system for an autonomous inland vessel, a pointbased method might be preferable until all measurement sources can be reliably modeled.An avenue for further research is therefore to look at improving both methods such that they are better able to deal with wake clutter, particularly for the GP-PMBM method, since it seems to

Figure 1 :
Figure 1: Visualization of the simulation scenario, along with the extent and measurements for a single timestep

Figure 2 :
Figure 2: The evolution throughout the simulation run for selected metrics as an average over the Monte Carlo runs.

Figure 3 :
Figure 3: Visualization of the test scenarios, along with the extent and measurements for three different timesteps.Note the measurements generated by the wake, as well as the occlusion in the second scenario.

Figure 4 :
Figure 4: The evolution throughout the single target test run for selected metrics.

IOUFigure 5 :
Figure 5: The evolution throughout the multi-target test run for selected metrics.

Table 1 :
Mean value of metrics for the simulated scenario

Table 2 :
Mean value of metrics for the real LiDAR data These numbers are as