A multi-sensor indoor tracking system for autonomous marine model-scale vehicles

Attitude estimation is a popular topic in marine engineering and robotics; the position and orientation of a vehicle are required as feedback from several control algorithms to improve autonomous navigation capabilities, such as dynamic positioning, track keeping, and autodocking. Typically, position and heading angles are provided by the Global Positioning System and compass. Usually, during the development and testing, the experiments are performed in a controlled environment, such as an indoor test tank. However, Global Positioning System systems can be unreliable due to non-negligible model scale errors or the absence of line-of-sight with the satellites. This article presents an experimental tracking system setup suitable for indoor testing facilities. In particular, the paper presents a tracking system based on a GigE camera and ArUco markers detection and a LiDAR-based tracking system relying on unsupervised machine learning techniques. The MQTT broker-based publish/subscribe message-queuing protocol allows real-time data communication and sharing. The proposed system was developed, installed, and tested in the COMPASS laboratory (University of Genoa). The two tracking systems’ outcomes have been compared. Eventually, an accuracy analysis was performed by comparing the results to the ground truth in purpose-built experiments. The proposed approach can estimate the degrees of freedom of a self-propelled model-scale vessel in an indoor testing facility without requiring active or powered markers and share the information acquired with multiple entities in real-time at a high frame rate.


Introduction
Developing, testing, and validating guidance and control algorithms is a significant topic in marine engineering and marine robotics.Dynamic positioning [1], track keeping [2], target chasing [3], and auto docking/undocking [4] are challenging tasks to be achieved, leading to an increase in safety and vessel capabilities.Such algorithms, generally implemented at the guidance and control module level, require as feedback an effective attitude estimation from the navigation module that has to provide a reliable evaluation of the vessel's degrees of freedom.In full-scale application, position and heading are provided by Global Positioning System (GPS), Inertial Measurement Units (IMU), and Gyro-compass.However, before moving to full-scale sea trials, developing and testing an algorithm to solve the maneuvering problem needs model-scale indoor testing where the traditional sensors suite is unusable [5].The need to provide guidance and control modules with a surrogate for navigation feedback data in a controlled environment for model-scale testing drove the proposed system's conception, development, and installation.Moreover, using a multi-sensor system allows the performance of each sensor to be evaluated.Eventually, an external DOF estimation system limits the number of sensors needed onboard and the resulting power supplies since some model-scale vessels may have limited payload [6].
Figure 1 shows the general pipeline of the proposed multi-sensor indoor tracking system.In particular, the sensing layer is entrusted to a 3D spinning LiDAR and a monochromatic GigE Camera, reducing to model scale a possible coastal setup consisting of a land radar and GPS.Each sensing data is processed independently regarding detection, combining both Unsupervised Learning for LiDAR data processing and Supervised Learning for image processing, relying on suitable fiducial markers [7].The position and heading angle of the model-scale target are estimated, and the output is made available via the MQTT broker-based publish/subscribe message-queuing protocol achieving real-time data sharing with all possible subscribers (e.g., guidance and control algorithms).The following sections will describe the two pipeline branches: camera and lidar detection, the experimental setup, and the data-sharing system.The results provided by the system during two maneuvers of increasing difficulty by remotely controlling a model-scale vessel are shown.Moreover, a purpose-built experiment comparing the system outcome to the ground truth values is illustrated, allowing the assessment of the position estimation errors of the system's two branches.

Camera detection
This section presents the approach to camera data processing.In particular, the whole process is achieved using OpenCV libraries [8] and the original ArUco marker series [9].Placing two ArUco markers on the scale-model vessel allows for a straightforward detection process; then, the position of the detected markers is estimated relying on camera calibration parameters and the pin-hole camera model, obtaining the position and the heading angle of the model-scale vessel in the time domain.

Pin hole camera model and calibration
Mapping 3D points on a 2D image plane and the reverse operation is a widely documented activity in the literature.Several camera models are available.For the case study, the pin-hole camera model has been adopted [10].Model calibration using a dedicated checkboard based on ArUco marker (known as ChArUco board) has been carried out to obtain the camera's intrinsic matrix (or camera matrix).However, the camera matrix directly obtained using such a model does not consider lens distortion.For this reason, the performed calibration also includes the radial and tangential distortion [11].The calibration process allowed computing the intrinsic parameters and distortion coefficients.
2.2.2D to 3D: ArUco marker detection and pose estimation An ArUco detector algorithm is used to detect the markers placed on the model scale vessel.In particular, the dedicated OpenCV function makes marker corners and IDs available.Then, exploiting the calibration parameters, a pose estimation of the marker is performed; for each marker, a rotation matrix and a translation vector are computed, allowing for the passage from the 2D image plane to the 3D reference system of the camera.

3D to 3D: pose estimation in world reference frame
The points in the camera reference system need to be referred to the world reference system (or test-tank RS) to make the output suitable for further applications.Since the installation circumstances do not allow a precise estimation of the position and pose angles of the camera, the pose computation problem is achieved by relying on the Perspective-n-Point (PnP) procedure [12].Markers at known positions in the world reference system have been framed by the camera so that the world reference system coordinates and the pixels reference coordinates of a set of points have been made available.Then, the PnP computation procedure has been used to obtain the rotation matrix R wc and translation vector t wc that relates the points in the world system X w to the same points expressed in the camera reference system X c .In particular:

Real-time ID tracking
A well-known problem in the literature regarding tracking algorithms is the track ID association.This step is crucial for multi-target applications to robustly track a target and eventually predict its motion based on the time history thus reconstructed.
As far as the case study is concerned, the presence of the ArUco markers avoids using a sophisticated tracking algorithm and thus allows reducing the computational load.ArUco marker detection associates each marker with the corresponding ID of the series they belong to.It is, therefore, straightforward to associate the target with the track as long as different markers are placed on different objects.

LiDAR detection
LiDARs are becoming popular in various fields of application, often employed as perceptual systems for situational awareness of autonomous vehicles or environmental surveillance purposes.For the case study, the sensor is used as a fixed acquisition system to reconstruct the test tank scenario and detect targets within the point cloud.In particular, the real-time parsed point cloud is analyzed using an unsupervised machine learning clustering technique to obtain the position of the target objects i.e., the model scale vessel.

LiDAR pose estimation
The point cloud provided by LiDAR is expressed in the sensor reference system.Once the sensor has been accurately installed on the tank side, the rotation matrix R lw and translation vector t lw are derived to express the point cloud in the world reference system.In particular: Where L is the n-by-3 matrix of the three spatial coordinates for the n points caught by the LiDAR expressed in the sensor reference system; thus, L T i represent the column vector of the three spatial coordinates of the i th point.W is the n-by-3 matrix of the three spatial coordinates for the points expressed in the world reference system; thus, W T i represent the column vector of the three spatial coordinates of the i th point.

Noise filtering and point-cloud cutting
The LiDAR sensor provides a three-dimensional point cloud that reconstructs the surrounding environment.In the case study, the sensor provides an oversized point cloud, i.e., detects points outside the domain of interest.In particular, the focus is placed exclusively on the boundaries of the test tank, while acquiring outside it provides objects of no interest (e.g., walls or furniture), unnecessarily increasing the amount of data to be processed.For these reasons, the cloud points exceeding the spatial limits of the test pool are filtered out.Moreover, considering the point position and intensity, additional noise filters were imposed to compensate for water acquisition errors, as presented in [13,14].

Target detection
Many methods are presented in the literature to detect targets based on point cloud data.Contrary to image-based target detection, where supervised machine learning is used extensively, unsupervised machine learning techniques are generally used in point cloud data processing, including LiDAR data.In particular, point cloud clustering techniques aim to identify groups of points that share common patterns [15].Many clustering algorithms have been developed over the years, differing mainly on the grouping criterion; for the case study DBSCAN clustering algorithm [16] has been selected due to its high outcome accuracy.The clustering analysis associates each point to a cluster, dividing the point cloud into groups, where each group represents a target.

Principal Component Analysis
The association of a bounding box with the groups of points identified by the clustering analysis is a crucial operation to extract useful information.A Principal Component Analysis (PCA) is conducted on each group of points to identify the most significant variance directions.The principal component directions are then used to orient a rectangular bounding box.Eventually, the center of the bounding box is used to obtain the target position in place of the point cloud cluster centroid.This technique improves the position estimation since it mitigates the shift of the centroids towards the first impact surfaces with LiDAR channels [13].

LiDAR centroids real-time tracking
The ID-track association problem is generally entrusted to tracking systems based on predictive filters.For the case study, since the marker's unique identification code is available at the output of the camera processing, there is no need to resort to such filters, and the association process is straightforward.Because of the system's modular structure, the possibility of using an independent tracking module for LiDAR, based on the Global Nearest Neighbours approach [17,18] was also envisaged.

Experimental set-up
The presented work focuses on the setup, experimental validation, and use of the proposed DOF estimation system.For this reason, this section describes the experimental setup, the sensors used, and their positioning in detail.In particular, the experimental setup is presented in Figure 2, where the position of the sensor, connection, and reference system is illustrated.The test environment consists of a 4.88m x 4.88m tank framed by the sensor suite; data are collected from a dedicated workstation.

. LiDAR specifications and setup
The 3D mechanical LiDAR sensor used for the experimental campaign is a HESAI Pandar XT-32 (Figure 3) which provides 32 equally spaced infrared laser beams for an operating range of 120m; the vertical field of view is 32°by 1°of resolution while the horizontal ensures 360°FOV (Field of Vision) by a tunable resolution (depending on rotating speed), which is set to 0.36°for the case study.Considering a trade-off between the solidity of the positioning and the ability to capture the test tank adequately, the LiDAR was mounted on the edge of the tank, in a fixed known position, using dedicated support printed using additive manufacturing techniques.The connection is achieved by ethernet cable and UDP data protocol, while the power supply has a dedicated channel.

Camera specifications and setup
For capturing images, a monochromatic camera is placed above the test tank, exploiting the structure of the hosting building.In particular, a Basler ace GigE Camera acA1920-40gm has been selected; it is equipped with the Sony IMX249 CMOS sensor and an 8 mm lens, providing 1920 x 1200 pixels grey-scale images at 42 fps.Data transmission and power supply are achieved using Power over Ethernet (PoE) via a dedicated powered network card in the workstation.

Self-propelled model-scale target
The target chosen for the experimental campaign is a teleoperated, self-propelled model-scale vessel, SWAMP (Shallow Water Autonomous Multipurpose Platform) [19], available at the COMPASS lab in UNIGE facilities.It is a catamaran double-ended soft hulls modular platform, driven by an innovative system composed of 4 pump-jets designed and tested by CNR-INM laboratory [20]; hydrodynamic tests were performed in DITEN Hydrodynamic Laboratories towing tank (University of Genoa).Figure 5 shows a perspective rendering of the vessel.For the experimental trials, the SWAMP-class vessel was equipped with two original-series 200mm x 200mm ArUco markers on the top of the payload deck, as shown in Figure 6 and teleoperated from a remote control station to perform any desired maneuver.In particular, markers 1 and 10 uniquely identify the stern and bow of the vessel, respectively, and, with proper positioning, the midpoint of the segment joining the centroids of the two markers represents the vessel's center.

Communication middleware
Sharing sensor data with multiple entities is a crucial part of the setup.Those entities work as nodes or clients of an IoT system and might need some or all the measured data for different purposes, such as logging, processing, filtering, and control.For instance, the dynamic positioning system of an autonomous vessel model needs pose estimation as a feedback signal to control the actuators.For this reason, the communication middleware needs to feature scalability, modularity, and platform and language agnosticism.Typically, IoT systems rely on a publish-subscribe data distribution model, where some network entities, called publishers, produce data and others, called subscribers, receive it.Publishers and subscribers operate independently and are unaware of each other's information.MQTT (Message Queuing Telemetry Transport) [21] is a lightweight publish-subscribe messaging protocol for efficient communication in IoT systems.MQTT is broker-based since the data distribution is achieved by a broker that receives all messages from the publishers and then routes the messages to the subscribers.MQTT's main features are high efficiency and support for low-power devices in IoT applications, such as the testing setup described in this paper; moreover, it is a consolidated and well-established technology.
The open-source MQTT broker Mosquitto [22] was adopted for the presented case study.Broker-based communication protocols, however, present some reliability issues due to the   presence of a single point of failure: if the broker fails, the whole data communication system fails.Even if this can be considered a minor issue in an indoor testing framework, it can be a limiting factor when operating in outdoor contexts where high autonomy is required.For this reason, as well as possible performance improvements, broker-less protocols, such as DDS [23,24], or ZeroMQ [25,26], are currently under evaluation as alternatives.

Results
An extensive experimental campaign was conducted to test the performance of the proposed system.The results of only two relevant acquisitions of the SWAMP-class teleoperated modelscale self-propelled USV equipped with ArUco markers have been reported for brevity.In particular, the first set of figures (Figures 7, 8, 9) represents a turning maneuver following a U-shaped trajectory, while an almost chaotic motion with considerable changes of direction is presented in the second set of figures (Figures 10,11,12).Figures 7 and 10 show the trajectory of the model-scale vessels in the test tank reference frame; in particular, blue markers identify the centroid of the bounding box obtained from LiDAR data, green and red markers identify the bow and stern centroids of the ArUco marker, respectively.Figures 8 and 11 show the distribution of the difference of the evaluation of the plane coordinates of the centroid of the vessel, calculated by subtracting the coordinates estimated with LiDAR from those estimated with the camera; Figures 9 and 12 present the same position difference in a box plot, where the five representative parameters of the distribution are evaluated, considering the presence of outliers (plotted in green) due to acquisition errors.
The error distribution is roughly centered on zero; however, a non-negligible spread of the position difference is present.The points acquired by LiDAR necessarily lie close to the surfaces of the first impact along the line-of-sight between the sensor and the acquired target, giving rise to shifts in the centroid evaluation.
Eventually, a tailored experiment was conducted to compare the results obtained to a ground truth value.In particular, a target was kept at a constant distance of 3.95 m along the xaxis of the tank reference frame and left free to move along the y-axis.Thus, the target was acquired and tracked with the perceptive system, comparing LiDAR and Camera results with the available ground truth value.Results are reported in Figures 13,14,15.In particular, Figure 13 shows the trajectory of the centroids evaluated with the camera (red markers) and with the LiDAR (blue markers).Figure 14 shows the error against the ground truth value on the x coordinates; finally, the same results are presented in a box plot in Figure 15.
The analysis of the obtained results indicates that, for objects of this specific size and shape,      the camera proves to be a more dependable sensor for precise position estimation.The superior performance of the camera can be primarily attributed to the LiDAR-based method's intrinsic calculation approach, where position determination occurs at the center of the bounding box constructed from the point cloud data.Consequently, this calculated position tends to be shifted towards the surfaces of initial impact, namely those closest to the sensor, as the laser radiation cannot propagate further.However, it is essential to highlight that, in the case of smaller targets, this positional shift experienced by the LiDAR-based method reduces significantly, resulting in enhanced accuracy.This advantage arises from the reliance of the LiDAR-based procedure on a physical phenomenon -specifically, the measurement of time elapsed between the emission and reception of a laser pulse -rather than being contingent on a mathematical model, as observed in the case of the camera-based approach.

Conclusions
In this paper, the authors presented a multi-sensor attitude estimation of model-scale vessels overcoming the issues of a traditional proprioceptive sensor suite (GPS, compass, IMU) in indoor testing applications due to model scale errors or line-of-sight limitations.In particular, a multisensor tracking system based on a GigE camera, ArUco marker detection, and unsupervised learning LiDAR detection has been developed, installed, and widely tested.The tracking systems were integrated using the MQTT broker-based publish/subscribe protocol, enabling real-time data sharing among multiple entities.Results demonstrate the capability to estimate the degrees of freedom of a self-propelled model-scale vessel in the indoor testing facility without needing active or powered markers.Moreover, the information is acquired and shared at a high frame rate, providing a valuable tool for achieving DOF estimation feedback in the test tank.Eventually, the results achieved provide a quantitative evaluation of the accuracy of the centroids estimate relying on LiDAR and camera detection.

Figure 2 .
Figure 2. Test tank experimental set-up