Current status and analysis of the development of SLAM technology applied to mobile robots

Mobile robots are tasked with jobs that assist or replace human work. With the development of technology, mobile robots have had a significant impact on human productive life. As for mobile robots, the core technology is the location awareness of itself. Therefore, this paper analyses the current development state of simultaneous localization and mapping (SLAM) technology applied to mobile robots. First, the overall process and key technologies were explained in detail. Subsequently, the classical algorithms developed in SLAM technology so far are classified based on the characteristics of information-aware devices in terms of methods as well as a more detailed description and comparative analysis of the algorithms. Finally, this paper analyses the current development status of SLAM and the remaining problems from the key performance indicators of SLAM technology, and interprets the future development trend of SLAM technology applied to mobile robots.


Introduction
Humans have long been hoping to create a robot that can have certain functions and even certain intelligence to replace some of the work of humans.In the process of technological development, the autonomy, flexibility and wider application space brought by mobile robots have led human beings to continuously devote themselves to research and improve the mobile function of robots in order to enhance the performance of mobile robots.Especially in the years of the COVID-19 outbreak, most of the industries showed stagnant development due to the epidemic, while mobile robots bucked the trend of rapid development in the epidemic due to its replacement of manual contactless work.
Technology has evolved to the point where extended functions and applications for mobile robots in various fields have been widely developed, such as self-driving cars, bipedal robots, drone operations, intelligent floor sweeping robots, etc.In the same way that human beings rely on their senses to input information from the outside world and then decide how to move based on the external environment and their own position, robots can make decisions and generate actions only if they have the correct knowledge of the external environment and their own position.Therefore, simultaneous localization and mapping (SLAM) technology is an important technical support for intelligent robots to complete complex tasks.SLAM, which stands for Simultaneous Localization and Map Construction for Robots, was introduced at the 1986 ICRA in San Francisco, USA.This is a solution to the problem of establishing the environment and estimating its own motion while the robot moves in a state without an a priori map and unknown position.Three stages can be distinguished in the development of SLAM: the classical age, the algorithmic-analysis age, and the robust-perception age [1].
This paper summarizes the existing algorithms for SLAM technology applied to mobile robots and analyzes the future development trends.Firstly, the basic algorithm structure and technical features of SLAM technique are introduced in Section 2.Then, in Section 3, SLAM techniques are classified based on information-aware device characteristics, while SLAM approaches are introduced and comparatively analyzed from three different aspects.Subsequently, the previous paper is summarized in Section 4, and the current status of new research in SLAM technology is summarized and outlined in terms of both sensors and application algorithms.

The basic process of SLAM
The basic process of SLAM is mainly divided into five links.Figure 1 depicts the overall flow of this system.

Figure 1. The basic process of SLAM.
As shown in figure 1, multiple sensors are first used for source data acquisition.Next, the sensor data is initially processed by the front-end odometer module.The back-end optimization module is then used to reduce the cumulative error in the front-end odometer calculation process.Finally, the map is constructed from the data sensed in real time by the robot motion and the results obtained from the calculation.A loopback detection module is added to the front-end odometry and back-end optimization process for further error detection and correction.The five main components of the process are described below.

Sensor data
Sensor data is the data read by the sensor, which usually consists of LIDAR, camera, and IMU.The SLAM methods can also be roughly divided into vision-based methods and LiDAR-based methods depending on the sensors, which will be specifically discussed later in this paper.For sensors of VSLAM, most vision sensors are based on cameras.According to the different ways of working, there are probably several types of cameras, such as monocular cameras, binocular cameras and depth cameras, in addition to cameras with other effects, such as fisheye cameras.The data for Visual SLAM(V-SLAM) is usually a 2D picture, and the depth camera needs to be coupled with depth information to provide 3D datas.
Lidar SLAM detects the environment by emitting multiple rays from the lidar, and like depth camerabased VSLAM, it directly acquires the point cloud data in the environment, and then measures where and how far to the obstacle based on the generated point cloud data.However, except for the depth camera, the other camera sensors get only two-dimensional images.To obtain the point cloud information, it is also necessary to extract and match the feature points by continuously moving its own position and then use the triangulation method to measure the distance of the obstacle.

Front-end odometer
The core function of the front-end odometer is to perform preliminary framing and positioning, which refers to the preliminary processing of the data collected by the robot to calculate the motion relationship of the sensors in adjacent time to solve the motion trajectory.In SLAM, there are two types of sensor data, laser data as well as vision data, and the difference between the two data types leads to the difference in front-end processing methods.
In VSLAM, front-end processing is roughly divided into two types, which are characteristic point method and direct method.The characteristic point method constructs sparse point cloud maps, extracts sparse characteristic points in the image as key points, and completes inter-frame matching by descriptors.Then, the poses are solved using the Epipolar Geometry, PNP or ICP algorithms according to the constraints of 2D to 2D, 2D to 3D or 3D to 3D relationships between the feature point data.The direct method constructs dense or semi-dense maps, and also invokes the idea of optical flow tracking to solve the robot's poses with the optimization objective of minimizing the photometric error under the assumption of constant photometry.
Compared with VSLAM, Lidar SLAM, although the front-end solves the same core problem, the inter-frame matching algorithm solved in the front-end of Lidar SLAM directly affects the effect of Lidar SLAM, so a more detailed classification of inter-frame matching methods has been made.Lidar SLAM methods also have commonalities with methods in V-SLAM, such as characteristic point matching methods, and optimal matching methods that model the nonlinear least squares optimization problem for the laser data matching problem, where minimizing a given target error function helps limit the error accumulation.In addition, there are ICP algorithms and their variants, CSM methods, etc. Table 1 provides a summary and comparison of front-end methods.

Backend optimization
As the robot moves, the environment changes and time passes.Errors in the front-end solution accumulate, which leads to the final mapping and positioning deviating too much from reality.Therefore, the role of back-end optimization is to optimize the initial calculation results of the front-end to obtain the optimal solution to the problem caused by noise.The error is controlled within a small range to obtain better information about the bit pose estimation and generate uniform trajectories and maps.The common back-end processing methods are filter-based back-end and non-linear optimization-based back-end.
Filter-based back-end.It is assumed that there is Markovianity between states, i.e., the current moment state is only related to the previous moment state.The core idea can be described uniformly by a Bayesian filtering model, which iterates and updates the state quantities continuously.The existing common filter methods mainly contain Kalman filter (KF), particle filter (PF), extended Kalman filter (EKF) and so on.
Non-linear optimization-based back-end.The method is to go in the direction of gradient descent to find the optimization variables that minimize the cost function for which the variables are nonlinearly related, and one of the most important in SLAM technology is the graph optimization method.The graph optimization treats the variable to be optimized, i.e., the robot's pose, as Node, and the spatial and interframe constraints as Edge, and then uses gradient descent to minimize the error accumulated by the robot during mapping.Factor graph optimization is constructed as a factor graph, which is made up of variable nodes that represent the optimized variables and factor nodes that represent the factors, while the maximum posterior probability is composed of the product of many factor products.Therefore, the optimization of the factor graph is to adjust the values of the variables to maximize the product of their factors.
When dealing with state estimation problems, filtering-based methods are usually incremental methods that hold an estimate of the current state and update it when new data arrives.In contrast, nonlinear optimization such as the BA algorithm, which is often used nowadays, is usually a batch method that accumulates data and processes it in batches to achieve optimization on a larger scale.The batch method is considered to be more effective than the incremental method, but the real-time performance of such processing is not high, so a trade-off is made and the sliding window method is born.Table 2 provides a summary and comparison of back-end methods.while other Lidar SLAM methods can be classified into Scan-to-Scan detection, Scan-to-subMap detection, subMap-to-subMap detection and other closed-loop detection methods according to the classification of data types.The most popular method is still Scan-to-subMap detection, which uses Branch and Bound method to accelerate the matching speed and combine with Lazy Decision to ensure the accuracy of matching, such as Google's Cartographer.
The more popular loopback detection methods in existing V-SLAM systems are Bag Of Words (BOW), such as ORB-SLAM, VINS-Mono, etc.In addition to this there are Random ferns methods that compress and encode each frame of the camera and efficiently evaluate the similarity between different frames, and deep-learning based retrieval methods that were born in the rapid development of deep learning [2].

Mapping
There are many different types to build overall map based on the actual application and required functions of robots.The maps constructed in SLAM can be divided into three main categories, namely, scale maps, topological maps, as well as semantic maps.
A scale map is to build a map that is consistent with the scale of the actual environment.Semantic maps can be understood as scaled maps with added labels, with more emphasis on descriptiveness between objects and objects in the map.Topological maps, on the other hand, usually represent the connectivity information in a map, including whether it is connected or not and the distance size.Since topological maps represent concise and concise map information, they are more suitable for larger scale maps.

Mono-SLAM based on extended Kalman filter.
Mono-SLAM algorithm is able to quickly recover 3D trajectories in real time when the scene is unknown by monocular camera, which is the first successful use of the SFM method in SLAM.This approach's primary step is to build probabilistic sparse yet persistent maps.Active mapping and measuring, using a general motion model for smooth camera motion, and the suggested approach for estimating feature orientation and initializing monocular features are the primary contributions [3].

Fast-SLAM based on particle filtering and Kalman filtering.
While traditional SLAM solutions are based on estimating the posterior probability of the robot's pose and map, the Fast-SLAM algorithm has the advantage of solving for the robot's path and map.Fast-slam is a localization and map building algorithm based on particle filtering.The SLAM problem is a simultaneous robot pose and map estimation problem given sensor data, but the construction of the map and the solution of the robot pose are usually interdependent.Therefore, in order to solve this problem, the Fast-SLAM algorithm adopts the RBPF method, which divides the SLAM algorithm's two difficulties into two subproblems: the robot localization problem and the problem of creating maps using known robot poses.The RBPF method represents the posterior probabilities of some variables using a collection of particles, while a Gaussian or other parametric probability density is used to represent all the remaining variables [4].

ORB-SLAM based on keyframe feature point method.
ORB-SLAM was proposed by Mur-Artal et al. in 2015, which uses three threads of parallel tracking, local map building, and closed-loop detection compared to the two threads of PTAM.Parallel tracking thread is used to complete the map initialization, feature matching and pose estimation based on model scoring.The local map building part proposes a easy in and strict out key frame filtering strategy and optimizes the constructed local maps.Finally, the closed-loop detection uses the bag-of-words model to determine whether the current scene has ever occurred and implements an optimization of the global map, which has excellent results for application scenarios that require global consistency.ORB-SLAM uses the feature point method in the front-end, and its features consist of two parts: key points and descriptors, where the key points use the modified FAST corner point Oriented Fast and the descriptors are binary described BRIEF points Rotated BRIEF.ORB operator adds orientation information to FAST corner points and improves rotation invariance.the ORB-SLAM algorithm has a time bottleneck in extracting and matching feature points, but has better robustness to rotation, blur, and illumination [5].To address the drawback that the direct method is susceptible to light interference, DSO uses a photometric calibration model, which dynamically estimates the photometric parameters in the optimization, thus making the algorithm more robust to exposure transformations.DSO has higher accuracy, stability and real time compared to LSD algorithm, which reduces the part of loopback detection because it has higher accuracy at odometry [6].

Lidar-based SLAM
3.2.1.Gmapping.Giorgio Grisetti et al. published in 2007 the article Gmapping, a SLAM algorithm based on 2D LiDAR using RBPF method to complete 2D raster map construction, which can also be said to be an improvement over the disadvantages of Fast-slam based particle filter method [7].The Fast-SLAM method poses two major problems, the memory explosion problem caused by needing more particles to get a better estimate when the environment is large or the robot odometry error is large, and the problem of particle dissipation and loss of particle diversity due to particle resampling, both of which make Fast-SLAM impossible to use in practice.Gmapping has two solutions to the problem of reducing the number of particles.The first one is to find the optimal positional parameters of matching particles as the poses of new particles directly by using the great likelihood estimation, and the second one is to restrict the Poisson distribution to a narrow effective region by the scan of the latest frame, then sample the Poisson distribution normally.With less computation in small scenes, improved map accuracy, and a reduced requirement for lidar scanning frequency, Gmapping can create interior environment maps in real time.However, it still needs to occupy a lot of memory when building large scenes.

Cartopgrapher
Based on graph optimization, Cartographer is a set of SLAM algorithms introduced by Google [8].It supports both 2D and 3D Lidar SLAM, can be used cross-platform, and supports multiple sensor configurations such as lidar, IMU, Odemetry, GPS, Landmark, etc.It is one of the most widely used Lidar SLAM algorithms.

LOAM scheme based on feature point matching algorithm. The Lidar Odometry and Mapping in
Real-time (LOAM) family has a pivotal position in Lidar SLAM [9].The first LOAM method was proposed in 2014 by Ji Zhang et al.For the extraction of point cloud features, LOAM does not use the more computationally intensive traditional methods such as feature vectors and rotated images, but instead determines the curvature of a series of parallel points to extract edge points and planar points.In addition, LOAM method also incorporates scan-scan and scan-map methods, combining the advantages of both methods for localization and map building respectively.At the same time, a nonlinear iterative optimization is used after defining the calculation method of the distance residuals during matching instead of the traditional SVD decomposition method.However, the original LOAM does not have the assistance of IMU and does not have loopback detection, thus inevitably causing drift.Along this line of thought, many algorithms such as LeGO-LOAM, LINS, LIO-Mapping, LIO-SAM, etc. have been generated.

SLAM algorithm combined with IMU
Because a single sensor cannot be applied to all scenarios, vision sensors are highly influenced by ambient light and texture, while laser sensors are generally used for indoor distances due to the lidar detection range.Therefore, in order to improve the accuracy as well as the robustness of SLAM, more and more experts and scholars have conducted research on the cooperative sensing of multi-source sensors.IMU is a sensor consisting of accelerometer and gyroscope, which has high accuracy relative to displacement data in a short period of time, but the error increases with the increase of mileage.Therefore, by combining IMU with traditional SLAM technology, the accuracy and application range of localization can be improved to achieve the ideal localization effect.

SLAM combining vision and IMU.
Many scholars have combined vision sensors with IMUs to optimize SLAM, among which VINS-Mono and MSCKF are two of the better algorithms [10], [11].VINS-Mono and VINS-Mobile are monocular vision inertial guidance SLAM schemes proposed by HKUST, which were published in 2017 in the IEEE Transactions on Robotics.VINS-Mono, a real-time SLAM framework based on monocular vision inertial systems, is a classic in the integration of vision and IMU, with positioning accuracy comparable to OKVIS, in addition to a better initialization and closed-loop detection process than OKVIS.Later, VINS-Fusion, which supports multiple vision inertial sensor types on top of VINS-Mono, was born.
Multi-State Constraint Kalman Filter (MSCKF) is a filter-based VIO algorithm proposed by Mourikis et al. in 2007 to improve visual odometry accuracy.By combining IMU and visual data within the EKF architecture, MSCKF addresses the dimensional explosion issue of EKF-SLAM.MSCKF provides more resilience than the straightforward VO technique because it can adjust to more erratic movements, a texture missing for a while, etc.; Compared to the optimized VIO-based algorithms like VINS, MSCKF is comparable in accuracy and faster, so it can run on embedded platforms that have low computational resources.

SLAM combining lidar and IMU.
LIO-Mapping is a paper published in ICRA in 2019.Similar to VINS-Mono, the LIO-Mapping backend is a tight coupling of multiple sensors based on factor maps implemented in the optimization method.From the engineering point of view, this method is a combination of LOAM and VINS-Mono [12].

Analysis and summary of the current situation
This paper introduces the importance of SLAM technology in robotics, and provides a comprehensive classification and summary discussion including the history of SLAM development, basic processes and specific algorithms.The basic process of SLAM mainly includes: sensor based information collection, preliminary composition and positioning based on the front-end odometer, back-end error optimization, loopback detection, and map construction.There are two core components in SLAM technology: the extrapolation of trajectories performed in the front-end odometer, and the global optimization and loopback detection performed in the back-end.Many algorithms are innovating on these two parts.The algorithm of SLAM can be divided into two parts, Lidar SLAM and Visual SLAM, according to the different sensor data.The difference of the generated data determines that although the technical core of both types of SLAM in front-end processing is the derivation of trajectory, both have specific methods suitable for their respective data types.After processing the data at the front end, there is no great difference between the two types of SLAM in terms of specific methods of back-end processing, but different algorithmic ideas and computational volumes have also been derived.Therefore, the basic methods of this part in Lidar SLAM and Visual SLAM respectively are summarized and summarized broadly in this paper.In addition, the classical algorithms in Lidar SLAM and Visual SLAM are also introduced in more details according to the different data types in this paper, which are contrasted with the summary of the basic processes in the previous paragraphs.
Combined with the analysis of available information, the traditional SLAM technology based on a single sensor is being combined with other heterogeneous sources of sensors, thus improving the intelligence of robots, as the demand for robot applications continues to increase and the application scenarios continue to be refined.The authors believe that in the future, multi-source sensor co-perception will be a major development direction.
The indicators of good or bad performance of SLAM can be summarized into three categories: robustness, accuracy and efficiency.SLAM technology has been developed so far, in addition to the research on the accuracy and diversity of sensors in terms of hardware to bring more accurate and diverse data information.Regarding the algorithmic aspects, besides the classical SLAM algorithms that have been performing well on the dataset, many algorithms now also have more diverse research for the three performance metrics of SLAM.For example, DM-VIO, GVINS, EDPLVO, and the CT-ICP algorithm, which is ranked first on the KITTI dataset, have been studied for computational efficiency, degraded environment, or map building and localization accuracy problems to more comprehensively address the application to mobile robots [13][14][15][16].
The performance indicators of SLAM can be summarized into three categories: robustness, accuracy and efficiency.So far, in addition to the research on the accuracy and diversity of sensors in hardware to bring more accurate and diverse data information, it's more about the study of algorithms.In addition to the classical SLAM algorithms that have been performing well on the dataset, many algorithms now have more diverse studies for the three performance metrics of SLAM.For example, DM-VIO, GVINS, EDPLVO, and the CT-ICP algorithm, which is ranked first on the KITTI dataset, have been studied for computational efficiency, degraded environment, or map building and localization accuracy problems to more comprehensively address the application to mobile robots on the ground [13][14][15][16].

Pose estimation in degradation scenarios
The degradation of localization is mainly due to the reduction of constraints, the problem of estimating the poses of feature points in open areas or in single directions that are difficult to find, for example, in open squares or long tunnels.The sensors collect less data or have a lot of duplicate information, which loses some of the useful information and therefore makes it difficult to extract features.This problem affects the robustness of the mobile robot and determines whether the mobile robot can be applied to more complex environments to expand more application capabilities.

Feature learning combined with deep learning
The widely used recognition algorithm in computer vision today is the deep learning algorithm, which has higher recognition accuracy compared with traditional recognition methods.In addition, deep learning can help SLAM combine with semantic information to improve robot service capability and intelligence of human-robot interaction.

Real-time problems on low-calculus processors
Reducing power consumption has been a very important part of SLAM in engineering implementation, and has a large impact on the real-time and portability of mobile robots.Computing devices that can be easily carried by mobile robots are usually smaller in size and lower in arithmetic power, which can reduce part of the cost.But in the case of low computing power, it will bring the problem of not high enough real-time performance.Therefore, there is still a need to balance the trade-off between the two, and there is no two-pronged approach yet.

SLAM based on other sensor data
For different robot application scenarios, different sensors are usually selected in engineering, and the current engineering mainstream is still the combination of vision sensors and IMU.Although the characteristics of the two sensors can complement each other to a certain extent to greatly improve the performance of SLAM algorithm, but also did not completely solve the problems of texture, lighting, etc., data fusion also has certain problems.Therefore, for the sensor diversity and accuracy, as well as the fusion of multi-sensor data problems still need to be studied in depth.For example, depth, light field and event-based cameras, magnetic field sensors and thermal sensors can provide more sensor data information for SLAM algorithms.

3. 1 . 4 .
LSD-SLAM based on semi-dense mapping.The LSD-SLAM proposed by Engle et al. in 2014 implements a CPU-level monocular semi-dense map reconstruction and employs a novel filtering form to take into account the uncertainty of triangulation in the monocular depth estimation.The LSD-SLAM proposed by Engle et al. in 2014 implements a CPU-level monocular semi-dense map reconstruction and employs a novel form of filtering that takes into account the uncertainty of triangulation in the monocular depth estimation, and it uses a direct method in the front-end.LSD-SLAM is less robust to fast camera motion, exposure transformations, and in 2016 Engle et al. proposed another monocular sparse direct method visual odometry DSO (Direct Sparse Odometry).

Table 2 .
Summary and comparison of Back-end methods.
2.4.Lookback detectionIf the robot returns to a point it passed before arriving after moving for a period of time, it is logical that the calculated result should be the same for both arrivals, so the error can be detected and corrected by this condition.In Lidar SLAM, some methods such as Gmapping do not include loopback detection,

Table 1 .
Summary and comparison of front-end methods.