Researches advanced in the SLAM for dynamic environment

Simultaneous Localization and Mapping (SLAM) has always been a popular topic in the computer vision community, as it seeks to forecast the location of agents and use sensors to detect their surroundings in order to construct maps and perform navigation. The majority of SLAM algorithms assume that objects in the environment are immobile or in slow motion, whereas the SLAM system in a dynamic world remains an unanswered question. When constructing a 3D point cloud map by sequentially accumulating scanning data, dynamic objects typically leave undesirable traces in the map. The traces of these dynamic items operate as impediments, hindering the positioning and navigation performance of mobile vehicles. In this work, we discuss the most recent advancements in SLAM in a dynamic context. Specifically, this paper examines three characteristics of the most significant negative effects that the dynamic environment has on SLAM. Then, according to the dynamic degree, solutions are presented for various dynamic items, including the design concept, basic framework, advantages, and disadvantages. We conclude by discussing the existing research difficulties in the subject of dynamic SLAM and its projected development path.


Introduction
Simultaneous Localization and Mapping (SLAM) is a prominent topic in the computer vision world, since it seeks to forecast the location of agents and use sensors to detect their surroundings in order to construct maps and perform navigation.Presently, SLAM is widely employed in the fields of robots, unmanned aerial vehicles, automatic driving, and route planning, which has significant academic and practical relevance.
Classic SLAM frameworks consist primarily of sensor data analysis, visual odometry, optimization, loop closure, and mapping.Reading sensor information pertains primarily to the reading and preprocessing of camera picture data.If present in the robot, reading and synchronisation of information such as code disc and inertial sensor may also occur.The objective of visual odometry is to estimate the camera's mobility between successive photos and the appearance of the local map, commonly known as the front end.The back end gets the camera position and pose measured by the visual odometer at various periods, as well as loopback detection information, and optimises them to produce globally consistent tracks and maps.Due of its connection to the visual odometry, it is also referred to as the rear end.The detection of loop closing determines whether or not the robot has reached the prior position.If a loopback is found, information will be sent to the backend for processing.The map is then created based on the estimated trajectory.The majority of existing visual localization and mapping (SLAM) algorithms assume that the environment's objects are immobile or in slow motion, which imposes stringent limitations on the application environment.This assumption affects the visual SLAM system's applicability in the actual scene.When there are moving items in the surroundings, such as people walking, the system will receive inaccurate observation data, lowering its accuracy and robustness.How to increase the performance of SLAM systems in dynamic environments remains an unresolved question.
Due to the fact that scanning data represents snapshots of the surrounding environment, scanning data in the environment invariably contains the depiction of dynamic objects.Existing SLAM algorithms often divide the dynamic SLAM problem into scan-to-scan matching and scan-to-map refining.This paper first thoroughly analyses the drawbacks of the dynamic environment for SLAM from three perspectives: front-end registration, mapping, and localization based on an a priori map, with a focus on the two kinds of ideas described above.Second, based on the dynamic degree of the scene's items, we describe in depth the available representative solutions.Finally, we provide a summary of the obstacles posed by various dynamic objects to SLAM technology and propose potential future solutions.

Effect analysis of dynamic SLAM
The majority of SLAM's numerous frameworks, methods, and parameters are only usable in specific situations.In a dynamic setting, the resilience of the SLAM architecture is severely tested.Before we can comprehend the "dynamic environment SLAM problem", we must first comprehend: What negative effects does the dynamic environment have on SLAM, requiring us to "fix" it?The undesirable effects generated by dynamic objects in the SLAM system can be categorised primarily on three levels.
(1) Identical scans.No matter which point cloud registration method [1] (point-to-point/point to feature/point to grid/NDT) is based on the static assumption (point-to-point/point to feature/point to grid).Theoretically, dynamic points should affect the precision of registration.In practise, a significant fraction of dynamic points will result in a reduction in track accuracy.At this level, dynamic points can only be discovered and deleted in real-time prior to or during registration.[2] As for identification?The conventional method is to eliminate points that are too distant during the registration iteration process, however the more popular method is to recognise dynamic objects straight from the point cloud using depth learning.
(2) Scan-to-map augmentation.Assume that the interference of dynamic objects on registration is limited and does not affect the track accuracy, but that the final map is full of "ghosts" of a large number of dynamic objects (as depicted in Fig. 1), which will have a negative impact on subsequent map-based positioning or map-based feasible area planning (path planning).Eliminating the dynamic interference created by the dynamic environment on the SLAM mapping and filtering out the dynamic objects in the SLAM map will be crucial to overcoming the effects of the dynamic environment.(3) Location level based on a priori map (the location problem in a dynamic environment).This situation refers to the process of realizing autonomous operation or automatic driving based on the matching of current observation and prior map after having a prior map.This is a problem that must be solved, no matter in the field of robots or in the automatic driving with heavy capital!Imagine that an unmanned sweeper working in a shopping mall or an autonomous driving vehicle driving on the urban road will suffer from very serious dynamic interference, which will inevitably affect the positioning.

Existing solutions for dynamic SLAM
Nowadays, in SLAM field, the improvement of robustness in various environments or the method of adaptive parameter adjustment are the main research directions of SLAM in dynamic environment.According to the "dynamic degree" of all objects in the environment, they can be divided into four categories: (1) High dynamic objects: real-time moving objects, such as pedestrians, vehicles, running pets.
(2) Low dynamic objects: objects that stay for a short time, such as people who stand on the roadside and talk for a short time.
(3) Semi static objects: objects that do not move in a SLAM cycle, but do not always move, such as vehicles in the parking lot, stacked materials, temporary sheds, temporary fences, temporary stages in the mall.
(4) Static objects: objects that never move, such as buildings, roads, curbs, traffic signal poles.Except for static objects, the dynamic properties of the other three classes of objects differ, as do the coping mechanisms at scan-to-scan match and scan-to-map refinement.Online real-time filtering for extremely dynamic objects.After the completion of a SLAM procedure, post-processing filtering is applied to low-dynamic-objects.For semi-static objects, SLAM is permanent (or long-term SLAM).The above three modes are compatible with higher modes.I believe that the following "four sorts of techniques" can be employed to overcome the challenges faced at the two levels of scan-to-scan matching and scan-to-map refinement.We will discuss the primary challenges in dealing with dynamic problems using current mainstream methods and provide a list of classic articles for anyone interested in pursuing further research in relevant subjects.

Scan-to-scan match.
High dynamic objects are affected.Such objects can be filtered out in real time before or during front end registration through traditional methods/or learning based methods to ensure that the final registration results are based on static point clouds.The key and difficult point here is that the filtering process must be very fast, and it is best to take milliseconds or even ignore it.Because traditional methods usually separate dynamic point clouds based on inter frame contrast, the accuracy and time consumption are usually not particularly satisfactory; The learning-based method directly trains an endto-end network with deep learning, and then it can identify dynamic points in the preprocessing phase.When the front end is configured, it is already a clean point cloud.However, the problem with learning based is that it can only identify the types of dynamic objects that have been trained, but it is powerless to do anything else.Besides, it needs to be equipped with GPUs, which increases the cost.
Part1: Chenglong Qian proposed a dynamic slam framework RF-LIO [3] based on LIO-SAM in 2022.The front LIO is responsible for rough registration (providing initial values), and the back end performs scan to submap fine registration.During the fine registration iteration, dynamic points in the submap are constantly detected and removed based on the initial values and multi-resolution depth maps, Finally, fine registration based on "static submap" is realized.
Part 2: In ICRA-2021, the renowned ASL laboratory of the Swiss Federal Institute of Technology released an article [4] on SLAM using dynamic target perception laser radar developed automatically based on training data.The author notes that the majority of previous techniques rely on recognising actual motion or dynamic objects based on their appearance.For instance, you can assume that the dynamic elements in a city scene are walkers, bikers, and autos.Then, we utilise a semantic cutting method based on deep learning to discover this item type in a point cloud.These approaches, however, rely on manually labelled training data and are limited to available data sets.The author performs realtime 3D dynamic object detection using deep learning (3D MiniNet network), and then feeds the filtered point cloud for traditional laser SLAM to LOAM.This detection is based on an author-proposed new type of pipe based on occupied grid.Diagram of the complete pipeline is shown in Figure 2.

Scan-to-map refinement
For highly dynamic objects, it can be improved at the scan-to-map refinement.Synchronous filtering in the SLAM process, the requirement for real-time is relatively low, and it can be carried out in a leisurely manner in the back end, as long as the key frames inserted into the map can be processed in a timely manner.The advantage of this strategy is that dynamic object filtering and SLAM are performed synchronously, without extra time.If the requirement for dynamic object filtering rate is not very high, this is a practical method.After all, users certainly do not want to wait for a while to see the map after building the map (compared with the post-processing method below).For example, David Yoon's works [5] take a frame before and after each, and take several recent frames to form a submap.In 2019, it proposed a radar based dynamic object mapless online detection scheme.The potential dynamic point is put into the following reference frame for verification.If it is passed by the laser beam of the following reference frame, it is confirmed as a dynamic point.The confirmed dynamic points are clustered as seed points in query scan to obtain dynamic clustering.The mapless means that only two reference frames are needed instead of building local maps or sub maps.
The low dynamic objects can also be refined during the scan-to-map phase.At the scan-to-map refinement, low dynamic objects are filtered by post-processing after SLAM is completed, so that information from all frames in the entire SLAM cycle can be referred to.This is more conducive to comparing objects that are temporarily stationary, and high dynamic objects are even more crucial.The post-processing technique can filter out dynamic objects more precisely and fully; however, you must wait once more after SLAM to obtain the map.Post-processing is the optimal method for dynamic object filtering when the requirements are stringent.The post-processing mode prioritises the precision and sufficiency of dynamic point cloud filtering in contrast to the real-time option.It should eliminate everything without exception.Since the post-processing approach is not time-sensitive, it can use all frames from the whole SLAM cycle as reference information to locate dynamic spots.The postprocessing mode prioritises the precision and sufficiency of dynamic point cloud filtering in contrast to the real-time option.On the basis of post-processing, common dynamic object filtering methods can be categorised as segmentation-based, ray-casting-based, or visibility-based [6].

Segmentation based methods.
Segmentation based methods are usually based on clustering.For example, Litomisky et al. [7] distinguish dynamic clustering from static clustering based on the characteristic distribution histogram (VFH) under a certain perspective; Yin et al. [8] believe that the points with large matching errors in the registration of adjacent frames are likely to be dynamic points.Among the segmentation based methods, we have to mention the deep learning based semantic segmentation.The semantic segmentation directly labels which points are dynamic objects.The mapping algorithm only needs to discard these points directly, which is simple and crude.However, the deep learning method can only segment the dynamic categories that have been trained, and can do nothing for other dynamic objects.

Ray tracing based methods.
Ray tracing based are typically implemented in combination with grids.The basic principle is that for the grid hit by the laser point, hits count+1, the grid crossed by the laser beam, misses count+1, and calculate the occupation probability of the grid through hits and misses.If the occupation probability is lower than the threshold, all points in the grid will be erased.This method takes advantage of the feature that dynamic points will only hit a grid temporarily, and this grid will be missed most of the time later.The disadvantage of this method is that it consumes computing resources!In actual projects, Cartographer has applied this strategy.From the actual effect, it can basically filter out dynamic points, but the filtering effect is not accurate enough, and the phenomenon of killing by mistake is also serious.In 2018, the author Johannes Schauer proposed a method [9]to delete dynamic objects from 3d point cloud data by traversing voxel occupied grid.This article proposes a grid based dynamic filtering method (ray casting based), that is, to determine whether the grid is dynamic based on the hit and pass of the grid.The author puts forward a series of strategies and tricks to avoid the common problems of such methods, such as excessive time consumption, accidental killing/missed killing, etc., which is of universal significance for grid based methods.The visualization of refinement can be seen in Fig. 3.

Visibility based methods.
The basic assumption of visibility-based methods are that if the optical path of a laser point passes through another laser point, the other laser point is a dynamic point.This assumption is completely logical, but there are two problems when it is implemented: first, the problem of killing by mistake when the incidence angle is close to 90 degrees, As shown in Figure 4, this visibility-based assumption has ambiguity of incidence angle.As the range measured from the ground becomes longer, the incident angle of the point becomes more blurred.Considering the angle error, ranging error, light spot impact, etc. of a frame of laser point cloud, this kind of killing will be more serious.Secondly, occlusion problem.For example, for some large dynamic objects, they completely block the line of sight of the laser radar.The laser radar has no chance to see the static objects behind these dynamic objects, which means that these dynamic points will never be passed through by new laser points.At this time, it is impossible to filter out these dynamic points.In 2021, the author Hyungtae Lim [6] proposed a new static map construction method, called ERASOR.Although this paper does not belong to the three categories of methods, it can still be used for reference.Its basic framework is shown in the figure 5. Its basic idea is to carry out conventional SLAM mapping in step 1 to obtain point cloud map.Step 2 is to divide the submaps near query scan and query scan into sector bin (lattice) according to the same location.Step 3 and step 4 are to compare the lattice of query scan and the lattice of submap, and according to the difference in the distribution of point clouds in the lattice, remove the dynamic points in the submap lattice.However, its theory relies on strict assumptions: all dynamic objects must be located on the ground plane, and the height interval must be between -1 and 3 meters.Compared with other schemes, this method can retain most dynamic points, and has significant advantages in operation speed.At the scan-to-map refining stage, semi-static objects can also be improved.At the scan-to-map refinement, for semi-static items, the lifelong SLAM (or long-term SLAM) approach can be used to determine if these objects have changed between successive SLAM rounds and to update the environment map accordingly.In reality, the main technology of lifetime SLAM consists of two components: (1) Follow the association.Track association attempts to address the spatial association problem of SLAM tracks in multiple rounds; (2) Map fusion.The purpose of map fusion is to solve the map fusion problem of many tracks, which involves the filtering of dynamic objects and the updating of semi-static items.Consequently, the filtering technology of high and low dynamic objects can be utilised as a component of lifelong SLAM technology).Lifelong SLAM can be fully implemented during the location operation phase, and the algorithm must do the same.If the industry is launched, I believe that lifelong SLAM must be implemented in the positioning procedure.SLAM mapping serves the aim of placement.It is fair and sensible to employ lifelong SLAM technology to adjust the map while positioning.The core issue of whether lengthy mapping exists goes well beyond dynamic/semistatic object filtering.In the lifetime process, dynamic/semi-static object filtering is only a component of map fusion between distinct sessions, while map fusion is only a component of lifelong mapping.Nonetheless, we have a general understanding of how to filter dynamic/semi-static objects in lifelong mapping.
In 2022, Giseop Kim proposed a lidar-based architecture for LT mapping [10].They offer LT-mapper, an unique modular architecture for LiDAR-based lifelong mapping.LT-mapper is the first open modular framework to provide LiDAR-based lifelong mapping in urban environments with complex topography.It emphasises map creation and maintenance and lacks a specialised positioning function.Idealistically, this architecture may become the central engine of various applications for urban map administration and spatial comprehension, allowing for the maintenance of real-time maps and metamaps.
In 2021, Zhao Ming's team proposed a universal framework [11] that will be localized for life with the changing environment.They published their work which developed based on Google Cartographer.The paper points out that the environment of most scenarios is always changing, and pre built maps that do not take these changes into account are easily outdated.For this reason, the author proposes a general framework for lifelong location and mapping.By using multiple location sessions and map updating strategies, this method can track scene changes and obtain the latest map.The core of this work is: 1.How to judge which old submaps need to be removed and how to retain the graph connection information of the removed subgraph; 2. How to maintain sparse and effective Pose graph.3. Structural design.Although 2D, the techniques for Life-long processing are generic.
At the drawing level, the processing methods for high dynamic objects/low dynamic objects/semi static objects are upward compatible.Compared with the common single track SLAM, life long SLAM is a more systematic work.Above all, if the life long SLAM is completed, the mapping and positioning will not be separated.I think it is a very valuable direction.As mentioned above, if the data preprocessing link has completely solved the dynamic objects (such as the learning based method), the mapping level may not need to work hard on dynamic filtering.

Conclusion
Simultaneous Localization and Mapping has always been a hotspot in the computer vision community, which aims to predict the location of agents and use sensors to perceive their surroundings to build maps and achieve navigation.In this paper, we introduce the advanced researches in the dynamic SLAM.Specifically, this article first analyzes three aspects of the main adverse problems that the dynamic environment brings to SLAM.Then the solutions are further introduced for different dynamic objects according to the dynamic degree, which includes the design idea, basic framework, advantages and disadvantages.Finally, we summarize the existing research problems in the field of dynamic SLAM and discuss its future development direction.

Figure 1 .
Figure 1.Visualization of ghosts in map.(3)Location level based on a priori map (the location problem in a dynamic environment).This situation refers to the process of realizing autonomous operation or automatic driving based on the matching of current observation and prior map after having a prior map.This is a problem that must be solved, no matter in the field of robots or in the automatic driving with heavy capital!Imagine that an

Figure 2 .
Figure 2. Full pipeline for generic dynamic object aware LiDAR SLAM is depicted.

Figure 3 .
Figure 3. illustrates the refining effect.After non-static points (in magenta on the left) are identified, they are eliminated without artefacts (right)

Figure 4 .
Figure 4. Limitations caused by ambiguity of incidence angle.In 2021, the author Hyungtae Lim[6] proposed a new static map construction method, called ERASOR.Although this paper does not belong to the three categories of methods, it can still be used

Figure 5 .
Figure 5.The basic framework proposed by ERASOR.At the scan-to-map refining stage, semi-static objects can also be improved.At the scan-to-map refinement, for semi-static items, the lifelong SLAM (or long-term SLAM) approach can be used to determine if these objects have changed between successive SLAM rounds and to update the environment map accordingly.In reality, the main technology of lifetime SLAM consists of two components: (1) Follow the association.Track association attempts to address the spatial association problem of SLAM tracks in multiple rounds; (2) Map fusion.The purpose of map fusion is to solve the map fusion problem of many tracks, which involves the filtering of dynamic objects and the updating of semi-static items.Consequently, the filtering technology of high and low dynamic objects can be utilised as a component of lifelong SLAM technology).Lifelong SLAM can be fully implemented during the location operation phase, and the algorithm must do the same.If the industry is launched, I believe that lifelong SLAM must be implemented in the positioning procedure.SLAM mapping serves the aim of placement.It is fair and sensible to employ lifelong SLAM technology to adjust the map while positioning.The core issue of whether lengthy mapping exists goes well beyond dynamic/semistatic object filtering.In the lifetime process, dynamic/semi-static object filtering is only a component of map fusion between distinct sessions, while map fusion is only a component of lifelong mapping.Nonetheless, we have a general understanding of how to filter dynamic/semi-static objects in lifelong mapping.In 2022, Giseop Kim proposed a lidar-based architecture for LT mapping[10].They offer LT-mapper, an unique modular architecture for LiDAR-based lifelong mapping.LT-mapper is the first open modular framework to provide LiDAR-based lifelong mapping in urban environments with complex topography.It emphasises map creation and maintenance and lacks a specialised positioning function.Idealistically, this architecture may become the central engine of various applications for urban map administration and spatial comprehension, allowing for the maintenance of real-time maps and metamaps.In 2021, Zhao Ming's team proposed a universal framework[11] that will be localized for life with the changing environment.They published their work which developed based on Google Cartographer.The paper points out that the environment of most scenarios is always changing, and pre built maps that do not take these changes into account are easily outdated.For this reason, the author proposes a general framework for lifelong location and mapping.By using multiple location sessions and map updating strategies, this method can track scene changes and obtain the latest map.The core of this work is: 1.How to judge which old submaps need to be removed and how to retain the graph connection information of the removed subgraph; 2. How to maintain sparse and effective Pose graph.3. Structural