Ground Interaction Models for Increased Autonomy of Planetary Exploration Systems

Future planetary exploration robots need to improve their autonomy to increase mission safety and efficiency. The presented concept achieves this by introducing the learning and usage of ground interaction models, which allow a more precise modelling of the robot’s mobility on different terrains. The idea is that a precise prediction of the expected performance will, on the one hand, allow an early detection of changed conditions and, on the other hand, enable a system to appropriately react on it. By classifying the traversed terrain, a robot gains the possibility to replan its path or to change its locomotion behavior to eventually optimize mission success. The paper provides details, on how the ground interaction models are trained, how the required data is collected, and how they are embedded into a physical simulator to use them online on the system.


Introduction
Planetary exploration of celestial bodies, such as Moon and Mars, using robotic systems has been common practice for decades.However, to reach scientifically interesting places such as craters and sub-surface lava tubes, they need high mobility capabilities to traverse sandy and rugged terrain, to avoid or overcome obstacles, and to cope with different slopes.There exist manifold locomotion concepts, e.g.wheeled rovers, legged walking systems, or hybrids that use legged-wheels or wheeled-legs.They all have their pros and cons, and thus are better or less suited for different terrains.During the exploration of unknown areas, regardless of the type of locomotion, the robots need to autonomously be able to detect the traversed terrain and to identify whether it is harmful or not.To guarantee mission success, an intelligent system should autonomously navigate around risky environments or, if possible, adapt its locomotion pattern accordingly.
The presented approach to predict a robot's performance on the actual terrain is to learn and use a meaningful model of robot-ground interaction, the so called Ground Interaction Model (GIM).It will allow, on the one hand, the realistic simulation of any robot movement on different surfaces by predicting the mobility characteristics as precisely as possible in advance.And, on the other hand, having a digital twin of the real counterpart allows a continuous comparison with the sensor values actually measured and to detect anomalous behavior.Due to the fact that the real conditions (gravity, soil composition) cannot be exactly reproduced on earth, the robots need to be able to learn the GIMs online during their traversal from their own sensor readings.

2
To implement the use of GIMs, the following methodology is applied (Figure 1): (i) Big data evaluation of different mobile systems on several terrain types: For a generic approach to learn ground interaction models, six morphologically different robotic systems developed for planetary exploration are used.By collecting performance data on plain ground, loose soil, rubble, and lava floor, a quantitative comparison of varying walking and roving systems of variables sizes and locomotion principles is done, and various GIMs are learnt.(ii) Application of machine learning methods to generate GIMs: Machine learning approaches are used to learn efficient models that predict the ground interaction of the respective wheel or leg contacts, and thus, improve the prediction of the overall robot behavior.For collecting data, the default locomotion behavior is used, but also the usage of specific probing behaviors maximizing the information gain are learnt and analyzed.(iii) GIM integration in physics simulator : The learned GIMs are then integrated into a realtime physics simulator to improve the simulation of the holistic robot behavior by reducing the simulation reality gap.(iv) GIM usage to increase robot autonomy: The usage of GIMs in an internal simulation online on the system to exactly predict the nominal robot performance and to compare this to the actual sensor readings enables to continuously determine soil properties and to detect non-nominal conditions which can then be handled through path planning or behavioral adaptation.The following sections describe the individual steps in more detail.At the end, a summary is provided and an outlook on future activities is drawn.

Experiment Design and Data Collection 2.1. Experimental Setup
In order to collect a wide range of data, six robots with different locomotion styles are used (Figure 2).In the following, each robot is briefly described.More detailed information about each robot can be found on the website1 : • Coyote III: A light-weight scout rover with four star wheels and skid steering.
• Artemis: A rover with six tweels and a triple bogie suspension, that allows omni-directional mobility on unstructured terrains.• SherpaTT: A rover with four wheels and an active suspension system which allows versatile driving and walking locomotion modes.• Crex: A light-weight torque-controlled six-legged walking robot.
• Mantis: A walking robot with six extremities that can walk in a four-, five-, or six-legged posture.• Charlie: A walking robot with four legs and adaptable multi-point contact feet.To expose the pros and cons of the the different locomotion concepts, each of the six robots drives or walks over a test track composed out of two solid and two loose materials.A simple plain wooden floor and a rough lava stone terrain with sharp edges are selected as solid surface types.The loose surface types are represented by a gravel field and fine sand of < 1 mm granularity.
The four materials are lined up (Figure 3a) where each field covers an area of 3 m x 3 m leaving space for the biggest robot used to perform various maneuvers on the test track.The test track is set up indoors in the multi-functional hall at DFKI, where a motion tracking system covers the entire track to collect additional reference data during the experiment.In order to enable repeatable experiments, a leveling mechanism for the fine sand is being developed.Besides flattening the surface, it also loosens the grains to maintain a typical sinking characteristic.

Data Storage Infrastructure
To unify the processing of data from the different robot systems, we transfer the log data recorded by each robot system into a database architecture.At the core of the data storage infrastructure (Figure 3b) is a relational time series database (R&Ts DB) based on Timescale 2 .This is used to store time series data such as sensor data streams, which make up the majority of the data to be stored.The underlying database layout is designed to capture additional relational data in order to simplify the work with sensor data streams.This includes, for example, metadata describing the particular robotic system or information about the experiments performed, including the parameters and the results of the experiments' evaluations.To support data processing for machine learning, the database also allows the acquisition of annotations to mark specific time ranges, e.g., to mark the occurrence of certain phenomena for data set assembly.Relational databases, however, are not particularly well suited for storing some types of data, such as image, video, or point cloud data.Even though most database systems provide a binary data type (e.g.BLOB / Binary large Object) for this case, we use a dual storage infrastructure instead.For this purpose, an additional file server is operated, on which the data is stored in respective established file formats and referenced in the R&Ts DB.The R&Ts DB then only contains timestamps, associated metadata, and a reference to the respective files in the Object DB.In the presented architecture, established technologies are used, each of which provides its own interfaces and API for accessing the contained data.

Extending Physical Simulator with Ground Interaction Models
Physical simulators for robotic systems are most often based on rigid body simulations.This is also true for the robot simulation MARS3 which is used in the scope of this work.MARS is based on the Open Dynamics Engine which is a rigid body simulation library.While the simulation concept is designed for hard contacts, it is possible to simulate soft contacts through parameters that are designed to handle simulation errors that occur due to discretized simulation steps.The two main parameters that define the softness of a contact are the constraint force mixing parameter (cf m) and the error reducing parameter (erp).Both can be calculated from default spring-damping parameters in relation with the simulation step size.
To integrate the GIMs into the physical simulator, two methods can be used.For the first method the GIMs completely take over the contact force calculation [9].However, using this approach to predict the contact force directly, would eliminate the contact handling from the constraint-based physics calculation of each simulation step and could introduce instability into the contact handling.An alternative is to calculate the cf m, erp, and friction parameters by the GIMs for every simulation step depending on the current state [12].In this approach the GIMs processes the current contact depth and contact load, given by the simulation, together with a variance parameter that could be randomly generated and stored in a grid-based overlay or specific texture of the surface geometry.Together with the variance parameter, also the footprint depth has to be stored in the overlay.As long as the calculated contact depth with the surface is below the footprint depth, the contact is ignored by the simulation.The variance parameter can be interpolated by the parameters of the neighboring cells depending on where exactly the contact is within a cell.Figure 4a depicts the grid-based contact management in simulation.
The output of the GIMs is the cf m and erp parameters and a friction coefficient, while from simulation perspective the GIM itself can be a black box.The corresponding layout is shown in Figure 4b.In [16], the authors developed a contact model for granular soil, based on a convex wrench space learned from data and used it for predicting whether a foothold in this soil sticks or slips.This depends on penetration depth and orientation, which we also consider as inputs to the GIM.Furthermore, the simulation needs a GIM manager that selects the correct GIM for a combination of robot model and surface type or parameters.This information is stored with the robot model to keep the simulation as generic as possible.

Learning Ground Interaction Models
Learning GIMs is essential to leverage the autonomy level of the robots during planetary exploration missions.From the experimental data collection with six different robot systems (Section 2.2), three learning modules are adopted as presented in Figure 3b: ground classification (Section 5), optimization of GIMs (Section 4.1), and learning of probing behaviors (Section 4.2).

Optimization of Ground Interaction Models
There are several possibilities to optimize the simulated counterpart of a real robot.Parameter optimization based on the matching of the robot trajectories in the simulation and real-world environment is proposed in [14].As proposed in [7], a simplified way to improve a GIM is the direct estimation of friction parameter distributions.[12] proposes a neural network to predict simulation parameters where the weights are optimized to match simulated and real footprint state variances.
As part of learning a walking policy with Reinforcement Learning (RL), [8] uses supervised action transformation functions (learned from a set of real-world trajectories) that are passed to the simulator with the intention of recreating the same transition state that would occur in the real robot.In [10], the action transformation function is itself treated as a policy, and thus, Figure 5: Overview of the integration of the machine learning modules (green) -the real-time simulation on board (blue) is extended by the learned GIMs (orange).The GIM's parameters are optimized to reduce the simulation reality gap and enable continuous behavior learning by the determination of soil properties and anomaly detection.During the robot's locomotion, the robot's sensor data will be recorded and the performance features extracted to supply the learning modules.
the optimization of the simulation is formulated as an RL problem.To mimic the dynamics of the real-world, Imitation Learning from Observation (IfO) is proposed in [4] to learn a modified action by only observing the state trajectories from reality.
We propose the adoption of the IfO approach for improving GIM parameters -possibly including action transformation -by imitating the observations of performance features collected from the real robot's ground interaction.The output parameters of the GIMs (simulation parameters) are used to improve the simulation prediction of performance features.This is compared to the robot's performance that we collected during the experiments, and the difference is fed to a RL algorithms to minimize the difference.The goal is to find the best contact parameters of the simulated surface that eventually generates the exact overall robot behavior.In the simulator environment, the GIM as policy π ϕ (a t |s t ) is improved by a deep RL algorithm during N interactions as the expected return is maximized, i.e., the simulation reality gap is minimized.In each step, given a current state s t (e.g, contact depth, contact load, surface variance, etc.) in the simulator, the actions are the simulations parameters (cf m, erp, and friction parameters) that produce an effect in the simulator environment s t+1 .As a reward signal, the error between observed state s τ real from a set of real-world states τ real and the simulated data is used.
However, minimizing the error between trajectories directly is not proposed because reactive locomotion behaviors might cause different references on joint levels.Instead, we aim to minimize the gap of high-level features, i.e., performance features as presented in [5], such as velocity, undesired vibrations, or energy consumption.

Ground Probing Behaviors
Standard locomotion controllers aim to generalize well over terrain variability.However, while any ground interaction yields observations useful to improve an initial GIM, some behaviors may yield more suitable interaction data than others.In [6], for instance, two tactile motions are designed for walking robots with the purpose of extracting relevant features for modeling ground interactions.Fixed probing behaviors can also be used to directly improve terrain classification [11] and assist navigation even by active tactile sensing of vertical surfaces [3].These probing behaviors are executed by interrupting the regular locomotion.However, probing can naturally blend into locomotion as in [15] where careful force-adjusted probing behaviors are performed by a front leg of a hexapod to classify footholds as collapsible or non-collapsible based on the interaction result.
We design probing behaviors to: • reduce the SRG by exploring interactions yet modeled inaccurately • reduce class-specific misclassification rate • integrate into locomotion behaviors allowing parallel locomotion and probing For this purpose, we compare random movements, regular locomotion, and learned behaviors.The training of a policy that generates a probing behavior could be realized by maximizing the disagreement among potential candidate models, i.e., the ground interaction model currently used and an alternative.Behaviors that maximize the disagreement should allow obtaining data from real trials that can be used for model selection and fine-tuning.
A simple procedure to both, optimize the parameters of a selected GIM and to learn probing behaviors follows the estimation-exploration approach [1,2].In this approach, an ensemble of simulation models is trained to match body part masses, sensor time lags, or the robot morphology.The differences between these candidate models' outputs should be higher, when a behavior explores interactions that are not yet modelled accurately and thereby the candidate models predict different outcomes.
Furthermore, a step-wise disagreement measure as in [13] is used in the exploration phase to find the most suitable probing behaviors with RL.Finally, by initializing the policy to a locomotion controller and optimizing its ability to elicit model disagreement, we expect to find behaviors that remain functional for locomotion while also generating informative trial data, e.g., to avoid potentially dangerous misclassifications of the terrain.

Increasing Autonomy through Ground Interaction Models
Having a precise simulation of the expected robot performance for different ground types allows the adaptation of the robot behavior in the following ways.The first approach uses the simulation to simultaneously predict the sensor values for a system while traversing a known surface.As long as the prediction fits to the real sensor values, the assumption concerning the surface parameters/type is valid and the system can continue its operation.As soon as the prediction error exceeds a defined threshold the system can stop and analyze its recorded data.Either the recorded sensor values can be simulated with another known surface model or the system reaches an unknown surface.The former describes the use case where the simulation is used to identify the surface type via known surface interaction models.The detected context-change can then be used to trigger a path replanning or to adapt the locomotion behavior [5].In case of reaching unknown terrain, we perform an online optimization of the interaction model based on the closest prediction to learn a new GIM.Another integration of the simulation into the navigation and planning architecture is the possibility to utilize it as a precise cost function for a planned trajectory by simulating the performance of the robot along the path.

Conclusion and Outlook
This paper provides an overview on the concept of ground interaction models, i.e., how they are integrated into a physical simulator to increase its accuracy, the data-driven methods to optimize them, the required data acquisition, and their application to increase a robot's autonomy.The conept is currently being implemented and verified with the first system.The extension to the other target systems will follow and as well as the in-depth analysis of the results obtained.

Figure 1 :
Figure 1: Concept drawing of the generation and usage of ground interaction models

Figure 2 :
Figure 2: Utilized robots for the data collection (a) CAD model of the test track including the leveling mechanism.

Figure 3 :
Figure 3: Data collection and management (a) Illustration of the grid-based footprint contact handling in simulation where each cell stores a footprint depth and a variance parameter.Definition of the inputs and outputs of a GIM.The mapping of inputs to outputs can be done via method while the blackbox is integrated into the simulation update step.The outputs of a GIM define the simulation contact behavior.

Figure 4 :
Figure 4: GIM integration into a physical simulator