Active robotic search for victims using ensemble deep learning techniques

In recent years, legged quadruped robots have proved to be a valuable support to humans in dealing with search and rescue operations. These robots can move with great ability in complex terrains, unstructured environments or regions with many obstacles. This work employs the quadruped robot A1 Rescue Tasks UPM Robot (ARTU-R) by Unitree, equipped with an RGB-D camera and a lidar, to perform victim searches in post-disaster scenarios. Exploration is done not by following a pre-planned path (as common methods) but by prioritising the areas most likely to harbour victims. To accomplish that task, both indirect search and next best view techniques have been used. When ARTU-R gets inside an unstructured and unknown environment, it selects the next exploration point from a series of candidates. This operation is performed by comparing, for each candidate, the distance to reach it, the unexplored space around it and the probability of a victim being in its vicinity. This probability value is obtained using a Random Forest, which processes the information provided by a convolutional neural network. Unlike other AI techniques, random forests are not black box models; humans can understand their decision-making processes. The system, once integrated, achieves speeds comparable to other state-of-the-art algorithms in terms of exploration, but concerning victim detection, the tests show that the resulting smart exploration generates logical paths—from a human point of view—and that ARTU-R tends to move first to the regions where victims are present.


Introduction
As the 2021 edition has said of the Ecological Threat Report, the number of natural disasters in the world has shifted, since 2000, from 300 to 450 every year, doubling the level reached in the 1980s [1].Disasters can be, in consequence, considered a very present issue and are set to remain.
Robotics could be very important in search and rescue (SAR) missions.The speed and efficiency of their management, the ability to search in inaccessible human locations and the advantage of not exposing humans to risks make them useful in accomplishing these tasks [2,3].In addition, the increase in the usage of field robots in collaboration with humans and the development of very accurate techniques of Computer Vision have played a major role in the grounding of Robotics in that sort of task.
Although victim recognition is a crucial aspect of working with all of them, it is a problem that has been addressed [9][10][11].However, the problem of achieving effective path planning remains open.Since humans are affected by disasters, rescue missions are time-critical applications.Therefore, it is interesting that the search strategies increase the robot's ability to find the victims as soon as possible.
Different solutions have been proposed, whether for unmanned aerial vehicles (UAVs) [12,13], submarines [14] or terrestrial robots [15].However, in these existing approaches, there is a common limitation: they either do not incorporate visual information from the environment when planning their route [12,16], or if they do, they do so using deep learning techniques [17].This means that active search-concept defined in [18] as 'on a large-scale environment [. ..] find objects using mainly visual sensing'-is rarely conducted.When it is employed, it often lacks the capability to provide a comprehensible explanation of the system's behaviour to external operators.
In the domain of quadruped robots used for rescue tasks, additionally, no work has been found in which a search for victims is carried out.The existing literature so far has predominantly focused on aspects such as the design of these quadruped robots [19], their teleoperation [20], terrain reconstruction [21,22], or, as previously mentioned, victim identification [23].
The main contribution of this research is developing, using a quadruped robot, an autonomous rescue system able to perform an active exploration in a transparent way, not using deep learning techniques for determining the different exploration points.It integrates several subsystems, including three convolutional neural network (CNN) detectors, a SLAM module, a path planner, and a movement controller, to create a comprehensive exploration algorithm.This algorithm combines visual and spatial information to select what will be the next exploration point, with the aim of finding all the victims as soon as possible.
To fulfil this requirement, two main techniques have been used.On one hand, we have employed indirect search (IS) strategies, focused on the objective of promptly locating victims without directly targeting them but looking for related objects and rooms.On the other hand, next best view (NBV) algorithms have been utilised to determine, at each step, the subsequent exploration point, as no pre-established path planning is performed.
Although both of them can be done employing black-box models, our algorithm interprets camera and sensor data using a random forest.This approach enables a step-by-step description of the system's decision-making process, making it possible to identify the variables that have the most significant impact on decision-making.This methodology emulates the decision-making process of a human, providing real-time insights into the system's reasoning.Operators can easily track elements like the upcoming scanning point and the detection status of victims, enhancing their comprehension of how the system operates.
The work is structured as follows: after analysing the most relevant works in literature related to IS and NBV, section 3 introduces the different components of the system, and section 4 presents the tests they have undergone.Finally, section 5 summarises the most interesting findings of the work.

Related work
The design of the system for rescuing people in cluttered and post-disaster environments presented here mix IS strategies and NBV algorithms.

IS problem
Unlike traditional planning problems, which aim to reach a goal efficiently, IS problems aim to find an object in a specific-known or unknown-environment.Classical cost-maps, which represent, for each point of the space, the distance to the goal, are in this task replaced by probabilistic roadmaps where each point has a probability associated with it that the object is in it [24].
As stated in [25], this problem is NP-Complete.In consequence, the optimal solution can only be reached approximately.Moreover, the dimensionality of the task is proportional to the number of sensor parameters that can be controlled (such as sensor position and orientation), which means that each decision taken must lead to a maximum probability of detection with the minimum number of sensor actions [26].
During the past decade, solutions based on semantic search have been implemented, such as Dora the Explorer, an ontology-driven robot [27], or the works of [26,28,29], in which relations between objects are exploited.This line, founded on ideas like 'is more probable to find a remote control near a TV screen than a microwave' , has continued to be exploited successfully [30].In [31], the environment is divided into rooms and their category is considered.Visual odometry has been employed in [32].
The development of reinforcement learning techniques has led to their application to the IS task.In [33] is proven that this formulation leads to better results when a high uncertainty exists.Nevertheless, a vast quantity of data and training time is needed, and its performance decreases in environments with many objects [34].Solutions based on both classical techniques [35,36] and deep reinforcement learning [37,38] are available.

NBV techniques
NBV task goal is to move the robot, in each step, to the point that maximises the probability of finding the desired object in the following iteration.Although it is implicitly considered part of the IS problem when reinforcement learning techniques are used, it is necessary to consider it explicitly when working with other algorithms.
Initially addressed by [39], the problem has been reformulated in many ways, adding different constraints [40,41].As noted by [42], achieving a solution needs an equilibrium between exploration and exploitation.
Two possible approaches have been studied.A first option is to choose the NBV at the border of the explored space.This is the line followed by [43], which estimates for each boundary cell an information gain value following a Markov decision process, and [16], which classifies the different boundary points into six categories following a decision tree.
On the other side, other works select the NBV after evaluating an objective function on a list of randomly chosen candidates in the neighbourhood of the present point.References [44][45][46] use a fitness function which takes into account the potential information gain (measuring the free space around the candidate) and the cost of getting to it from the current location.In the objective function of [47], free space, and information gain (measured image entropy) are taken into account and a penalty is added to points already visited.

TASAR approach
The TASAR (Team of Advanced Search And Rescue Robots) project proposes the use of ground SAR Robots for Humanitarian Assistance and Disaster Relief [48].One of them is ARTU-R, shown in figure 1, a quadruped with a high ability to move in unstructured environments [49,50].
IS and NBV techniques are used here without resorting to deep learning, which enhances operators' comprehension of the system and its decision-making processes.Information from cameras and sensors is processed using a random forest, a machine learning tool that, much like human reasoning, makes sequential decisions and prioritises certain factors over others.The importance of the different factors influencing decision making is known.
Although those advantages, it is not habitual to use random forests in rescue tasks.In recent years, the most significant contribution in this regard is [51], which uses different machine learning techniques-including random forests-to estimate the probability of maritime SAR operations and proves its validity and robustness when modelling unknown environments.
However, the present work goes a step further and random forests are used as part of the SAR process and receive information not only related to the map, but also images of the environment.
Exploration is done room by room: when ARTU-R reaches the SAR environment, first looks for doors and locates them on the map.These doors are all passed through, one by one, during the search process, and no room is abandoned until finishing conditions are met.
In each room, the robot uses images captured by an RGB-D camera looking for victims from selected interest points, chosen following NBV strategies based on evaluating a fitness function.Unlike what has been done until now, not only the distance to the candidate and its exploration potential (the volume of the unknown region around him) are considered, but also the semantic information extracted from the images.
This information is processed using a CNN and a random forest, which, based on the objects detected by the network, returns a probability that there is a victim in the vicinity.The great advantage of using random forests is that their reasoning process is very similar to what a human would do in such a situation, making it easy to understand the logic of the decisions made by the system.
In the same way, an operator interface has been developed.It is possible to check it in real-time information such as the mapped region, the evaluated candidates, and the next exploration point or to access ARTU-R's camera.
The tests of each particular step of the process gave accurate results.After the integration of all of them had been done, the final system was proven to be fast detecting humans and followed paths that can be considered logical from a human point of view.

Test environments
Although environments cannot be directly considered part of the work, testing scenarios have been considered a main part of the developing process due to the necessity of evaluating system performance in real scenarios.Two environments have been used for this.Firstly, a fictitious rescue scenario was created on a generic Robotics test ship following standardised rescue environment guidelines.Secondly, tests were carried out in a controlled open environment, i.e. using a real scenario-an office and laboratory floor-in a situation where the passage of people and the occupation of the space was limited.Facilities of both of them are located in the Centro de Automática y Robótica (CAR) at Universidad Poliécnica de Madrid (UPM).

National Institute of Standards and Technology (NIST) orange reconstructed environments
The standardised environment has been made to achieve the requirements of a NIST orange environment [52].NIST standards are used in Rescue Robotics completions and research to make possible comparisons of results.
Three levels are defined in these standards: yellow, orange and red, with the orange level being between the other two in terms of the complexity of the scenario.This level is characterised by a floor made of different materials, with ramps or stairs, which require a certain degree of agility in manoeuvring.In terms of planning, it is necessary for the robot to be able to fully map the environment, as it can be reconfigured in real-time, simulating collapses, opening or closing doors and blinds, etc. Due to the limitations of the robot's own lidar, and this work being a proof of concept, environments with more than one floor have never been addressed.However, a future line of interest may be to incorporate an advanced SLAM module into the system to manage several floors.
Two scenarios following orange standard were set up inside a test area of 49 m 2 (7.6 m × 6.5 m).They can be seen in figure 2. Both simulate a house: they have a little entrance with a door, behind which there is access to different rooms with some tables and other everyday objects and several pieces of debris on the floor, such as a semi-transparent pallets, walls, etc.Some human mannequins were placed in the role of victims.Not all the obstacles hide humans.This aims to study the system's behaviour with false positives and whether the exploration order is natural and efficient (e.g. if it looks first on the nearest obstacles).
While the scenario in figure 2(a) aims to demonstrate the validity of the system and its algorithms in a generic post-disaster environment, the objective of the second is to recreate an environment with floors containing obstacles, ramps or other debris that make it difficult for a conventional robot, i.e. a wheeled robot, to pass.

Open controlled environment
The main floor of the CAR building, which consists of a series of offices and information technology (IT) laboratories, with ground plan dimensions of 20 m × 35 m, has been chosen as an open controlled environment.Figure 3 shows a blueprint of the facilities.
Unlike the previous environment, this is not a place where damages as a result of natural disasters can be seen; however, it has been chosen since the system has been designed to work in dangerous situations, such as gas leakages where no material damages have been produced.The main floor tests also allow us to study ARTU-R's behaviour in much more crowded environments in which a vast variety of objects (cables, CPUs, robots, chairs, tables, showcases. ..) are present and have to be avoided.

System architecture
In this section, the main components and their interaction are described.A master-slave architecture has been followed, in which the SAR robot is autonomously controlled by a central computer (Core i7, 10th GEN, Nvidia 1006TiGTX) in the role of the master.During the victim detection phase, it processes the information provided by the robot sensors and returns a path to the new point to explore.Physically, the rescue system used for the autonomous detection of victims is integrated by ARTU-R (figure 1).It is twelve degrees of freedom quadruped robot.The usage of quadruped-legged robots has been proved to be a good solution for SAR tasks [53][54][55], but only very simple victim search algorithms have been developed on them.In this work, a victim search and exploration algorithm that takes full advantage of its sensors and generates a route that can be explained from a human point of view is introduced.
ARTU-R is equipped with 3 on-board computers, 12 actuators of a maximum torque of 35 Nm and a set of sensors, such as an inertial measurement unit, pressure sensors on its legs, an RGB-D camera (Intel RealSense Depth Camera D435i) and a 2D-Lidar (Slamtec Mapper Pro M2M2 360 • ), which allow it to map the environment and give real-time 1920 × 1080 px images.Details of camera and lidar can be found in tables 1 and 2.Although ARTU-R can reach, according to its datasheet, 3.3 m s −1 (11.88 km h −1 ), the maximum speed has been limited to 0.5 m s −1 because the manufacturers themselves recommend not to work at maximum speed in order to prevent the robot from falling when moving in environments with obstacles.In addition, the low speeds ensure that the images ARTU-R makes are not blurred or shaky, which would make it difficult to identify objects in them.
Both subsystems have been connected by a 5G Wireless Network through ROS protocols.A latency of 10 ms has been used.To allow external operators to follow the development of the missions, information interfaces have been developed using RViz.
The master comprises a series of subsystems structured as presented in figure 4. Three different detectors have been set up: a first one to determine the category of the explored room, a second one to detect objects and humans and a third one in charge of detecting doors.All of them consist of a CNN-only the weights associated with the net change between the detectors.The information coming from them is introduced in the trained random forest, which returns a probability of existing a victim in the region captured by the picture.
To define the next exploration point using an NBV strategy, this probability is combined with the Map Information obtained from ARTU-R's sensors.Specifically, a fitness function evaluates different point candidates and chooses the best one, taking into account the free space around them, the distance to reach them (both given by the map) and the probability of existing a victim in its vicinity.
Once the next exploration point is selected, the robot moves towards it.Velocity commands are calculated by the master by using a PID controller.If the point to be reached is considered too far away, a Planner module comes into operation.In addition to sending speed commands, the Planner reads the robot's speed in real-time and re-plans when it detects that the robot is slowing down (i.e. it is close to an unexpected obstacle).
The three blocks of the system are along the rest of this section.The integration among them will be addressed consequently.Datasets and algorithmic implementations are available in [56].

Intelligent detection system 3.3.1. Room detection
The first step in implementing the system was to train a detector to identify which type of room humans were being searched.As introduced in [31], this determination can help to improve success rates when doing indirect searching.
To perform this task, it was first necessary to search for a dataset that had such a sufficiently large number of images of the different rooms in which to carry out the exploration.It was found that [57], a house room dataset with 1158 images, met these requirements.No data augmentation was considered necessary.The data, however, have to be labelled.There were five possible room types: bathroom, bedroom, dining room, kitchen and living room.At first, the possibility of having room types more closely linked to the professional sphere, such as offices, computer rooms, workshops or meeting rooms, was missing.However, although it was considered to combine the dataset with another one that had some of these elements, it was seen, as will be discussed later, that the results, with the reduced spectrum of options, timed out to be quite good.
For the identification of rooms, the YOLOv5 CNN [58] was used, in its small size, as it was considered a relatively simple task to tackle.Unlike other models, YOLO can initially place different bounding boxes over an object and decide on the last step, based on probability scores, which to take.In the same way, this last version of YOLO has been prepared to work with partially occluded objects.Figure 5 presents a conceptual approach to its architecture.
As shown, the network is divided into three subnetworks.The first one, called Backbone, acts as a feature extractor.It combines convolutional and CSP layers, presented in [59].The second step, called Neck, acts as a feature aggregator, using a spatial pyramid pooling and a path aggregation network.Finally, the Head processes that information and predicts the bounding boxes and the related probabilities, which are the net outputs.
For this case, the net was trained on 80% of the data and 20% of the image bank was reserved for testing.20 epochs were set.To see how the system worked, the training process was divided into three stages: one of high approximation and two of refinement, in which a 96.99% mAP was achieved.a precision of 92.66% and a recall of 93.2%.Training curves are presented in figure 6.Although these results might indicate the existence of some overfitting, section 4.1 will demonstrate how the network can function adequately with images extracted from other contexts.
A sample of some data from the training process can be seen in figure 7. The confusion matrix, displayed in figure 8, demonstrates that the accurate detection of rooms is achieved in a minimum of 88% of instances.Notably, the predominant sources of confusion arise from the misclassification of kitchens as dining rooms and vice versa, manifesting in 7% and 8% of cases, respectively.This phenomenon can be attributed to the presence of kitchens with warm colours and furnished with chairs and dining tables, as it can be observed in figure 9.With the exception of the dining room classification, false positives, i.e. instances where parts of the environment are incorrectly categorised as one of these room types, occur in fewer than 30% of cases.These outcomes can be considered acceptable.As elaborated in section 4.1, they will not be a problem during the integration of the system.This is because detections with probabilities below a specified threshold will not be taken into account.Consequently, false positives, similar to the corridor illustrated in figure 10, will either disappear or remain at a very low level.
Once trained, when used, the model returns a bi-dimensional array with as many rows as objects detected and which contains, in the columns, the bounding box coordinates and the detection probability.As shown in algorithm 1, the useful information is inserted in a one-dimensional array, detRooms, which host, at element i, the probability that the room is of type i.The correspondence between vector indexes and room types is presented in table 3.

Victims detection based on IS and random forest
The next detector to be configured was the object detector.No training was done for this subsystem: the YOLO network itself has, from the moment of downloading, a file with weights already trained for the detection of different elements, such as persons, kitchen utensils, means of transport or everyday objects.As it has been done for the Room Detector, the results coming from the CNN have been processed and inserted into a similar vector, as the algorithm 2 shows.Since the system's objective is detecting persons using IS, the problem posed here was 'what everyday objects and items could be correlated with a person' in a post-disaster environment.What, in some situations, seems obvious-the remote control is expected to be near a television-was not so in this case, as there are many activities that a person can be doing at any given time of the day.
Therefore, the following assumption was made: it was assumed that, after a disaster-such as an earthquake, for example-the people's position tends to be the same as the previous position before the disaster.Consequently, intelligence systems do not need to be trained with post-disaster data but only with everyday images.Even though this premise may at first glance be considered unrealistic, works such as [60,61] point out that common-sense responses are largely overlooked due to the excitement produced by natural phenomena.Moreover, contrary to popular belief, good behaviour during a disaster does not necessarily mean a lot of moving [62].In this line, New Zealand's government guides for acting before, during and after earthquakes [63] recommend that citizens not move far but stay where they are and adopt a drop-cover-hold position.
The chosen dataset for the task was Common Objects in Context (COCO), [64], developed by Microsoft with 330 000 images on various aspects of everyday life.
With it, a database relating to the presence of humans and objects was created.The two presented detectors were applied to each image, and the probability associated with each object or room was stored in the database.If any human was detected in the picture, a tag was added to that database entry.Using the terminology of algorithms 1 and 2, the inputs of the random forest are the vectors detRooms and detObjs and the output, the probability of appearance of a human in that region.
Although the usage of neural networks was also considered, as it is a system designed to work collaboratively with humans, decision trees were chosen.This technique has the advantage that it models how a person makes decisions much better.Indeed, the probability of a victim is in a given region is determined by successively asking whether a series of objects, ranked from highest to lowest order of importance, are located in that region.It is true that, compared to simple decision trees, random forests are less intuitive but have the ability to perform better with large datasets-with less overfitting-and are able, in this work, to achieve similar results to the ones offered by a neural network.
Model hyperparameters were optimized following a cross-validation search process.Their final values can be found in table 4. In the same way, to increase the model's generalisation ability, variables with significance rates under 1% were removed from it, reducing the number of variables to 13. 70% of mAP precision and 40% of MSE loss were reached.Although those results can be considered low compared to typical Machine Learning accuracy rates, their validity, once the whole system is integrated, will be discussed in the 4 section.Figure 11 shows the first levels of one of the trees (in a total of 13).As previously mentioned, obtaining the significance level of the different variables is possible.These values explain how much the loss function would increase if this explanatory variable were absent in the model.Figure 12 shows significances for the final trained model.

Door detection
The third component required to enable initial exploration was the doors' detector.When ARTU-R arrives in a post-disaster environment, it starts searching for doors, goes to the nearest door and, once through the door, explores the room.Once explored, it should look for new doors in the same room and walk.If no door is found in the room, the robot leaves the room and goes to other doors found in the previous search.
Detecting doors is also done with a CNN, using the dataset [65].It has 1213 images in which room doors, shelf doors (including drawers), fridge doors and door handles of all types of doors have been labelled.For this purpose, the network was trained for 5 epochs.
Evolution of loss can be seen in figure 13.As can be noticed, once certain peaks are reached, the training is asymptotically stagnant, and the improvements introduced by the new epochs are negligible.
Figure 14 illustrates the confusion matrix, demonstrating that doors are correctly identified 50% of the time.In the remaining instances, they are classified as background, implying that they are not assigned a specific category.While this outcome may be viewed as suboptimal, it concurs with the results reported by  the dataset's authors [65].Additionally, it is important to highlight that the probability of any environmental element being misclassified as a door is minimal, given that false positives are as low as 15%.
Examples of correct detections of difficult situations-a door with little contrast and a partially-occluded door-are depicted in figure 15.
In figures 16(a) and (b), two instances of false positives are observed.These occurrences, as previously mentioned, appear to be unusual situations, and notably, both images feature windows being misclassified as doors, both with low probability detection.The inclusion of windows in the dataset could potentially mitigate this issue.
Figure 16(c) presents a scenario in which two room doors are erroneously classified as refrigerator doors.The most significant issue encountered by the system-the non-detection of certain doors-is exemplified in  figure 16(d).Instances similar to this one, where the lower portion of the door is obscured from view, are infrequent in scenarios like the one presented in this paper, primarily due to the robot's height.
Once the doors have been identified in the image, the system can locate them.This process is achieved by estimating the relative angular position of the door concerning the robot and comparing the estimate with the obstacles on the map identified with the lidar.Although the CNN only achieves a detection rate of 50%, because each door is photographed, on average, twice, resulting in a high probability of detection.Consequently, ARTU-R stores the positions of all the doors in the environment with errors in the order of centimetres, as it will be commented in the 4.1 section.A pseudocode of the whole process of door detection is shown in algorithm 3.

Map Information
Information from the environment is captured using ARTU-R's embedded lidar sensor.A SLAM module is integrated into ARTU-R's sensor processing architecture and returns a cell map.The map is sent to the master using the /map ROS topic.It consists of a bi-dimensional array in which each element represents the state of one of the cells.A resolution of 0.1 m has been selected according to the size of ARTU-R.The state of the cell is expressed in occupancy probability, from 0 (free) to 1 (occupied).
For the NBV selection process, cells with values lower than 0.5 are considered free.Non-explored cells take the value of −1.The array size is increased during exploration if new regions are detected.The information on obstacles is also shown in RViz interface because it was considered useful for operators, as figure 22 illustrates.

Control and planning
The control of ARTU-R is achieved by sending linear and angular velocity commands to it.Although the robot can be considered holonomic, it has been preferred, for the sake of simplicity, to command only by the two aforementioned components of velocity.
Planning and control are done in three different steps.If only a rotation is required, as when ARTU-R has to take the pictures to evaluate the different NBV candidates, angular velocity commands are sent directly from the master.Their values are adjusted by using a PID control with saturations.The values of the PID parameters can be seen in table 5.
If a translation is required, but to a distance lower than 1.5 m, a similar PID controller is implemented, which parameters can also be found in table 5. What the system does in that situation is to firstly orientate ARTU-R to the goal point and, subsequently, command a straight-line movement till reaches it.
Both linear and angular speed PID controllers are limited to adapt them to the physical characteristics of the robot.For safety reasons, speeds higher than 0.3 m s −1 and 0.6 rad s −1 are not allowed.In the same way, a point or orientation is considered reached when errors lower than 0.3 m and 0.1 rad are measured.Figure 17 presents the block diagram of the controller.
In further points, however, planning strategies are needed, although they are more time-consuming because the controller does not guarantee obstacle avoidance.The planner operates from the master, establishing a path between ARTU-R's present position and the desired goal by traversing only cells categorised as free on the map.Once it has been established, velocity commands are sent to the robot.If obstacles are detected, the costmap is updated and the path is locally replanned.

System integration 3.4.1. NBV selection based on image-laser fusion sensor
The three developed subsystems have been integrated in order to finally obtain the complete system, whose execution flow is shown in figure 18 and algorithm 4. Table 6 presents the main configuration parameters of the complete algorithm.
As it was explained when arriving at the initial point-the entrance hall of a house, the foyer of an office building, etc-, the system first looks for doors taking an image and applying the appropriate detector to it.If any door is found, ARTU-R goes to the nearest one and explores the room behind it.If none is found, it rotates the angle specified in the parameter doorTol (which is defined in table 6 and repeats the process until a door is found or a complete turnaround has been made.If this happens, it goes to a random point and searches for new doors.If this situation is repeated a number of times nPossDoors, the scenario is considered as one single room (the entrance hall is not only a hall) and it is explored.
When room exploration finishes, ARTU-R starts the same process of door searching inside the room.If more doors are found, the robot will traverse them.If not, it will exit the room and go to the second door of the ones detected in the previous step.
The process of room exploration, on the other hand, starts by defining room type.This step will be done only once until a new room is explored.After that, the system traverses the space moving, at each time, to the NBV point.
When, at step k, robot is at position x k , it randomly generate a number nPoints of candidates x j k+1 (where j = 1, . . ., nPoints), located at a distance less than dist from x k .Cells corresponding to candidates must be categorised as 'free' by the SLAM tool which reads the information coming from the laser.All of them receive a fitness value according to equation ( 1)  The values of ω = (ω 1 , ω 2 , ω 3 ) have been determined experimentally and, for this case, are ω = (0.1; 2; −0.7).Equation (1) consists of three terms: • Explorable volume around the candidate-V(x j k+1 ): measured as the percentage of free cells that lie in a square of dimension (2 • candRadius) × (2 • candRadius) around x j k+1 .This is a value to maximise because the clearer the surroundings of a point are, the better the chances of detection by the robot's camera.The measurement is made in percentage and not in a total number of points as this will inevitably be lower at the edges of the map.
• Information gain-I(x j k+1 ): in order to measure it, the robot is oriented towards x j k+1 , a picture is taken and the different probabilities of having detected an object together with the probabilities associated to the current room are introduced in the trained random forest.The value of I(x j k+1 ) is, in consequence, the one returned by the intelligence.
• Distance to point-d(x k , x j k+1 ): Euclidean distance measured in norm-2.Associated coefficient ω 3 is negative to prioritise the exploration of closest points.
Equation ( 1) is similar in its external form to [47].However, while the latter considers free space and potential information gain and penalises visited points, the model presented here considers, for example, [44] and [46], the distance to the candidate.A pseudocode implementation of the function can be found in algorithm 5.
The victim identification process is executed simultaneously to object identification for calculating information gain.This is because object detector is also able to identify humans.If damaged people are found, an arrow pointing to them is placed on the interface map.

Unit tests 4.1.1. Rooms classification
As previously mentioned, the room detector achieved 97% mAP precision over training data.To ensure that such a high accuracy was not due to overfitting, experiments on real images were done using pictures of the different training scenarios.
The qualitative results of the tests-examples of detection can be seen in figure 19-showed, for each class, an accuracy similar to that achieved in training.An empirical probability threshold of 30% was established: when the detection probability is lower than this value, no detection is considered.

Victims detection
Human detection is done simultaneously with object detection to obtain a fitness function value.In that task, the system's success rates-working even with mannequins-are around 100 percent, as expected, thanks to the power of CNNs.The system can detect people in all positions: sitting on a chair, squatting, lying down.As figure 20 shows, it is enough to identify a small part of the body-an arm, a leg, the head-for a bounding box to appear, categorising it as a human.The figure also shows that the detector can succeed even with blurred or poorly focused images.
Victim detection-and in general, detection systems based on RGB images-tends to fail in poorly lit or excessively dark scenarios.In the illuminance conditions in which the test have taken place-between 400 and 1000 lx-no problems related to victim recognition have been appreciated.To ensure detection in less illuminated spaces, one potential solution could involve equipping the robot with a head-mounted spotlight, capable of generating ample illumination for the captured images.
When victims are identified in the pictures, an arrow pointing to them is placed on the map, which is visible to the system operators, as is shown in figure 21.The position of the origin of the arrow and its orientation are the ones that ARTU-R was at the moment of taking the picture.Although victims are not directly placed on the map-no semantic SLAM nor similar techniques have been used-, the high human recognition rates make it probable to detect the same victim from different points, which means that several arrows will point to her.Interviews with people outside the world of Robotics showed that this solution allows us to identify the places in which victims are located correctly.

Door detection
The evaluation of the door detection process took place in a real environment, in the building used for the validation process.The different tests consisted in leaving the robot to search for doors in this environment, placing them on the map and then checking-the scaled plan of the site, presented in figure 3 was available-whether the placement given by the system coincided with the real one.To do this, different initial points were defined to assess ARTU-R's ability to identify doors in different contexts, such as long distances, the corners of the image, those at dead angles, etc.
Figure 22 shows the accuracy of the subsystem for door detection.In average, a door detection rate of 72% is achieved.The system also has a false positive rate (objects that are not doors are marked as doors) of 6%. and placed with errors of less than 0.92 m.This is clearly one of the major sources of uncertainty in the work presented here.This uncertainty can be attributed to two primary sources of error.Firstly, it pertains to the precision of the Door Detector, as previously discussed.Secondly, it arises from the methodology employed for gate localisation on the map.This approach relies on estimating the gate's angle relative to the robot's position, using a two-step process.Initially, it approximates this angle from a planar photograph and subsequently aligns it with an obstacle within the lidar-generated map.To address the second source of error, refining angle estimations by accounting for image deformation during projection could be a viable solution.

System validation
Once the performance of each subsystem has been verified individually, tests in the real environments described in section 3.1 took place.The regions were prepared to reconstruct a post-disaster area and mannequins of victims were hidden in them.A mission took place in each of them.For each one, empirical results have been recorded, to compare the performance of the algorithm here proposed with the methodologies available in the literature.More concretely, the size of the unexplored and explored areas have been tracked regularly during the mission, and, for this last one, the size of the space classified as 'free' or 'occupied' has also been recorded.During the tests, the time in which new victims were discovered was also noted.

Test in a NIST orange reconstructed environment
Validation was firstly conducted in the environments presented in section 3.1.1.As specified, two types of orange scenarios were designed, with different difficulty levels.A video of the missions carried out in both of them is available in A.1.
The path followed by ARTU-R during the fist mission is shown in figure 23.As expected, the first thing ARTU-R did was search for doors.Because only one was found during the first series of camera snapshots, it headed straight for that door (P1).After entering in the room (P2), it went behind the desk (P3) because a big unexplored area had been detected and a table had been identified and taken into account, by the random forest, as objects with a high correlation to human presence.
The victim placed in that position (V1) was correctly detected and ARTU-R decided to cross to the second room (P4).No exploration around the pallet was done because its transparent nature made it possible to directly check if it harboured any victims behind him.
Exploration of the second room started just after having crossed the door (P5).The first selected NBV was one located near the table and the grey panel (P6), in which another victim (V2) was located and detected by ARTU-R.After that, the robot moved to the background region (P7).
Although no victim was present, it seems natural, even for people, to go there, because it was relatively close to the previous position and there was a lack of visibility.The last explored point was the bottom left corner of the room (P8), where the last remaining human (V3) was found.In terms of time, the whole test took, for exploring an area of 49 m 2 (7.6 m × 6.5 m), 4 min and 15 s (255 s), which makes an exploration speed of 0.192 m 2 s −1 .Statistics related to this test can be found in figure 24.
The necessity of finding an equilibrium between exploration and exploitation to solve the problem is clearly displayed in the graph.It is very easy to see that when ARTU-R enters a new room (when the door is crossed), the room is rapidly mapped, achieving exploration speeds greater than 0.5 m 2 s −1 .Once this process concludes, the system starts moving inside the room looking for victims.In both rooms, victims have been found relatively early, which shows the importance of using the information provided by the camera in the process, and the good performance of the intelligence system.
Figure 25 presents the map of the environment reconstructed by ARTU-R at different steps of the mission.As previously commented, it is important to note that all the hidden victims were correctly located during the test.
The test was repeated 10 times.Durations of the mission ranged from 212 to 308 s.The route taken by the robot, while not precisely identical, closely resembles the route shown in figure 23, always exploring from very close points.It is worth noting, however, that in 3 of the 10 repetitions, the second room's exploration sequence was P8-P9-P6-P7.However, this can be considered logical as, although there were no objects correlated with the presence of humans near P8, there was, upon entering the room, a substantial amount of unexplored space around it.
The test environment layout for the second battery of tests is depicted in figure 26.It contains two ramps at the entrance.One is an ascending ramp with an initial step height of 14 cm, a length of 87.5 cm and a slope of 11.8 • .The other is a descending ramp with a slope of 12.4 • .Additionally, a beam, which is an insurmountable obstacle in the form of a step for medium-sized wheeled robots, has been placed between points P6 and P7.The passages between P3 and P4 and P4 and P5 are narrow and require a robot that is not too wide and has relatively good manoeuvrability.
Path followed by ARTU-R explores sequentially the scene.Once the ramp had been overcome (P2) it looked for objects of interest (P3) and found a table.The area behind it was explored (P4) and a victim (V1) was found.The robot continues its moving up to the next table (P5-P6), in where another victim was found (V2).Finally, the beam was traversed (P7) and the last victim (V3), found.No possibility of finding that last victim would have been possible without access to the latter region.
The test was repeated 5 times.The victim search sequence was the same in all of them.In average, they took a time, measured from P0 to the victim's location V3, of 5 min and 37 s (337 s).As expected, the system decided to start exploring the region behind the nearest door, corresponding to the one delimiting the computer room.The path ARTU-R follows to reach the door is represented in red (P1-P3).After crossing it, ARTU-R rapidly explored the room: a few steps were enough to cover the entire region and detect the human placed there.
After that, an exploration of the corridor, which path is represented in blue in figure 27(a)(P3-P7), was done.ARTU-R's behaviour during that part of the mission was similar to that of a human would have: it walked down forward, discarded to enter in the open rooms because visibility at the entrance was enough, and returned once the end was reached.
The third part of the mission is reproduced in figure 27(b).The system looked for victims in the lab located behind the door (P9).Although its many objects can difficult the exploration, as figure 28 shows, ARTU-R was able to replan its paths to avoid collisions, keeping the objective to reach the selected NBV.When the room was explored, no other doors were detected, and the mission was considered completed (P10-P11).
The complete test took a time of 18 min and 30 s (1110 s) to explore an area of 225 m 2 , which gives an exploration speed of 0.203 m 2 s −1 .
As previously stated, the testing scenario was relatively uncomplicated.Our goal here was not to evaluate the system's effectiveness in the aftermath of significant disasters like earthquakes, but rather in situations such as gas leaks, which do not lead to structural damage.Consequently, this particular test was conducted only once.

Efficiency analysis of the proposed method
The different experiments that have taken place prove the validity of the system.Its behaviour can be considered logical and understandable to human operators in such a situation, but compared to rescues carried out by humans, it implies a consequent reduction in risk and a greater ability to penetrate hazardous or debris spaces.As it has been presented, all the victims have been identified during the missions, which proves the reliability of the system.It can also be said that the system tends to prioritise, during its exploration process, the areas in which victims are more likely present.
In order to study the performance of the algorithm presented here, a comparative analysis has been carried out against other methods presented in the literature.Each of these works has been tested in a different environment and with robots of different characteristics.In addition, it should be noted that these methodologies are not exclusively designed for rescue missions but can also be applied to simpler exploration tasks.Consequently, the results do not allow for definitive and absolute conclusions to be drawn.However, the main objective is not to establish a precise quantitative evaluation but rather to engage in a qualitative comparison of the performance of the methodology presented here with that of some existing methods that share similar characteristics.
Two criteria guided the selection process for comparative works.Firstly, the chosen works needed to be designed for ground robots, with no restrictions on robot typology-thus encompassing both quadruped and wheeled robots-.Secondly, the selected scenarios were required to closely resemble the orange NIST environment employed in this study.Specifically, the preference was for real-world scenarios, as opposed to simulated ones, with a similar spatial footprint (about 50 m 2 ) and objects distributed throughout the area, although in moderate quantities and always allowing robot navigation between them.
Because this last requirement has been difficult to fulfil, and not all test scenarios meet it due to the absence of major obstacles in some of them, from the two tests which took place in the NIST environment presented in this work, the first one, with less obstacles, has been chosen for the comparisons.
For both Reinforcement Learning and NBV techniques, an analysis of the exploration speed, measured, as done in the present work, in m 2 s −1 , has been conducted.The data have been converted using the information provided in the different papers.Because not all are directly related to SAR tasks, no statistics related to victim detection are directly presented.The results can be seen in table 7.
The methodology presented in this work, which estimates NBV using the fitness function of equation ( 1), is able to reach a speed, making the average of the two evaluated scenarios of 0.198 m 2 s −1 .
Of the different authors who implement Reinforcement Learning techniques to solve search and exploration problems, [35] work has been chosen.It uses a classical Reinforcement Learning approach, comparing its set of developed algorithms in different scenarios.The best solution took, on average, 1194.4 s to find all the humans present in a 20 × 20 m 2 post-disaster environment, which means an exploration speed of 0.34 m 2 s −1 .The mission takes place in a closed room, with a similar number of objects of different sizes (tables, chairs, armchairs. ..) present inside.The only noticeable difference is that the walls are perhaps more densely packed with obstacles than in the scenario in this article.
These values clearly outperform the ones reached by this work (only at a 99% confidence level they can be considered similar).Nevertheless, two important differences can be found between this model and the one here presented.First of all, experiments were carried out in a simulated environment, in which the difficulties associated with the real world, such as rough terrains or areas with small obstacles, are difficult to model.Secondly, [35] uses a pioneer wheeled robot, which can presumably reach higher speeds than ARTU-R, although with a lower capacity to access more unstructured terrains.
In [40], a generic wheeled robot is used to explore different simulated and real-world environments.As noted for the previous work, results show that the exploration speeds of simulation environments are clearly higher than the ones reached in real-world environments.For comparisons with the work of this article, the first real-world scenario was chosen because it was the most similar in terms of size.Although it has a larger number of rooms than the orange NIST scenario, there are no objects in the rooms that make it difficult to pass through.
Exploration speeds of 0.2 m 2 s −1 are achieved, very similar values to the ones here presented.These results, however, are reached without the necessity of image processing (it is not included in its algorithm), so no time is wasted on the task.However, the methodology of the presented work achieves similar speeds with more exploration richness.
The work of [47] uses a mobile robot equipped with a manipulator's arm to inspect different areas.Only outside scenarios are considered, but all of them have a similar object density to the one of this work.A fitness function is used to determine, at each step, the selected NBV.7200 s are needed to map the 80% of a 20 × 20 m 2 area, which is a speed of 0.108 m 2 s −1 .Manipulation of the arm can be estimated as a little slower than ARTU-R's movement but with not much difference.
In addition, it is important to note that [47] comments that its methodology clearly offers betters results than the AEP algorithm, proposed in [45] for UAV exploration.Although is very difficult to compare terrestrial and aerial robots in terms of time, the outperforming of this work over [47] and, in consequence, over AEP, hints that it will be better than similar or clearly inferior techniques, such as the one proposed in [44] and in [46].

Experimental assessment in accordance with human criteria
In this section, a test involving twenty individuals was conducted to analyse the routes and priority zones they would propose for exploring the environment depicted in figure 23.The primary objective of this test was to assess the degree of alignment between the robot's exploration path and the preferences and decisions that individuals would make in a similar situation.This process involved a scientific evaluation of the correspondence between the robot's actions and the choices that individuals would make.
Figure 29 illustrates the exploration routes defined by four of the twenty participants who participated in the evaluation.All the test results from the participants are available in A.2. Participants were instructed to define an exploration route based on their criteria as 'rescue personnel' .They were shown the initial scenario and the robot-generated map of the environment, and they were asked to chart the exploration route, starting from the robot's initial position (P0 as referenced in figure 23).
The analysis comprises two key components, each with a distinct purpose in the overall evaluation.These components comprehensively assess navigation strategies and significant environmental points, enhancing our understanding of the exploration process and its alignment with evaluation objectives.The first component involves a detailed examination of how participants or the autonomous robot navigate and interact with the environment, facilitating an understanding of the diverse approaches employed for efficient exploration.The second component focuses on specific areas of interest explored during the evaluation, seeking patterns in prioritisation, visitation frequency, rationale, and commonalities among participants.
Starting from point P0, the most evident scenario involves moving through the door located at P1, a situation contemplated in all evaluation cases.In the subsequent stage, a table is centrally positioned, which could potentially hide a victim on its rear side.95% of the test participants explored this table (circumventing it) before proceeding to the next stage.85% of these participants followed a counterclockwise route, similar to the path taken by ARTU-R.Towards the rear of this area, translucent objects such as a pallet offer a direct line of sight, facilitating the continuation of exploration.
Upon crossing through point P4 to the next stage, the environment expands, and a decision must be made to explore either to the right or left.80% of the participants charted an exploration route from left to right, similar to ARTU-R's approach.In 95% of the cases, the points of interest P7-P8 are considered, as they may conceal hidden victims.In 50% of the cases, the exploration concludes in proximity to point P9, near the last victim, similar to ARTU-R's behaviour.
The routes described facilitate a thorough investigation of the environment, with a focus on critical zones where victims may be concealed.In the majority of cases, these routes closely match the path taken by ARTU-R (in terms of sequence and points explored), as shown in figure 23.This supports the initial premise of this subsection of the work, which aims to copy human search behaviour using a robot and artificial intelligence algorithms.

Conclusions
Using a quadruped robot, with great stability and the ability to inspect hard-to-reach areas, allows nimble exploring in post-disaster scenarios.Its integrated sensors can map unknown areas, making it possible to interact safely with unknown environments.
All of these subsystems have been integrated into a main exploration algorithm, which combines state-of-the-art strategies in IS-because the victims are, in most cases, not directly visible-and NBV selection-to perform an exploration as optimised as possible-.Integration has been done following a waterfall approach, in which every subsystem acts only when needed.This allows a better understanding of the process for the operators and a more straightforward development process.
Consequently, all the components have been tested independently, achieving good results.This modular approach, in addition to allowing for more rigorous development and testing, would allow for easy implementation of new functionalities in the future or for adapting some parts to new developments in computer vision and artificial intelligence.
The accuracy of all the neural networks (when the training process is finished) goes over 90%, except in the case of the one in charge of the door detection, in which the results are similar to the one obtained at the state-of-the-art.They are not a significant obstacle because they allow correctly locating the doors on the map.
The random forest, for its part, can predict the presence of humans 70% of the time correctly.That means that the system can first explore the regions of the environment with more probability of hosting victims 40% better than simply performing a blind search.In addition, it is important to note that the procedure used is not simply a black box but that it is possible to follow and understand, to some extent, the cause of the decisions taken by the system.Indeed, it has been possible to extract the most significant variables from the random forest and, from the operator interface, the NBV chosen in each iteration and the different candidates can be consulted at any time.
The system, once it has been integrated, continues showing excellent performance and functionality.Three tests were conducted.The first two were located in an NIST Orange environment, while the other occurred in a real work centre.In all of them, the system could identify all the human victims inside, following a path prioritised similarly to humans.These regions are truly relevant for harbouring victims.In terms of exploration, the speeds achieved seems to be very similar, or even better although no direct comparison has been done, to the ones reached in other works of the state-of-art, even in which no time is wasted processing images.Because the movement speed is similar, but more information is considered when defining the exploration route, it is clear that the system presented here attains higher efficiencies, which is crucial when rescuing victims.
All ARTU-R's movement can be explained from a human point of view based on the three components of its objective function: unexplored volume around the candidate, potential information gain and distance to it.Moreover, it has been demonstrated that the system's behaviour can not only be understood from a human perspective but also resembles human behaviour.A survey was conducted, and the results revealed that when individuals explored one of the NIST Orange environments, their decisions closely aligned with those made by ARTU-R during the validation test.
This behaviour shows the power of human-robot cooperation in SAR tasks and opens the door to extensions in this action line and integration of this system with other components made by the group.More specifically, the usage of swarms of mobile robots, each one with different characteristics-even if some of them are partially or tele-operated-can perform more specialised and efficient rescue missions and avoids the presence of humans in the regions of risk.

Figure 1 .
Figure 1.ARTU-R quadruped robot.An RGB-D camera and a lidar sensor can be appreciated in the picture.

Figure 2 .
Figure 2. Orange environments reconstructed for the test missions.

Figure 3 .
Figure 3. Drawing of the CAR's main floor, used for the test in the open controlled environment.

Figure 6 .
Figure 6.Evolution of the mAP and the class loss along neural network training.

Figure 7 .
Figure 7. Results of the validation of the rooms detection model.

Figure 8 .
Figure 8. Confusion matrix of the rooms' detector.

Figure 9 .
Figure 9. Examples of confusion between classes of the room's detector.

Figure 10 .Algorithm 1 .
Figure 10.Example of a false positive of the room's detector.

Figure 11 .
Figure 11.First three levels of one of the trees of the random forest.

Figure 12 .
Figure 12.Most significant variables of the random forest.The 0.1 value assigned to the 'dining table' feature means that the presence of this variable in the model causes the model to reduce its error by 10%.

Figure 13 .
Figure 13.Evolution of the class loss for the training and validation sets along the training.

Figure 14 .
Figure 14.Confusion matrix of the doors' detector.

Figure 15 .
Figure 15.Examples of accurate identifications achieved by the doors' detector.

Figure 16 .
Figure 16.Examples of incorrect identifications achieved by the doors' detector.

Figure 18 .
Figure 18.Descriptive synthesis of the complete algorithm implemented.

Figure 19 .
Figure 19.Different room detections with their associated probabilities.

Figure 20 .
Figure 20.Bounding boxes of different humans identified along the tests.

Figure 21 .
Figure 21.Interface showing a human detected at 3 different steps.Blue arrows are placed by the system when a human is detected.They are placed at the position where the robot took the photo in which the detection was made with the orientation it had at that moment.The green dot indicates the next point to visit by ARTU-R.

Figure 22 .
Figure 22.Door detection and location in real buildings.The doors are placed as green dots in the map.The points identified by ARTUR as doors are marked with red dots.The four doors of that region have been correctly detected.

Figure 23 .
Figure 23.Path followed by the robot during the first test rescue mission.

Figure 24 .
Figure 24.Progress of the exploration process during the mission in the orange environment.

Figure 25 .
Figure 25.2D map of the robot at different steps of the exploration process.

Figure 26 .
Figure 26.Path followed by the robot during the test rescue mission in the second orange scenario and evolution of the map generated by the SLAM module.The scene includes several areas with destructured floors.

Figure 27 .
Figure 27.Path followed by ARTU-R during the test mission in an open controlled environment.

Figure 28 .
Figure 28.Laboratory with obstacles of the second mission.

4. 2 . 2 .
Test in an open controlled environment A second validation process has been done in the real environment of section 3.1.1.The test rescue mission started at the P0 point of figures 27(a) and (b), which corresponds to the entrance of the building.Door detection process yields the detection of three doors, the open door of figure 27(a), which gives entrance to a computer room, the corridor of the same figure, and the door of figure 27(b), that gives access to a small and plenty of obstacles lab.

Figure 29 .
Figure 29.Exploration routes defined by four random participants within the scenario shown in figure 23.

Table 1 .
Technical specifications of the RGB-D Camera.

Table 2 .
Technical specifications of the Lidar.

Table 3 .
Vector indexes of detRooms associated to each room type.

Table 5 .
Parameters of the controllers and limitators.
Figure 17.Block diagram of robot movement controller.Algorithm 4. Complete algorithm.SearchForDoors() for each room do picture ← Take a picture detRoom ← DetectRoom(picture) while room is not explored do NBV ← SelectNBV(detRoom) Move to NBV end SearchForDoors() end

Table 6 .
Main parameters of the algorithm.
Parameter Description Value nPoints Number of candidates to NBV to generate at each step 3 dist Maximum possible distance between the candidates and ARTU-R's current position 5 m candRadius Distance around a NBV's candidate in which to evaluate the state of the cells 2 m doorTol Angle to rotate if no door is found 60 • nPossDoors Number of attempts before considering the environment as one-room environment 15

Table 7 .
Comparison of the exploration speed of different methods from the literature.The results obtained in this work are marked in bold.