Game-theoretic learning for the coordination of drone teams in autonomous cooperative inspection

Without the need for an on-board pilot, drones are designed to accomplish dull, dangerous and dirty missions. However, if a mission exhibits a large operative area and/or several objectives, it may entail poor performance when executed by a single drone. Drone teams may overcome this issue by acting as mobile sensor networks for proximal sensing. In such networks, cooperative autonomy is a key enabling behaviour for achieving resilient and cost-efficient systems. This work implements cooperative autonomous behaviour in the form of a dynamic and decentralized mission planner for a multi-drone inspection mission. The proposed design exploits multi-agent task allocation, distributed route planning and game theory for the assignment of inspection tasks and for the processing of optimal routes in reasonable time frames and with limited communication. In detail, it applies the learning-in-games framework for the coordination within the inspection team, by studying some ad-hoc variants of best response and of log linear learning. Moreover, this work presents some numerical results of model-in-the-loop tests for a comparison between the learning-in-games approaches.


Introduction
Recent advances in the technology for drone teams are exploited in a wide new range of applications, revealing several market opportunities, but raising as well significant challenges, one of which is related to the mission effectiveness of standalone drones.Indeed, drones were usually designed to accomplish the D-cube (dull, dangerous and dirty) envelope [1], but if a mission exhibits a large operative area and/or several objectives, it may entail poor performance when executed by a single drone.Multi-drone missions may overcome this issue by engaging several drones with some common objectives and by establishing a sort of multi-drone collaboration to increase mission effectiveness.An instance of such collaboration is represented by drone teams, namely networked sets of drones with a shared goal, and in which all members are assigned specialized tasks to accomplish the reference goal [2].Drone teams may be designed as a mobile sensor network [3].In this case, the drones act as mobile sensor nodes to closely perceive several targets (even changing over time) for maintaining an up-to-date picture of the situation and for balancing the load of the sensing tasks within the team.For example, some relevant operational scenarios for drone teams are: detection and reconnaissance, target recognition, target tracking, infrastructure patrolling, environmental monitoring, formation flight, etc.
Cooperative autonomy is a key enabling behaviour for a resilient and cost-efficient drone team within mobile sensing.Indeed, this cooperative behaviour may ensure: (i) a real-time reconfiguration and retargeting of the overall team, based on the evolution of the operating scenario; (ii) a reduction of the reliance on human operators to contain costs; (iii) faster reactions to environment changes with respect to human response.In detail, a Cooperative Autonomous System (CAS) is a system engineered as a collection of separate decision-making entities, which shall accomplish common goals.The environment of a CAS is generally characterized by one or both the following features: (i) information distribution, i.e., there is not a single decision-making entity that has access to all the information collected and generated in the system; (ii) complexity, i.e., cooperative autonomous contexts usually exhibit decision problems with an inherent computational complexity such that centralized approaches are unfeasible.Such features favour a decentralized decision architecture for a CAS, wherein there is not a central decision point for all the system components, but there is a set of local autonomous decision-makers.These interact without high-level mediators and are guided by some form of global rule that leads them to converge towards a consensus.The local autonomous decision-makers are named agents and the overall system is a Multi-Agent System (MAS).
This work investigates cooperative autonomy for multi-drone inspection missions.The inspection is for feared-event detection in a given Region of Interest (RoI).Such application is within the generic domain of autonomous multi-robot systems, swarm robotic behaviours and multi-robot patrolling, for which different solutions are already available, as described in recent state-of-the-art reviews [4] [5].In detail, reference [4] provides a comprehensive overview of industrial projects and products about swarm robotics, highlighting that many of them rely on centralized control, and do not actually apply distributed decision making and real swarming behaviours.Moreover, multi-drone inspection missions have some features in common with multi-robot search and rescue applications, for which a central coordinator is used in most of the approaches [5].Instead, reference [6] points out that most approaches for multi-robot patrolling focus on offline solutions, defining patrolling strategies according to static characteristics of the environment.To the contrary, effective multi-robot patrolling requires online (dynamic) coordination is needed to deal with the uncertainties and the dynamics of the actual mission environment [6].This is even more evident in the specific case of multi-drone planning algorithms, which usually operate in large and complex planning spaces and shall adjust trajectories online in real-time [7].For this purpose, game theory is worthy of further exploration as a mathematical tool for the autonomous decision-making behaviour of multi-drone systems in several autonomous applications, e.g., detection and reconnaissance, target recognition, etc. [7].
This work proposes a CAS in the form of a dynamic and decentralized mission planner for a multidrone inspection mission.The design of the planner aims at attaining an autonomous and cooperative area inspection, to allow for simultaneous inspection, and for an efficient and resilient management of inspection operations in large regions.The proposed design exploits game theory for the assignment of inspection tasks and for the processing of optimal routes in reasonable time frames and with limited communication, laying the foundation for autonomous real-time reactions to environmental changes.We leverage on some our previous works, which applied Markov games and the Distributed Stochastic Algorithm for multi-drone systems in persistent surveillance applications [8], [9].This work introduces some changes by applying the learning-in-games framework for the coordination within the inspection team.For the learning design, some ad-hoc variants of best response and of log linear learning are proposed as possible learning algorithms [10].Moreover, the work presents some numerical results to validate the effectiveness of the proposed approach.Also, a proof of concept is reported as the implementation of the mission planner in an Agent-Based Modelling and Simulation (ABMS) environment to allow for model-in-the-loop testing.

Solution
This section describes the proposed solution.The mission planning problem is formally stated as a stochastic constrained optimization of the team's tasks and routes.A game-theoretic model and the related coordination mechanism are addressed to solve the problem according to a decentralized setting.

Formal model of the problem
The multi-drone inspection problem may be applied in several operational scenarios that are relevant for drone teams.For example, the proposed inspection model may represent: (i) the detection of reference threats for infrastructure-patrolling or environmental-monitoring scenarios; (ii) the detection step in detection-and-reconnaissance applications.Thus, the model specifies the mission that a drone team shall accomplish to detect (i.e., to find traces of) a reference feared event in a ROI.Instead, the model does not consider the next operations of the team in case the event is detected.In detail, the proposed model of the drone-team mission prescribes to to visit a number of geographically distributed targets in a priority-based setting.Thus, the mission may be modelled as a set of inspection tasks (one for each target), whose priorities are possibly based on a risk map.The targets are associated to inspection cells, the size of which depends on the reference altitude of the drones and some features of their payloads, such as the field of view and the related sensor footprint [8].Every inspection cell is associated to an inspection waypoint, which is the central point of the cell and conventionally represents the ground projection of the waypoint to be reached for the inspection of the cell.In the following, the terms target, cell and inspection waypoints are interchangeably used.Moreover, we assume: (i) a homogeneous drone team; (ii) the reference altitude is the same for all the drones in the team.As a consequence, the spatial target distribution is unique for all the drone team.Figure 1 shows the sensor footprint of a drone and the target distribution in the RoI.The reference problem may generally be stated as an optimization problem to generate a joint inspection plan that optimally satisfies the "required visitation times" of the cells.In more detail, the joint inspection plan refers to the assignment of inspection tasks, and of the related inspection paths and times to reach targets, for every drone in the team.Clearly, this problem is computationally challenging, especially if we consider the necessity to replan in real-time to ensure a resilient behaviour.A possible solution of this issue is the adoption of a hierarchical decomposition and of the multilevel optimization principle [11].Instead of solving the original and monolithic planning problem, we split such problem in smaller sub-problems, which are arranged in a hierarchical fashion.In our case, a temporal hierarchical decomposition is adopted by splitting the multi-drone inspection planning problem in: a mission planning problem, which works over the whole temporal horizon of the mission and aims at an optimal scheduling (assignment and ordering) of the targets and at the generation of an optimal highlevel (with a coarse degree of detail) trajectory for each drone; the flight planning problem, which works over a limited temporal horizon and consists in the actual flight planning (the real trajectory) of the drones amongst the navigation points scheduled by means of the mission planning.This work deals with the mission planning problem, which may be divided in: goalpoint planning, that is the scheduling of inspection tasks in the team; and high-level trajectory planning, that is the definition of the routes to accomplish the goalpoint plan.Table 1 reports the problem attributes and their definition.
Note that a mission plan includes mission waypoints and cross waypoints, where the former coincide with the inspection waypoints, whereas the latter are intermediate waypoints (e.g., to avoid obstacles).We assume that: (i) drones are modelled as a point mass; (ii) the altitude is the same for all the drones, as mentioned above; (iii) cruise speeds are constant (thus they are not used as a decision variable for the solution) and are the same for all the drones in the team; (iv) without loss of generality, the mission occurs in a free space environment, thus the mission plan coincides with the goalpoint plan.
Table 1 introduces also the concept of inspection performance by means of the Expected Cost of Ignorance (ECI) [12], which is an operator based on information utility.Given a mission plan ℳ, the ECI of a cell   is a weighted time-based cost related to: the time  ℳ, when   is inspected according to ℳ; the occurrence probability  E, () and the occurrence cost  E, () of the feared event in the interval (0,  ℳ, ).The functions  E, () and  E, () specify the priority of targets and model the mission features.Lastly, the join mission planning problem may be defined as the following optimization problem:

Game-theoretic model
Figure 2 shows the architecture of the proposed solution, which provides a dynamic and decentralized implementation of the inspection mission planner since each drone is equipped with its own planner, interacting with the other ones in the team by means of a cooperative Flying Ad-Hoc Network (FANET).
The mission planning of the drone team is structured as a MAS, where the agents are the local mission planners of the drones.For such a MAS, the main challenge is the design of the coordination mechanism to enable the agents to achieve the team objective in equation (1).In general, a coordination mechanism shall ensure that the individual decisions of the agents result in good joint decisions for the group [13], avoiding destructive and unhelpful interactions in the system, and maximizing global effectiveness by exploiting any positive interactions.In our case, we have adopted a competitive paradigm for the coordination of the inspection mission planners, which prescribes antagonistic agents, i.e., agents with their own goals.It is clear that drones cannot be real opponents since they need to cooperate for the minimization of the ECI.Thus, we address an artificial competition, in which the goals of the single agents are somehow aligned with the global objective.Moreover, a competitive MAS fits better with the scalability and resiliency features that are required for drone-based inspections.stochastic perturbation introduced by custom BLLL does not bring a real improvement, despite its random exploration of Nash equilibria and the related computational burden.This may be interpreted as a feature of the custom best response for: (i) inherently avoiding unhelpful interactions (i.e., target collisions) amongst the plans of different agents; (ii) giving rise to a sequence of joint actions leading to a Nash equilibrium with a lower ECI.With respect to the state-of-the-art works, these preliminary results confirm the feasibility of a dynamic and fully-decentralized coordination of multi-drone inspection by means of game-theoretic learning.The main limitations are related to the asynchrony of the learning process, which may slow down the convergence to a Nash equilibrium or may introduce local optima.

Conclusion
The proposed work has designed some custom learning-in-games approaches for the coordination of drone teams in inspection missions.Further improvements may be achieved implementing ad-hoc variants of synchronous learning rules.For example, future approaches may exploit the opportunity provided by BLLL to relax both asynchrony of the learning and completeness of the action set [17].

Figure 1 .
Figure 1.Sensor footprints of drones (left) and spatial distribution of targets in the RoI (right).

Figure 4 .
Figure 4. Computed inspection plans for each learning rule in a specific scenario with 200 waypoints.