Enhancement of multirotor UAV conceptual design through Machine Learning algorithms

Designing an efficient and optimized multirotor UAV requires laborious trade-off analyses, involving numerous design variables and mission requirement parameters, especially during the early conceptual design phase. The large number of unknown parameters, as well as the associated design effort often leads to non-optimal designs, for the sake of time efficiency. This work presents the implementation of a machine learning (ML) framework to assist and expedite the conceptual design phase of multirotor UAVs. The framework utilizes information from a comprehensive database of commercial lightweight multirotor UAVs. The database contains an extensive collection of crucial sizing parameters, performance metrics, and features associated with foldability and indoor guidance (e.g., obstacle avoidance sensors). These attributes specifically pertain to multirotor UAVs weighing less than 2kg, which exhibit diverse design and performance characteristics. The proposed ML framework employs multiple regression models (e.g. k-nearest neighbors regression, multi-layer perceptron regression) to predict the sizing parameters during a multirotor UAV’s conceptual design phase. This enables designers to make quick informed decisions, while also significantly reducing computational time and effort. Finally, the ML framework’s predictive capability is validated by comparing the predicted values with real-world data from an “unseen” test dataset.


Introduction
In recent years, the utilization of multirotor Unmanned Aerial Vehicles (UAVs) has witnessed a remarkable surge across various industries and applications [1].These versatile aerial platforms have transcended their initial roles in military and surveillance operations to become indispensable tools in sectors as diverse as agriculture, cinematography, infrastructure inspection, and emergency response [2].Their ability to hover, maneuver in tight spaces, and capture high-resolution imagery has revolutionized data collection and remote sensing.As a result, the demand for tailored multirotor UAVs has escalated significantly, prompting the need for more efficient conceptual design approaches.
In the realm of Unmanned Aerial Vehicles (UAVs), the design process for fixed-wing aircraft has reached a mature stage, marked by a systematic approach beginning with tailored user requirements aligned with mission objectives.Established methodologies, often rooted in historical trends from manned aviation, leverage semi-empirical correlations between requirements and design parameters [3].Extensive efforts have been invested in extracting similar tools for UAVs, bridging the gap between conventional aircraft design and unmanned systems [4].In stark contrast, the design landscape for multirotor UAVs remains in its infancy, relying heavily on individual designer expertise and utilizing low-fidelity tools coupled with manufacturer data for components.While tools like ecalc and Drive Calculator [4] expedite the comparison of various components, they often overlook installed performance factors, leading to complexities arising from the vast permutations.
Recent strides in the conceptual design of multirotor UAVs have seen a focused exploration of specific components.For instance, Biczyski et al. [5] introduced a methodology that optimizes motorpropeller combinations, factoring in estimated flight duration.L. de Angelis et al. [6] delved into the energy system during hover, emphasizing pivotal rotor features affecting flight endurance.Meanwhile, holistic approaches to design have emerged, integrating correlations and trends.Delbecq et al. [7] proposed techniques utilizing scaling laws and similarity models to expedite multirotor UAV conceptual design, a sentiment echoed by M. Budinger et al. [8] in the preliminary design phase.A notable innovation comes from Yuyao et al. [9] who employed regression tools for installed performance.Their study, involving a database of 16 drones, explored the relationships between components and overall maximum takeoff weight (MTOW).
The core focus of this study lies in harnessing the power of machine learning (ML) methodologies applied to existing multirotor UAV platforms.The target is to utilize the predictive capabilities of ML, turning machine learning algorithms into invaluable tools that enable swift and accurate predictions of crucial design parameters during the conceptual design phase.By seamlessly incorporating these predictive models into the design loop, designers can make well-informed decisions with minimal input, ushering in a new era of efficiency and precision in the multirotor UAV conceptual design process.

Data Acquisition
The initial step in any ML endeavor involves data acquisition for model training.In this study, a custom database was created to meet specific drone analysis criteria.

Database population
In curating the database, several stringent criteria were applied.Firstly, drones within the defined weight limit of less than 2 kg were exclusively chosen.Secondly, the focus was on drones designed for professional applications, excluding toy-grade variants, ensuring relevance to practical uses like aerial photography, surveillance, and mapping.The database was diversified, encompassing various brands, models, and configurations with all available equipment, sourced directly from manufacturers.Importantly, the selected databases align precisely with the research objectives of the machine learning process.The database is publicly available via a GitHub repository1 .

Database parameters
The design parameters for each drone were systematically categorized into geometrical/layout, performance, mission-related, and equipment aspects, constituting a total of 26 distinct metrics.These encompass crucial details such as maximum take-off weight, dimensions, and specialized applications.Comprehensive specifics of these metrics are outlined in Table 1.

Descriptive statistics
In the dataset, which comprises information from 87 diverse drones, there is a small fraction of missing data, approximately 5%.These gaps primarily stem from incomplete manufacturer data.Due to the variety of documented categories, sampled parameters exhibit values of drastically different scales.For example, the propellers' diameter ranges from 2.25" to 18.5" while the battery capacity (measured in mAh) ranges from 450mAh to 7500mAh.The latter is two orders of magnitude larger than the diameter of propellers.This poses the threat of misleading the machine learning algorithms into thinking that the battery capacity is a more biased/weighed parameter, hence a more important one.In chapter 3.6 there is a discussion on the treatment of this issue.

Methodology
In the forthcoming sections, the research methodology is outlined, providing an objective overview of the procedures employed, which is outlined in Figure 1

Machine Learning
In the realm of machine learning, the power of prediction unfolds as systems delve into data patterns without explicit programming, a technique pivotal in this study.Supervised learning, links features to labels [10].Through rigorous training on the feature matrix (Xtrainindependent variables) and corresponding labels (ytraindependent variables), algorithms minimize the divergence between predicted outcomes (y ) and actual labels (ytrain) using specialized loss functions.The iterative optimization of internal parameters (θ) through techniques like gradient descent refines the models, enabling them to generalize to unseen data (Xtest) and predict vital design parameters.
The research harnesses regression methods to estimate multirotor UAV design parameters during the conceptual phase swiftly.The precision of this estimation relies on the performance of four distinct models: k-nearest Neighbors, Random Forest, Support Vector Regression, and Multi-layer Perceptron (MLP) regression [10,11].The evaluation of these models includes key metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and the coefficient of determination  2 [12], providing comprehensive insights into their predictive capabilities.This in-depth analysis forms the bedrock of the methodology, ensuring the accuracy and reliability of the results.

Data Leakage Precautions
Guarding against the pervasive threat of Data Leakage, a phenomenon prevalent in numerous machine learning projects [13], was crucial in this study.Specifically, diligent attention was paid to ensuring a clear and early separation between the training and test datasets, addressing the fundamental challenges posed by unseparated data.Illegitimate feature usage during training was also carefully avoided.In the subsequent section, the division of the dataset into distinct training and test subsets will be explained, followed by a detailed exploration of the careful selection of model features in section 3.4.

Creating the Training and Test Sets
The dataset was divided into a 20% test subset and an 80% training subset due to the limited data available, a strategy to ensure a balance between model evaluation and training robustness.The training phase, crucial for model development, utilized the 80% training subset, allowing the model to learn patterns within the data.The remaining 20% held out for validation served as a critical checkpoint, enabling us to assess the model's generalization and performance on unseen data.This division and validation process ensured a comprehensive evaluation of the model's predictive capabilities while guarding against overfitting, thereby enhancing the reliability of results.

Feature Engineering
In addition to the features initially recorded in the database, new parameters were derived from existing ones, a common strategy to enrich datasets with valuable information [14].These newly generated features, chosen for their relevance in the UAV design domain or their significance as performance and design indicators, enhance the depth of this analysis.The study introduces several new parameters: • Battery Energy [ℎ]: Calculated by leveraging the nominal voltage of LiPo/Li-Ion batteries (assumed to be 3.7V) and the reported battery capacity.• Disk Loading [/ 2 ]: Derived from the propeller area (another constructed parameter) and the maximum takeoff weight, this metric, is widely used in rotary aircraft analysis.• Volume [ 3 ]: Total volume of the quadrotor UAV calculated by length, width, and height.

Identifying correlations
The next step in the process was to identify relationships between the variables to be regressed.To achieve that, two methods, and therefore two indicators, were utilized: Spearman's correlation coefficient and Mutual Information between the variables.

Mutual Information
The first and most important criterion for the variable relationships, Mutual Information (M.I.) was utilized.M.I. of two random variables measures the mutual dependence between the two variables.More specifically, it quantifies the "amount of information" obtained about one random variable by observing It was observed that certain "areas of interest" were formed, primarily around the performance and geometrical variables.Consequently, these relationships were investigated further.On the other hand, categorical variables generally exhibited poor performance in correlating with other variables.

Models Selection
Given the wealth of features, the research demanded a strategic focus on specific models and parameter combinations.Among the numerous variables, two crucial parameters profoundly impacting multirotor UAV design were identified: the Maximum Take-off Weight (MTOW) and the total available Battery Energy.These parameters, extracted through analysis of correlations and mutual information, emerged as pivotal variables shaping the UAV's conceptual design.The selected features for training the ML models are summarized in Table 2, with each row representing an independent feature.Table 2.The two models that were selected to train the ML algorithms.

Preparing Data for Regression Models
In the final preprocessing stage, the challenge of varied feature ranges was tackled as it is showcased in section 2.3.Standardization, ensuring zero mean and unit standard deviation, was crucial for unbiased algorithm performance, excluding Random Forest.Additionally, the 5% missing data was carefully handled using the Scikit-learn library's Iterative Imputing technique, ensuring a complete and reliable dataset for subsequent analysis.

Results
For the ML algorithms, basic tuning was performed using a grid search technique which led to the hyperparameters shown in Table 3.
Table 3. Hyper-parameters for the 4 ML algorithms for the 2 district models In Table 4 the resulting metrics for each of the 2 models are displayed.The validation and the calculation of the metrics is always done on the test subset (unseen data).This is an indication that the ML algorithm can generalize well and capture the complex relationships in unseen data.In Figures 4 and 5, depicting MTOW and Battery Energy models respectively, the graphical representation illustrates the comparison between predicted values from the test dataset (X-axis) and the corresponding real unseen values (Y-axis) (left side).In contrast, on the right, the residuals of the prediction (actual -predicted value) are presented over the real values.In an ideal scenario, on the predicted versus real values, data points would precisely align along the Y = X line indicating a flawless match.On the residuals versus the real values the residuals should lay on the Y = 0 line which indicates a perfect prediction.The residual plot is a way to unveil any biased pattern of the ML algorithms.Specifically, it is important to ensure that there are no repeating patterns such as "U" shapes, and that the dispersion of data points remains random around the Y = 0 line.Notably, the kNN algorithm stands out in the MTOW model, displaying superior performance with an impressive  2 of 0.98.In the context of the Battery Energy model, the RFR algorithm exhibits noteworthy results, achieving a  2 of 0.89.These remarkably high  2 scores underscore the models' exceptional accuracy in capturing the intricate relationships within the data, showcasing their robust predictive capabilities.versus the real values (4b) for RFR algorithm and Battery Energy model

Conclusions
The potential of machine learning (ML) applications has been explored for rapid and accurate estimation of the design parameters of multirotor UAVs during the conceptual design phase.A concrete database was constructed using data from UAV manufacturers, as well as new features were created to add more layers of information.After rigorous preprocessing of the data while taking into consideration the risks of data leakage, a framework of 4 ML algorithms was developed.Maximum take-off weight and battery energy were the two variables chosen as the dependent ones (Y variable) since they are deemed as some of the most critical characteristics that drive the design of a multirotor UAV.The performance of the ML framework shows that the kNN and RFR predict the target parameters better than the other algorithms with remarkable precision.This underscores the potential of ML algorithms as indispensable tools for designers, ensuring precise predictions even in scenarios with constrained knowledge.
. The workflow encompasses three pivotal components: data processing (exploratory data analyses and pre-processing), model construction, and model validation.Notably, this research utilizes the Python programming language, leveraging opensource libraries such as Pandas, Scikit-Learn, Matplotlib, and NumPy for implementation.

Figure 1 .
Figure 1.Workflow followed from data preprocessing to the validation of ML algorithms

Figure 4a - 4b .
Figure 4a -4b.Real values versus the predicted values (4a) and residual values (real -predicted) versus the real values (4b) for the kNN algorithm and MTOW model

Figure 5a - 5b .
Figure 5a -5b.Real values versus the predicted values (4a) and residual values (real -predicted) versus the real values (4b) for RFR algorithm and Battery Energy model

Table 1 .
All the parameters that were documented during the data acquisition phase

Table 4 .
Metrics of predictions on test subset for the ML algorithms a smaller value is better.b bigger value is better.