Predicting wildfire ignition causes in Southern France using eXplainable Artificial Intelligence (XAI) methods

The percentage of wildfires that are ignited by an undetermined origin is substantial in Europe and Mediterranean France. Forest fire experts have recognized the significance of fires with an unknown ignition source since documentation and research of fire causes are important for creating appropriate fire policies and prevention strategies. The use of machine learning in wildfire science has increased considerably and is driven by the increasing availability of large and high-quality datasets. However, the absence of comprehensive fire-cause data hinders the utility of existing fire databases. This study trains and applies a machine-learning based model to classify the cause of fire ignition based on several environmental and anthropogenic features in Southern France using an eXplainable Artificial Intelligence framework. The results demonstrate that the source of unknown caused wildfires can be predicted at various levels of accuracy/natural fires have the highest accuracy (F1-score 0.87) compared to human-caused fires such as accidental (F1-score 0.74) and arson (F1-score 0.64). Factors related to spatiotemporal properties as well as topographic characteristics are considered the most important features in determining the classification of unknown caused fires for the specific area.


Introduction
In Europe, approximately 50% of all fires were caused by an unknown origin based on data reported from 19 European countries in the European Forest Fire Database from 1999 to 2016 (De Rigo et al 2017). In Mediterranean France, almost 70% of all fires between 1973 and 2020 were recorded without a cause of ignition according to the forest fire database for the Mediterranean area (Prométhée.com). Many experts in the field of fire management in Europe have acknowledged the importance of fires classified as having an undetermined origin (Tedim et al 2022), since the lack of information regarding fire causes makes it difficult for fire managers to determine the most suitable course of action to prevent similar incidents from happening in the future. French fire experts, in particular, have identified fires of unknown origin as being of paramount importance (Tedim et al 2022) among the various categories of fires: natural, accident, negligence, deliberate, and rekindle of the harmonized classification scheme of fire causes in Europe . In southeastern France, Ganteaume and Guerra (2018) highlighted the fact that large areas are burned by fires of undetermined sources, and they argue for enhanced quality and quantity of investigations into fire ignition causes in order to improve the accuracy of fire databases. Fire ignition patterns can vary significantly both temporally and spatially depending on the cause of ignition (Curt et al 2016) and can be impacted by a plethora of environmental and anthropogenic drivers (Syphard et al 2008, Catry et al 2009, Syphard and Keeley 2015. As such, documentation and research of fire causes and their spatiotemporal patterns are essential for establishing meaningful fire policies (Rodrigues et al 2014) since a better understanding of these patterns can improve the efficacy of fire prevention strategies (Oliveira et al 2012). However, the absence of comprehensive fire-cause data hampers the utility of these databases.
Similar to other areas of study, the utilization of machine learning (ML) methodologies in wildfire science has seen a marked increase in popularity in recent years (Jain et al 2020, Bot andBorges 2022)  While ML models have demonstrated great effectiveness at identifying complex patterns in large datasets, some are considered 'black boxes' because it can be difficult to understand how the model arrives at its predictions or how certain patterns were identified (Loyola-Gonzalez 2019). This lack of interpretability can be a barrier to adoption, as it may be difficult for stakeholders to trust such models without understanding the complete algorithm inference pattern. In recent years, eXplainable Artificial Intelligence (XAI)/interpretable machine learning has emerged as an approach that employs various techniques and strategies to enhance the interpretability, transparency, and explainability of ML models and their decision-making processes with the ultimate goal of fostering trust and accountability in the model's output. In the context of wildfire science, the application of XAI has been explored by only two recent studies to address wildfire occurrence and size (Al-Bashiti andNaser 2022, Cilli et al 2022).
Research conducted on fire ignition causes is fairly limited and poorly understood, but some studies have demonstrated that arson fires can potentially be predicted both spatially and temporally (Gonzalez-Olabarria et al 2012, Penman et al 2013. The objective of this study is to develop a ML-based model that can classify the ignition source of fires that have been recorded without a known cause in France. Furthermore, this study aims to evaluate the significance and the effect of various environmental and anthropogenic factors in determining the classification of different fire sources utilizing XAI methods.

Study area
The study area comprises 15 administrative divisions (departments) in the south of France, with a total area of just over 80 000 km 2 (table 1, figure 1). The specific region is considered the most fire-prone in France and where most of the burned area is recorded, despite exhibiting decreasing trends in the last decades (Bountzouklis et al 2022). Environmental characteristics and landscapes vary significantly with both mountainous and coastal zones contained in the study area; the highest altitudes and steepest slopes are found in the northeastern parts where the French Alps are located (e.g. Hautes-Alpes, Alpes-de-Haute-Provence) whereas in the southern portions topography is low-lying and relatively flat (e.g. Bouchesdu-Rhône, Hérault). Population density is influenced by topography: the highest concentrations are located in areas with low altitudes and gentle slopes, especially in the southeastern Mediterranean coastal and near coastal zones (e.g. Bouches-du-Rhône, Alpes-Maritimes). The French Alps and the island of Corsica are largely covered by forests & semi-natural areas whereas the largest agricultural areas are concentrated mainly in the center of the study area.

Fire database
The current study was based on 'Prométhée' , the official forest fire database for Mediterranean area in France. This database documents fires from 1973 onwards and contains information for each fire such as burned area, ignition source (known/unknown), time, date, and location within a 2 × 2 km grid. Similar to the harmonized European classification scheme on ignition causes , 'Prométhée' includes five major fire ignition sources: (i) accidental (e.g. power lines, vehicles), (ii) arson (e.g. pyromania, conflict), (iii) private negligence (e.g. cigarette butts, leisure), (iv) professional negligence (e.g. industry, agriculture) and (v) lightning. The total number of fires considered in our study is 48 038; these were recorded from 1997 to 2020. Fire records prior to 1997 were excluded from this study since classification on the origin of fires is considered less reliable . The dataset comprised of records starting in 1997 is fairly balanced with regards to the number of fires of known/unknown sources as approximately 60% have a known cause of ignition. Within the known causes (n = 27 620) frequency varies considerably; arson is the most frequent (38.4%), followed by private negligence (26.7%), professional negligence (17.2%), accidental (10.1%) and finally lightning (7.6%) (figure 2).

Fire frequency & burned area according to cause
After unknown causes, arson fires are both the most numerous and account for the greatest annual burned  area most years (figures 3(a) and (b)). This is followed by private negligence, which, even though is the second most frequent fire source, it does not cause a proportionate extent of burned area. Despite similar numbers of accidental and lightning fires, the annual percentage of area burned by accidental fires is often significantly greater than that burned by lightning fires and occasionally greater than the other causes. Lastly, although the percentage of burned areas by unknown origin fires is substantial most years, frequently second after arson, it fluctuates widely from 5% to 49% depending on the year.   Table 2 describes the land cover, topographic, anthropogenic, and spatiotemporal variables that were used as features to predict the fire ignition source. The contextual geographic information of the selected factors was processed for each 2 × 2 km grid initially in ArcGIS Pro v2.9 and subsequently using python packages pandas (McKinney 2010) and NumPy (Harris et al 2020) to preprocess the data for the classification scheme (e.g. replace missing values, one-hot encoding, etc) and finally for visualization purposes seaborn (Waskom 2021).

Fire cause classification based on random forests (RFs)
ML methodologies learn and adapt through the process of experience, where the size and quality of the input data play a critical role in determining the overall effectiveness of the model. RFs (see e.g. Breiman 2001) is a supervised ML algorithm used both for classification and regression that is well-established in many disciplines and has grown substantially in popularity in the field of wildfire science over the last decade (Jain et al 2020). RF is based on decision trees (Breiman et al 2017), where each decision tree is a series of If-Then-Else sequences with several branches connected by decision nodes and finally by leaf nodes that eventually determine a value or category such as the label of a classification task (figure 4). Furthermore, a fundamental characteristic of RF is that a random subset of features is used at each node of each decision tree, resulting in several individually trained and uncorrelated decision trees, and these are finally merged into a larger ensemble model to limit overfitting and produce more accurate predictions. The processing chain of RF (classification, accuracy score, confusion matrix, hyperparameter tuning, etc) was carried out using the implementation of the algorithm in Python module Scikit-Learn (Pedregosa et al 2011) (figure 5). To address the unbalanced number of samples between classes, the Synthetic Minority Oversampling Technique (SMOTE) (Chawla et al 2002) was used, which is implemented under Python package scikit-learn  imbalanced-learn (Lemaitre et al 2017). SMOTE is a common method to produce synthetic data from a minority class (e.g. lightning ignitions) by randomly selecting one of the k-nearest-neighbors and using it to generate new, but randomly tweaked, similar samples. To train the classifier, 70% of the dataset was utilized, while the remaining 30% was used for testing the accuracy in predicting the cause of a fire. The synthetic samples created using SMOTE were utilized only during the training phase and not for the validation of the model. To finetune the algorithm hyperparameters such as number of trees, max number of features considered for splitting a node, max levels in each decision tree etc, scikit-learn Random Search Cross Validation method was used; this allowed us to evaluate and narrow down a wide range of values for each hyperparameter. Subsequently, the Grid Search with Cross Validation method was used to examine different combinations of specific values for the selected hyperparameters.
To evaluate the model, accuracy, precision, recall and F1-score were used; these are commonly used as evaluation metrics for classifiers in the field of ML, which are calculated using the number of instances classified as true positives (TP), true negatives (TN), false positives (FP) or false negatives (FN). Accuracy is determined by dividing the number of correct predictions by the total number of instances. Precision specifies how many of the instances the classifier predicted as positive are actually positive while recall shows what fraction of the positive instances in the dataset were correctly identified by the classifier. F1-score serves as a comprehensive measure of both precision and recall. A F1-score of 1 denotes optimal performance, with both precision and recall being maximized. Conversely, a score of 0 represents the worst possible outcome, with both precision and recall being minimal. A score of 0.5, which is equivalent to random guessing, is suboptimal performance, whereas scores above 0.5 are generally considered to indicate good performance, To identify which features are driving the classification but also to comprehend the contribution of each one, the SHapley Additive exPlanations (SHAPs) (Lundberg and Lee 2017) method was utilized. SHAP is an approach based on game theory that is used to explain the ML model outputs by breaking down the prediction into contributions from each feature value. These contributions are combined and help us understand the overall importance of each feature value in the final prediction. SHAP values can be visualized using various plots, such as a summary plot, that allow us to display not only the strength of the impact a certain feature has but also the direction of the impact.

Results
As elaborated below, the results derived from the RF model are presented through classification metrics and a confusion matrix, subsequently followed by the description of which features drive the classification and how they influence it.

Fire ignition cause classification
The overall accuracy of the multiclass RF classification scheme reaches about 70% (69.8%). Detailed results per ignition cause are presented in table 3 and evaluated using (i) F1-score, (ii) precision and (iii) recall. Concerning the accidental class, the model displays the second highest F1-score (0.77) and a moderate discrepancy between and precision (0.81) and recall (0.74). This indicates that the model is able to correctly identify most of the instances as accidental when it predicts that class, but it misses more instances that actually belong to that cause. Regarding the arson class, the model shows a lower F1-score of 0.64 and not very accurate in terms of precision (0.60), meaning that it may predict some instances as arson that actually belong to a different class. However, the model performs better when it comes to identifying most of the instances that belong to the arson class (recall score 0.69). The lightning class displays overall the best classification metrics (F1-score of 0.88). The precision score (0.85) is fairly lower than the recall score (0.91), suggesting that lightning fires are easier for the model to identify and are not confused with another class. On the contrary, the model performs the worst for the private negligence class, with a F1-score of 0.55. In this class, the precision score (0.59) is higher than the recall score (0.52), which suggests that the classifier has a higher rate of correctly identifying positive samples but is missing a higher proportion of the total number of positive samples. Finally, the professional negligence class exhibits relatively low but balanced scores between precision (0.67) and recall (0.63).
The confusion matrix ( figure 6) provides additional information with regards to the performance of the classification of ignition causes. More specifically, accidental fires are most frequently misclassified as arson ones. There is a high number (n = 159) of arson fires that are wrongly classified as private negligence, and similarly, there are 266 private negligence fires that are misclassified as arson. This could mean that there are similarities between the causes of these fires, or that the model may not have enough information to accurately distinguish between these classes. As the most accurately predicted cause, lightning displays low misclassification numbers, which are distributed evenly among the other classes. In contrast, private negligence, that is a major negative contributor to the overall classification accuracy, shares its errors primarily between professional negligence and arson classes. Finally, professional negligence fires are also often confused for either arson or private negligence fires. Figure 7 illustrates the computed feature importance of the RF model for all classes, which is expressed through mean SHAP values that represent the average impact of a feature on the model output across all the instances in the dataset. Overall, feature importance values vary significantly both per feature type and ignition cause. The features summer, elevation and afternoon form a group that stands out significantly from the rest, followed by a second cluster with slightly less impactful features such as spring, geographic coordinates, BA <0.1 ha and secondary road density.

Feature importance and effect
In the context of accidental fires, several features demonstrate comparable significance, with afternoon, elevation, summer, and primary road density being slightly more salient than other variables. Similarly, the relevance of features for arson fires is widely distributed, with spatiotemporal characteristics such as summer, night, and location being the most prominent factors. Regarding fires caused by lightning, summer and elevation are by far the most impactful variables followed by secondary road density. In the case of private negligence, summer exhibits the highest level of importance, although this distinction is not substantially greater than that of other variables, such as afternoon, spring, and secondary road density. Finally, with respect to professional negligence, summer represents the most influential factor by a significant margin, with only burned area (BA) size (<0.1 ha) showing discernible differences from other variables. Figure 8 depicts the most influential (n = 10) features for each class of the model in descending order. Furthermore, the impact of each feature on the ignition cause is also illustrated through the positive or negative SHAP values. These values indicate whether an instance is more or less likely to belong to a particular class depending on the magnitude of the feature values. For example, instances with lower elevation values are more likely to be classified as an accidental or arson ignition, and less likely to be categorized as a lighting fire. Similarly, a fire that occurred during the summer is more likely to belong to the arson or lightning class, but less probable to be classified into the other categories.

Discussion
The performance of the RF classifier varies considerably between natural and human-induced fires. Lightning fires were classified with the highest accuracy since ignition dynamics for these fires are significantly different from human-caused fires. As reported by Curt et al (2016), lightning fires tend to have small burned areas, occur on steep, densely vegetated, mountainous slopes with low anthropogenic presence; seasonality also plays a significant role in the incidence of those fires (summer). This particular profile, which aligns with the interpretation of features effects through the SHAP values, enables the classifier to distinguish it from other causes more clearly. In contrast to natural fires, human-caused ignitions are multi-faceted and more complex to model. Accidental fires are the least difficult human induced events to classify in our model, potentially attributed to the greater association of such fires with infrastructure elements such as power lines and railways in contrast to other forms of anthropogenic causes. The most challenging cause to classify is private negligence, which is most often misclassified as arson and vice versa. Both arson and private negligence fires often occur in similar contexts, specifically the wildland urban interface. The similarity in environmental contexts and conditions between these types of fires may make it difficult to distinguish between the two causes. However, this may also reflect a problem of reliability in the fire databases (Ganteaume and Guerra 2018): in order to reduce the number of unknown caused fires the cause is either speculated or attributed without much physical evidence to support it . Professional negligence fires are also confused, but to a lesser extent, with private negligence. Both causes share common characteristics, as they tend to burn small/medium areas and occur mainly outside of the summer season (Curt et al 2016) which is reflected in the significance and impact those features hold in the SHAP framework.
Socioeconomic data used in our model only pertains to a single year. While this approach may have its advantages, such as simplifying data collection and analysis, it can also potentially undervalue the importance of socioeconomic features by not capturing their temporal fluctuations, especially considering that most fires in France but also in the Euro-Mediterranean (95%) region are caused by humans . Factors such as population density, unemployment rate, etc represent dynamic phenomena that can change considerably over time in contrast to static variables such as topography or even to other dynamic variables as, for instance, land cover. The addition of geographic coordinates in our workflow not only partly tackles spatial non-stationarity, as the decision trees of the model in a way incorporate geographic space during their creation, but also enhances the results which is in accordance with other works that utilize ML algorithms for applications of spatial nature (Hengl et al 2018). Spatial approaches of ML algorithms such as Geographic Random Forests (Georganos et al 2021) and Geographically Weighted Neural Networks (Hagenauer and Helbich 2022) would be advantageous for such applications considering the significance of spatial location and its strong links with different fire ignition causes.
As a first attempt, the current study utilized only the first-level causes (5 categories) available from the hierarchical structure of the 'Promethee' fire database, which also includes second-level (15 categories) and third-level (31 categories) causes. Applying a similar procedure on selected sub level data could possibly improve functionality and understanding of ignition sources and their performance within the classification scheme. However, this would increase the complexity of the model and may negatively impact overall accuracy. Finally, the inclusion of fuel type characteristics and fire-weather variables can potentially strengthen and facilitate the distinction between different fire causes; for instance, arson fires burn larger areas Jappiot 2013, Syphard andKeeley 2015) and this may indicate these fires are set under more favorable weather conditions.
The practicality of this model is not intended for operational use or as a substitute method to conventional field investigation methods as it cannot provide physical evidence for the proper deduction of the cause of a forest fire. Instead, it is targeted as a method to analyze large-scale fire databases that contain a moderate percentage of unknown caused fires. The ideal balance would be neither too low, as insufficient data would result in a restricted training dataset, nor too high, as that would render the model less useful. Despite the limitations in identifying causes of unknown ignitions, the results can help to facilitate targeted prevention efforts (Oliveira et al 2012). Moreover, the benefits of harmonized classification systems, such as the one proposed by the European Commission  are emphasized. By utilizing such schemes, ML models can significantly benefit from increased harmonized data availability, provided that the data's reliability stays at an adequate level. This allows for the combination of historical national fire databases, leading to the development of larger databases with enhanced modeling potential. Other standardized georeferenced data initiatives in Europe (e.g. the European INSPIRE Directive, the European Geodata Infrastructure, etc) which aim to establish a common framework for the management and sharing of geospatial data across Europe are going in this direction.
ML algorithms have become increasingly popular in fire science (Jain et al 2020). These algorithms can help identify complex relationships between various factors that contribute to fire occurrences. However, the success of these algorithms relies heavily on the availability of large, high-quality datasets. As fire science continues to advance, access to larger and more comprehensive datasets is becoming increasingly common. This includes georeferenced explanatory feature data which provides important contextual information that can be used to better understand the underlying causes of fires. As these datasets continue to grow in size and quality, ML algorithms will become even more powerful tools for analyzing fire occurrences and fire causes.

Conclusion
In this study we train and apply a model to classify fire ignition causes based on several environmental and anthropogenic features using an XAI framework. The results suggest that the source of unknown caused fires can be identified at various levels of accuracy depending on the nature the forest fire (e.g. F1-score lightning 0.87, accidental 0.74, arson 0.64). Spatiotemporal characteristics including geographic location, season, time of the day but also topographic factors like elevation are the most important features in determining the classification of unknown caused fires for the specific area and fire regime studied here. The role of spatial non-stationarity is highlighted through the importance it holds in our processing framework and should be treated by implementing models that utilize spatial approaches of ML algorithms, which are expected to have increased accuracy over the original ones. The increasing availability of large, high-quality datasets is an important factor driving the growth of ML algorithms in wildfire science and will likely play a critical role in advancing our understanding of fire causes in the coming years.

Data availability statements
The data that support the findings of this study are openly available at the following URL/DOI: 10.6084/ m9.figshare.22028015 (Bountzouklis 2023).