Improvement of human-induced wildfire occurrence modeling from a spatial variation of anthropogenic ignition factor in the CLM5

Wildfire is an essential form of natural disturbance for the Earth system, and it is challenging for the current numerical models to accurately retrieve the spatiotemporal distributions of wildfire occurrence. One of the deficiencies could result from the parameterization of anthropogenic impact on wildfire occurrences. This study develops an approach to advance human-induced wildfire modeling by calibrating the parameter of human ignition count (HIC) in the fire module of the Community Land Model version 5. This study modifies the source code to allow a grid-scale variation of HIC. Sensitivity experiments with different grid-uniform HIC values are conducted to quantify the model biases with satellite-based observation data as the reference. The theoretically optimal HIC for each grid is obtained by linear rescaling the HIC based on the model biases in the sensitivity tests. The model evaluation takes place in southwest China where there is complex terrain and land use/land cover features. The involvement of grid-scale HIC significantly reduces the model bias in the climatology of wildfire occurrence. The pattern correlation coefficient increases from 0.57 to 0.78, and the root mean square error (RMSE) decreases from 0.58 to 0.18. The correlation coefficient of the annual sums of wildfire occurrences increases from 0.69 to 0.77, and the RMSE decreases from 560.8 to 146.4. A global-scale test verifies that such an approach can be extended to multiple regions with a reasonable scale of population density and economy.


Introduction
Wildfire is an important natural disturbance for the Earth system (Bowman et al 2009) for its substantial impacts on the land cover, water and heat fluxes, and terrestrial biogeochemical cycles at both regional and global scales (Bond et  Wildfire studies have long been supported by numerical models via multiple pathways. Some fire models are developed through statistical approaches. Algorithms qualify the likelihood, hazards, and vulnerability of wildfires based on statistics (Canadian Forestry Service 1984, Preisler et al 2004, Finney et al 2011, Miller and Ager 2013, Xi et al 2019. The key to statistical wildfire modeling is to build a relationship between wildfires and atmospheric/land background conditions, including but not limited to temperature, wind speed, atmospheric and soil humidities, land cover/land use types, etc. Some algorithms involve socioeconomic properties to consider anthropogenic impacts on the spatiotemporal distributions of wildfires (Mercer and Prestemon 2005, Syphard et al 2008, Vilar et al 2010. Machine learning techniques have also taken a significant contribution to its exceptional advantage in filtrating dominant influence factors of wildfires with a regional focus (Jaafari et al 2019, Jain et al 2020, Shao et al 2022). The advantage of statistical fire modeling is explicit for its applicability with ease in the field of routine wildfire risk management. However, the model uncertainty could inflate to an extensive magnitude as the statistical framework lacks a full physical representation of wildfire processes, failing to retrieve a dynamic interaction with the background of the atmosphere and land.
The process-based wildfire models are developed along with the land surface models as its embedded component. (Hanson et al 2000). Land surface models offer a dynamic background of near-surface atmosphere and soil conditions, based on which a full physics model framework simulates the entire process of wildfires, including the ignition, spread, suppression, and impacts on the environment (Li et al 2013, Haas et al 2022. The first wildfire parameterization module was developed within a dynamic global vegetation model (DGVM;Lenihan et al 1998). In particular, the parameterization schemes for anthropogenic wildfires are based on socioeconomic factors, which potentially enlarges model uncertainties compared to the modeling of naturally-caused wildfires (Thompson and Calkin 2011). The model biases result largely from the complex contributing factors of wildfire behaviors. It inspires detailed refinements and parameter calibrations specifically for anthropogenic wildfires.
This study addresses the research question by conducting model modification and calibration on the parameterization of human-induced wildfire ignitions in the fire module of a land surface model. The modified model with the calibrated parameter was tested for a region in southwest China that is characterized by its complex land cover, large population density, and frequent wildfire activity during the dry season. The model evaluation is conducted with satellite-based wildfire observations as the reference to examine the performance of the modified model. Global simulations are also addressed to provide guidance on the applicable regions for our model modification approach around the globe.

Study area
This study addresses simulations on both regional and global scales. The regional-scale test takes place in Yunnan over the high-latitude area of southwest China. The land cover types in this area include boreal forests, shrubs, and grasslands (Pan et al 2022). The climate in Yunnan is characterized by distinct dry and wet seasons. The wildfire risk is high during the dry season due to the lack of precipitation and extensive forest cover (Chen et al 2014, Cao et al 2017. The population density is high in this area, and it has sophisticated wildfire prevention policies made by the local government (Yi et al 2017, Hayes 2021.

Satellite fire observations
This study relies on the satellite-based wildfire observation data obtained through the Fire  This study chooses MODIS-based FIRMS data among the two for the long temporal coverage from 2001 to 2019 although its 1 km 2 resolution is lower than the 375 m resolution of the VIIRS-based product (Li et al 2018). Fire spots with a confidence level <60% or FRP value <20 are omitted to distinguish wildfire occurrences from interferences like cloud reflections, sun glints, etc. Only one fire spot is kept in the analysis for all the spots which are less than 1 km from each other in distance. Likewise, fire spots within 1 km in distance for the two consecutive observation times are regarded as the same fire spot. Fire spots persisting for more than two consecutive weeks and without any location change are discarded, which is likely to be industrial facilities that emit infrared radiation constantly. Since CLM5 exports fire occurrence as gridded output with a unit of counts/km 2 /second, we re-grid the observations into the model grid by counting the number of observed fire spots within the boundary of each grid for a consistent comparison between the model and observation. The analyses of the wildfire distributions are quantified by the fire counts per grid per year in this study by multiplying the model output by the area of the grid. Note that the area of the grid cell is higher in the high latitudes than in the low latitudes.

The land surface model and fire module
The model modification is based on the fire module embedded in the Community Land Model version 5 (CLM5; Lawrence et al 2019) developed by Li et al (2013). It is a process-based system modeling different types of wildfires and their corresponding impacts on trace gases and aerosol emissions.
Regarding the ignitions of wildfire, a parameterization calculates the number of ignition sources N i as the sum of natural and anthropogenic sources of ignition before considering fuel availability, combustibility, and wildfire suppression, in which I n and I a are the number of natural and anthropogenic ignitions respectively (counts/km 2 /second), and A g stands for the area of grid (unit: km 2 ). The number of naturally-caused ignitions is a function of land and atmospheric (lightning frequency) background conditions, while the ignition of human-caused wildfire is parameterized by a parameter called 'human ignition count' (HIC hereafter) and calculated as the following equation, in which α is HIC with a default value of 0.01 (count per person per month) and uniform for all grid points. D p is the population density (person per km 2 ), and n is the seconds in a month. As in the equation, human-induced ignitions are monotonically increasing with the population density. As in Li et al (2012), the default value of α is 3.89 × 10 −3 in accordance with the CLM-DGVM. The default value has now been updated to 0.01 as the CLM developing to its newest version. Note that HIC only applies to ordinary wildfires. Other types of wildfires like peat fires, cropland fires, and deforestation fires in tropical closed forests (tropical forest with canopy cover >40%; Li and Lawrence 2017) are addressed by separate parameterization algorithms that do not export the fire count as the output. Specifically for cropland fires, as the post-harvest burn is currently prohibited in China (Yang et al 2020), the agriculture fire module was turned off for the regional test while being kept on for the global test. The grids with an extensive land cover of cropland (area weight >30%) in the global test are omitted as they could potentially lead to large model biases. For deforestation fires, analyses skip grids with more than 60% in area weight of tropical closed forests for the same reason.

Model configurations for regional and global tests
For the regional test, model runs are configured with a grid spacing of 0.125 degrees. The atmospheric forcing data is customized by the ERA-land reanalysis product with a grid spacing of 0.1 degrees (Muñoz-Sabater 2021). The ERA-land reanalysis data is accessible through Copernicus Climate Data Store (https://cds.climate.copernicus.eu/). The high resolution helps to retrieve strong gradients in meteorological variables and land surface conditions, favoring the modeling over complex terrain (Abatzoglou and Brown 2012). The forcing data includes the surface air temperature, precipitation, wind speed, surface pressure, relative humidity, and downward solar radiation from 1981 to 2019. The variables from ERAland have been re-grided onto the CLM5 simulation domain. Spin-up simulations with the forcing data of the GSWP3 product (Compo et al 2011(Compo et al ) in 1901(Compo et al -1980  The experimental design including, the spin-up simulation, and the prescribed HIC values for each sensitivity case, keeps the same as for the regional tests. The surface datasets including the calibrated HIC are included in the supplemental material along with the revised model source code.

Grid-scale HIC and its calibration
The HIC is originally set as a grid-uniform parameter prescribing the number of fire ignitions per capita. In reality, however, the frequency of human ignition could vary from place due to socioeconomic factors like local economic types, wildfire suppression policies, education levels, etc. Allowing a geographic variation of HIC helps to involve the impact of these factors with the spatial distribution. This study introduces the grid-dependent HIC by modifying the HIC as a two-dimensional parameter with lon/lat information. The grid-independent HIC is stored along with other background variables in the surface dataset. The I/O module in the CLM5 model is modified correspondingly to read the HIC values for each grid from the surface dataset. Given the freedom of grid-scale variation of HIC, the calibration work could result in an unaffordable amount of test cases and computational resources. To avoid running redundant test simulations, a calibration approach is correspondingly designed for determining the appropriate HIC value for each grid based on linearly regressing the model biases for the grid in all sensitivity runs. The model bias is calculated as the long-term annual mean of fire count for each grid. The model bias is calculated as the differences in the multi-year (2001-2019) annual mean of fire counts from model results to observational results, ∆Nfire = Nfire model − Nfire obs (3) in which Nfire model and Nfire obs are respectively the simulated and observed annual-mean fire occurrences (unit: count/year). An F-test examines the statistical significance of whether the model bias can fit into a linear function of the HIC by the linear regression model with a 95% level of significance. If so, changing HIC to its 'best' value is likely to reduce the model bias significantly. More detail on the HIC calibration is included in the supplemental material.

Regional scale model evaluation
The FIRMS data shows a high spatial heterogeneity of wildfires ( figure S2(a)). In general, more wildfire occurrences are found in the south. Regions with frequently-happened wildfires are typically associated with an extensive forest cover and a warm climate (Fan et al 2011). Such a pattern bears a resemblance to the scattered fire occurrences in the same region as in Ye et al (2017) and the prescribed wildfire risk (Li et al 2017). Note that for this area, wildfires are typically characterized by a large number of ignitions while small burned areas after each ignition (Chen et al 2017). In this way, the fire occurrence is a prevailing variable for the model evaluation in this area compared to the variable of burned area.
The model consistently simulates a higher frequency of wildfires compared to observation throughout the study area in general. The model also fails to simulate the fire occurrence in the southeast. There is a cluster of grids in the south without any wildfires The plant functional type of these grids is the tropical closed forest so the wildfire occurrence is not calculated. The pattern correlation coefficient (PCC) is 0.57 (p < 0.01) for the control simulation within the study domain, associated with the root mean square error (RMSE) of 0.58.
The observation indicates that the total number of wildfires is 434 times annually on average for the study domain, with a maximum of 678 times and a minimum of 168 times respectively ( figure 1(a)). This result has a comparable interannual variability to that of the Wildfire Atlas of China (WFAC), although their spatial patterns are hardly comparable due to the 2-degree resolution of the WFAC data (Fang et al  2021). The comparison of timeseries further confirms that the simulated annual wildfire occurrences in the control simulation are dramatically higher, exceeding 1000 times for some years ( figure 1(a)). Both the observed and simulated fire occurrences reach their maximum in 2010 when this region was suffering from a severe drought. Despite the high model biases in annual fire counts, the interannual variabilities are similar between the observation and simulation. The two timeseries have a linear correlation coefficient of 0.69 (p < 0.01; figure 1(a)). The only time-varying variable involved in the anthropogenic wildfire parameterization is fuel combustibility, which is a function of atmospheric and soil moisture conditions. As the CLM5 performs reasonably well in capturing the interannual variability of wildfires, the great bias in fire counts could largely be attributed to the bias in modeling anthropogenic wildfires. It is therefore necessary to calibrate the parameterization of anthropogenic wildfires to improve the model performance over southwest China.
HIC calibration statistically significant linear relationship between the HIC and model biases for most grid points in the study area ( figure 1(b)), although negative HICs are suggested in the central and southwest of the study domain. HIC is set to zero for these grids to avoid unphysical results, even though it may not remove all the biases. On the other hand, the HIC value is relatively high (0.1-1) in the east, suggesting a stronger-than-default anthropogenic impact of wildfire ignitions. Note that it is likely that the frequency of natural-caused wildfires is underestimated for these grids, and the frequency of anthropogenic wildfires is overly adjusted with the calibrated HIC to recover the model bias of naturally-caused wildfires. In this case, the bias in the total number of wildfires is reduced while the model fails to retrieve a reasonable ratio of natural wildfire counts to anthropogenic wildfire counts. Since natural and anthropogenic wildfires share the same algorithms in calculating the spread and impacts of wildfires, the HIC calibration is regarded as a success as long as the model bias in fire occurrence is significantly reduced.
Importing the calibrated HIC with a grid-scale variation reduces the difference in the observed and modeled spatial distribution of annual fire counts. The model biases decrease for a majority of grid points. The PCC increases from 0.57 to 0.78 (p < 0.01), and the RMSE reduces from 0.58 to 0.18 ( figure S2(c)). Improvements are also apparent shown by the temporal variations. The RMSE decreases from 560.8 to 146.4, and the linear correlation coefficient increases from 0.69 to 0.77 (p < 0.01) between the two timeseries ( figure 1(a)). The linear correlations between the timeseries keep being statistically significant before and after the model modification.
The grids with calibrated HIC > 0 have smaller model biases compared to those with calibrated HIC = 0 before and after modification (figures 2(a)-(d)). Even so, for the grids with HIC = 0, the model biases still reduce as much as, if not higher than, the  grids with HIC > 0, making the overall extent of model bias reduction substantial after modification (figure 2(e)).

Global-scale model evaluation
The regional test has proved that grid-scale HIC with calibration can improve the wildfire occurrence modeling in the region that has a complex terrain/land cover and large population. A similar approach is then applied worldwide to validate the applicable area of the model modification on a global scale. The global-scale calibration suggests positive HICs with statistical significance in clusters of regions, including the Northeast and East part of China, South and Southeast Asia, west continental Europe (including Scandinavia), and the Northwest and East part of the US ( figure 3(b)). Specifically, the calibrated HICs for the Northeast US and Western Europe typically are between 0.0001 and 0.01, while Southeast US and Southeast China are with relatively high HIC values (0.01-0.1). These regions are generally covered by forests and characterized by large population densities and reasonable scales of economic activity. In regions like the Sahara Desert, northern Siberia, and Greenland, negligible amounts of wildfires are observed so the HIC calibration does not make a difference in wildfire occurrences. Some regions in India, central Europe, and the Midwest US that has an extensive cover of croplands have been omitted from the analysis. Zero HIC values are calibrated for the Figure 3. (a) The annually-averaged fire count derived from the FIRMS data. The observational result is interpolated onto the CLM grid, for which the grid points with the area weight of tropical closed forests >60% or cropland >30% removed from the analysis. (b) The spatial distribution of the calibrated HIC on a global scale. Black dots denote the 95% significance of applying a linear rescaling of HIC for this point. Grey dots denote the statistical significance of linear regression and the calibrated HIC is adjusted to zero to prevent unphysical model results.
central and southern parts of Africa, as well as for a large part of South America and Western Australia. For these regions, the population density is so sparse that adjusting HIC makes little difference in changing the number of human-caused ignitions (Smith 2017).
On the other hand, the model biases linearly fit to HIC with statistical significance over sparselydistributed grids over Siberia and the northern part of North America (Alaska and northern Canada), where wildfires are known to occur in the boreal forests and Arctic tundra during summer (Abatzoglou and Kolden 2011, Hu et al 2015, Kharuk and Ponomarev 2017. The population density is so low in these regions that wildfires are caused by natural reasons. Calibrating HIC, therefore, does not make a significant difference. Studies have revealed that lightning is the major cause of wildfires in the boreal forest and tundra over Siberia and Alaska (Ivanova et al 2010, Kasischke et al 2010). Li et al (2013) have also indicated that the CLM model has a relatively poor performance in modeling wildfires in the Arctic, which could probably be due to a proportion of peat fires (van der Werf et al 2010).
On the global scale, grids with positive HIC (grids with black dots shown in figure 3(b)) imported have the biases almost disappeared (figures 4(a) and (b)). Among these grids, some of them in the mid and south part of Africa, South America along the boundary of the Amazon, and the Grate Mekong River watershed receives less reduction in model bias after modification (figure 4). These regions are characterized by their adjacencies of large tropical forests or croplands. The inevitable inclusion of deforestation fires and/or agricultural fires in the FIRMS data while omission in model results could explain these biases. Further improvements in these grids rely on a sub-grid scale evaluation and calibration work, which is worth a separate study to deal with.
For grids that pass the significance test but with calibrated HIC = 0 (originally negative; grids with gray dots in figure 3(b)), few grids that are scattereddistributed in South America and Central Africa are with reduced model bias after modification. The bias is higher than 10 times/year and barely decreases after modification (figures 4(c) and (d)).

Discussion
For both the regional and global tests, there are grids with calibrated HIC = 0 has negative biases before but switch to positive after modification, suggesting that the calibration of HIC overly decreases the fire counts. It is speculated that there could be a non-linear relationship between fire counts and HIC for these grids, which is against the designated algorithm in the current parameterization. Some studies have argued the reasonability of the direct proportionality of humaninduced wildfire ignition and the population density as in the CLM5 and other land surface models (Moritz et al 2012, Fusco et al 2016. Without developing new algorithms, the model evaluation verifies that there are certain regions in which the model modification approach in this study is not applicable. Such a feature shown by the global test is different from the regional test, and one of the reasons may be that the natural and anthropogenic causes of wildfires are so various and inconsistent for different regions around the globe. The global test presents the model bias higher than 10 times/year in southwest China even after modification while the regional test can remove a majority of the model biases. The inconsistency could be a result of the coarser resolution (2 degrees) of both the atmospheric forcing and model grid in the global test. The atmospheric forcing, as well as the population density background data, do not have a sub-grid tiling structure as the land use/cover types do in the CLM5, so the small-scale variation of the atmospheric background and population density distributions due to the complex terrain is likely to be hindered by the coarse resolution, potentially enlarging the model biases in some regions. In this way, a high spatial resolution of model configuration is recommended as practically as possible for better performance of this approach.
The model modification in this study is based on the calibration of a human impact factor, which explains why the effect is better for the region with relatively high population density and a reasonable scale of economy. On the other hand, regions with sparse populations receive little effect through this approach. In the CLM5, the naturally-caused wildfires are addressed by a parameterization of cloudto-ground lightning (Li et al 2012). A separate study on the modification and calibration of the parameterization of natural wildfire ignitions could improve the model performance in remote areas, such as Siberia and Northern Canada. Further advancement in retrieving the monthly to interannual variation of wildfires also relies on the advancement of the natural wildfire parameterization scheme. Furthermore, it could help to untangle the natural and anthropogenic fire events through the total fire counts by calibrating both parameters. Separate work could be addressed on the calibration of parameters along with new datasets on lightning frequencies, which is beyond the scope of this study.
The spatial distribution of HIC in this study is obtained from the model evaluation work based on the observation from 2001 to 2019. In theory, the HIC calibration has its best performance for the cases with the same period, while for other years, it can also significantly reduce the model bias (see supplemental material). More importantly, what the spatially-varying HIC represents is a geographic distribution of human impact on wildfire ignitions. It could work as a scenario for the simulation projecting the potential change of the spatiotemporal variation of wildfire occurrence with future climate as the background.
On the global scale, the parameters in the fire module have already been calibrated during the development phase (Li et al 2013), which is based on the burned area data in 1997-2004 available via Global Emissions Fire Database version 3 (GFED3; Giglio et al 2006) that is also post-processed from the MODIS data (Liu et al 2010, Lasslop and Kloster 2017, Tang et al 2021. This study introduces the latest observations with a longer temporal coverage available to provide complementary evaluation/calibration work via the perspective of fire occurrence. The evaluation in this study focusing on fire occurrence rather than burned area emphasizes more on the individual wildfire events/ignitions, especially inclining on the small fires that have not been included until the latest version of the GFED data (GFEDs4.1; Randerson et al 2017). This study is based on the CLM5 model for its extensive applicable fields and large user community, not to mention its coupling capability with other components (atmosphere, ocean, river, etc). The achievement in this study gives the Earth system model a greater potential in projecting the spatial-temporal distributions of wildfires and the corresponding impacts. With a grid-scale prescription of HIC, scenarios could also be designed in terms of the variation of wildfire occurrences based on the Earth system model to examine their climatic impacts on the regional and global scales.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// firms.modaps.eosdis.nasa.gov/download/.