SIMBAD: a simplified emission-concentration model for a computationally efficient assessment of energy policies impact on air quality

The reconstruction of airborne pollutant concentration fields based on emission reduction scenarios is a complex task. Simulations with chemistry and transport models (CTM) are computationally expensive and not suited for iterative optimisation that could require the evaluation of a great number of scenarios. To address this, data-driven surrogate models have been used to approximate the relationship between emission and ambient air concentrations. In this study, a different approach is presented. SIMBAD is a simplified model based on CAMx (Comprehensive Air Quality Model with Extensions) and the Direct Decoupled Method (DDM) algorithm that estimates concentration changes due to variations in emission fields. SIMBAD was validated by comparing PM10 and NO2 concentrations with CAMx simulations implementing the same emission variations. The testing scenarios involved different emission reductions and precursor species to assess SIMBAD’s performance in both simple and complex cases. The model’s performance in reproducing the non-linear nature of atmospheric processes was satisfactory, showing an average Root Mean Square Error (RMSE) always lower than 0.2 μg/m3 and a normalised bias below 2%. Slightly lower accuracy was found in more complex scenarios involving multiple pollutants and sectors modified simultaneously. Overall, SIMBAD proved to be an efficient and accurate tool for evaluating the impacts of energy policies on air quality, providing valuable insights for policymakers and researchers alike.


Introduction
Quantitatively assessing the impact of anthropic activity scenarios on air quality requires an accurate description of the resulting change in the atmospheric concentration of the main pollutants over the area under investigation and for the period of interest.This can be achieved applying mathematical models that describe the complex and non-linear dynamics of the transport, diffusion and chemical transformation of pollutants emitted by human activities.However, those models are based on numerical codes that require considerable computational resources, so that only a very limited number of emission scenarios can be assessed within a reasonable timeframe.Therefore, even if accurate, this approach does not allow the exploration of all possible emission reduction scenarios.The need to investigate the effects of a very large set of emission scenarios may be determined, for example, by the need to identify the most effective solution fulfilling a specific environmental objective, for example the one that introduces the greatest concentration reduction for each emission unit removed or the one that most reduces the overall exposure to one or more pollutants.In these cases, which can be traced back to the resolution of an optimisation problem [1], mathematical tools capable of examining an extremely large number of alternatives very quickly are required.
Over time, several expeditious tools have been developed, the so-called simplified models or metamodels [2] [3], in which much simpler mathematical relationships between different input variables (e.g., emissions) and output variables (e.g.concentrations) are established.Such models can be based on parametric formulations that relate input output variables through algebraic relationships generally of no higher order than two (e.g., GAINS [4]).In other cases, the simplified model is based in more general terms on machine learning-type models [2] using regressive techniques (e.g.[5] derived from the CHIMERE model) or neural networks (RIAT+ and MAQ, [6] [7]).In most cases, these models contain a set of parameters whose values, specific to each case study, are estimated by means of error minimisation when comparing the results obtained with the simplified model, as a function of parameter values, to a set of reference simulations performed with the full model.
In this study SIMBAD (SIMplified emission-concentration model BAsed on DDM) is presented.SIMBAD is based on a different approach, compared to the state-of-art simplified models.It derives from CAMx (Comprehensive Air Quality Model with Extensions) [8], a multi-scale photochemical modelling system for gas and particulate air pollution, and the Direct Decoupled Method [9].DDM is an algorithm developed for model sensitivity evaluation that computes first-order derivatives of concentration fields with respect to multiple input parameters, evaluated for each grid cell at every time step of the CTM simulation.Sensitivity coefficients represent the change relative to the base case due to a unit variation of the input fields.The final concentration can be computed as a Taylor expansion limited to first-order terms.At this stage the simplified model has been implemented for the analysis of four emission sectors: residential heating (detailed in biomass and "non biomass" fuels), road transport and thermoelectric production.To assess the accuracy of SIMBAD, some calculations were performed comparing the results with those produced by simulations with the CAMx reference model driven by the same emission scenarios.The article is organised as follows: the model formalization is presented in section 2. , followed by section 3. where the modelling set up and the implementation of the validation scenarios are reported.Validation results for PM and NO2 are described in section 4. and the conclusions, including further development of the simplified model, are drawn in section 5.

Model formulation
The simplified model was defined by exploiting the principle expressed by the Taylor series, i.e., that the variation of the value of a generic analytical function around a reference point can be reconstructed, to a certain approximation, by means of an algebraic polynomial.In general terms, we can state that, in the complete three-dimensional model, the values assumed by the concentration field (, , , ) depend implicitly on the value assumed by the multiple parameters enclosed within the various terms of the chemistry and transport equation, e.g., wind speed and direction, chemical reactivity coefficients, emission rates, etc [10].The variability (or sensitivity) of the concentration field  with respect to a generic parameter  0 included in the model equations can always be described through the Taylor series of sensitivity coefficients.In the case of first-order approximation they are defined as where  represents the rate of change in concentration , due to the change in parameter , calculated at  0 corresponding to the "base" or "reference" configuration.Generalising the formulation to a vector  of  parameters ( 1 ,  2 ,  3 , …   ), we can express the change in concentration  in response to a change in the parameter vector  with the following expression: The CAMx model includes a special algorithm, called the Decoupled Direct Method (DDM, [9]), which allows a generic simulation ("base case") to be associated with the quantitative evaluation of first-order sensitivity coefficients with respect to certain input parameters (e.g. the emissions of a certain activity sector or geographical region).These coefficients are evaluated for each cell (, , ) of the simulation domain and for each time step ().The dependence on the generic parameter  is introduced with the following formalisation: (, ) = (, ) +   •   (, ) where  represents a generic model input field, relative to all interested species, in the unperturbed simulation (e.g. the emissions from all emission sectors) and   represents the perturbation function associated to parameter   (e.g. itself, or the emissions for the road transport sector only).Based on this formalization, DDM computes first order sensitivity coefficients  , (1) (, ), which quantify the concentration variation of species  in cell (, , ) at time , due to the perturbation   .According to this formulation, the unperturbed situation corresponds to   = 0, so equation ( 1), limited to terms of order one, can be written as:   (, ,   ) =   (, ,   = 0) +   •  , (1) (, ) In equation ( 2)  , (1) is the first-order sensitivity coefficient calculated by DDM.Once the coefficient  , (1) has been estimated, it is then possible to calculate the value of the concentration field of a pollutant species as a variation with respect to the unperturbed case (  = 0), by means of a simple linear expression.Going back to the above example and assuming   represents road transport emissions, it is possible to estimate the effect of a 30% reduction in transport emissions on concentrations of species , considering a value of   = −0.3.If we generalise equation ( 2) to the case of multiple emission sources, be they categories or regions, we obtain, implicitly implying dependence on ,  and : where  are the sources considered for the perturbation;  () denotes the concentration calculated with an emission change of     for each source considered;  () denotes the concentration in the unperturbed simulation;   denotes the maximum perturbation for source ;   () denotes the values of the sensitivity coefficients for emission sector or region  calculated by DDM in a simulation with total emissions equal to the base case and maximum perturbations for each emission source equal to   .For example, for   = 1 we obtain the concentration determined by an increase in emissions from source  equal to   , similarly for   = −1 we obtain the concentration determined by a reduction in emissions from source  equal to   .Several tests were performed to evaluate the response of the simplified model to different perturbations, and a more robust formulation was identified, where the concentration fields relative to a certain scenario are calculated as a weighted average between two estimates, obtained through simulations in which the same perturbation is applied to two distinct emission configurations, corresponding to the most probable extremes of variability of the emission scenario.The first one (labelled as 100) corresponds to an input configuration where all emissions are kept; the second one (labelled as 50) where the emissions of all investigated sectors were reduced by 50%.The resulting formulation of the model is given in equation ( 4), in which for each sector the absolute value of the perturbation is set for both simulations equal to 50% of the total emissions of the base case, i.e. equal to the difference between the emissions used for the two input setups (100% and 50% respectively).The weighted average mentioned above is calculated between the final concentrations corresponding to a certain reduction level , computed with the Taylor polynomial starting respectively from concentrations  (100) obtained from the base configuration and the concentrations  (50) from the configuration with emissions reduced to 50%.The variables  (100) (50) and  (50) (50) represent the sensitivity coefficients calculated by DDM in the two simulations, corresponding to a 50% change in emissions.The relationship between the reduction level  and the value of the perturbation parameter  and the formulation of the factors  100 and  50 used for the weighted average are given below.The model formulation is also generalised so that independent variations can be considered for each emitted species (precursors) in addition to each emissive sector.Since the considered sensitivity coefficients are first-order ones, the analytical relationship is still linear.Annual average concentration of species  obtained in the simulation with reference emissions (base case) Annual average concentration of species  obtained in the simulation with a 50% reduction of emissions with respect to the base case Sensitivity of the concentration of pollutant  to a 50% perturbation in precursor  from sector , calculated in the simulation with reference emissions (base case) Sensitivity of the concentration of pollutant  to a 50% perturbation in precursor  from sector , calculated in the simulation with a 50% emission reduction •  , Emission variation coefficient for precursor  and sector  [%] (e.g., 100% for base case; 0% for source removal; 150% for emission increment by 50%) •  ,(50) = 0 ( , > 100%) •  ,(50) = 1 ( , < 50%)

Modelling chain set up
The modelling chain used for the estimation of sensitivity coefficients and for the following validation of the simplified model is shown in Figure 1.CAMx v7.1 (Comprehensive Air Quality Model with Extensions) is an Eulerian chemistry and transport model able to produce high-resolution simulations of three-dimensional concentration and deposition fields of the main air pollutants (O3, NO2, NOx, PM10, PM2.5, CO, SO2) [8].It can work on complex orography, dealing with transport, diffusion, dry and wet deposition phenomena.CAMx model set up is summarized in Table 1.The estimation of the sensitivity coefficients required the execution of two CAMx simulations with the DDM module [9], one in the basic configuration (no reduction in total emissions) and one with a 50% reduction in residential heating and road transport sectors emissions.In both cases, the emission perturbation was 50%.
The emission dataset is based on the Italian and European emission inventories [11] [12], processed using SMOKE v3.5 (Sparse Matrix Operator Kernel Emissions) model.The SMOKE model spatially and temporally disaggregates the emission inventory, as needed by the CAMx model.The estimate of the biogenic emission contributions in the domain is computed using the MEGAN model (Model of Emissions of Gases and Aerosols from Nature [13]) which needs meteorological and land use information.Finally, SeaSalt v3.1 returns a marine spray emission.
Meteorological fields were computed using WRF model [14], a prognostic meteorological model suitable for applications on spatial scales ranging from a few meters up to thousands of kilometres.The large-scale fields used for initializing weather simulations and defining their boundary conditions come from the ERA5 global reanalysis fields database of the European ECMWF.
The initial and boundary conditions for the dispersion model come from continental-scale forecast simulations of the CHIMERE model acquired from INERIS Prev'Air service [15].An example of the annual mean concentration fields produced by CAMx for the base case with no emission variation is shown in Figure 2.

Validation scenarios
The simplified emissions-concentrations model SIMBAD aims at a rapid reconstruction of the pollutants concentration driven by the precursors emission variation, mimicking the CTM behaviour.The reduction in computational times is achieved because of a significant simplification of the modeling approach: from a three-dimensional dynamic model to a linear analytical formulation.It is therefore essential to validate SIMBAD concentration fields with CAMx simulation results, guided by the same emission reduction scenario.SIMBAD performances are validated using 8 scenarios that vary differently both precursors (primary particulate matter, volatile organic compounds, nitrogen oxides, sulphur dioxides and ammonia) and sectors (thermoelectric production, biomass residential heating, non-biomass residential heating, and road transport) emissions.All emission variations refer only to the grid cells belonging to the Italian territory.
In Table 2 the validation scenarios are reported.The first two scenarios aim at evaluating SIMBAD performances when the emission reduction is outside the 0-50% range.In the first case a reduction of 70% of all emissions in all sectors is introduced, while in the second case an increment of 20% of emission is tested.The additional 6 scenarios identify reductions of 30% in some specific sectors (road transport, biomass and non-biomass heating) both for all pollutants and for some individual precursors: nitrogen oxides (NOx) in road transport and PM10 in biomass heating.SCE70_02R_NOBIOM assesses a 30% reduction in emissions of all precursors in the non-biomass heating sector (mainly natural gas) which impacts mostly on NO2 concentrations, while PM variations are negligible, since PM depends, for this sector, primarily on emissions from biomass combustion (Figure 2).In fact, the SCE70_02R_BIOM_PM10 scenario investigates only the impact of the reduction of primary particulate matter in the biomass residential heating sector, whose combustion causes a negligible variation in NO2 concentrations, while, as expected, PM2.5 concentrations drop with peaks of 30%, as shown in Figure 3, because concentrations of particulate matter caused by biomass burning is mainly of primary origin.

Validation of PM2.5
Particulate matter (PM) is a pollutant of great interest from the air quality planning point of view because it is a complex pollutant, consisting of several components with different characteristics, and whose presence in atmosphere is caused both by direct emissions (primary particulate matter) and by the chemical transformation of gaseous precursors (secondary particulate matter).PM is, together with nitrogen dioxide, one of the most critical pollutants in Italy, especially in the Po Valley.The spatial distribution of the PM average annual concentration over the domain shows peaks in large urban areas and diffuse high concentrations in the Po Valley, where emission density, orographic conformation and the peculiar meteorological conditions can limit pollutants dispersion (see Figure 2).The simplified model performance indicators, reported in Table 3, generally show a good performance of SIMBAD model reproducing the particulate matter concentrations calculated by CAMx.Indicators are slightly worse for the first scenario: examining the scatterplot and the maps reported in Figure 5 and Figure 6, we can see that the relatively higher bias occurs over the Po valley, where SIMBAD appears to not fully reconstruct nonlinear dynamics and estimates slightly higher concentration values than CAMx.This could also be due to the choice of the sectors investigated, which in this case exclude agriculture and livestock.The latter are responsible for 94.3% of total emissions of ammonia [18], an important precursor of PM, which in this scenario is reduced only in the thermoelectric, residential and road transport sectors.This implies that sensitivity coefficients used for this scenario ( ,, (50) (50) ) were computed with simulation characterized by lower NOX emissions, but still high ammonia emissions.In particular, the result obtained seems to indicate that in the scenario under consideration, characterized by a strong reduction in emissions of NOx (up to 70%), but without significant changes in ammonia emissions, the actual efficiency of PM reduction (in terms of secondary inorganic compounds, nitrate and ammonium) is higher compared to what SIMBAD computes.On the other hand, especially in the Po Valley urban contexts, sensitivity coefficients for NOx emissions computed in base case conditions are related to a chemical situation where there is an excess of ammonium, therefore NOx variations have less impact on PM concentrations.Qualitatively similar behaviour is also observed in other scenarios involving a reduction in NOx emissions, especially in the road sector (Figure 7 and Figure 8).The ability of SIMBAD to reconstruct the primary PM fraction, meaning particulate matter directly emitted into atmosphere, is confirmed by SCE70_02R_BIOM_PM10 scenario (Figure 9), where only the primary PM10 of the biomass heating sector is reduced.The concentrations are well reconstructed: the difference in absolute value between the average concentrations estimated by CAMx and SIMBAD is always less than 0.04 µg/m 3 ofPM2.5,corresponding to a relative BIAS always less than 3%.

Validation of NO2
Nitrogen dioxide, NO2, is mainly a secondary pollutant, which depends on the emissions of nitrogen oxides (NOX = NO+NO2) produced by combustion processes, such as heating systems, vehicles engines, industrial combustion.In Italy, the spatial pattern of NO2 concentrations is strongly linked to road traffic, responsible for 46% of national NOx emissions especially in the most populated areas (see Figure 2).As can be seen from the indicators reported in Table 4, the simplified model is able to reproduce in a very satisfactory way the average annual concentrations calculated by CAMx, with a normalized average error on the domain of less than 2.5%.In Figure 10 the scatterplot of scenario SCE70_07T is reported, showing how accurately SIMBAD reconstructs CAMx concentrations, with only a slight overestimation.It is interesting to analyse the scenarios that specifically affect transport alone, the main emitter of nitrogen oxides.In scenarios SCE70_07T and SCE70_07T_NOX, very low errors are observed, at most 2%, especially in highly urbanized areas with a particular emissive density, such as large cities and the Po valley (Figure 11 and Figure 12).As expected, the SCE70_02R_BIOM_PM10 scenario, which foresees a reduction in emissions of PM10 alone from the residential biomass combustion sector, does not impact the estimated NO2 concentrations that are correctly reconstructed by the simplified model.

Conclusions
In this study a novel approach for the formalization of a simplified emission-concentration model is presented.It is based on the Decoupled Direct Method implemented in the chemical transport model CAMx, that computes sensitivity coefficients representing the change relative to the base case due to a variation of the input fields.In this approach, the final concentration can be computed as a Taylor expansion limited to first-order terms.The simplified model estimates concentration fields relative to a defined emission scenario computing the weighted average between two estimates both based on Taylor expansion but referred, respectively, to a 50% and 100% emission scenario.
Currently SIMBAD can investigate four emission sectors: thermoelectric generation, residential heating (detailed in biomass and "non-biomass" fuels residential heating), and road transport.In this study SIMBAD has been validated in the reconstruction of Particulate Matter (PM) and nitrogen dioxide (NO2) concentrations, comparing the annual average concentration fields with CAMx simulations, driven by the same emission reductions.Validation results show a general good accuracy of SIMBAD in reproducing CAMx concentration fields for both primary and secondary PM.Indeed, these results were expected when referred to emission scenarios where the emission-concentration process is mainly linear: for example, in SCE70_02R_BIOM_PM10 only PM10 emissions in the biomass fuels residential sector were reduced.In this case the variation in PM concentrations is related mainly to non-reactive fraction, therefore a linear behaviour was expected.Similar conclusions can be drawn for NO2 concentrations estimated by SIMBAD in the road transport related emission scenarios: the relationship between variations in NO2 concentrations and NOx emissions is almost linear, meaning that concentration reduction can be estimated with high accuracy by the model.
Differently, in scenarios involving all precursor emissions (NOX, VOC, NH3, primary PM and SO2), non-linearities related to chemistry can play a more relevant role.In these cases, SIMBAD is not fully able to capture such phenomena, however, this validation scenarios show a small error of SIMBAD, confirming the robustness of the implemented approach but also suggesting that the analysed emission sectors are characterized by limited non-linear effects.
Starting from such positive findings, further development of the SIMBAD model will concern: a validation process to be applied specifically on PM secondary components (nitrates, sulphates and ammonium) to quantify the capability of the model to capture concentration changes of these species and understand how much the model underestimates total particulate matter concentration reduction when all the precursors are reduced, minimizing the impacts on secondary PM.Furthermore, sensitivity coefficients are produced by CAMx on hourly basis, hence SIMBAD can work on hourly or daily resolution, thus representing a relevant added value compared to other simplified emissionconcentration models.A validation on daily basis can help in understanding if seasonal phenomena are well captured and if the simplified model is suitable to estimate also short-term indicators (e.g.daily exceedances).
Moreover, SIMBAD can be expanded, in order to include more emission sectors (i.e. industry and agriculture) and other pollutants.Agriculture is an essential sector for ammonia emissions, an important precursor of PM.The impacts of NH3 on PM concentrations is highly non-linear, affecting secondary inorganic PM components formation, therefore the analysis of agriculture emissions can be a challenging test for the model.

Figure 1 .
Figure 1.Modelling chain used for the estimation of sensitivity coefficients and for the implementation of the validation scenarios.

Figure 11 .
Figure 11.Annual average NO2 concentration maps (ppbV) of SIMBAD mean error (left) and normalized mean error (right) for scenario SCE70_07T.

Figure 12 .
Figure 12.Annual average NO2 concentration maps (ppbV) of SIMBAD mean error (left) and normalized mean error (right) for scenario SCE70_07T_NOX.
Annual average concentration of species  resulting from emission changes in  precursors and  sectors •  (100)

Table 3 .
SIMBAD model performance indicators of PM2.5 concentrations for all validation scenarios.

Table 4 .
SIMBAD model performance indicators of NO2 concentrations for all validation scenarios.