Deep learning techniques applied to super-resolution chemistry transport modeling for operational uses

Air quality modeling tools are largely used to assess air pollution mitigation and monitoring strategies. While neural networks (NN) were mostly developed based on observations to derive statistical models at stations, the use of Eulerian chemistry transport models (CTMs) was mainly devoted to air quality predictions over large areas and the evaluation of emission reduction strategies. In this study, we investigate deep learning architectures to create a metamodel of the process oriented CTM CHIMERE and significantly reduce the computing times required for super-resolution simulations. The key point is the selection of input variables and the way to implement them in the NN. We perform a quantitative evaluation of the proposed approaches on a real case-study. The best NN architecture displays very good performances in terms of prediction of pollutant concentrations observed at stations with respect to the raw super-resolution CHIMERE simulation, with a correlation coefficient above 0.95. The best NN is also able to display better performances when compared to observations than the raw high resolution simulation. Currently the model is designed to be used for air quality forecasting and requires improvement for the definition of air quality management strategies.


Introduction
Currently, about 55% of the world's population lives in urbanized areas, and this number is expected to increase by 68% by 2050 [1]. As mentioned in the most recent European Environment Agency annual report, around 25% of the European urban population is exposed to air quality exceeding the European Union air quality standards, and air pollution is the leading preventable risk factors for premature death in Europe, being responsible for 400,000 deaths each year [2]. The simulation of Air Quality in urban areas remains a major challenge, in particular to assess the exposure of citizens to pollutants, identify the sources of pollution and adapt the strategies to lower the air pollutant concentrations. Operational modeling tools are expected to be more robust and computationally-efficient to quickly simulate air quality and propose adequate measures to monitor, curb and control air pollution. At the regional scale, operational forecasting systems mainly rely on deterministic models that perform simulations with average resolutions of 10 km in the case of a European domain and 3km to 1 km for a regional to a urban domain. A chemistry transport model like CHIMERE [3] is suitable to work at such resolutions. This type of models is used in well-known platforms such as the COPERNICUS ensemble forecast [4], the French national forecast PREV'AIR [5], or the regional forecasts of air quality monitoring associations in France such as Airparif for the Paris region [6].
Machine learning techniques have also emerged as relevant solutions to forecast air quality at stations and using observations (meteorology, concentrations, emissions, landcover, etc.) as predictor variables [7][8][9][10][11][12][13][14][15]. Recently, deep learning schemes based on neural networks (NN) [16] has become more and more popular with increasing computer power and training data availability. Novel approaches using wavelet artificial network techniques have been recently used for short term air quality forecasting [17]. They seem now relevant and possible to use in the air quality community. These studies show the need to evaluate: (i) the influence of the length of training data utilized on the overall NN model performance, (ii) the significance of the selected predictors and utilized model structure on the complexity and overall NN model performance, (iii) the links between the selected data normalization scheme and transfer function used and (iv) the influence of the initialization schemes for the weighting, bias and other training parameters on the overall NN model performances.
Some very recent works propose to embed NN techniques within CTMs, the objective being to get the best of state-of-the-art physics on-board CTMs and NN approaches [18,19]. For instance, in [18], neural network schemes were used and evaluated to emulate process-oriented modeling outcomes. Especially, they designed a simple recurrent 3-layer NN to reproduce daily mean concentrations of some pollutants over Europe as predicted by the CMAQ (Community Multiscale Air Quality model) model. They found the trained NN may estimate air pollutant concentrations several orders faster than the original model and with reasonably small errors. They designed a simple recurrent 3-layer NN to reproduce daily mean concentrations of some pollutants over Europe as simulated by the Community Multiscale Air Quality model (CMAQ). Convolutional Neural Networks (CNN) have also been recently used for bias corrections to improve a 7 days air quality forecast issued from a chemistry transport model [20]. Other recent works showed the possible use of NN to act as drop-in chemical solvers with orders-of-magnitude performance gain [21,22], though error propagation over long time periods needs to be addressed.
Following these previous works, this study proposes a novel approach to design a NN-based emulator of high-resolution air pollution simulations which are highly time-consuming. In [23], they showed that a very simple regression with an adequate selection of input variables may relevantly emulate a CTM and mimic the behavior with identical performances. Though, this preliminary study was made on daily datasets over a large period. Here, we introduce a new CTM-NN strategy and base the training on a shorter period with hourly data. Compared with previous works, the novelty of our approach lies in the input data selection and the designed NN architectures. The use of an hourly dataset is also new compared to the previous study allowing a shorter period for the training phase and get a better representation of the evolution of dispersion conditions. We report a quantitative evaluation on a representative test case, which demonstrates the relevance of the proposed CTM-NN approach. At this stage an evaluation on PM 10 , PM 2.5 and NO 2 , ozone is not addressed and it will be in a follow-up study by introducing more physics in the NN approach. We further discuss possible applications and improvement for operational uses.

The original principles
In a previous study [23], a metamodel of the Chemistry Transport Model CHIMERE was realized to downscale a low 0.5 o × 0.25 o horizontal resolution simulation to a higher resolution of 0.1 o × 0.05 o . Using of air quality models is highly computationally-demanding mainly because of the CFL (Courant-Friedrichs-Lewy) condition. This condition is mandatory for convergence when partial differential equations are solved by finite differencing. The CFL for a grid box as presented in equation (1) is a ratio between the mean wind speed v x i , the time-step Δt and the cell size in all three directions i as Δx i . More the wind speed is high, more the time-step has to decrease. The wind speed varying every minute, the best way to optimize the numerical cost of a simulation is to adapt the time-step.
The first version of this methodology was based on a training process over a 6-months period and applied over the next 6 months. This simple and straightforward methodology provided means to assess the extent to which the proposed approach is able to capture the main spatial and temporal patterns of the main pollutants daily concentrations. The gain in computing time was very impressive because the costly step including all physics and chemical processes was by-passed. The assessment of performances observations were close to those obtained with the raw model high resolution simulations with CHIMERE. In the following, uppercase letters refer to variables at low/coarse resolution (LR) and lowercase for the high resolution simulation (HR). The bases of the methodology is described hereafter.
In [23,24], a given high resolution grid cell was assumed to behave as a 'city'. Based on the atmospheric diffusion theory, a simple methodology have been hypothesized to evaluate urban increments of concentrations due to a city [25]. Thus, under neutral atmospheric conditions, the vertical diffusion of a non-reactive pollutant from a continuous point source is described in a general form through the relationship as follow in equation (2) assuming a Gaussian dispersion in a box model approach : with s z 2 indicating the variance of the vertical diffusion after a distance x from the source, k as the eddy diffusivity and u as the wind speed.
As described in [23,24], a generalization of this equation has been hypothetized to evaluate the concentration difference Δc between a fine grid and a coarse grid simulation of a primary pollutant p concentration influenced by low level sources of primary pollutants which can be finally expressed as: • c and C (μg m −3 ) are the concentrations at the high resolution (HR) and the low resolution (LR), respectively. All concentrations are interpolated from the coarse grid to the fine grid.
• k is the vertical mixing coefficient (m 2 s −1 ) over the fine mesh.
• u is the 10 m horizontal wind speed (m s −1 ) over the fine mesh.
• δX, δY, δx, δy are the coarse longitude, latitude and the fine longitude, latitude increments of the meshes (in degree), respectively. In this study they are constant, but we have considered them, because for other domains they can vary.
• d and D are the characteristic lengths respectively derived from the previous parameters for the fine and coarse meshes.
• e and E (μg m −2 s −1 ) are defined as the high and coarse resolution low-level emission fluxes interpolated over the fine grid. The sum of primary PM 2.5 and PM 10 emissions is considered while for NO and NO 2 the total NOx emissions is taken into account. The emissions of the two first levels (approximately below 30 m) are considered.
• α and β are regression coefficients. They embed indirectly missing geographical, physical and chemical processes, partly lost during the simplification process. It is noteworthy that β here has not the same meaning than in [24,25], it represents here a residual value issued from the regression method and is expected close to 0.
An illustration of this concept is displayed in figure 1. As explained in the next sections, the previous equations list the relevant variables which are selected as inputs for the NN.

CHIMERE ouputs as input for the Neural Networks
For this new study, the CHIMERE configuration used to create the input data for our neural network strategies is briefly described here, and the reader can refer to the usual reference CHIMERE publications [3,[26][27][28] for details. Particulate matter includes primary particulate matter, secondary inorganic species such as nitrate, sulfate, ammonium and organic aerosol resulting from the oxidation of anthropogenic and biogenic precursors. Gas-particle partitioning of the condensable oxidation products [28] is also taken into account. Biogenic emissions are computed with the model MEGAN version 2.1 [29]. Sea-salt and mineral dust emissions issued from desert and agricultural areas are also implemented with recent parameterizations. Particle sizes range from 10nm to 40 μm over 10 bins.
The WRF simulation (WRF 3.7.1 version) used for the meteorology is nudged with NCEP (National Centers for Environmental Prediction) final analysis from GFS (Global Forecast System) meteorological fields at 1 o × 1 o . The 6-hourly time resolution version is used at the coarsest model initial and domain boundaries (ds083.2 dataset [30]).
For this study, the set-up and domains are rigorously the same depicted in a previous work aiming at simulating the air quality at fine resolution over the French Alps [31] providing CTM outputs for the meteorology and concentrations fields. Two different domains are defined as: (i) EUR01 covering a large part of Western Europe and (ii) over ALP0033 for the whole French Alps with detailed characteristics are detailed in table 1. The Alps domain (APL0033) encompasses an area from the Lyon municipality on the West part to the Leman Lake on the north to the Piemonte region in Italy on the East Part with a resolution of about 3 km (figure 2). Grenoble and the Arve valley from Geneva to Chamonix are known to be air pollution hot spots in France due to their location in deep valleys with frequent stagnant cold meteorological conditions in wintertime. The HR simulation is performed over the ALP0033 and the LR simulation over the EUR01 domain and interpolated over the ALP0033 so that the ALP0033 mesh will be the working grid for the neural networks. The simulation was performed from 2013-11-15 01:00 UTC to the 2021-12-21 00:00 UTC with a spin-up period from the begin of November to ensure a good initialization. Therefore 864 hours are available to train (432 hours) and evaluate (432 hours) the various neural network strategies. The available observation dataset for the model evaluation is reported in [31] including only rural and urban background stations (figure 2).

NN-based architecture in this study
The basic concept of our approach is presented in figure 3. We aim to replace the high resolution simulation HR by a neural-network-based model to save computing time so that we could produce quick scenarios analyses or air quality predictions. The key idea is to exploit the low resolution in the inputs of the NN to convey an  information on chemistry and long range transport processes that are of major importance for air pollution issues.
In [23], the downscaling from coarse/low (LR) to high (HR) resolution is performed by N pixel-grid linear regressions of the increment Δc based on equation (3), where N denotes the number of grid cells in the HR target ALP0033. In this work, we consider an extension of this preliminary work with more sophisticated deep learning-based models. We aim to train as N location-specific independent super-resolution (SR) operators c i and C i respectively denote the estimation of the high and coarse resolution (this latter been interpolated over the fine mesh ALP0033) in grid cell/pixel i, and Λ i = (d, D, e, E, k, u) i denotes as used in equation (3) the set of additional covariates at the same location over the high resolution mesh ALP0033. The first natural generalization of [23] is to involve independent pixel-grid multilayers perceptron (MLP) for SR operator Φ, see figure 4, instead of a linear regression. Here, we use a MLP architecture with two hidden layers of 16  used in the linear regression, the input data of the MLP with all the potential covariates are namely C, k, u, d, D, e and E and let the training extract the relevant features through the weight parameters of the first two hidden layers of the MLP. This choice provides more flexibility. In this configuration, the number of parameters for each submodels Φ i is 289. In the end, on the global 69x102 ALP0033 domain, the total number of parameters is 2 033 982. The test case here differs from [23] where a 12 months period was used for the training and validation and only the daily values where exploited. In this new study a shorter period of 36 days is used and the training and validation processes are performed on an hourly basis.
Because the working frame is a two-dimensional fields discretized on a regular grid, additional options may be envisaged for SR operator Φ by considering the use of convolutional neural networks (CNN) [16]. The underlying idea is to exploit the potential spatial relationships within local neighborhoods to ease the learning of Φ which will be a global SR operator and not an aggregation of N independent pixel-grid SR operators. This framework can be extended to spatio-temporal CNN even though we believe that for this super-resolution task, a spatial formulation of the problem is satisfactory enough. This is still an interesting idea though if the same work has to be achieved on data with high missing data rates, such as remote sensing and/or in situ dataset [32].
Here, a simple CNN architecture is first considered in which the inputs are the same covariates used in the MLP, but stacked here in addition of the coarse resolution C as supplementary channels. The CNN architecture comprises a first hidden layer with 128 Conv2D 3 × 3 filters + ReLU activation (Rectified Linear Unit) with a batch normalization, followed by a second hidden layer with 64 5 × 5 filters + ReLU activation. The final layer maps the outputs of the second hidden layer to the required HR resolution by a single linear Conv2D 3 × 3 filter, see figure 5. The total number of parameters for this CNN is 84 737.
We also draw from the learning-based super-resolution literature to evaluate more sophisticated CNN architecture. Here, we consider the Deep Residual Channel Attention Networks (RCAN) architecture [33], which is among the state-of-the-art deep learning models for the super-resolution of natural images. In this setup, the so-called residual-in-residual (RIR) structures are the elementary building blocks of the deep neural network architecture. RIR blocks are typically made up of several residual groups (RG) with long skip connections, while each residual group itself contains a predefined number of residual cells with short skip connections. The underlying idea lies in the fact that RIR blocks allows to bypass low-frequency information through multiple skip connections, thus allowing the main network to focus on the learning of high-frequency information. The RCAN architecture also involves a channel attention-based mechanism [34, 35] to rescale   channel-wise features through interdependencies among channels. The total number of parameters in our RCAN configuration is 299,393.
The so-called back-propagation strategy to calculate the fitting coefficient of the NN is the essence of neural network training. It is the practice of fine-tuning the weights of a neural net based on the error rate (i.e. loss) obtained in the previous epoch (i.e. iteration). Proper tuning of the weights ensures lower error rates, making the model reliable by increasing its generalization. Regarding the training phase, we use for all architectures as training loss function  the root mean squared error between the 'true' high resolution (c i ) and the output of the neural network (  c i ): It is noteworthy that it could be interesting in future works to assess if other loss functions can be of interest, for instance to constrain the SR operator Φ to behave better during pollution episodes. Regarding the training strategy, we use an Adam optimizer with a batch size of 4 through 100 epochs on a Microsoft Azure Virtual Machine (VM) powered by NVIDIA Tesla K80 with a GPU memory of 12GiB. The training time of a single MLP is only of 15 seconds, but without any parallelization, the whole training on the 7038 grid cells of the ALP0033 domain is about 26 hours. The same number of epochs is used for MLP, CNN and RCAN, though the RMSE loss function stabilizes between 50 and 100 cycles depending of the architecture. No major overfitting has been observed for this number of epochs though this is the case if increasing the number of cycles up to 200. The CNN-based architectures provide faster training procedures with only 50 minutes for the basic CNN and around 6 hours for RCAN. In the end, if the training time can significantly differ according to the NN architecture, their application on new datasets only takes a few seconds. We remind that it can take several hours for a full computation with a CTM at high resolution depending on the resolution and the domain size. All NN models are implemented using keras framework [36].

Results
In this section, we report the results obtained for NO 2 and PM 2.5 with the three NN-based super resolution strategies. We use the first half of the dataset for training: 432 hours ranging from 2013-11-15 01:00 UTC to 2013-12-03 00:00 UTC and the other half for validation, ending on 2013-12-21 00:00 UTC. Complementary results are provided in appendix A (figure A1, A2, A3 and A4) for PM 10 and the evaluation on an hourly basis in tables B1, B2 and B3 of appendix B. Figure C5 of appendix C provides an evaluation on Δc for each NN. The definition of evaluation metrics is provided in appendix D.
Typical patterns of high concentrations are observed along the road traffic network and urbanized areas with the original high resolution (HR). PM 2.5 concentrations maps are smoother since PM emissions are more spread over rural areas particularly due to wood burning and long range transport of such species.  architectures while the coarse resolution acts like a smoother of both NO 2 and PM 2.5 pollution level in these areas of interest.
Because the PM 2.5 increment of concentrations between the high and coarse resolutions is less easily explained by the high resolution emission covariates, the SRNN applied to PM 2.5 coarse resolution is slightly less efficient in comparison to the results obtained for NO 2 . It is especially noticeable in the southeast quarter of the domain where the convolutional-based SRNN NO 2 downscaling is very similar to the high resolution. Meanwhile, it is not straightforward to say if the modifications of the PM 2.5 coarse resolution proposed by the SRNNs really bring the simulation closer to the high resolution in this area of the ALP0033 domain.
If the gain provided by the super resolution solution presented here is obvious when looking at the maps, we present in figure 7(a) and 7(b) error statistics between the HR simulations and the various neural networks as Normalized Root Mean Square Error as nRMSE (solid lines) and correlation (dashed lines). These statistics are displayed on an hourly basis in order to identify which NN architecture behaves best along the validation period. Each grid point provide an hourly HR 'truth' and SR modeled output. CNN and RCAN architectures seem to increase the already significant gain of the MLP architecture. In terms of correlation, RCAN is slightly better than the basic CNN architecture but the NO 2 variability of the latter, see the CNN-based standard deviation on the Taylor diagram provided in figure 8(a), is closer to the HR truth. Correlations are usually higher than 0.95 for NO 2 concentrations for the three NN architectures. The hourly frequency results B are similar with again the best performances for the RCAN. It is noteworthy that even the evaluation on Δc for each NN (Appendix C) shows the ability of the NN to mimic the HR model for the Δc with a correlation up to 0.95 for NO 2 by the RCAN approach. Regarding the statistics related to PM 2.5 concentrations, they all indicate RCAN (see figure 8(b)) is the best super resolution approach among the three architectures evaluated. On this specific pollutant, it is also interesting to note that RCAN is the single super resolution architecture able to deal with the abrupt change of performance of the coarse resolution in the last hours of December 18 (see figure 7(b)): even if the nRMSE and correlation with HR become worse than earlier in the validation period, they are the best  trade-off while both the MLP and in a lesser way the CNN do not capture this singularity in the validation period. The same conclusions hold for PM 10 (see appendix A). Regarding this specific issue, the PM 2.5 mapping are also provided in figure 9 on 2013-12-18 18:00 UTC as complementary information. It is clear that the impact of the emission covariates is here largely overestimated in this situation with low PM 2.5 pollution levels along the roads. This issue directly relates to the learning issue of optimizing the RMSE loss function in average over this training period. It might be more efficient to consider other loss functions and training datasets to address the issue of learning how the HR behaves in specific conditions for operational applications.
As in [23] it is interesting to compare the HR, LR and the three NN simulation outputs with real observation data. For the observations available in the ALP0033 domain along the validation period, we provide in figures 10 and 11 additional Taylor diagrams for the comparison of HR, LR, and the three SRNN to daily-averaged rural and urban observations. Complementary statistics (average bias, RMSE and correlation) are also given in tables 10(c) and 11(c), as well as the original hourly statistics in appendix B: they lead to the same conclusions except that the MLP architecture might improve in terms of RMSE and correlation at the expense of a degraded variability. Regarding the NO 2 , because the high-resolution behaves relatively poorly on the rural observations with an average correlation lower than 0.25, it is not surprising to see the SRNNs doing the same. For urban areas, we report a greater similarity to the HR ground truth for all NN models with an average correlation of 0.75 for SR-MLP model and slightly lower performance for SR-CNN and SR-RCAN ones (0.70 for the basic CNN and 0.71 for RCAN). Regarding the poor concordance between rural observations and the HR ground truth, it suggests the importance of using data assimilation in high-resolution CTMs. On this particular issue, our SRNN models may be used as fast surrogate simulations in a model-based ensemble data assimilation framework [37][38][39]. They can also be extended to a fully NN-based data assimilation scheme, where the end-to-end learning strategy consists in using both the coarse resolution (with the HR covariates) and the observations to feed a neural network whose target is the anomaly between the observations and the coarse resolution [32,40]. Also, these formulations have the advantage of addressing both interpolation, reconstruction and forecasting issues where only a combination of LR and HR covariates, possibly irregularly-sampled, are available.
Performances on PM 2.5 concentrations for the HR resolution at rural stations is better than for NO 2 with an average correlation of 0.70 and RMSE of 5.62 μg m −3 ; for this pollutant, the CNN-based architectures are even closer to the observations than HR with an improvement of the correlation up to 0.82 with the basic CNN and a similar RMSE with RCAN. Even at urban sites, RCAN behaves better than HR with similar correlations but lower RMSE and biases. This supports the use of such a super resolution approach as surrogate model in operational applications.

Discussion
The best NN architecture RCAN is able to reproduce the behavior of the raw HR simulation with satisfactory performances. For some pollutants the NN model is even able to provide better results probably by smoothing some aberrant values calculated by the CTM during very stable situations leading to unrealistic concentration peaks. With minor improvements on ozone chemistry and the use of observational data to constrain the system similarly to MOS (Model Output Statistic) techniques or using a CNN approach for bias corrections [20], our approach can be quickly deployed for air quality forecasting. Once the training is performed the forecasting chain could deliver a forecast in a few second instead of hours. Then, it should be further investigated if a generic NN-based model is sufficient or if it has to be adapted to specific meteorological conditions. In the latter, a training strategy must be investigated with probably a moving 15 days or a monthly update of the learning process by taking the last 15 to 30 days for instance. This would have the advantage of training the NN with similar meteorological conditions.
More interesting, expectations lie in the field of air quality modeling for policy making and impact assessment. These NN approaches can be complementary of statistical analysis embedded in surrogate models like the Screening for High Emission Reduction Potential on Air-SHERPA [41,42] developed to support the design of air quality plans in the context of the EU Air Quality directive [43] by the member states. The approach proposed in SHERPA is based on the cell-per-cell relationships linking the concentration at a grid cell i to the emissions in the surrounding cells. It builds on the concept of Geographically Weighted Regression (GWR) as used in [44]) or local modeling approaches [45], a family of approaches that uses 'bell-shaped' kernel functions to establish weighted, local regressions between input and output variables. SHERPA is designed to evaluate the impact of an emission reduction for a given activity sector and area to a selected location through the mathematical representation of Source Receptor Relationships (SRR). The SHERPA model works so far on yearly and season averaged concentrations. Working over large time-averaged periods smooths the results and limits the impact of non-linearities induced by complex high-frequency phenomena and interactions between chemical species. SHERPA requires a minimum number of simulations with targeted emission reductions scenarios and the goal is to increase its resolution for a better representation of the local scale. Definitively, our approach paves the way for producing fast scenario simulations at high resolution to feed this type of models, but care must be taken to ensure that a minimum of physics is embedded to deal with non-linearities as previously mentioned. For instance, the case of Ozone but also the formation of secondary particles like the ammonium nitrate can be highlighted. Our approach could also inspire new developments in SHERPA-like models benefiting from more recent developments in machine learning techniques on image processing and analyses. Moreover, our developments can be easily adapted to any other CTM outputs.
Future works may focus on how to integrate physical constraints in the neural network to improve these first encouraging results on a very complex area with steep slopes enhancing local effects. However, this approach deserved to be tested over a larger domain and a finest resolution with very different chemical and meteorological regimes. It is especially relevant for O 3 , in [23], a special treatment for such a secondary pollutant is proposed based on the two main equations of the ozone chemistry involving NOx and Ozone. If the correlations for NO 2 are even better for the RCAN architecture compared to CHIMERE versus observations, the discrepancies in terms of bias can be a consequence of the local interactions with Ozone that are not considered in our methodology. This type of physical processes can be easily implemented in an efficient NN-based scheme as a way of forcing the consistency between the super resolution outputs for NO 2 , NO, O 3 and probably the Volatile Organic Compounds. This directly relates to one of the main branch of physically-guided neural networks that aims at designing specific NN architectures to embed the physics in the modeling system. An other option would be to keep similar CNN and attention-based architectures proposed in the paper while adding additional constraints on the physics in the loss function: it is an active field of research in what is called 'physically-informed neural networks' [46][47][48]. At last, the use of satellite data like aerosol optical depth with ground observations as input data for a NN allows to create an adequate model to predict Super-Resolution PM2.5 concentrations as it has been reported over Beijing [49,50]. Introducing observational data in our approach is another way to improve a forecasting system based on a metamodel.

Conclusion
In this study, we have developed and evaluated the relevance of neural-network super-resolution approaches to downscale coarse chemistry transport modeling simulations focusing on three criteria air pollutants: NO 2 , PM 2.5 and PM 10 . These learning-based techniques take advantage of (i) fast coarse simulation outputs from the CHIMERE CTM embedding complex mathematical representations of physics and chemistry and particularly the long-range transport of pollutants, and (ii) more local features which are retrieved by neural network approaches. The reported quantitative and qualitative evaluation against both the high resolution reference simulation and real observation datasets support the relevance of the neural-network-based downscaling for the operational monitoring and forecasting of air quality. Future directions at short and medium terms are identified to use this kind of techniques handling non linearities related to secondary pollutants, interactions between species and emission reduction strategies.
At last, let us remind that using neural network is complementary of developing more and more complex physical models. They are good instruments to simplify complex models for operational uses (by catching the main patterns) and offering the possibility to develop more and more sophisticated deterministic models representing the 'real world'.