First estimation of hourly full-coverage ground-level ozone from Fengyun-4A satellite using machine learning

Ground-level ozone (O3), renowned for its adverse impacts on human health and crop production, has garnered significant attention from governmental and public sectors. To address the limitations posed by sparse and uneven ground-level O3 observations, this study proposes an innovative method for hourly full-coverage ground-level O3 estimation using machine learning. Meteorological data from National Centers for Environmental Prediction global forecasting system, satellite data from Fengyun-4 A(FY-4 A) and Ozone Monitoring Instrument, emission inventory from Multi-resolution Emission Inventory for China, and other auxiliary data are utilized as input variables, while ground-based O3 observations serve as the response variable. The method is applied on a monthly basis across China for the year 2022, resulting in the generation of an hourly full-coverage high-resolution (4 km) ground-level O3 estimation, termed ML-derived-O3. Cross-validation results demonstrate the robustness of ML-derived-O3 yielding a coefficient of determination (R 2) of 0.96 (0.91) for sample-based (site-based) evaluations and a root-mean-square error (RMSE) of 9.22 (13.65) µg m−3. However, the date-based evaluation is less satisfactory due to the imbalanced training data, resulting from the pronounced daily variations in ground-level O3 concentrations. Nevertheless, the seasonal and hourly ML-derived-O3 exhibits high prediction accuracy, with R 2 values surpassing 0.95 and RMSE remaining below 7.5 µg m−3. This study marks a significant milestone as the first successful attempt to obtain hourly full-coverage ground-level O3 data across China. The diurnal variation of ML-derived-O3 demonstrates high consistency with ground-based observations, irrespective of clear or cloudy days, effectively capturing ground-level O3 pollution exposure events. This novel estimation method will be employed to establish a long-term high spatial-temporal resolution ground-level O3 dataset, which holds valuable applications for air pollution monitoring and environmental health research in future endeavors.


Introduction
Ozone (O 3 ), a crucial atmospheric constituent, holds paramount importance for both the Earth and human beings.Approximately 90% of ozone resides in the stratosphere, where it absorbs the solar ultraviolet (UV) radiation, thereby preventing UV damage to the Earth's ecosystems (Bernhard et al 2023).However, ground-level O 3 , located near the Earth's surface, poses a significant threat to the human health, such as asthma, cardio vascular diseases, respiratory tract infection, and more (McConnell et al 2002, Jerrett et al 2009, Liang et al 2019).Additionally, ground-level O 3 adversely affects crop yield (Rai andAgrawal 2012, Lin et al 2018).The increasing severity of ground-level O 3 during summer has garnered significant attention from both governmental and public spheres (Wang et al 2017, Maji et al 2019, Lu et al 2020).Accurate and effective observations are urgently needed to comprehend the influence of O 3 on the atmosphere, environment and climate.
Ground-based observation networks provide an effective method to obtain ground-level O 3 with high accurate and temporal resolution.Despite the establishment of numerous environmental monitoring stations across China since 2013, the scarcity and irregular distribution of surface monitoring stations limit the representativeness of the observation data.To overcome the limitation of discrete ground-based stations, satellite-based remote sensing techniques have been employed (van der A et al 2010), instruments such as the Total Ozone Meteorological Satellite (Krueger 1989), Ozone Monitoring Instrument (OMI) (Levelt et al 2006), Scanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY) (Bovensmann et al 1999), Global Ozone Monitoring Experiment (GOME/GOME-2) (Callies et al 2000), and Tropospheric Monitoring Instrument (Veefkind et al 2012).These sensors primarily provide measurements of total column ozone, with only a subset capable of retrieving ozone profiles at different vertical levels (Liu et al 2010c, Lamsal et al 2021).
Obtaining the ground-level ozone data from satellites pose a significant challenge, as ground-level O 3 accounts for only a small proportion of total ozone.
Ground-level O 3 estimation from satellites is undoubtedly challenging, but researchers have made numerous attempts to achieve wide and continuous distribution of ground-level O 3 .Three main methodologies have been employed, including chemical simulation model, statistical models and artificial intelligence.Chemical models, such as CMAQ, WRF-Chem, and GOES-Chem, simulate the groundlevel O 3 by considering atmospheric chemical reactions and air pollution transport (Liu et al 2010a, Sicard et al 2020, Fu et al 2022).The accuracy of estimated ground-level O 3 is notably influenced by the precision of the atmospheric chemical reaction simulation.Statistical models establish quantitative relationships between ground-level O 3 and potential influential factors (Adam-Poupart et al 2014, Kerckhoffs et al 2015, Huang et al 2017, Zhang et al 2018, 2020).Various approaches, such as Kriging, Land-Use Regression (LUR), and Combined Bayesian Maximum Entropy-LUR, have been applied to obtain ground-level O 3 in Quebec, Canada (Adam-Poupart et al 2014).Moreover, LUR model for O 3 has been developed for a national fine spatial scale (Kerckhoffs et al 2015).Huang et al (2017) utilized the LUR model for ground-level O 3 retrieval in Nanjing, China.Additionally, a second-order regression model based on NO 2 and CH 2 O column density from satellites has been applied for estimated the daily groundlevel O 3 over the eastern U.S. (Zhang et al 2018).Furthermore, Zhang et al (2020) estimated monthly ground-level O 3 in eastern China using the geographically weighted regression (GWR) method.While these schemes are suitable for small region such as urban scales, they encounter challenges in solving multi-parameter and nonlinear problems for large research areas.
With advancements in computer technology improvement and the available of multiple datasets, machine learning (ML) has rapidly merged as a powerful tool for ground-level O 3 estimation in recent years.For instance, Wang et al (2022a) estimated 10 km-resolution daily maximum 8 hour average (MDA8) ground-level O 3 in California by using random forest ML model, with a cross-validated coefficient of determination (R 2 ) of 0.84.Similarly, Liu et al (2022) developed a cluster-enhanced ensemble ML method to generate a global monthly ground-level O 3 dataset from 2003 to 2019 with 0.5 • spatial resolution.Wei et al (2022) employed an extended ensemble learning approach based on the space-time extremely randomized trees model to estimate a full-coverage 10 km-resolution daily MDA8 ground-level O 3 dataset covering China from 2013 to 2020, achieving an R 2 of 0.87 (0.80) for out-of-sample (out-of-station) cases and a root-mean-square error (RMSE) of 17.10 (21.10) µg m −3 .Guo et al (2022) used various ML methods to estimate daily O 3 on a 1 km × 1 km resolution across China from 2018 to 2020.The performance of random forest is best, with the highest validation R 2 achieved was 0.86 and the lowest RMSE of 18.39 µg m −3 , followed by support vector machine, backpropagation neural network, and multiple linear regression.
However, all of these studies utilizing polar-orbit satellite products could only provide daily or monthly O 3 datasets and were unable to capture the diurnal variation characteristics of ground-level O 3 .Groundlevel O 3 is not only influenced by atmospheric compositions but also strongly correlated with meteorological variables such as surface solar radiation and temperature.Geostationary satellites, capable of high-temporal-resolution earth monitoring, offer unique advantages in atmospheric observation.A high spatio-temporal resolution product with wide coverage would facilitate advancements in both scientific research and operational endeavors.Previous studies have generated hourly ground-level O 3 over China based on Himawari-8 using a self-adaptive geospatially local model (Wang et al 2022b).However, this approach is applicable only under low cloud fraction conditions and does not provide full coverage across China.In this study, a new scheme is proposed to estimate hourly full-coverage of ground-level O 3 over China using a ML method.
Section 2 will describe the details of data used in the study, while section 3 will present the methodology employed for estimating ground-level O 3 .In section 4, the performance of the estimated groundlevel O 3 will be discussed, including the full-coverage diurnal variation of ground-level O 3 distribution across China, which has not been illustrated in previous studies.Finally, section 5 will contain the discussion and conclusion of the research.

Datasets
This section presents details about the datasets utilized in the study, and table 1 provides a summary of these datasets.

Surface observation
Hourly ground-level O 3 observations across mainland China have been available since 2013, provided by the China National Environmental Monitoring Center.The data can be accessed from the official website center (CNEMC: www.cnemc.cn).The ambient O 3 concentration was measured using the UV fluorescence method (Chen et al 2021), and the data were collected in accordance with the technical specification named HJ 818-2018.The monitoring stations are distributed throughout mainland of China, with their number increasing from approximately 900 in 2013 to around 1700 in 2022.For this study, the data in 2022 were utilized, and the locations of the insitu stations are illustrated in figure 1.

Meteorological data
Ground-level O 3 , a secondary air pollutant, is closely influenced by meteorological conditions (Chen et al 2020).To ensure the potential application of groundlevel O 3 estimation in the near real-time operations, meteorological data from National Centers for Environmental Prediction (NCEP) global forecasting system (GFS) were used in this study instead of reanalysis datasets.The GFS is a weather forecast model that provides numerous atmospheric and radiation variables (National Weather Service and National Centers for Environmental Prediction 2003).
Various meteorological factors that potentially influence the formation of ground-level O 3 were selected for estimation.These factors include surface pressure (SP), temperature at 2 m (T2M), relative humility (RH), total precipitation (TP), precipitation rate (PR), planet boundary layer height (PBLH), 10 m u-wind (U10) and v-wind (V10) component, total cloud cover (TCC), downward surface short wave radiation (DSWRF), and ozone mixing ratio (OMIX).The GFS data with a spatial resolution of 0.25 × 0.25 degree and a temporal resolution of 1 h were chosen for the study.

Emission inventory
Three main precursors for ground-level O 3 from direct emissions includes nitrogen oxides (NO x ), volatile organic compounds (VOCs), and carbon monoxide (CO).These precursors are provided by the MIX Asian emission inventory for the year 2010 which is the fundamental data for the Multi-resolution Emission Inventory for China (MEIC, Li et al 2017aLi et al , 2017b)), specifically referred to as MixNO x , MixNMVOC, and MixCO in this study.The MEIC model-based emission data has been extensively utilized in scientific research and operational work.

Satellite datasets
Two crucial meteorological parameters for surface ozone formation are radiation intensity and surface temperature.While meteorological factors from GFS have been used, they are derived from the numerical model rather than real observations.Considering that reflectance at visible spectral bands is possibly associated with solar radiation and brightness temperature (BT) is possibly associated with surface temperature, near-real time observation from satellites have also been considered in this study.Fengyun-4 A (FY-4 A), China's second-generation geostationary meteorological satellite, was launched successfully on 11 December 2016.Positioned at 104.7 • E, it carries the Advanced Geosynchronous Radiation Imager (AGRI) as a key payload.The AGRI consists of 14 spectral bands ranging from 0.47 µm in the visible (VIS) to 13.8 µm in the infrared Radiation (IR), each with different spatial resolutions (1 km at nadir in VIS, 2 km in NIR, and 4 km in IR) (Yang et al 2016).In order to obtain the full coverage estimation results and avoid the opposite trend of reflectance in VIS bands at clear and cloud conditions, only BT from IR bands are treated as predictive variables.Details of Bands are listed in table 2.
In addition to historic emission inventory, atmospheric components of satellites products were also considered.NO 2 , one of the O 3 precursors, have a significant impact on ground-level O 3 (Liu et al 2010b).The latest version of OMI/Aura total column O 3 and tropospheric NO 2 products were included in ground-level O 3 estimation.Those sky conditions where the cloud fraction is less than 30% were used.To address the limitation for cloud fraction, we take an average of the total column O 3 (dO3_CF03) and tropospheric NO 2 (dNO2_CF03) data obtained on the same day of the year from 2005 to 2021.It is able to compensate for the missing product in cloud covered area and satisfy the continuous estimation requirements.This approach also allows us to perform near real-time estimation of hourly ground-level O 3 in the future work, despite the time delay in obtaining OMI products from the satellite.To access the importance of each variable in ground-level O 3 estimation, the feature scores of the predictors were calculated by evaluating the gain of variation for each feature across all trees, with instances sorted based on their gradients.This process involves forming instance subsets A and B and splitting the data to optimize variance gain (Ke et al 2017, Tang et al 2020).The resulting scores are presented in figure 3.These importance scores indicate the usefulness or value of each variable during model construction.A higher score suggests greater importance of the predictor in making key decisions with decision trees.As main predictors for the groundlevel O 3 estimation, T2M is found to be the most important feature, accounting for about 31%.BT11 observed by satellites can also provide temperature information, accounting for 4.4%.Other meteorological factors have an important impact on the ground-level O 3 estimation, especially RH, PBLH, PR, DSWRF, SP, and V10 (importance > 3%).The influence of O 3 precursors and emissions (i.e.MixCO, MixNMVOC, and MixNOx) cannot be ignored (importance > 1%).

Validation method
In this study, the 10-fold cross-validation (CV) method, the most popular validation method, was employed to assess the performance of the estimation method and to check for over-fitting issues (Ma et  In the sample-based 10-CV method, all data samples were randomly divided into 10 groups, 9 groups were used for model training, and the remaining group was used for model validation.This process was repeated 10 times to ensure that all samples were used and each group serving as the test set exactly once.To evaluate the performance of estimation models, R 2 , RMSE, bias and ordinary least squares regression were chosen as evaluation metrics.The performance metrics obtained from each iteration were averaged to provide an overall assessment of the model's performance.CV helps in identifying overfitting issues by assessing how well the model generalizes to unseen data.In the site-based 10-CV method, all monitoring stations were randomly divided into 10 groups.The date-based 10-CV method involved randomly dividing the dates of all matching data into 10 groups.The subsequent steps were the same as sample-based 10-CV method.

Cross validation results
To assess the overall performance of the developed model on a country scale, a comprehensive evaluation of the ground-level O 3 estimation has been conducted.The sample-based, site-based and date-based evaluation results in 2022 are illustrated in figure 4.  3.Among the three validation approaches, the samplebased validation shows the best performance, followed by site-based and date-based validation.This can be attributed to the fact that sample-based evaluation ensures a more balanced distribution between training and validation datasets.While site-based and date-based validations were conducted using CV techniques, the uneven distribution of monitoring sites and pronounced daily variations still result imbalances between the training and validation datasets.This is particularly challenging for ground-level O 3 , as its temporal variation is significant in date, making it difficult to achieve a perfectly balanced division of training groups.The performance of estimation model in the similar previous studies are given in table 4. Overall, our model works well when training datasets is relatively balanced.

An example of pollution episodes
The CV results show that our estimation model works well for the matching datasets.Then, full-coverage hourly ground-level ozone for mainland of China from 1 January to 31 December 2022 was generated using the proposed estimation model.Figure 5 illustrates an example of full coverage ground-level O 3 estimation in mainland of China at 02:00 UTC and 08:00 UTC from 3 to 6 October 2022, alongside the corresponding ground-based O 3 maps.The spatial distribution of derived ground-level O 3 by

Sample-based
Site-based Date-based Bias RMSE ML (ML-derived-O 3 ) is exhibits diversity and shows a strong similarity to the ground-based O 3 .This method offers significant advantages in terms of spatial coverage compared to the sparse and uneven distribution of surface monitoring stations.The MLderived-O 3 provides ground-level O 3 information for any location, particularly in areas where groundbased measurement is lacking.In addition, clear diurnal variations are observed in both ML-derived-O 3 and ground-based observations.The estimation of hourly ground-level O 3 is of great significance, however, previous studies have not provided such comprehensive, full-coverage and high temporal resolution ground-level O 3 data.In order to further verify the performance of ML-derived-O 3 , the pollution episodes have been given.The temporal variation characteristics of MLderived-O 3 , as depicted in the time serial plot of ground-level O 3 in figure 6, closely align with groundbased observations.Taking the Hongkou station in Shanghai as an example, the ground-level O 3 was high on 3 October in 2022 and decreased from 4 October.From the FY-4 A image (figure 5), it can be observed that 3 October the clear day while 4-6 October were cloudy for Shanghai.On 3 October, when the sky is clear, the ground-level O 3 concentration increases as the sun rises, exhibiting distinct daily variations.On the other hand, during the period of thick cloud cover from 4-6 October, the ground-level O 3 concentration remains a relatively stable.This discrepancy may be attributed to the effect of solar radiance and temperature on ozone production through photochemical reactions.The spatial distribution of ML-derived-O 3 and ground-based observations both indicate higher ground-level O 3 concentration in cloud-free areas compared to cloud-covered areas at the same time.Similar temporal patterns are also observed in other locations.
A heavy O 3 pollution event is always closely related with favorable meteorological conditions (e.g.air temperature and solar radiation intensity) and high surface missions of its precursor gases.Generally, the emission of ozone precursors does not undergo significant changes on consecutive days, indicating that the variation in ground-level O 3 is primarily influenced by meteorological conditions.The hourly T2M, DSWRF and cloud fraction on 3 October and 6 October are illustrated in the supplementary materials.In a clear day, the T2M and DSWRF are obviously increased with sunrise which lead to more    O 3 produced via the photochemical reaction.The Ground-level O 3 decreased with sunset.For a cloudy day, the T2M and DSWRF are low level all day, which not favorable O 3 produced.

Seasonal and hourly validation results
The performance of ML-derived-O 3 , as demonstrated in example cases, is highly satisfactory.To assess the accuracy of ML-derived-O 3 on a broader scale, a comprehensive evaluation of the groundlevel O 3 estimation has been conducted.Groundlevel O 3 exhibits pronounced seasonal and hourly variation due to its close relationship with precursors and meteorological variables (Seinfeld and Pandis 1998).The density scatter plots of the validation results for the four seasons are presented in figure 7.
The ML-derived-O 3 demonstrates excellent performance across all seasons.The R 2 values exceed 0.96 for all seasons, and the RMSE are below 7.5 µg m −3 .The lowest R 2 are found in winter, while the highest RMSE values are found in summer.The larger RMSE might due to the high level of ground-level O 3 during summer.
One of the notable advantages of this study is the provision of full-coverage and hourly ground-level O 3 estimation.The performance of the hourly estimated ground-level ozone is depicted in figure 8.For all hours, the R 2 values surpass 0.96 and the RMSE values remain below 7.5 µg m −3 .It is worth noting that the performance of ML-derived-O 3 at 00:00 UTC is slightly poorer compared to other hours, with an R 2 value of 0.95.Nevertheless, the validation results demonstrate that ML-derived-O 3 is reliable for each hour.However, the slope of the validation results is less than 1, indicating a slight underestimation of hourly ML-derived-O 3 .This discrepancy could be attributed to the limited number of data samples available under highly polluted conditions.

Validation results for individual ground-based monitoring station
The performance of ML-derived-O 3 in different areas was evaluated by examining the accuracy at each monitoring station, as illustrated in figure 9. Unlike the previous study using Himawari-8 (Wang et al 2022b), the spatial distribution of sample counts in this study is not uneven due to the inclusion of both in clear and cloudy days, which increases the sample counts in southern China.At the individual site scale, the majority of stations exhibit high accuracy and low uncertainty.96% of stations have an R 2 value larger than 0.9, while 85.6% of stations have an R 2 value larger than 0.95.64.4% of stations have an RMSE value less than 5 µg m −3 , 87.3% of stations have an RMSE value less than 10 µg m −3 .It is important to mention that stations situated along coastlines and large water bodies tend to have poorer performance, potentially due to the influence of mixed pixels, which refer to the blending of land and water elements within a single pixel.Treating these mixed pixels as a single homogeneous target will introduce significant errors, as the observation parameters are very different between land and water.

Spatio-temporal ground-level O 3 variation
Full-coverage hourly ground-level ozone for China in 2022 with 0.04 • spatial resolution of this study, overcoming the unevenness of surface stations and the problem of missing data caused by clouds.Unlike previous studies, this study produced hourly rather than daily ground-level ozone estimates the entire country.
Figure 10 displays the seasonal average of MLderived-O 3 in 2022 across China.Ground-level ozone concentrations exhibit significant seasonal variation, with higher concentrations in summer and followed by spring.This phenomenon is most pronounced in the North Plain of China and the Huang-Huai Area, likely due to favorable meteorological conditions and high surface missions of its precursor gases (Wang et al 2017).In winter, ground-level ozone concentrations are the lowest for much of China.It is noteworthy that southern China exhibits higher groundlevel ozone concentrations in autumn compared to summer, which differs from other regions in China.One possible reason for this difference is the frequent cloud cover over southern China during the summer.Previous studies have reported similar conclusions regarding the seasonal changes of ground-level ozone (Wei et al 2022).Figure 11 illustrates the annual mean hourly ground-level ozone concentrations from 00:00 UTC to 11:00 UTC.A significant diurnal variation in the spatial pattern of ground-level ozone is observed.Before 03:00 UTC, the ground-level ozone concentrations slowly increase and are relatively consistent across China.Between 04:00 UTC and 09:00 UTC, a clear upward trend in ground-level ozone concentrations is observed for the entire country, especially in the North China Plain, Yangtze Delta, Pearl River Delta and Northwestern China.After 10:00 UTC, the ground-level ozone concentrations obviously decreased.Wang et al (2022b) also has given the similar diurnal variations of ground-level ozone.This trend is very likely attributed to favorable light and temperature conditions.The spatial distribution reveal that these high-value zones are predominantly located in economically developed and densely populated areas that emit a large amount of precursor gases.In addition, high-value zones are also found in arid and semi-arid regions in western China where solar radiation resources are abundant for photochemical reactions.

Discussion and conclusion
To address the limitations of sparse and uneven ground-level O 3 observations, a ML method was applied to develop hourly full-coverage ground-level ozone estimates for China.Various input variables, including meteorological data (e.g.DSWRF, T2M, RH, TP, PBLH, etc) from NCEP GFS, satellite data from FY-4 A and OMI, emission inventory from MEIC, and other auxiliary data (e.g.POP, DEM) were used in conjunction with ground-based observations The accuracy of ML-derived-O 3 estimates was evaluated using the 10-fold CV approach.The overall validation results demonstrated excellent performance with CV-R 2 values of 0.96 for samplebased validation, 0.91 for site-based validation, and 0.78 for date-based validation.The RMSE values were small, ranging from 9.22 µg m −3 for samplebased validation to 13.65 µg m −3 (site-based) and 21.88 µg m −3 (date-based).It is important to note that the uneven distribution of ground observation stations and the pronounced daily variation of ground-level ozone may have affected the balance between training and validation datasets, resulting in lower performance for site-based and date-based validation compared to sample-based validation.When the estimated model applied to the whole year data in 2022, the R 2 for all seasons and hourly estimates were larger than 0.95, and the RMSE values were less than 7.5 µg m −3 .The results indicate that the ML-derived-O 3 estimates are more accurate than previous studies, with higher R 2 values and lower RMSE values.
Significantly, this study is the first to provide hourly full-coverage ground-level O 3 estimates over China, enabling the analysis of daily patterns and diurnal variations of ground-level ozone.The observed significant diurnal variations in the spatial pattern of ground-level ozone aligns with surface observations.When the sky is clear, the ground-level O 3 concentration increases as the sun rises, resulting in pronounced daily variations.When the sky is covered by thick clouds, the ground-level O 3 concentration remains relatively stable.The groundlevel ozone concentrations exhibit a clear upward trend with increasing solar energy and temperature throughout the day.In future work, the proposed method will be utilized to generate the long-time hourly full coverage ML-derived-O 3 , aiding in the understanding of long-term variations in groundlevel ozone concentrations across China.

Figure 1 .
Figure 1.Locations of in-situ ground-level ozone monitoring stations across mainland China.The filled color represents elevation.

Figure 2 .
Figure 2. Flow chart of retrieval method.

Figure 3 .
Figure 3. Sorted F score of each feature in the ground-level O3 estimation during the model construction.
The proposed model exhibits a high level of accuracy, in estimating ground-level ozone.The CV-R 2 values are 0.96 for sample-based validation, 0.91 for site-based validation and 0.78 for date-based validation, respectively.The RMSE value are remarkably small, with 9.22 µg m −3 for sample-based validation, 13.65 µg m −3 for site-based validation, and 21.88 µg m −3 for date-based validation.As the estimation model has established for each month, evaluation results of 12 months in 2022 are list in table

Figure 4 .
Figure 4. scatter plots of the results of the model fitting and cross validation in 2022 (a).sample; (b).site; (c).date.The dot color represents the counts of data points.The black dashed line is 1:1 line.N is number of samples; R 2 is coefficient of determination; RMSE is root-mean-square error; y = a * x + b is the linear regression formula.

Figure 7 .
Figure 7. Density scatter plots of the ML-derived-O3 validation results for the four seasons in 2022.

Figure 8 .
Figure 8. Density scatter plots of the ML-derived-O3 validation results for each hour in 2022.
O 3 to train the model.The resulting ML-derived-O 3 provided hourly ground-level ozone estimates with full-coverage at a spatial resolution of 4 km for China in 2022, including both clear and cloudy conditions.The spatial distribution of ML-derived-O 3 and ground-based observations indicate higher groundlevel O 3 concentration in cloud-free areas compared to cloud-covered areas at the same time.

Table 1 .
Description of the datasets used in this study.
(Zhan et al 2018, Ma et al 2021, Wei et al 2021a)hropogenic emissions, was also considered.The Gridded Population of the World (GPW) dataset at 2020, developed by the Center for International Earth Science Information Network (CIESIN) in collaboration with NASA SEDAC, was used.actions or relationship models(Zhan et al 2018, Ma et al 2021, Wei et al 2021a).Ensemble learning, an important method in the field of ML, utilizes weak classifiers that exhibit greater effectiveness and stronger generalization performance compared to

Table 2 .
Specifications for AGRI on board FY-4 A.
(Bloomer et al 2009, Lee et al 2014 Ma et al 2022) tree growth strategy with depth restriction, etc.Many researchers utilize this model to predict and analyse atmospheric pollutants, including PM 2.5(Zhong et al 2021, Wei et al 2021b, Ma et al 2022), NO 2(Kang et al 2021), O 3(Kang et al 2021), and SO 2(Xu et al 2023).In comparison to eXtreme Gradient Boosting (XGBoost), one of GBDT, LightGBM demonstrates superior, computational speed and memory consumption.Leveraging these advantages, LightGBM has been chosen in this study(Ke et al 2017).3.2.Retrieval methodGround-level O 3 is a secondary air pollutant produced through complex photochemical reactions influenced by emissions, meteorological factors, and land characteristics(Bloomer et al 2009, Lee et al 2014, Wang et al 2017, Li et al 2020).

Table 3 .
Statistics of cross validation per month.

Table 4 .
Comparison of model from previous studies in estimating O3 concentrations in China.
Note:The unit for the RMSE and mean absolute error(MAE) is µg m −3 .