Estimations of ambient fine particle and ozone level at a suburban site of Beijing in winter

Estimates of PM2.5 and O3 in suburban areas are of importance for assessing exposure risk and epidemiological studies of air pollution where large scale and long-term measurements network are absent. To fulfill this goal, our study develops a flexible approach to predict levels of PM2.5 and O3 at a suburban site of Beijing using multilayer perceptron (MLP) neural network analysis with the inputs of gaseous air pollutants (CO, SO2, NO, and NO2) and meteorological parameters (wind direction, wind speed, temperature, pressure and humidity). Daily ambient data of PM2.5, O3, PM10, CO, SO2, NO, and NO2 were estimated using hourly data collected from January 20 to March 10 in the years from 2016–2020 at a suburban site of Beijing, respectively. Ambient measured levels of PM2.5 and O3 were compared with the output estimates of PM2.5 and O3 through MLP neural network analysis with limited input variables. Overall, MLP neural network analysis could explain 97% of measured PM2.5 mass and 82% of measured O3 level with R2 values of 0.983 and 0.905, respectively. This approach could be helpful for reconstruct historical PM2.5 and O3 levels in suburban areas.


Introduction
Air pollution is recognized as the largest environmental risk to human health by the World Health Organization (WHO) for population in areas where levels of air pollutants exceed the limit (D'Antoni et al 2019). Numerous studies have documented that air pollution exerted negative impacts on health through showing associations between exposures of air pollution and incidence rates of cardiovascular disease, autism spectrum disorder in children, leukemia as well as mortality of lung cancers (Lin et  . Ambient fine particulate matter (<2.5 microns in aerodynamic diameter; PM 2.5 ) is mainly comprised of inorganic and organic components that emitted from emission sources (Liu et al 2014b, Liu et al 2016, Liu et al 2019. Meanwhile, gaseous pollutants including NO, NO 2 , SO 2 and CO are co-emitted from the sources of PM 2.5 and reacted with O 3 to form secondary chemicals (Luo et al 2015, Tang et al 2020. The meteorological parameters such as wind and temperature are found to be main factors in spreading PM 2.5 pollution (Cheng et al 2016, Sun et al 2020. Due to the negative impacts of ground-level O 3 and PM 2.5, large scale and long-term measurements of O 3 and PM 2.5 were established in countries all around the world ( Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Standards (NAQQS) for limits of O 3 and PM 2.5 were issued in 2013 . Recently, several advanced methods were developed to predict levels of PM 2.5, and O 3 recently. For example, Eeftens et al (2012) used land use model with the inputs of several variables including traffic intensity, population, and land-use to estimate PM 2.5 mass in 20 European study areas. The model could explain 35%-94% of PM 2.5 mass collected at 20 sites in European areas. Wang et al (2016) incorporated land-use regression (LUR) with chemical transport modeling to calculate spatiotemporal variability in levels of O 3 and PM 2.5 from 2000 to 2008 in the Los Angeles Basin. Huang et al (2021) establish a hybrid spatiotemporal model that incorporates satellite, chemical transport model, geographic, and meteorological data to reconstruct daily PM 2.5 concentrations in China from 2013 to 2019 at 1 km×1 km grid cells. Xiao et al (2018) employed a set of machine learning approaches including random forest, generalized additive model, and extreme gradient boosting with the inputs of long satellite aerosol data to estimate daily PM 2.5 in different areas of China, respectively.
Beijing, as the capital of China has experienced serious air pollution with high levels of O 3 and PM 2.5 in winter in recent years (Liu et al 2014a, Liu et al 2016, Miao et al 2018. Since NAQQS was launched in 2013, Beijing Municipal Environmental Monitoring Center has established 34 monitoring sites across Beijing to announce hourly and daily air quality data including O 3 , PM 2.5 , PM 10 , NO 2 , SO 2 , and CO . These sites are mostly located in urban areas of Beijing (figure S1(A) (available online at stacks.iop.org/ERC/3/081008/mmedia)), while the sites are uneven distributed geographically across Beijing. Thus, it is necessary to develop subsidiary methods to present spatiotemporal levels of O 3 and PM 2.5 across Beijing, especially in suburban areas . As discussed above, machine learning approaches could generate high precisions of O 3 and PM 2.5 estimates with appropriate inputs of variables.
In order to fill this gap, our study aims to estimate levels of O 3 and PM 2.5 in suburban areas of Beijing using multilayer perceptron (MLP) neural network analysis with the inputs of ambient gaseous data. The study collected daily air quality and meteorological parameters including PM 2.5 , O 3 , PM 10 , CO, SO 2 , NO, NO 2 , wind direction, wind speed, temperature, pressure and humidity from January 20 to March 10 in the years from 2016-2020 at a suburban area of Beijing, respectively. In total, ∼3000 field data were included in MLP neural network analysis to generate high estimate precisions of O 3 and PM 2.5 in suburban areas of Beijing. The estimated approaches from our study are insightful for local policy-makers to reconstruct unbiased historical O 3 and PM 2.5 levels for exposure assessment and epidemiological studies of air pollution.

Sampling site
The sampling site is situated at the Milu Park Ecological Research Center (figure S1(A), 39.78N, 116.47E), at a suburban area to the south-5th-ring road of Beijing . Automated instruments were set up on the roof of a building in the Park at 15 m above ground. The sampling location is surrounded by artificial wetlands with plenty of vegetation and trees. There are no industrial facilities near the site with only a two-way road about 900 meters to the northwest. There are 34 monitoring sites for air quality affiliated with the Beijing Municipal Environmental Monitoring Center (http://bjmemc.com.cn/) in Beijing (figure S1(A)). Two monitoring sites including Jiugong, Daxing District (39.90N, 116.40E) and Yizhuang Development Zone (39.80N, 116.51E) are close to our sampling site (39.78N, 116.47E). The two sites are located at the downtown areas of Daxing District, which are ∼6 km away from our sampling site (figure S1(B)). In this study, we choose the ambient air quality data collected at these two sampling sites (Jiugong, Daxing Distric and Yizhuang Development Zone) affiliated with the Beijing Municipal Environmental Monitoring Center to compare with those obtained from our sampling site (Milu Park Ecological Research Center).

Data collection
At the sampling site, hourly concentration data of gaseous pollutants including NO, NO 2 , SO 2 , CO and O 3 were collected with Thermo Scientific NO-NO 2 -NO x Analyzer (Model 42i), SO 2 Analyzer (Model 43i), CO Analyzer (Model 48i) and O 3 Analyzer (Model 49i) respectively (Liang et al 2019, Bekbulat et al 2021. The flow rates for NO-NO 2 -NO x Analyzer, SO 2 Analyzer, CO Analyzer and O 3 Analyzer were kept at 25 ml min −1 , with detection limits ranging from 10-50 ng m −3 , respectively. Mass concentrations of PM 10 and PM 2.5 were measured by Thermo Scientific Tapered Element Oscillating Microbalance (TEOM) 1405-Ambient Particulate Monitor with Filter Dynamics Measurement System (FDMS) . Ambient particles were collected at a flow rate of 16.7 l min −1 on a glass-fiber filter tape with PM 10 or PM 2.5 cyclone inlets. These data were recorded automatically by the instrument and the calibration of the flow rate was done monthly under normal conditions by the technicians. Furthermore, hourly meteorological data including wind direction, wind speed, temperature, pressure and humidity were collected using the WXT520 Meteorological measuring instrument (Vaisala).
Hourly data including NO, NO 2 , SO 2 , CO, O 3 , PM 10 , PM 2.5 , wind direction, wind speed, temperature, pressure and humidity during the periods from January 20 to March 10, 2020 (lockdown period, n=1224) were selected and compared with those obtained from the same periods during the past four years (January 20 to March 10, 2016, n=1211; January 20 to March 10, 2017, n=1200; January 20 to March 10, 2018, n=863; January 20 to March 10, 2019, n=1200) to assess the trends in pollutants over these years. Data in some sampling periods (i.e., ∼7% of the total samples) were absent because of rainfall.
The ambient hourly data (NO 2 , SO 2 , CO, O 3, PM 10 and PM 2.5 ) at the two sampling sites including Jiugong, Daxing District (39.90N, 116.40E) and Yizhuang Development Zone (39.80N, 116.51E) were collected from the website of the Beijing Municipal Environmental Monitoring Center (http://bjmemc.com.cn/) during January 20 to March 10 across the years of 2016-2020. Data in some sampling periods (i.e., ∼5% of the total samples) were absent because of rainfall. Meteorological data at the two sampling site including wind direction, wind speed, temperature, pressure and humidity were not available through the website of the Beijing Municipal Environmental Monitoring Center (http://bjmemc.com.cn/).

Multilayer perceptron (MLP) neural network analysis
The MLP analysis is a feed-forward machine learning method, which designs for predicting one or more output variables using input variables (Desai and Shah 2021). The MLP model consists of three main components, i.e., (1) the input layer, (2) the hidden layer and (3) the output layer (figure S2). In general, the MLP model could use interconnected layer of artificial neurons with the input of data to produce a set of output data. Then, the MLP model could train the neural network through a back-propagation process (Simões Hoffmann et al 2020). The hidden layer of the MLP model has demonstrated its predictive applicability in output of ambient air quality with the inputs having non-linear impacts (Desai and Shah 2021).
At the beginning step, the data for input and output variables are standardized in order to eliminate potential bias resulted from the range of variance and dimensional differences within the dataset (Arias del Campo et al 2021). Each variable is standardized using z-score method that subtracts the mean observed value and then divided by the standard deviation (Li et al 2020, Arias del Campo et al 2021). Then, nine independent variables including NO, NO 2 , SO 2 , CO, wind direction, wind speed, temperature, pressure and humidity are fed in the input layer to the hidden layer for predicting dependent PM 2.5 as output variable. Six independent variables including NO, NO 2 , wind direction, wind speed, temperature, pressure and humidity are fed in the input layer to the hidden layer for predicting dependent O 3 as output variable. The equation for the hidden layer is described as follows: Where W i,j G is the weight of the neuron between the input and hidden layer and W o,j Gan activation constant for neuron j. The activation function H is often non-linear. The hidden layer transfers a response onto the out layer through activation function H. The functions of sigmoid and hyperbolic tangent (TanH) are tested in this study because these functions are appropriate for continuous dependant variables (Borlaza et al 2021). The scaled conjugate and stochastic gradient descent optimization algorithms were applied in this study to obtain the optimal weights for both the input and output layers (Arias del Campo et al 2021). The dataset were grouped into the training set (70% of the dataset) and testing set (30% of the dataset), respectively. For each run, the data in training set are responsible for train the MLP, while data in the testing set is adopted for monitoring errors during the training step independently. In the process of training step, the MLP is continually developed and verified until the weighting values between the nodes accurately predict the outcome (i.e. minimal possible errors). A step of stopping rules is utilized to terminate the training of the MLP if any of following scenarios occurs for preventing the model from overfitting. The following scenarios are described as follows: (1) there is no decrease in prediction error, (2) the maximum training time (30 min) is reached, (3) the minimum relative change (0.0001) in the training error is found and (4) the minimum relative change in the training error ratio (0.001) is observed. A maximum of 1000 data passes (epochs) are kept in memory before the training step is completed. Further, the results are verified in the testing step to check the performance of the MLP model by comparing its forecasting capability on data points with those in the testing set. The MLP neural network analysis was carried out in this study by IBM SPSS Statistics for Windows, version 20 (IBM Corp., Armonk, NY, USA).
Daily SO 2 levels in all samples at the three sampling sites were lower than the limit level of 24-hour value for SO 2 (150 μg m −3 ) ( figure 1(D)). At the Milu Park, the mean level of SO 2 during the sampling period in 2020 was recorded as 3.3 μg m −3 (n=51), which was relatively lower than that observed in 2016 (11.9 μg m −3 , n=51), 2017 (26.0 μg m −3 , n=50), 2018 (9.8 μg m −3 , n=40) and 2019 (6.9 μg m −3 , n=50). Mean levels of SO 2 at other two sampling periods (Jiugong and Yizhuang Development Zone) were in the range of 4-30 μg m −3 , which was comparable with those observed in Milu Park.
A higher mean level of NO x (i.e., NO+NO 2 ) was observed during the sampling period in 2020  (table 1). Though the data of NO were absent at other two sampling periods (Jiugong and Yizhuang Development Zone), the mean level of NO 2 were similar as those observed in Milu park, ranging from 30-60 μg m −3 .
At Milu Park, the mean daily levels of CO were recorded as 1.3 mg m −3 , 1.6 mg m −3 , 3.5 mg m −3 , 6.5 mg m −3 and 0.9 mg m −3 during the sampling period in 2016-2020, respectively (figure 1(F)). Extremely high daily concentrations of CO for all samples (n=50) were observed during the sampling period in 2019, which all exceeded the limit levels of 24 h values for CO (4 mg m −3 ). Yet, the mean level of CO decreased to 0.9 mg m −3 during the sampling period in 2020. In contrast, the mean levels of CO were in the range of 0.8-1.3 mg m −3 at other two sampling sites (table 1). No increases in mean daily levels of CO were found at other two sampling sites across five individual sampling periods.

Meteorological parameters
Daily meteorological parameters including temperature, pressure, humidity, wind speed and direction were summarized and presented in table S2. Average daily pressure during individual sampling periods across five years showed similar levels, ranging from 1015 hPa to 1026 hPa. Mean daily temperatures varied from −0.8°C to 2.6°C. The mean levels of humidity including hourly and daily data during the sampling period in 2016-2019 were in the range of 28%-32%, which were lower than those in 2020 (∼59%). For wind speed, the mean level during the sampling period in 2020 was found to be ∼2.5 m s −1 , which is greater than those observed in the other four years (1.4-1.7 m s −1 ) (figure S4). Higher wind speed seen during the sampling period in 2020 could be attributed to a dominant westerly wind, while lower wind speed related to wind directions during the sampling periods in the other four years (i.e., 2016-2019) were northerly or northwesterly (table S2).

Site
Year

Performance of MLP neural network analysis
With some interactions between PM 2.5 mass with sources of NO, NO 2 , SO 2 , and CO, as well as wind direction, wind speed, temperature, pressure and humidity (Tang et al 2020), MLP neural network analysis yielded a good performance for predicting PM 2.5 mass for the dataset obtained at Milu Park (figure 2). In modeling PM 2.5 mass, the activation function in hidden layer is TanH, while the area under receiver operating characteristic curve (ROC) (0.9) is higher than the threshold goodness of fit metrics of the model (0.7). Though MLP method did not fully capture some peaks of the measured PM 2.5 (figure 2), an optimization equation between modeled PM 2.5 using the inputs of NO, NO 2 , SO 2 , CO, wind direction, wind speed, temperature, pressure and humidity through MLP neural network analysis against measured PM 2.5 was estimated to be y=0.97x (R 2 =0.983, p<0.001) ( figure 3(A)). Since nonlinear interactions are observed between O 3 with NO and NO 2 , as well as wind direction, wind speed, temperature, pressure and humidity, the equation between modeled O 3 equation with the inputs of NO, NO 2 , wind direction, wind speed, temperature, pressure and humidity and measured O 3 equation was estimated to be y=0.82x (R 2 =0.905, p<0.001) ( figure 3(B)). The activation function for modeling O3 concentration in hidden layer is TanH and the area under receiver operating characteristic curve (ROC) (0.8) is greater than the threshold goodness of fit metrics of the model (0.7).

Discussion
In response to outbreak of the coronavirus 2019 (COVID-19) pandemic, Beijing government implemented restrictions on socio-economic activities including stay-at-home orders and limited non-essential travel starting in January 2020 (Hua et al 2021, Shi et al 2021). Consequently, emissions associated with business activity and  Increase in mean levels of O 3 at the three sampling sites were observed in 2020 relative to those observed in the same period within the years of 2016-2019, which is in consistent with several prior findings that enhanced level of O 3 occurred in Beijing and the surrounding areas during COVID-19 lockdown periods . Meanwhile, the mean levels of NO and NO 2 were observed to increase during COVID-19 lockdown periods. The enhancement of O 3 levels observed at the three sampling sites was not resulted from the reduced NO 2 levels, which differed from previous findings conducted by Li and his colleagues with the Goddard Earth Observing System Chemical Transport Model . The enhanced levels of O 3 and NO X observed at our sampling sites may be ascribed to regional transport .
Multiple-layer neural network method is proven to be superiority in estimations of PM 2.5 and O 3 , especially with limited input variables (Hung et al 2020, Liao et al 2021. However, the inaccurate selection of input variable could result in the prediction error of 30%-60% (Lin et al 2017). For selections of input variables to model PM 2.5 mass, we input different combinations of variables including air pollutants (NO, NO 2 , SO 2 , and CO) with meteorological parameters and without meteorological parameters. The output of PM 2.5 estimates with the inputs of air pollutants (NO, NO 2 , SO 2 , and CO) and meteorological parameters can improve the prediction precision by 30% compared to those with the inputs without meteorological parameters. While in modeling O 3 level, the output of O 3 estimates with inputs of air pollutants (NO, NO 2 , SO 2 , and CO) and meteorological parameters could reduce the prediction precision by 15% relative to those with the inputs of air pollutants (NO and NO 2 ) and meteorological parameters. The optimization performance of MLP neural network analysis achieved better predictions of PM 2.5 and O 3 levels versus measured levels of PM 2.5 and O 3 with R 2 values of 0.983 and 0.905, which could explained 97% of measured PM 2.5 and 82% of measured O 3 on average respectively. We also performed the predictions of PM 2.5 and O 3 at other two sampling sites (Jiugong, Daxing District and Yizhuang Development Zone) with the input of air quality data obtained from Beijing Municipal Environmental Monitoring Center and meteorological parameters obtained from Milu sampling site. The prediction precisions (∼50% prediction error) of PM 2.5 and O 3 at these two sampling sites were not acceptable. A possible cause was the measurement loss of meteorological parameters for their own sampling sites. Our results suggest that MLP neural network analysis provides flexibility in predicting PM 2.5 and O 3 with the limited input variables. This method could be applied in reconstructing dataset of PM 2.5 and O 3 at suburban sites of Beijing in past periods when the network of PM 2.5 and O 3 measurement were absent and other air pollutants (NO, NO 2 , SO 2 , and CO) were available.

Conclusions
Seven hourly and daily air pollutants (PM 2.5 , O 3 , PM 10 , CO, SO 2 , NO, and NO 2 ) and meteorological parameters (wind direction, wind speed, temperature, pressure and humidity) were measured from January 20 to March 10 in the years of 2016-2020 at a suburban site in Beijing. The mean levels of three pollutants, PM 10 , CO and SO 2 were observed to be lower from January 20 to March 10, 2020 (COVID-19 lockdown period), than those in the four past years (2016-2019) due to lockdown restrictions on socioeconomic activities. Although the contributions of primary source emission decreased in urban areas during the COVID-19 lockdown period, PM 2.5 mass exhibited little variations across the five sampling periods. This result suggests that secondary formation processes dominate the fine particle mass at the sampling sites. The increases in levels of O 3 and NOx were observed from January 20 to March 10, 2020, versus the four other sampling periods due to impacts of regional sources. (MLP) neural network analysis yielded relative high precisions of PM 2.5 and O 3 estimates with limited inputs variables, which explained 97% of measured PM 2.5 mass and 82% of measured O 3 level, respectively.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).

Author contributions
W L and Q L conceived and designed the experiments; W L and Y Z performed the experiments; W L, Y Z and Q L analyzed the data; W L and Q L wrote the paper. All authors have read and agreed to the published version of the manuscript.