This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Letter The following article is Open access

Long-term satellite-based estimates of air quality and premature mortality in Equatorial Asia through deep neural networks

, , , and

Published 8 October 2020 © 2020 The Author(s). Published by IOP Publishing Ltd
, , Citation N Bruni Zani et al 2020 Environ. Res. Lett. 15 104088 DOI 10.1088/1748-9326/abb733

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1748-9326/15/10/104088

Abstract

Atmospheric pollution of particulate matter (PM) is a major concern for its deleterious effects on human health and climate. Over the past 50 years, Equatorial Asia has experienced significant land-use change and urbanization, which have contributed to more intense and frequent extreme PM concentrations associated with increased anthropogenic and wildfire emissions. Recent advances in remote sensing instrumentation and retrieval protocols have enabled effective monitoring of PM from space in near real time with almost global coverage. In this study, long-term satellite-based observations of key chemical and physical parameters, integrated with ground-based concentrations of PM with aerodynamic diameter <10 μm (PM10) measured at 52 stations, are used to develop a machine learning approach for continuous PM10 monitoring. As PM atmospheric pollution, like most of environmental processes, is highly non-linear and influenced by numerous variables, machine learning approaches seem very suitable. Herein, deep neural networks are developed and tested over different temporal scales and used to map PM10 over Equatorial Asia during the period 2005–2015. The proposed model captures both PM10 seasonal variability and the occurrence of extreme episodes, which are found to impact air quality on the regional scale. The modeled annual mean fine PM (PM2.5) concentrations are used to estimate long-term premature mortality. This study indicates that the region is experiencing increasing mortality rates related to long-term exposure to PM2.5, with 150 000 (108 000–193 000) premature deaths in 2005 and 204 000 (145 000–260 000) in 2015. This is mostly due to air quality worsening and high population growth in urban areas, although the contribution of years of intense wildfires results as well significant.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Atmospheric particles (i.e. aerosols) have been shown to be responsible for increased morbidity and premature mortality (Cohen et al 2017, Burnett et al 2018), particularly in developing countries, where extreme aerosol episodes are more frequent and intense than in high-income countries (WHO 2018). In recent decades, many countries worldwide have experienced rapid development, with fast economic growth, industrialization and urbanization (Muntean 2018, United Nations 2018), which have led to increased primary emissions and enhanced secondary formation of aerosols in the atmosphere.

To improve understanding of atmospheric pollution impacts and inform policymakers on effective mitigation strategies, there is a strong need to assess aerosols' properties at high spatio-temporal resolution. This includes information on the distribution of particulate matter (PM) with aerodynamic diameter smaller than 10 μm and 2.5 μm (PM10 and PM2.5, respectively). Large uncertainties in the estimates of PM environmental and societal impacts arise from the incomplete understanding of the key controls dictating the spatio-temporal variability of degraded air quality conditions and extreme aerosol events. Multiple factors, including climate variability, meteorological conditions and land-use change, potentially play different and changing roles in increased occurrence of extreme pollution episodes globally (Jacob and Winner 2009, Fiore et al 2012, Hong et al 2019, Turnock et al 2020).

Data from monitoring networks are frequently used to produce localized assessments, but their spatio-temporal coverage degrades dramatically in developing countries, where air quality is generally worse and in need of monitoring. Advances in satellite‐borne instrumentation and data retrieval protocols now allow identification of pollution sources and detailed monitoring of atmospheric properties and thus of air quality conditions with almost global and near real‐time coverage. Multiple mathematical approaches have been proposed to infer ground-level PM concentrations from satellite retrievals of column integrated aerosol optical depth (AOD). Simple statistical linear models (LMs) have demonstrated high potential for global mapping (Donkelaar et al 2010, Reid et al 2012) and more sophisticated proxies have been also successfully proposed to predict ultrafine particle concentrations (Kulmala et al 2011, Crippa et al 2013) and to account for aerosol dynamics (Sullivan et al 2016, Crippa et al 2017). However, LMs present limited skills in predicting PM10 spatio-temporal distribution and cannot capture mechanisms involved in aerosol dynamics, chemistry and transport processes which are characterized by a strong non-linearity and interactions between variables (Seinfield and Pandis 2016). To overcome major limitations of prior studies, machine learning approaches represent a unique opportunity given their high predictive skills (Grgurić et al 2014, Li et al 2017, Chen et al 2018, Di et al 2019, Shtein et al 2019) and low computational expense compared to the widely used Earth System Models (Huntingford et al 2019, Reichstein et al 2019). Specifically, artificial neural networks have shown to be one of the most effective and low-demanding tools in predicting the spatio-temporal distribution of both gaseous pollutants and atmospheric PM (Feng et al 2019, Lautenschlager et al 2020), especially over areas with sparse monitoring sites (Alimissis et al, 2018).

The present study develops a novel and general Deep Learning approach for spatially and temporally continuous air pollution mapping based on a suite of satellite retrievals. As previous studies mainly rely on AOD, meteorological and land-use data to infer ground-level PM (Ma et al 2014, Wei et al 2019), this machine learning application aims to account for chemical processes connected to primary and secondary aerosol formation/evolution and for different emission sources by exclusively relying on satellite-retrieved data. Moreover, the presented application targets an entire decade (2005–2015), with the purpose to quantify and capture spatio-temporal patterns and long-term trends in PM10 and PM2.5 concentrations and their epidemiological impacts. Specifically, this work aims to (i) investigate the predictive skills of a set of satellite-based proxies in reproducing ground-level PM10 through deep neural networks (DNNs), (ii) explore the predicted seasonal and inter-annual changes in air quality over the period 2005–2015 in response to variable atmospheric composition, land use and emissions, and (iii) analyze the chronic health impacts due to PM2.5 exposure. The focus is on Equatorial Asia, a tropical region particularly sensitive to changes in climate that in recent decades has experienced significant urbanization and land-use/land-cover change (Field et al 2009, Gaveau et al 2014). Recent studies have shown that haze episodes and extreme air pollution concentrations have become more frequent due to both increased local/urban emissions and transboundary pollution (Aouizerats et al 2015, Lee et al 2018, Hansen et al 2019, Alifa et al 2020). Equatorial Asia is also currently one of the most densely populated regions in the world, thus the need of improving air quality to reduce harmful impacts on human health is particularly pressing.

2. Data

2.1. PM10 observations

Long-term observations of PM10 concentrations from a network comprising 52 ground-level monitoring stations across Peninsular Malaysia and Malaysian Borneo are analyzed (figure 1). The sites have been active during the period 1997–2015 and monitored PM10 concentration through beta attenuation or tapered element oscillating microbalance instruments, as part of the continuous air quality monitoring program of Malaysia. Measurements have been standardized using universal calibration approaches. In this work, daily mean PM10 values are used to investigate air pollution variability at multiple time scales, including monthly, seasonal, annual, and inter-annual.

Figure 1.

Figure 1. Analyzed region and location of the 52 ground-based stations monitoring PM10 (blue triangles). The color shading indicates satellite-retrieved tropospheric column NO2 [mmol m−2] averaged during 2005–2015.

Standard image High-resolution image

2.2. Satellite observations of aerosols, atmospheric trace gases and land use

Satellite retrievals of aerosol properties, trace gases and land use are used to develop a satellite-based proxy able to capture the variability of ground-level PM10. Key features of the analyzed satellite retrievals are summarized in table S1. Specifically, our proxy is based on:

  • Aerosol optical depth (AOD) data from MODIS (Moderate-resolution Imaging Spectroradiometer) Collection 6 deployed onboard the NASA Terra and Aqua satellites. Level 2 (L2) daily AOD (at the λ = 550 nm wavelength) at 1 km × 1 km resolution from the multi-angle implementation of atmospheric correction (MAIAC) algorithm (Lyapustin et al 2018) is used. MAIAC is chosen because characterized by a wider spatial coverage and higher retrieval accuracy, compared to other algorithms applied in neighboring regions (Mhawish et al 2019). AOD is chosen as a proxy for suspended aerosols in the atmosphere, including fine solid PM. Over land, 95.5% of the AOD data used have the highest quality in the MAIAC product.

  • Column water vapor (CWV), retrieved as a daily, 1 km × 1 km resolution data from MODIS on Terra and Aqua, and corrected through MAIAC. CWV is considered as indicator of liquid suspended particles/droplets, which affect AOD measurements as absorbing part of the radiation detected by MODIS.
  • Normalized difference vegetation index (NDVI), retrieved as monthly Level 3 (L3) 1 km × 1 km resolution quantity from MODIS onboard Aqua. NDVI denotes the vegetation surface coverage and is used as a proxy of natural emissions of PM precursors (e.g. volatile organic compounds (VOCs), mainly isoprene), as well as an important absorber of both atmospheric PM10 and its gaseous precursors (Nowak et al 2014, 2018).
  • Carbon monoxide (CO), tropospheric amount derived from the Measurements of Pollution in the Troposphere (MOPITT) sensor onboard Terra (Deeter et al 2017), as gridded L3 (Version 8) monthly averages at 1° × 1° latitude × longitude resolution. CO is taken into account as an indicator of primary PM emissions derived from both anthropogenic (e.g. traffic) and natural (e.g. wildfires) combustion processes.
  • Urban fraction (UF), from the Consensus Landcover dataset (Tuanmu and Jetz 2014), as a single satellite image with 30-arc-second spatial resolution (∼1 km at the equator). UF is included, as expected to be positively associated with anthropogenic emissions of both primary PM10 and gaseous precursors of secondary aerosol.

Tropospheric amounts of trace gases and ultra-violet (UV) irradiance, measured by the ozone monitoring instrument (OMI) onboard the NASA's Aura spacecraft, are also considered in this study as key precursors of both inorganic and organic secondary atmospheric PM:

  • Nitrogen dioxide (NO2), daily tropospheric column, cloud-screened at 30%, with 0.25 × 0.25° latitude × longitude resolution from the OMNO2d (V3) L3 product (Duncan et al 2018).
  • Sulfur dioxide (SO2), daily L3 column amount within the planetary boundary layer (OMSO2e, V3) at 0.25° × 0.25° resolution (Krotkov et al 2008).
  • Formaldehyde (HCHO), daily L3 weighted mean global V3 (OMHCHOd) HCHO column amount, gridded at 0.1° × 0.1° resolution. HCHO is included, similarly to (Sullivan et al 2016), as a proxy for the availability of secondary organic aerosol precursors (e.g. VOCs).
  • Ultra-violet irradiance (UV), daily gridded L2 retrieval (OMUVBG, V3) at 0.25° × 0.25° resolution, measured at λ = 310 nm. UV is considered as the main energy source of the photochemical reactions that lead to secondary aerosol formation.

2.3. Population data

Population data are retrieved from the Socioeconomic Data and Application Center (SEDAC) census archived in the NASA Earth Observing System Data and Information System. Population counts, available at 1 km × 1 km resolution for the years 2005, 2010 and 2015, are upscaled to the reference grid by summing all the cells falling inside each 0.25° × 0.25° square unit. A linear regression across the 3 available years is performed for each grid cell i to account for the different demographic growth rate across the whole region for all other intermediate years.

3. Methods

3.1. Data pre-processing

To test models skills in predicting ground-level PM10, satellite data are extracted by averaging the values of the pixels falling within a 20 km radius around each measuring station. The radius choice derives from our sensitivity analysis that identifies the minimum radius maximizing the correlation between daily PM10 and AOD while retaining an appreciable number of non-missing data over the entire period 2005–2015. AOD averages are computed if at least 5% of the total number of pixels inside the 20 km radius are non-missing data.

An autocorrelation analysis is performed at each site to quantify the actual scales of PM10 temporal variability. As the PM10 autocovariance function displays an exponential decay (also shown by (Alifa et al 2020)), the mean autocovariance among sites reaches the value of 1/e (∼0.37) at a lag equal to 7 d (figure S1 (https://stacks.iop.org/ERL/15/104088/mmedia)), thus we average daily PM10 concentration using a 7-d moving average without discarding significant temporal variability. This moving average is applied only when at least 3/7 values are non-missing. An analogous moving average is applied to daily values of AOD, CWV, NO2, SO2, HCHO and UV. Monthly satellite-retrievals of CO and NDVI are replicated on a daily basis for each month and a 7-d moving average is applied to smooth the transition between consecutive months.

Satellite retrievals are homogenized to the reference OMI 0.25° × 0.25° (latitude × longitude) grid when aiming to predict PM10 maps over Equatorial Asia. As CO is available at 1° × 1° resolution, each grid cell is divided into 16 sub-cells containing the same value of the initial one to match the reference grid.

3.2. Deep neural networks

DNNs are powerful non-parametric approaches to explain highly non-linear relationships between input and output (Goodfellow et al 2016) and hence appropriate to explain atmospheric chemistry processes in the Earth system (Reichstein et al 2019). Here DNNs are trained using satellite data extracted with a 20 km averaging radius around each station and aggregated with a 7-d moving average. Based on our sensitivity analysis on model performance (figure S2), we define our DNN to comprise two subsequent hidden layers of ten and nine nodes, respectively. Given the 9 inputs and 1 output, the total number of model's parameters is 209 (figure S3). DNNs are trained on a randomly extracted 80% subset of all available data; then, validation is performed on the remaining 20%. The data for DNN are selected when all the nine input variables are non-missing at the same time-step. One hundred trials are performed to eliminate the dependence of model performance on individual random sampling of training data. The overall performance of DNN is evaluated with the Pearson (r) and Spearman (ρ) correlation coefficients between observed and modeled values. The model bias and error are quantified on a seasonal basis using the normalized mean bias factor (NMBF) and the normalized mean absolute error factor (NMAEF) (Yu et al 2006), defined as:

Equation (1)

Equation (2)

where mi represents the estimated PM10 and oi the observed one, while $\bar m$ and $\bar o$ their associated means and n the number of samples of the entire dataset. DNN predictive skills are also compared against the performance of a LM having the same input and output variables. Moreover, as seasonal phenomena (mainly monsoons and wildfires) are present in the analyzed area, model evaluation is also performed over distinct seasons: winter, spring, summer and fall (DJF, MAM, JJA and SON, respectively).

To predict annual mean PM10 spatial fields, other DNNs are trained on monthly aggregated data. PM10 patterns are thus predicted from the monthly aggregated satellite variables homogenized to the reference grid at 0.25° × 0.25° resolution. Due to the presence of several missing values in satellite-retrieved CO, the aforementioned monthly based DNNs are integrated with an additional DNN trained by excluding this variable and used on grid cells where CO is missing. Monthly PM10 maps at 0.25° × 0.25° resolution are finally averaged on a yearly basis, to obtain annual maps during 2005–2015.

A sensitivity analysis performed on a set of meteorological parameters obtained from the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2, (Gelaro et al 2017)) indicates that meteorological variables do not enhance DNN performance on a monthly basis (see table S2), while they are more relevant over shorter time scales and when more data are available (table S3). Thus when aiming to predict annual mean PM10 fields, meteorological data are not incorporated into DNN.

3.3. Premature mortality estimates

We apply the global exposure mortality models (GEMMs) (Burnett et al 2018) to estimate the mortality burden associated with the satellite-derived PM10 concentrations. As GEMMs require PM2.5 data, we estimate the PM2.5/PM10 ratio from simulations of the Weather Research and Forecasting model with Chemistry (WRF-Chem) presented in (Crippa et al 2016) for September, October and November 2015. The annual domain average PM2.5/PM10, retrieved by assuming September and October to be wildfire months and November being representative of the other 10 'ordinary' months, is estimated by averaging all the on-land grid cells. A value of 0.622 ± 0.036 is found, consistent with (Amil et al 2016).

The relative risk (RR), which is the probability of a fatal outcome from a specific disease due to PM2.5 chronic exposure divided by the risk of the same outcome in the case of no-exposure, is calculated for each grid cell i and year, following (Burnett et al 2018):

Equation (3)

where θ, α, μ, μ are age and disease-specific parameters and ${\tilde z_i}$ the maximum between zero and the difference between PM2.5 concentration in grid cell i and the no-observed-effect concentration (2.4 μg m−3). RR is calculated by referring to 5-year age groups >25 years old (i.e. 25–30, 31–35, 36–40... 80 plus) and to four chronic diseases: chronic obstructive pulmonary disease (COPD), lung cancer (LC), ischemic heart disease (IHD) and stroke (S).

The premature deaths in each year and grid cell i are computed as:

Equation (4)

where Pi is the total population living in cell i for a specific year and B the yearly baseline mortality for a specific chronic disease, derived from the Global Burden of Disease (Naghavi et al, 2017). The total population living in the region in figure 1 is considered for the health impact assessment. The uncertainty of (i) PM2.5/PM10 ratio, (ii) GEMMs parameter (θ) and (iii) B, is propagated into the estimates of yearly premature deaths with a Monte Carlo approach on 10 000 simulations, assuming a Gaussian distribution of each of the parameters, similarly to (Giani et al 2020). Estimation of the uncertainty associated with DNNs parameters remains an open question in computer science (Goodfellow et al 2016), thus the uncertainty from PM10 predictions cannot be included in our assessment.

4. Results

4.1. Sources of seasonal and inter-annual variability of air quality in Equatorial Asia

During 2005–2015, significant seasonal patterns are observed in ground-measured PM10, with highest concentrations during summer and lowest in winter (figure S4). The mean PM10 averaged across all stations is higher during Jun–Aug (50.90 ± 23.63 µg m−3) and lower during Dec–Feb (45.63 ± 15.71 µg m−3). This is related to the occurrence of monsoon cycles characterized by a dry period during May–September (Southwest Monsoon) and a wet period during November–March (Northeast Monsoon). Higher PM10 levels are likely during the Southwest Monsoon, as humidity is lower, rainfall is less frequent (Tan et al 2015) and widespread wildfires may occur, particularly during drought years enhanced by intense El-Niño Southern Oscillation (ENSO) conditions (Marlier et al 2012, Field et al 2016). PM10 peaks appear in fall 2006 and 2015 (figure S4), due to the dry conditions brought by ENSO and associated wildfires. During these years, central Sumatra and southern Borneo were hotspots of intense and widespread wildfires, which degraded air quality in surrounding areas (Crippa et al 2016, Field et al 2016). In addition to PM10 seasonal variability due to climate and meteorological conditions, satellite retrievals also display a strong seasonality which may also explain the PM10 variability. AOD and CO show a strong seasonality, with highest values over southern Borneo and central Sumatra during fall (figures S5 and S6), as a result of the Southwest Monsoon and possible wildfires occurrence. A seasonal pattern is also noticeable in SO2, which highlights the impact of volcanic emissions (Carn et al 2017), especially over Java island and during winter (figure S7). Conversely, no clear seasonal pattern is observed for NO2 and HCHO (figures S8 and S9), which however clearly identify key anthropogenic sources. Given the significant seasonal variability of the predictors, our DNNs are developed on a seasonal basis to capture seasonal phenomena key to explain PM10 variability and ultimately improve satellite-based PM10 predictions.

4.2. Model evaluation

The evaluation of seasonal DNN performance and comparison to LM, as described in section 3.2, are here presented. Significantly higher r and ρ are found for DNN compared to LM (table 1), which indicate that DNNs present higher performance in predicting ground-level PM10 values. This is likely due to the presence of non-linear mechanisms and interactions between variables, typical of environmental pollution systems, which are not captured by the LM. The accuracy of DNN predictions is also higher, as both model bias (NMBF) and absolute error (NMAEF) are significantly closer to 0. Moreover, while LM underestimates PM10 observations in every season, DNN presents a negative NMBF in summer and fall, when PM10 peaks generally occur, and a slightly overestimation in winter and spring. NMBF remains anyway modest and lower compared to prior chemical transport model simulations performed over the area (Gao et al 2014, Crippa et al 2016).

Table 1. DNN and LM performance on 7-d moving average PM10 data. Statistical metrics include correlation coefficients (r and ρ) as well as NMBF and NMAEF (see definitions in Methods) computed on validation data (i.e. 20% of the sample size n).

SeasonsModelrρNMBFNMAEFn
Winter DNN 0.694 0.628 0.0001 0.1891 11 775
LM 0.453 0.372 −0.0438 0.2442
Spring DNN 0.643 0.544 0.0007 0.2018 15 830
LM 0.488 0.379 −0.0421 0.2369
Summer DNN 0.744 0.628 −0.0025 0.2208 15 845
LM 0.598 0.553 −0.0482 0.2589
Fall DNN 0.777 0.656 −0.0004 0.2018 8335
LM 0.684 0.565 −0.0403 0.2420

The seasonal variability of model performance reflects the complexity of the relationship between dependent (measured PM10) and independent variables (satellite-retrieved). DNNs skill on relatively low PM10 (∼40 µg m−3) remains similar among seasons; however, the occurrence of higher PM10 in summer and fall favors an improved fit, with the model being able to estimate a wider range of values (figure S10). Such model behavior suggests the presence of a baseline PM10 level that cannot be fully explained by the predictors included in the model. Some satellite retrievals are also subject to higher uncertainty when aiming to detect low levels of trace gases, thus other predictors, such as meteorological variables, may be included to provide additional information on PM10 variability.

DNN seasonal performance is generally higher when input data are aggregated on a monthly basis. The monthly averaging reduces some short-term variability, while still capturing seasonal patterns, and produces a non-negligible increase in the linear correlation coefficients r and ρ, compared to DNN trained on 7-d moving averages (table 2 and figure S11). In this case, training is performed on all the available data, as the monthly temporal aggregation reduces the sample size by an order of magnitude and precludes generating random samples, representative of most of data variability, for training and validation.

Table 2. DNN statistic metrics for model evaluation. Comparison between DNN trained on data aggregated with 7-d moving averages (MA) and monthly means (MM). Statistical metrics of model performance are computed using the entire sample size n, differently from the results shown in table 1, which refer to the validation on 20% of all available data.

SeasonModelrρNMBFNMAEFn
Winter MA 0.731 0.666 −0.0004 0.1779 11 775
MM 0.712 0.668 −0.0004 0.1594 1112
Spring MA 0.694 0.578 0.0002 0.1921 15 830
MM 0.724 0.679 −0.0009 0.1491 1393
Summer MA 0.786 0.644 −8.625e–05 0.2087 15 845
MM 0.805 0.759 0.0009 0.1474 1466
Fall MA 0.825 0.690 −0.0003 0.1873 8335
MM 0.905 0.741 −0.0009 0.1423 1081

4.3. Spatial variability in annual PM10 and PM2.5 concentrations

DNN trained on monthly aggregated data are applied to predict annual mean PM10 at 0.25° × 0.25° resolution. The estimated annual PM10 means are slightly underestimated compared to the observed values (mean bias of −3.15 µg m−3 over the entire period). Figure 2 compares two ordinary years (i.e. 2008 and 2014) against 2006 and 2015, which instead experienced widespread wildfires and, consequently, intense haze phenomena and extreme concentrations particularly in southern Borneo and central Sumatra.

Figure 2.

Figure 2. Annual mean PM10 concentrations (0.25° × 0.25° resolution) estimated with satellite-based DNN during ordinary (2008 and 2014) and wildfire (2006 and 2015) years.

Standard image High-resolution image

Diffusion and dispersion phenomena are also captured by the model, as wildfire emissions appear to have spread towards densely populated areas (mostly Singapore and Kuala Lumpur), as also identified in prior modeling studies (Crippa et al 2016, Lee et al 2018, Mead et al 2018).

High values of PM10 (>50 µg m−3) also occur over Peninsular Malaysia, central/eastern Sumatra and part of Java every year (see figure 1 for the locations of these islands), due to the combined effect of local emission sources and transnational pollution transport (Lee et al 2016). Urban scale pollution is also captured by the model, as localized pollution peaks are present over metropolitan areas including Singapore, Jakarta and Kuala Lumpur. The yearly average PM10 over these areas always exceeds the World Health Organization threshold of 50 µg m−3, thus suggesting that both wildfires and large anthropogenic emissions are critical in deteriorating the regional air quality and lead to potentially severe impacts on human health. Analogous results are found for yearly maps of PM2.5 where most of the analyzed domain exceeds the yearly average WHO standard of 10 µg m−3 (figure S12).

4.4. Trends in human health impacts

Yearly satellite-based PM10 spatial fields, generated with DNNs, are converted to PM2.5 maps (see example in figure S12) using a PM2.5/PM10 ratio estimated from WRF-Chem (see Methods) and fed to the GEMMs. Premature deaths are computed for each year by integrating the estimated RR with population distribution maps. The total premature mortality burden and the associated 95% confidence interval (C.I.) are reported in table 3.

Table 3. Total estimated premature deaths associated with PM2.5 and 95% confidence interval (C.I.) (columns 2 and 3, respectively) over the analyzed domain for each year during 2005–2015. Columns 4–7 indicate the percentage of premature deaths associated with each of the diseases analyzed (see section 3.3 for their definition). Column 8 contains the total exposed population (in millions) over the analyzed domain.

YearTotal Deaths95% C.I.%COPD%LC%IHD%SPopulation
2005 149 500 108 400–193 400 10.51 6.97 49.20 33.31 284
2006 158 200 111 100–202 300 10.84 6.95 48.51 33.70 289
2007 149 500 106 000–193 500 10.69 7.02 49.57 32.72 294
2008 159 800 112 200–205 600 10.35 7.07 49.54 33.05 298
2009 162 100 115 300–210 100 10.41 6.87 49.66 33.06 303
2010 160 700 113 600–206 800 10.07 7.04 50.42 32.47 308
2011 173 900 124 200–223 400 10.00 7.08 49.97 32.95 313
2012 170 900 122 800–218 400 9.91 6.99 50.40 32.71 317
2013 173 500 122 800–224 400 9.71 7.08 50.56 32.65 322
2014 191 000 137 100–245 000 9.60 7.22 49.50 33.68 327
2015 203 900 145 200–260 100 9.50 7.24 49.22 34.04 332

The most relevant diseases over the analyzed decade are IHD and stroke, which on average contribute to 49.69% and 33.12% of the premature deaths, while COPD and LC are responsible for the 10.14% and 7.05%, respectively. A positive trend is seen in the absolute number of estimated deaths, partially due to the significant population growth over the area: from ∼284 M in 2005 to ∼332 M in 2015. A rise through the years is evident also when the number of deaths is normalized by the exposed population: the mean trend, calculated by excluding 2006 and 2015 as 'extraordinary' wildfire years, is significantly increasing (+5.19 deaths/Mpop/year, p-value = 0.033, figure 3). The trend including all years would be +6.38 deaths/Mpop/year. This suggests that regional air quality has deteriorated and its effects enhanced during the past decades, especially over big cities, where the majority of people lives and population growth rates are higher. The effect of wildfire occurrence is clear, particularly in 2015, and to a lower extent in 2006. The number of deaths per million inhabitants in these years is in fact higher than the mean trend. The same happens in 2014, although less affected by wildfires, as it presented higher mean concentrations than other years (figure 3). Our mortality rate of ∼570 deaths/Mpop/year for the four analyzed diseases, quantified as the mean trend in figure 3, moderately underestimates the World Health Organization value of 676.4 deaths/Mpop/year for 2016 (computed as the mean rate for Indonesia and Malaysia) (WHO 2020).

Figure 3.

Figure 3. Annual median premature deaths (white line) per million people due to PM2.5 chronic exposure. The blue shading indicates the standard deviation computed through Monte Carlo approach, assuming a Gaussian distribution of death estimates. The trend over 2005–2015 is indicated by the black dashed line.

Standard image High-resolution image

The total burden of PM2.5-related deaths during 2005–2015 is mapped to highlight the most impacted regions (figure 4). Big metropolitan areas, including Jakarta, Singapore and Kuala Lumpur, stand out clearly as the most affected areas. This is certainly due to the large population, but PM2.5 also plays a crucial role, as it peaks over those locations (figure 2). Other PM2.5-related health effects, beyond to big cities, are prominent in highly urbanized areas, such as Java and the west coast of Peninsular Malaysia. Health effects of wildfires are instead moderate over southern Borneo and central Sumatra, which are sparsely populated (figure 4). Conversely, wildfire contribution to premature mortality is most likely determined by transport phenomena from burnt areas to densely populated areas, thus impacting the total amount of victims (table 3).

Figure 4.

Figure 4. Cumulative premature deaths associated with long-term exposure to PM2.5 concentrations during 2005–2015, as estimated by applying GEMMs to the satellite-based estimates of annual mean PM2.5. The number of deaths refers to each grid cell with 0.25° × 0.25° resolution.

Standard image High-resolution image

5. Conclusions

In this study, we develop a novel DNN approach trained on a suite of satellite-retrieved variables related to atmospheric physics and chemistry, and land use, to remotely predict ground-level PM10 concentrations. The model is developed for Equatorial Asia but its applicability extends to any region with a sparse monitoring network. DNNs generally show enhanced predictive skills and lower bias compared to the more classical LM approach, as able to capture significant non-linear mechanisms and variable interactions dictating PM10 concentrations. On a seasonal basis, the dry period brought by the Southwest monsoon, possibly enhanced by ENSO, is found to be associated with higher PM10 that lead to an improved model performance during summer and fall. Higher PM10 in the fall is associated with the less frequent wet deposition processes and enhanced wildfires occurrence, which determined the extreme haze events recorded in 2006 and 2015. As significant spatio-temporal variability remains poorly explained for relatively low PM10, future research should focus on including additional meteorological variables (e.g. wind speed/direction, ground temperature and planetary boundary layer height), which may enable description of aerosol vertical profiles and transport and dispersion processes on the local scale. Further, while this study shows high skill of DNN in predicting surface PM concentrations, future investigations could be directed to quantify the predictive skills of a different machine learning approaches (e.g. random forests, gradient boosting machine or mixed models).

The annual PM10 and PM2.5 maps reveal significant spatial and inter-annual patterns related to both anthropogenic drivers and wildfires. The estimated health impacts indicate that metropolitan areas remain the most affected, due to the combined effect of numerous anthropogenic emissions and high population density. Conversely, the effect of wildfires dominates on the regional scale, as indicated by the strong inter-annual variability in the number of premature deaths over the region, which are significantly higher during fire years than adjacent non-fire years. We also found a significant increasing trend of PM2.5-related mortality of +1600 additional deaths per year on average over 2005–2015. In addition to the population growth, a possible explanation for this include the ongoing urbanization and land-use and land-cover changes, as well as large-scale climatic changes, such as the enhanced intensity of ENSO and more frequent wildfire events. Future research will be directed to attribute the role of these drivers through numerical model simulations including different climate conditions and emission scenarios.

Acknowledgments

The authors gratefully acknowledge the National University of Malaysia (Universiti Kebangsaan Malaysia, UKM), METMalaysia (Malaysian Meteorological Department) and the Malaysian Department of Environment (DOE) for providing access to the observational data used in this study. The authors acknowledge NASA for accessibility to MODIS and OMI data (downloaded from open-access website https://earthdata.nasa.gov), MERRA-2 output (https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/data_access/) and population/demographic data (obtained from NASA SEDAC (https://sedac.ciesin.columbia.edu). Global Burden of Disease used in this study have been accessed from the Institute for Health Metric and Evaluation website: http://ghdx.healthdata.org/ihme_data.

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Please wait… references are loading.