A novel approach for determining source–receptor relationships in model simulations: a case study of black carbon transport in northern hemisphere winter

A Gaussian process emulator is applied to quantify the contributions of local and remote emissions of black carbon to its concentrations in different regions using a Latin hypercube sampling strategy for emission perturbations in the offline version of the Community Atmosphere Model Version 5.1 (CAM5) simulations. The source–receptor relationships are computed based on simulations constrained by a standard free-running CAM5 simulation and the ERA-Interim reanalysis product. The analysis demonstrates that the emulator is capable of retrieving the source–receptor relationships based on a small number of CAM5 simulations without any modifications to the model itself. Most regions are found to be most susceptible to their local emissions. The emulator also finds that the source–receptor relationships are approximately linear, and the signals retrieved from the model-driven and reanalysis-driven simulations are very similar, suggesting that the simulated circulation in CAM5 resembles the assimilated meteorology in ERA-Interim. The robustness of the results provides confidence for application of the emulator to detect dose–response signals in the climate system.


Introduction
Black carbon (BC, commonly referred to as 'soot') is an important climate forcing agent owing to its strong radiative absorptivity. It has both anthropogenic and natural sources associated with incomplete combustion from fossil fuel and biomass burning. The lifetime of BC in the atmosphere ranges from several days to a few weeks, enabling these aerosols to undergo intercontinental transport (Ramanathan and Carmichael 2008). Because of uncertainties associated with the emission inventory of aerosols as well as the model treatment of atmospheric circulations and aerosol processes, general circulation models (GCMs) show diverse results in BC distributions (Koch et al 2009, Textor et al 2006, Lee et al 2012b particularly in remote regions (Rasch et al 2000) such as the Arctic (Shindell et al 2008), introducing challenges for quantifying domestic and foreign contributions to the BC concentration. A variety of approaches have attempted to attribute sources of atmospheric BC, ranging from the trajectory-based techniques (Barnaba et al 2011, Dentener et al 2006, Kuhn et al 2010, Stohl 2006 to the GCM-based approaches including the tracer-tagging technique (Liu et al 2009) and the simple sensitivity test, called the 'one factor at a time' (OFAT) approach, that changes emission sources one at a time (Bourgeois and Bey 2011, Fisher et al 2011, Koch and Hansen 2005. For the GCM-based approaches, while the tagging technique provides most of the information in a single model simulation, the complexity of the technical implementation increases as the complexity of the processes in which the tracer of interest is involved increases. The OFAT type of sensitivity study, on the other hand, is a straightforward approach, but it requires a large number of simulations and lacks the capability to account for the nonlinear effects and interactions between different emission sources.
In recent years, the climate modeling community has employed various techniques to perform sensitivity studies that investigate the response of a particular measure of the climate (e.g., cloud condensation nuclei, etc) to the variation of model parameters. These methods include the traditional OFAT type of approach (Haerter et al 2009, Lohmann andFerrachat 2010), the adjoint modeling approach (Karydis et al 2012b(Karydis et al , 2012a, and the 'design of experiment' (DOE) approach (Lee et al 2011, Le et al 2012a. The DOE approach does not require modifications to the model, similar to the OFAT approach, but it uses a relatively small number of model simulations and allows researchers to perturb multiple uncertain parameters simultaneously instead of adjusting the uncertain parameters in an OFAT fashion. Statistical surrogate models, or 'emulators', are then applied to extract signals from those model simulations, providing insights into the trends and relationships between variables. Compared with the OFAT method, the DOE type of approach has been proven to (1) greatly reduce the number of experiments, (2) provide more precise estimates of the effect of each parameter, (3) give accurate estimates of the effect of the interactions between two parameters, and (4) cover a larger portion of the parameter space (Czitrom 1999). Given the advantages, the DOE approach has been applied for uncertainty quantification studies to investigate the parametric uncertainty of climate models (Yang et al 2012, Lee et al 2011, Yang et al 2013, Zhao et al 2013, Le et al 2012a.
In this study, we utilize a Gaussian process (GP) emulator (Sacks et al 1989), part of the recently released Gaussian process models for simulation analysis (GPM/SA) toolkit (Higdon et al 2008), to explore the wintertime BC source-receptor relationships from the Community Atmosphere Model Version 5.1 (CAM5) (Neale et al 2010 simulations. Our primary goal is to consider whether a GP emulator is capable of identifying the response of a complex climate model based on a small number of simulations. The sensitivity of regional BC burden and deposition to the changes in local and remote BC emissions is explored using GP emulations as a demonstration of the applicability of the DOE approach for climate studies.

Methodology
We use CAM5 that includes a 3-mode version of the Modal Aerosol Model (MAM3) (Liu et al 2012) for representing the aerosol lifecycle, and use an improved representation of wet removal and convective transport of aerosols (as described in Wang et al (2013)) that provides better fidelity in aerosol simulations. The model is configured in an 'offline mode' (Rasch et al 1997, Lamarque et al 2012 where the model's meteorology is strongly constrained by an external dataset so that changes in aerosol loading and distribution have little effect on the model meteorology. Two meteorological datasets are used in this study: (1) an archive of an earlier CAM5 simulation where meteorological fields are evolved according to the governing equations of the model; and (2) an observation-constrained meteorological archive using the ERA-Interim reanalysis (Dee et al 2011). We perform two ensemble simulations (listed as CAM5 AMIP and CAM5 ERAI, respectively) with 20 members in each ensemble to (1) explore the source-receptor relationships of BC in the winter of 2008 (December 1, 2007-February 29, 2008, and (2) understand the effect of circulation biases potentially present in CAM5 meteorological fields in attributing sources of BC. The model uses a horizontal grid spacing of 1.9 • × 2.5 • with 56 levels in the vertical as described in Lamarque et al (2012). All model simulations use the same initial condition archived from the last time step of a previously completed 1 yr simulation for spin-up.
Aerosol emissions used in this study are compiled by Louisa Emmons for the Polar Study using Aircraft, Remote Sensing, Surface Measurements and Models, of Climate, Chemistry, Aerosols, and Transport (POLARCAT) Model Intercomparison Project (POLMIP), available at ftp://acd. ucar.edu/user/emmons/EMISSIONS/arctas streets finn/. The dataset consists of time-invariant anthropogenic emissions based on David Streets' inventory for Arctic Research of the Composition of the Troposphere from Aircraft and Satellites (ARCTAS) (Jacob et al 2010) and the daily varying aerosol emission from fires ('FRE') (Wiedinmyer et al 2011). We distribute the total column aerosol emission from FRE at elevated levels with a vertical profile described in the AEROsol model interCOMparison project (AeroCom) (Dentener et al 2006, Textor et al 2006. In this dataset, the BC emission from FRE takes place in Central America, South Asia, North Africa, southeast China, and northeast Russia in the simulation period (figure not shown), and accounts for about 20% of the total global BC emission.
We divide the Earth into ten geographical regions of interest (figure 1): Arctic ('ARC'), North America ('NAM'), North Asia ('NAS'), Central America ('CAA'), North Africa ('NAF'), East and Central Asia ('ECA'), South Asia ('SAS'), North Hemispheric oceans ('NHO'), and South Hemisphere ('SHM'). Anthropogenic BC emissions from these regions vary greatly due to the differences in areal extent and aerosol sources within each area, from 0.05 kg s −1 in ARC to 68.31 kg s −1 in ECA. Since the uncertainty associated with the BC emission inventory is generally considered to be at least a factor of two (Ramanathan andCarmichael 2008, Bond et al 2004), we perturb the anthropogenic BC emissions in these ten regions simultaneously in this study by adding each regional emission with a different emission perturbation allowed to vary between zero and 10% of the global anthropogenic emission, such that (1) the emission perturbations applied in every region have the same magnitude, and (2) the maximum possible global anthropogenic emission (when every regional emission is added with 10% of the global anthropogenic emission) is within the factor of two variation. The emission perturbations are chosen based on a Latin Hypercube design of experiment that takes 20 sets of ten regional emission perturbations for the 20 CAM5 experiments for each in the two ensembles.
The 20 CAM5-simulated results are then used to constrain a GP model to be used as an emulator of the simulation response at each point in the input domain. Markov Chain Monte Carlo is used to obtain 500 posterior samples of uncertain GP parameters including length scales of the negative-exponential-squared covariance function, process variance, and process residual or noise (section S1 of the supplementary material available at stacks.iop.org/ERL/ 8/024042/mmedia) (Higdon et al 2003(Higdon et al , 2008. Posterior realizations across these samples of uncertain parameters represent the uncertainty of emulator predictions of the GCM response. In this study, we consider the 95% confidence bounds in these posterior prediction distributions. A brief description of the GP emulator is given in the appendix.

Results
Since we concurrently perturb all regional emissions, the relationship between the global BC burden and the BC emission from a particular region is difficult to retrieve from a conventional correlation analysis, plotted in marginal projections in figure 2. The lines and vertical bars in figure 2 show the emulator-predicted mean and 95% confidence bounds (i.e., the uncertainty range) of the global BC burden for the emission variation at each region, conditional on the other regional emissions at their mid-range value. The approximately linear relationships explain the emulator's ability to effectively model the response given the small number of simulations relative to the number of parameters. Cross-validation that compares the emulator-retrieved mean response with the corresponding CAM5 simulation indicates an effective prediction of CAM5 results (section S2 of the supplementary material available at stacks.iop.org/ERL/8/024042/mmedia). The uncertainty in the emulated response, depicted as vertical bars in figure 2, is usefully narrow to proceed with analysis. The relationships retrieved from CAM5 AMIP and CAM5 ERAI ensembles are very similar, suggesting that CAM5 is skillful in generating meteorological fields similar to those produced by assimilation of observations.
Because the relationships are found to be approximately linear, we can define a 'contribution efficiency' parameter η for each region as the marginal increase of global atmospheric BC burden (M) per unit increase in regional emissions (E): η = ∂M ∂E (unit = kg kg −1 s). The values of η for global BC burden are given in figure 2. This analysis shows that the sensitivity of the global wintertime BC burden to a unit increase of emissions (1 kg s −1 ) greatly depends on the region where the BC particles are emitted, varying by about a factor of 4, with η ranging from 0.15-0.16 kg kg −1 s for NHO to 0.62-0.67 kg kg −1 s for NAF. The source region dependency of the contribution efficiency indicates that the fate of BC emitted from different regions may depend on local meteorological conditions including winds and clouds that regulate the transport and removal of BC, which have strong seasonal variations (Kunkel et al 2012, Burrows et al 2009. When BC particles are emitted from a region with strong convective transport or absence of cloud, they can remain in the atmosphere for a longer period of time, resulting in a greater impact on the global BC burden. In CAM5, the variation of BC loading in the atmosphere changes the size distribution and the hygroscopicity of Figure 2. Marginal projections of CAM5-simulated global total column BC burden (unit: 10 6 kg) versus regional anthropogenic BC emissions (unit: kg s −1 ), and the emulator-predicted relationships (line) and uncertainty (vertical bar) between global total column BC burden and regional anthropogenic BC emissions, constrained by the meteorology from free-running CAM (blue) and ERA-Interim (red). The correlation coefficients (R) associated with each scatter plot and the contribution efficiency parameters (η; unit = kg kg −1 s) of each emulator-predicted relationship are given.  aerosols and, hence, can affect the aerosol wet removal efficiency (Liu et al 2012, Wang et al 2011. As a result, the sensitivity of the BC burden and surface deposition rates to the emission perturbation of a particular region depends on the background BC concentration, which is affected by emissions from other regions as well. The nonlinearity associated with aerosol-cloud interactions is identifiable in many climate models (Koch et al 2011). While neither the OFAT nor the tagging approach can identify these interaction effects, the emulator is able to quantify these terms; they are, however, small in this experiment (figure 3).
We further explore the emulator-derived contribution efficiency for BC burden (figure 4, upper panel) and surface deposition rate (figure 4, lower panel) for different regions.
In general, BC burden and surface deposition rates are most sensitive to the increase of local emissions, and less sensitive to that in remote regions. Emission perturbations in some regions (e.g., ARC) are found to affect only the limited areas nearby, while emission perturbations in some other regions (e.g., NAF and ECA) can influence many other regions. This analysis also finds that the emission perturbation in a particular region can have a different effect on the BC burden and on the surface deposition in a receptor region. These effects may be related to the amount and vertical distribution of aerosols in a particular column of the atmosphere, which are associated with the meteorological fields (such as cloudiness and convective stability) along the transport pathways. These factors affect the aerosol advection and deposition. In a receptor region, if the BC influx takes place in the lower troposphere where most liquid clouds are, the aerosols are more likely to be removed by scavenging processes and, hence, are deposited at the surface. On the other hand, BC transported in the upper troposphere where the wet removal mechanism is absent is likely to stay in the atmosphere for a longer period. Accordingly, since BC transport into the Arctic from Europe occurs mostly in the lower troposphere (Stohl 2006), European BC emission can make a larger contribution to the surface BC deposition than the BC burden in the Arctic.
To explore the contribution of each emission source to the BC burden and deposition, the contribution efficiency parameters (figure 4) are multiplied by the regional anthropogenic emission (figure 1) and normalized with respect to the regional burden or surface deposition rate, as depicted in figure 5. The regions signaling a low sensitivity to anthropogenic emissions are those where BC emitted from fires dominates. This analysis shows that the local emissions contribute the largest portion of the BC burden and surface deposition over most regions, except for the Arctic (figure 5) where local emissions are negligible despite the fact that the contribution efficiency of the local emissions in the Arctic is very high (figure 4). We also find that due to its large emissions, East and Central Asia (ECA) is responsible for a significant portion of the aerosol atmospheric concentration in most regions in the northern hemisphere, despite the low contribution efficiency ( figure 4). Furthermore, this analysis indicates that many regions are greatly influenced by FRE emissions, which are placed at altitudes where wet scavenging is weak due to the absence of clouds.

Concluding remarks
The climate modeling community usually adopts the OFAT method for sensitivity studies to identify the effect of various factors on the climate system. However, owing to the a priori unknown complexity and nonlinearity of the climate system and simulation response, it is computationally expensive to systematically identify the cause-and-effect of multiple factors in climate models. In this study, taking the wintertime BC as an example, we use a GP emulator, constrained by a small number of CAM5 simulations from a Latin Hypercube experimental design, to demonstrate that the source-receptor relationships can be identified to characterize constituent burden and surface deposition rates from emissions in various regions. Therefore, with an appropriate design of experiment and emulations, source-receptor relationships of different tracers in different seasons, various kinds of sensitivity studies, as well as the studies for the detection and attribution of climate change can be achieved with a relatively low computational cost.
In this study, we have demonstrated that the simulations driven by meteorological fields from a model allowed to evolve and simulations driven by meteorological fields constrained by the assimilated meteorology from reanalysis products yield very similar results, suggesting that the climate model is quite skillful in simulating the transport of aerosols. Our analysis also finds that the aerosol burden and surface deposition rates are linearly related to emissions, and most regions are the most vulnerable to the increase of local emissions, although Asia is found to contribute a significant portion of BC burden and deposition over many regions in the north hemisphere. These results provide insights into the dynamical, physical, and chemical processes of the climate model, and the conclusions may have policy implications for making cost-effective global and regional pollution reduction strategies.
NSF/DOE/USDA Decadal and Regional Climate Prediction using Earth System Models (EaSM) program. The Gaussian process models for simulation analysis (GPM/SA) methods and code package were developed with support from the DOE/NNSA's Advanced Simulation and Computing program and the DOE's Scientific Discovery through Advanced Computing (SciDAC) Quantification of Uncertainty in Extreme Scale Computations (QUEST) Institute.
, or for all the sampled simulation values the matrix C(X) = e − 1 2 X T BX . The covariance (X) = ρC(X) + σ 2 I. When response samples Y are transformed by subtracting their mean and dividing by their standard deviation so that they are approximately standard normal, the statistical model is Y ∼ N(0, ) where ≡ (X). The free parameters of the emulator are the diagonal elements of B, ρ, and σ . The b i are the length scales, expressing the complexity of the simulation response to each simulation parameter. The parameter ρ is the process overall marginal variance, and is required to allow the best fit of the emulator to the standardized Y. The σ 2 is the residual variance, an estimate of noise on the response samples that are not predictable with the emulator.
The model then has a likelihood of the data given parameter settings: L(Y|B, ρ, σ ) ∝ | | − 1 2 e − 1 2 Y T −1 Y . Prior distributions on emulator parameters are chosen to allow the dataset to determine the parameters' preferred values within reasonable ranges rather than providing strong guidance to their setting. These so-called 'weak' priors are that the length scale should be small (i.e., response surface is smooth), the process variance ρ is allowed to be large (allowing the case where the model will be strongly linear) and the residual variance should be small. In the GPM/SA toolkit, the default priors of the parameters are assumed Beta or Gamma distributions: π(e − 1 4 b i ) = β(1, 0.1), π( 1 ρ ) = (1, 10 −3 ), and π( 1 σ 2 ) = (3, 3 × 10 −3 ), where π(·) are prior distributions. The posterior of these parameters are then sampled using Markov Chain Monte Carlo (MCMC) sampling, from the posterior probability distribution: p(B, ρ, σ ) = L × π(B)π(ρ)π(σ ). A representative sample of these parameters is illustrated in the supplementary material (available at stacks.iop.org/ERL/8/024042/mmedia). To express the full uncertainty of the emulator predictions, multiple predictions of µ p and p are computed for sets of emulator parameters drawn from the posterior distribution in the MCMC sampling. This statistical emulator is then used to predict unknown responses Y p at parameter setting X p by a conditional normal estimate, expressed as Y p ∼ N(µ p , p ), where µ p = (X, X p ) −1 Y, p = (X p ) − (X, X p ) −1 Y, and (X, X p ) = ρC(X, X p ), and the elements of C are c(x1, x2 p ). Posterior realizations across these samples of uncertain parameters represent the uncertainty of emulator predictions of the climate model response. A more detailed description of the GP emulator can be found in Higdon et al (2003Higdon et al ( , 2008.