Do responses to different anthropogenic forcings add linearly in climate models?

Many detection and attribution and pattern scaling studies assume that the global climate response to multiple forcings is additive: that the response over the historical period is statistically indistinguishable from the sum of the responses to individual forcings. Here, we use the NASA Goddard Institute for Space Studies (GISS) and National Center for Atmospheric Research Community Climate System Model (CCSM4) simulations from the CMIP5 archive to test this assumption for multi-year trends in global-average, annual-average temperature and precipitation at multiple timescales. We find that responses in models forced by pre-computed aerosol and ozone concentrations are generally additive across forcings. However, we demonstrate that there are significant nonlinearities in precipitation responses to different forcings in a configuration of the GISS model that interactively computes these concentrations from precursor emissions. We attribute these to differences in ozone forcing arising from interactions between forcing agents. Our results suggest that attribution to specific forcings may be complicated in a model with fully interactive chemistry and may provide motivation for other modeling groups to conduct further single-forcing experiments.


Introduction
The Coupled Model Intercomparison Project, currently in Phase 5 (hereafter CMIP5) aims to further our understanding of past, present, and future climate changes [1]. An important goal of the project is to standardize the output of multiple independently-developed climate models and make simulations freely available. This enables robust comparisons of different models and allows the scientific community to explore climate responses to external factors in a multi-model context.
The modeling groups participating in CMIP5 perform a standard set of experiments. Nearly all models submit an ensemble of 'historical' simulations over a common period . During this time, multiple physical drivers affected the balance between incoming solar energy and energy re-radiated into space. Such changes in the energy balance are known as 'radiative forcings', and the physical drivers that give rise to them are commonly referred to as 'forcing agents' or simply 'forcings'.
The field of climate change detection and attribution (hereafter D&A; [2]) seeks to identify the effects of various forcings on the climate system. D&A is generally a three-step process: first, we identify expected climate changes in response to a particular forcingthe 'fingerprint' of that forcing agent. Second, we determine whether the fingerprint is present in the observational record, and whether a signal of forced climate change is detectable above a noisy background of purely natural internal climate variability. Third, we compare the observed signal to that which emerges from climate models in order to attribute detected climate changes to particular forcings (see e.g. [3]).
Realistic historical simulations must include both natural forcings (like solar changes and volcanic eruptions) and anthropogenic forcings (such as increased aerosols, stratospheric ozone depletion, land cover change, and greenhouse gases). In order to attribute changes over this period, many modeling groups performed simulations with varied subsets of these forcings. Most commonly, modeling groups submitted simulation output forced only by historical greenhouse gas concentrations (the 'historicalGHG' archive) or by natural forcings alone ('historicalNat'). A more limited number of simulations forced by other historical forcings or combinations of forcings were submitted to the CMIP5 'historicalMisc' archive; an overview can be found online at http://cmip-pcmdi. llnl.gov/cmip5/docs/historical_Misc_forcing.pdf.
A key assumption in many D&A studies, particularly those that use multiple linear regression to identify the signals of different forcings (e.g. [4,5]), is that the sum of the climate responses to all individual forcings (or subsets of forcings) is statistically indistinguishable from the response in the all-forcing 'historical' simulations. Whether the climate response is indeed additive across multiple forcings has therefore been an active area of study (e.g. [6][7][8]). The literature suggests that additivity may indeed hold at large scales over the historical period for specific variables and for specific subsets. Reference [9] found no significant difference between simulated historical temperature trends and the sum of trends in models forced by natural and anthropogenic forcings separately. Reference [10] found that additivity holds at global scales during the historical period, but breaks down in certain RCP scenarios. Reference [11], however, tested the assumption of linear additivity for variables associated with the energy budget and hydrological cycle, and found that the responses to CO 2 and (large) solar forcings are generally not additive.
In this paper, we rely on the freely available CMIP5 'historicalMisc' archive to test whether the historical global temperature and precipitation responses are statistically indistinguishable from the sum of responses in the relevant single-forcing experiments (detailed information for this repository is available at http://esdoc.org).

Methods
Only two modeling groups submitted the full complement of CMIP5 single-forcing simulations necessary to test the hypothesis of additivity across: the NASA Goddard Institute for Space Studies (GISS) ModelE2 [12] and the NCAR Community Climate System Model, version 4.0 (CCSM4) [13]. In this paper, we use results from the GISS model coupled to the Russell ocean model (GISS-E2-R). Other modeling groups supplied separate simulations forced with anthropogenic or natural forcings only, but we aim to test for additivity across different anthropogenic forcings, and are thus restricted to output from two modeling groups.
The GISS group submitted two configurations of the GISS-E2-R model: a non-interactive physics configuration (GISS-NINT, denoted as 'p1' according to the CMIP5 nomenclature) in which aerosol and ozone concentrations are provided as inputs to the model and the aerosol indirect effect is parameterized; and a 'Tracers, Chemistry, Aerosols Direct/Indirect Effect' configuration in which aerosol concentrations and atmospheric chemistry are calculated interactively from transient emissions inventories and the aerosol impact on clouds is calculated (GISS-TCADI, denoted as 'p3') [14]. As in GISS-NINT, CCSM4 is forced with pre-computed concentrations of ozone and aerosols. All model output was submitted to the CMIP5 repository and is available for download on the Earth System Grid.
For the non-interactive models GISS-NINT and CCSM4, the CMIP5 archive contains five-member ensembles of simulations forced with: Nat Natural forcings (solar and volcanic) only; LU Land use changes only (due to changes in surface albedo from time-varying changes to the area and locations of crops and pasture); Oz Ozone only, using pre-computed decadalmean ozone concentrations including stratospheric depletion and tropospheric increases over the 20th century;

AA
Anthropogenic tropospheric aerosol concentrations; GHG Well-mixed greenhouse gas concentrations; Hist Historical simulations forced with all of the above.
Because GISS-TCADI uses an interactive chemistry model to convert emissions into concentrations, the single-forcing experimental setup was different. Here, the CMIP5 archive contains five-member ensembles of simulations forced with: Nat Natural forcings (solar and volcanic) only LU Land use changes only. Here, we use the GISS-NINT ensemble, since no separate 'LU'-TCADI simulations were performed. However, LU changes result in very small trends in global-mean temperature and precipitation over the historical period, and because the GISS-TCADI model does not allow for temporal changes in anthropogenic dust sources, we expect the responses to be similar in the NINT and TCADI models.
LLGHG Long-lived greenhouse gas emissions (including CO 2 , N 2 O, and CFCs, but not CH 4 ); AA Anthropogenic emissions of tropospheric aerosol precursors; ARG Anthropogenic emissions of reactive gases (NO x , CO, VOCs, and CH 4 ); Hist Historical simulations forced with all of the above.
Some additional forcing associated with orbital changes is present in the GISS historical simulations, but this has only a tiny impact on simulation over a century time-scale [14]. We note that while CCSM4 performed separate solar-only and volcanic-only simulations, equivalents are not available for the two GISS configurations due to an error in the experimental design of the initial volcanic-only output submitted. Details of all forcings are available in the CCSM4 [13] and GISS [14] model documentation.
In this paper, we will simultaneously test two different hypotheses. First, we consider the null hypothesis H noise : that the temperature or precipitation trends from externally forced simulations are compatible with purely internal climate variability, or 'noise.' If H noise can be ruled out with a particular level of confidence, a trend is said to be detectable above the background of climate noise [2]. Second, we consider the null hypothesis H add : that the responses to different forcings are additive over the historical period. Specifically, we test whether global-average, annual-average temperature and precipitation trends in the ensembles of 'historical' experiments differ significantly from the sum of trends in n F sets of single-forcing experiments (hereafter 'SUM').

Are responses to different forcings detectable above internal variability?
H noise is tested by reporting temperature and precipitation trends in signal-to-noise units. This concept is used in climate change detection and attribution studies (e.g. [3,15,16], and allows us to apply basic signal-processing methods to test significance. The 'signal' S x (L) is defined as the L-length temporal trend in global average, annual average x, where x is surface air temperature (T) or precipitation (P). The signal alone, reported in standard units (K or mm d −1 per decade), provides no indication of the trend's significance with respect to internal climate variability. We therefore normalize the signal by a measure of 'noise' N x (L), i.e., the internal climate variability in variable x estimated on the same time scale L. Following e.g. [17], we obtain this noise measure by concatenating the first 200 yr of global-average, annual-average T and P for all 34 pre-industrial control simulations in the CMIP5 multi-model archive, yielding a single 6800 yr time series of unforced internal variability in variable x. There are L 6800 ⌊ ⌋ different non-overlapping Llength segments in this time series. For each segment, we calculate the 'trend' as the slope of the best-fit line.
A histogram of the resulting trends reveals that the unforced trends are quasi-normally distributed about the origin. The standard deviation N x (L) of this distribution thus constitutes a measure of internal variability in L-length trends in global-average, annual-mean T or P.
In obtaining the noise measure, we use concatenated output from all CMIP5 models, rather than from GISS-E2 or CCSM4 models alone: using a longer record ensures a larger sample size (i.e., trends in many non-overlapping L-length segments), the trend distribution histogram is smoother, and the standard error of the distribution better approximates its standard deviation. This is particularly important for large L: individual pre-industrial control simulations are not long enough to provide sufficient samples for e.g. distributions of century-scale trends.
The signal-to-noise ratio S L N L SN x x ( ) ( ) º is also quasi-normally distributed and, by construction, has unit standard distribution. Using a conservative 2-tailed standard z-test, a signal is thus considered detectable at the 95% (99%) confidence level if SN exceeds 1.96 (2.56).

Are responses to different forcings additive?
We will also test the hypothesis that the ensemble mean 'historical' trend μ H is equal to the sum of the mean trends μ F of the n F = 5 single-forcing responses (SUM):

Century-scale trends
We begin by calculating century-scale SN using L = 100 yr trends in both T and P. In that case, the unforced trend distribution includes 68 (6800/100) samples, two per control simulation. For each model, we calculate 1900-2000 temperature and precipitation trends for the historical ensemble. Figure 1 shows 100 yr temperature (horizontal axis) and precipitation (vertical) signal-to-noise ratios for each ensemble member in CCSM4, GISS-NINT, and GISS-TCADI single-forcing and historical ensembles. If SN lies above or below the horizontal gray shaded region, the 100 yr precipitation trend differs significantly from climate noise at 99% confidence; SN to the left or right of the vertical gray shaded box indicates a temperature trend significantly different from internal variability. Figure 1 also shows the sum of the mean T and P SN from single-forcing experiments, along with 99% confidence intervals on the sums estimated from the pooled standard distribution. In CCSM4 and GISS-NINT, ozone forcing alone yields positive 100 yr trends in both T and P. However, these trends are small and, on average, undetectable above noise in either model. By contrast, GHG-forced T and P trends are positive and detectable above noise in all three models. Aerosol-forced T and P trends are negative. Ensemble average trends in simulations forced with land use changes and natural forcings are undetectable and compatible with zero in all models. Table 1 lists the estimator , g the standard error, the test statistic, and the p-value for H add in each model. The results indicate that in all three configurations examined, 1900-2000 historical temperature and precipitation trends are additive across multiple forcings; they do not differ significantly (at the 99% confidence level) from the sum of global-mean, annual-mean T or P trends in the single-forcing experiments.

Additivity at multiple time scales
External radiative forcings over the historical period are neither constant nor monotonic, and different forcings may dominate on different timescales and at different times. We therefore test the additivity hypothesis H add for sliding 5, 10, 20, 30, 50, and 100 year trends over time. Figure 2 shows the intervals over which the sum of single-forcing temperature or precipitation trends differs significantly (at the 99% confidence level) from the historical trends.
In CCSM4 and GISS-NINT, H add generally cannot be ruled out for long-term T or P trends of length 50 yr. In GISS-TCADI, however, H add is rejected for every consecutive 50 and 30 yr P trend late in the record. The sum of precipitation trends in the single- significance levels for century-scale trends relative to internal variability are shown as gray boxes. The purple box represents the 99% confidence interval for the sum of single-forcing T and P signal-to-noise ratios. The sum of the mean singleforcing responses is shown as a white circle. forcing runs is systematically larger than the simulated late historical precipitation trends in the GISS-TCADI model, which incorporates an interactive chemistry scheme.

Role of forcings
The strength and relative role of different forcings changes with time, and identifying important forcings at work late in the historical period may help to attribute precipitation nonlinearities in the GISS-TCADI model. To illustrate this, we focus on 30 yr trends: an important timescale for detection and attribution studies, as it is the approximate length of many satellite datasets. For each single-forcing experiment, and for the historical experiments, we calculate overlapping ensemble-mean 30 yr SN beginning in each year from 1900 to 1975. Figure 3 shows these ratios for T and P, as well as the sum of ensembleaverage single-forcing SN for GISS-NINT and GISS-TCADI.
In both GISS-NINT and GISS-TCADI, anthropogenic aerosols and natural forcings result in negative 30 yr T and P trends. In GISS-NINT, greenhouse gases and ozone forcing lead to positive 30 yr trends in both T and p. Thirty year precipitation trends in the GISS-NINT ozone-only simulations are at their largest toward the end of the record. We argue in section 3.4 that this likely reflects the impact of anthropogenic stratospheric ozone depletion, which increases post-1950 and peaks in the 1990s. Because GISS-TCADI is forced by direct emissions and not concentrations, there is no comparable 'ozone-only' simulation as in the NINT models. Instead, ozone changes are calculated as the model runs from emissions of tropospheric reactive gases (included in the ARG simulations) and ozone depleting substances (included in the LLGHG simulations). The resulting ozone concentrations in the historical (all forcings) TCADI experiment were extensively compared to observations in [18]. They do not differ substantially from ozone concentrations used in the GISS NINT historical experiment. The T and P 30 yr trends in the GISS-TCADI LLGHG and ARG experiments are positive later in the historical record, with a marked increase in LLGHG 30 yr precipitation trends later in the record.
In GISS-NINT, the historical signal-to-noise ratios for both T and P track the sum of the single-forcing SN fairly well. This is not the case for the GISS-TCADI model where there are large differences in precipitation trends, particularly toward the end of the record. The GISS-TCADI sum of 30 yr P trends ending in years from 1975-ca. 2000 exceed historical P trends over the same years, and the difference is significant at the 99% level for most of this period ( figure 2(f)). However, there are no significant differences between the historical and SUM T trends in this model. It is notable that these differences in precipitation trends (but not temperature trends) emerge and persist during the time period during which stratospheric ozone depletion is expected to be increasing.

Role of ozone
The precipitation response to external forcing is constrained by the availability of atmospheric water vapor [22], by the ability of the troposphere to radiatively balance the release of latent heat [19][20][21], and by the energy available to drive evaporation at the Earth's surface [23][24][25]. Different forcings change the tropospheric and surface energy balances in different ways. Continuously increasing greenhouse gas emissions, for example, result in smaller precipitation increases than might be expected from the increase in surface temperature alone [26,27]. This is because GHGs heat the atmospheric column, reducing the ability of the atmosphere to balance the latent heat released by precipitation with radiative cooling. Stratospheric ozone depletion, by contrast, both reduces instantaneous longwave forcing and increases the shortwave flux reaching the surface [18]. This increases the energy available to drive evaporation at the surface, which is in turn compensated by increases in precipitation.
In earlier GISS model simulations, low latitude P was found to be sensitive to ozone-induced temperature changes in the vicinity of the tropopause [28]. Reference [18] found that ozone depletion and GHG increase affect P differently, and that P changes do not necessarily follow surface T trends. As noted above, nonlinearities in precipitation trends in the GISS-TCADI model appear during the height of historical stratospheric ozone depletion. In this section, we argue that this is due to differing ozone depletion rates in the GISS-TCADI historical and SUM-differences attributable to interactions between emissions that are not captured in the single-forcing simulations.
If rates of ozone depletion differ in TCADI historical and SUM, statistically significant differences in vertical temperature trends should appear, explaining, at least partially, the differences in global precipitation trends. To test this, we calculate 1970-2000 globalmean, annual mean temperature trends for each single forcing ensemble at every model vertical level in GISS-NINT (figure 4(a)) and GISS-TCADI ( figure 4(b)). Confidence intervals are then calculated for the resulting sums of trends at each vertical level, and compared to the average trends in the historical ensembles. In both GISS-NINT and GISS-TCADI, the 99% confidence intervals overlap for pressure levels above (altitudes below) 200 mb. However, we note a significantly larger cooling trend aloft in the sum of GISS-TCADI single-forcing simulations than in the GISS-TCADI historical ensemble. No such difference is apparent in the non-interactive GISS-NINT simulations.
In GISS-NINT (as well as CCSM4), SUM ozone concentrations are identical to ozone concentrations in the historical ensemble by construction. In GISS-TCADI, however, ozone concentrations are calculated from specified emissions of ozone-depleting substances (such as CFCs) and tropospheric precursor gases (such as CH 4 ). Figure 4(c) shows 1970-2000 trends in the odd oxygen O x mass in GISS-TCADI, which is dominated by ozone changes. While historical and SUM trends are similar below the tropopause, they differ significantly in the stratosphere.
We suggest that these differences may be attributable to nonlinear interactions between chemical compounds related to individual forcing agents. For example, nitrogen oxides (NO x ) play a dominant role in the ozone chemistry of the lowermost stratosphere. These are primarily produced in the stratosphere from nitrous oxide (N 2 O; included in the LLGHG simulation), but NO x and NO y can also enter the stratosphere from the troposphere where their anthropogenic sources are largely related to direct emissions (included in the ARG simulation). While NO x can catalytically destroy ozone in the lower stratosphere, they may also lead to increased ozone due to removal of hydrogen or chlorine oxides via formation of reservoir species. Hence the impact of additional NO x can be highly non-additive depending on how the background abundance compares with the sequestration capacity Figure 2. Time intervals (x-axis) over which H add is rejected at 99% confidence for varying length global and annual average T and P trends. The color indicates the magnitude of the difference between the ensemble mean historical SN and the sum of ensemble mean single-forcing SNs. [29]. Other interactions may also play a role: for example, methane (included in ARG simulations) may react with chlorine released from halocarbons (in the LLGHG simulations), reducing ozone depletion [30], and the NO x removal rate may be affected by nonlinearities in hydrogen oxide abundances related to the nonlinear water response to the tropopause 'cold-trap' temperature. These interactions are possible in the GISS-TCADI historical ensemble, but are not present in the single-forcing simulations. It is also conceivable that other factors contribute to the differences between GISS-TCADI SUM and historical trends, for example, improved meteorological coherence of these tracers in the interactive simulations (relative to the non-interactive NINT simulations) may contribute [31].

Conclusions
Single-forcing climate model simulations help us to perform sensitivity tests in order to attribute phenomena in the real world. If the climate response to multiple forcings is not additive, this may undermine these inferences, complicating attribution studies. Our results indicate that in models without interactive chemistry, global-average, annual-average temperature and precipitation responses at multiple timescales are generally additive in multiple forcings. Additionally, we find no significant nonlinearities in any model when century-scale (1900-2000) T and P trends are considered. However, in the GISS-TCADI model incorporating interactive chemistry, the sums of single-forcing post-1950 precipitation trends of varying lengths are generally larger than precipitation trends in the historical ensemble. These trends are accompanied by differences in ozone concentrations and temperatures aloft, but not necessarily by large differences in surface temperatures. We suggest that the differences between historical and SUM trends arise from nonlinear interactions between chemical species that are captured in the historical experiment, but absent in the single-forcing simulations. Reduced ozone depletion in the historical run (compared to SUM) may lead to changes in instantaneous radiative forcing due to ozone, rendering it less negative. This, in turn, alters the atmospheric energy balance, leading to differences in response of The ensemble mean SN for each single forcing is shown as a colored bar. Single-forcing ensemble mean SNs are stacked to show relative differences between trends. The sum of single-forcing mean SNs is shown in purple; the mean historical SNs are shown in black. Least-squares linear SN trends are calculated over 30 yr and overlap by all but 1 yr; that is, the first trend is over 1900-1929, the second from 1901-1930, and so on. precipitation to temperature. Thus, our results indicate that, at least for the GISS model, interactive chemistry captures interactions between radiative forcing agents -interactions that have consequences for important variables even at the largest scales.
Our conclusions are necessarily limited by reliance on existing CMIP5 data. Single-forcing model experiments were not a high priority for many modeling groups, and the sparse nature of the CMIP5 'histor-icalMisc' archive limits our ability to examine the additivity of climate responses in a multi-model framework. These results will, we hope, encourage more modeling groups to perform complete suites of singleforcing experiments.