Global mean temperature indicators linked to warming levels avoiding climate risks

International climate policy uses global mean temperature rise limits as proxies for societally acceptable levels of climate change. These limits are informed by risk assessments which draw upon projections of climate impacts under various levels of warming. Here we illustrate that indicators used to define limits of warming and those used to track the evolution of the Earth System under climate change are not directly comparable. Depending on the methodological approach, differences can be time-variant and up to 0.2 °C for a warming of 1.5 °C above pre-industrial levels. This might lead to carbon budget overestimates of about 10 years of continued year-2015 emissions, and about a 10% increase in estimated 2100 sea-level rise. Awareness of this definitional mismatch is needed for a more effective communication between scientists and decision makers, as well as between the impact and physical climate science communities.


Introduction
Many climate change impacts relevant for societies scale with global mean surface air temperature (GMT) rise (Seneviratne et al 2016, UNFCCC 2015b), making it an adequate proxy for the assessment of global climate change risks (Knutti et al 2015).International climate policy has adopted levels of global mean temperature increase to guide global climate action.The most prominent example of such temperature rise levels is the long-term temperature goal of the UN Paris Agreement of 'holding the increase in the global average temperature to well below 2 • C above pre-industrial levels and pursuing efforts to limit the temperature increase to 1.5 • C above pre-industrial levels, recognizing that this would significantly reduce the risks and impacts of climate change' (UNFCCC 2015a, Schleussner et al 2016b).The second part of the goal provides highly relevant context as it explicitly links the temperature levels referenced in the Paris Agreement to the assessment of climate risks and impacts.
The adoption of global average temperature levels to avoid climate risks have been informed by a multi-year science-policy process (UNFCCC 2015b), which was predominantly based on the findings of the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC 2014).Its products, such as the 'reasons of concern' (O'Neill et al 2017) link various climate risks to levels of GMT increase.The warming levels at which these risks emerge depend on the method that underlies the global average temperature estimation, which ties them to the methods used in the scientific basis of the underlying risk assessment, the AR5.This context is key for scientists to understand how to interpret the Paris long-term temperature goals (Rogelj et al 2017).
With the Paris Agreement in place, international policy has shifted focus from defining its goals to implementing and tracking progress towards their achievement.Monitoring GMT rise has thus become a key component of assessments of whether climate mitigation actions are on track to achieve the Paris temperature goal, for example in terms of carbon budgets (IPCC 2014, Rogelj et al 2016).Clarity on how global mean temperature is assessed is essential for this process.However, there is no single established and agreed-upon method to assess GMT change.In particular, substantial differences emerge between observational-based and model-based GMT products (Richardson et al 2016, Cowtan et al 2015).Here we will assess the implications of discrepancies between different GMT products for our ability to track progress towards the Paris Agreement temperature goal.To this end, we evaluate different methodological approaches to GMT that have been used for policyrelevant statements, carbon budget estimates or for resulting climate impacts.
The IPCC AR5 determines global mean temperature (hereinafter referred to as GMT AR5 ) relative to the 1986-2005 period.Past warming since the 1850-1900 preindustrial reference period is 0.61 • C based on the HadCRUT4 observational dataset (Morice et al 2012).Future warming relative to preindustrial is defined as the sum of past warming and the CMIP5 climate model ensemble mean relative to the 1986-2005 baseline (IPCC 2014).For carbon budget estimates the IPCC AR5 Working Group 1 uses the model-based global mean surface air temperature increase (hereafter GMT SAT ) since the 1861-1880 period from the Coupled Model Intercomparison Project (CMIP5) (IPCC 2013).Note that carbon budgets have been assessed slightly differently in different working groups and subsequent publications (see Rogelj et al (2016) for an overview).
The HadCRUT4 observational GMT product is only based on regions for which observational data exists.Parts of the rapidly warming Arctic, for example, are undersampled (Cowtan et al 2015).Furthermore, surface air temperatures over land and sea ice are blended with sea surface temperatures over the open ocean.In contrast, CMIP5-model-based global mean temperature is derived with global coverage and based on surface air temperatures (SAT) alone.The differences between observational-based and model-based GMT have been shown to introduce considerable differences (Richardson et al 2016, Cowtan et al 2015) and to be partly responsible for discrepancies of the observational record and model projections over the recent decade (Medhaug et al 2017).Correcting for discrepancies between the HadCRUT4 and infilled datasets also affects the warming level of the 1850-1900 period (Richardson et al 2016, Cowtan et al 2015).In the following we will investigate the implications of using non-AR5 GMT products for tracking progress against Paris Agreement warming levels for carbon budgets as well as climate impact indicators.

Methods
Based on the method by (Richardson et al 2016, Cowtan et al 2015) we have derived a model-based GMT estimate that has been corrected for masking and blending as in the HadCRUT4 observational record (GMT blend−mask ).We use an ensemble of 32 CMIP5 models forced with the RCP8.5 scenario (see table S1 available at stacks.iop.org/ERL/13/064015/mmedia).For each GCM all runs are averaged to one global mean temperature time series.
This GMT product can be considered a proxy for future observations if the HadCRUT4 approach to derive GMT is continued.Assessments of future GMT could also be rebased to the observational warming record since 1986−2005.This has e.g.been done recently by Millar et al (2017), using human-induced warming until 2015 determined as 0.93 • C based on HadCRUT4 (GMT M17, Millar et al 2017).The future warming difference for rebased products like GMT M17 solely depends on the offset to GMT SAT over the rebase period.An overview of the different GMT products is given in table 1.As other observational datasets project higher warming than HadCRUT4 over this period (Rohde et al 2013, Cowtan and Way 2014, Hansen et al 2010) we also included two sensitivity cases assuming an attributable warming of 1 • C and 1.1 • C until 2015 (Haustein et al 2017) (table S4).Conversions between different GMT products are based on the 20 year running mean values from the CMIP5 models which are closest to 1.5 • C in the source GMT product.
The intensity of hot extremes is measured as the annual maximum value of daily maximum temperature (TXx).Following Fischer and Knutti (2014), we derive grid-cell based time averaged differences between the 1986−2005 reference period and model specific 21 year periods with a mean warming above 1986−2005 of 0.89 • C for GMT AR5 and 1.07 • C for GMT M17 .The mid-years of the 21 year periods are listed in table S5.These differences are aggregated in a spatial probability density function (PDF) over the global land mass and all models area-weighting each grid-cell.The smoothed PDFs are estimated using a weighted Gaussian kernel density estimation method with a bandwidth estimated following 'Silverman's rule'.Sea level rise projections for different warming levels are derived using a component-based approach (Mengel et al 2018) with an updated Antarctic ice sheet contribution (Nauels et al 2017).The updated method emulates a recently proposed and more sensitive Antarctic response to future warming (Deconto and Pollard 2016).

Results
The discrepancies between the different GMT products and the GMT AR5 are displayed in figure 1. Deviations

GMT M17
Analogue methodological approach to GMT AR5 , but updates the observed warming to the 2010-2019 period as in Millar et al (2017).Past warming until that period is set to the observed attributable warming until 2015 based on HadCRUT4 (0.93 • C).
GMT Obs 1 • C Analogue methodological approach to GMT M17 , but assumes a past attributable warming until 2015 of 1 • C.
Analogue methodological approach to GMT M17 , but assumes a past attributable warming until 2015 of 1.1 • C.

GMT blend−mask
Following methodological approach of Cowtan et al (2015): This method includes regridding the CMIP5 model output to a 5 • × 5 • grid, blending surface air temperatures (tas) and surface ocean temperatures (tos) in grid points partly covered by sea ice and masking the model output to the observational coverage of HadCRUT4.For this method, observational coverage is required.We assume that future observation coverage stays similar to the coverage of the years 1986-2016 For each month, we treat a grid cell as covered by observations if in 20 out of the 30 years of 1986-2015 observations were available for the given month.Observational coverage was taken from CRUTEM4 and HadSST3 datasets.

GMT blend
Analogue methodological approach to GMT blend−mask , but using global coverage and regridding model output to a are smallest for GMT SAT , as modelled CMIP5 1986-2005 warming since 1861−1880 matches well with the HadCRUT4 reconstruction.This is remarkable, as considerable differences exceeding 0.1 • C between blended-masked GMT and surface air temperatureonly products are already apparent for this period (Richardson et al 2016).This-coincidental-close match might be one of the reasons why the methodological difference between observed and modelled GMT products has not risen to larger prominence before.As a result, deviations that result from the mix of products for the AR5 impact appraisals and the AR5 carbon budget estimates are small.Deviations for the GMT blend−mask and the GMT M17 products are more pronounced.A 1.5 • C global mean temperature rise in GMT AR5 corresponds to a warming of just 1.31 • C in the GMT blend−mask product (full ensemble range: 0.85 • C−1.77 • C, compare figure 1(a)) and table 2).The difference between GMT blend−mask and GMT AR5 is not constant in time (Richardson et al 2016) and increases with increasing warming (figure 1(b)).It is largely introduced by undersampling of fast warming Arctic regions and sea-ice loss.As a result, the future strength of this effect will depend on the emission scenario and will be less pronounced under stringent mitigation scenarios (Richardson et al 2018).A substantial discrepancy between model-based GMT SAT and GMT blend−mask is already apparent over the observational record and particularly pronounced in recent decades.Rebasing the reference period as in GMT M17 introduces a time-invariant offset.In this case, a GMT AR5 1.5 • C warming corresponds to 1.32 Differences between GMT products are sensitive to the choice of method.The difference between GMT products with recent reference periods (GMT AR5 and GMT M17 ) and GMT products referenced against a preindustrial period (GMT blend−mask and GMT SAT ) depends on the choice of the preindustrial period (Hawkins et al 2017).Setting the preindustrial period to 1850−1900, for example, slightly reduces the difference between GMT blend−mask and GMT AR5 (see figure S3).Furthermore, differences depend on the method used to convert between GMT products.For example, basing the conversion into GMT AR5 on annual mean temperatures within a range of 1.5 • C ± 0.05 • C in the source GMT product (instead of analysing 20 year running mean values close to 1.5 • C) yields slightly lower differences between GMT AR5 and GMT M17 and GMT blend−mask , (see figure S5).Finally, these results depend on the understanding of the 'multi-model mean' (see figure S2).If all available model runs are weighted equally instead of weighting contributions per model, GMT M17 and GMT blend−mask would both reach +1.7 • C at the time when GMT AR5 reaches +1.5 • C (see figure S4).All conversions between GMT products and choices in the method are listed in tables S2 and S3.
The choice of GMT indicators for expressing current and future warming can influence how much carbon emissions are perceived to remain for limiting warming to internationally agreed levels such as 1.5 • C (see table 2).As indicated earlier, international temperature goals have been underpinned by climate risk assessments pegged to GMT AR5 levels of global mean warming.Warming in the real world, however, is expressed in observation-based indicators.Our GMT blend−mask time series aims to mimic the limitations of one commonly used indicator (Had-CRUT4 (Morice et al 2012)).Consistent with (Schurer et al 2018, Richardson et al 2018) we estimate a mismatch between GMT AR5 and GMT blend−mask at the time GMT AR5 reaches 1.5 • C of about 0.2 • C (GMT blend−mask is cooler, see table 2).At the moment GMT AR5 reaches 1.5 • C, the remaining carbon budget for avoiding the assessed impacts of 1.5 • C warming should be effectively zero.However, because of the mismatch between GMT blend−mask and GMT AR5 , a  GMT blend−mask indicator would continue to suggest a remaining available budget of about 422 Gt CO 2 at that point in time (using an average transient climate response to cumulative emissions of carbon of 1.65 × 10 −3 • C/Gt C).This amounts to a carbon budget overestimate the size of about 10 years of continued year-2015 emissions.An adjustment of similar size would be required to make recently published carbon budget estimates (GMT M17 , calculated as in Millar et al (2017)) consistent with the assessed warming levels for avoiding global warming risks (table 2).Reaching 1.5 • C in GMT M17 , or GMT blend−mask (here considered a proxy for expected observational warming) would correspond to climate risks at higher temperature levels when following the AR5 method.These levels are 1.68 • C for GMT M17 and 1.71 • C for GMT blend−mask (see table 2).Several highly vulnerable systems such as tropical coral reefs (Schleussner et al 2016a) or Arctic sea-ice (Screen and Williamson 2017) are very sensitive to small warming increments.Also extreme weather indicators have been found to robustly increase with increasing GMT SAT (Seneviratne et al 2016) and threshold based indices even in a non-linear fashion (Fischer and Knutti 2015).Figure 2 illustrates how the different GMT products (GMT AR5 and GMT M17 ) lead to different projected changes in global extreme hot day temperatures (TXx, figure 2(a)) and 2100 sea-level rise (figure 2(b)).
The intensification of extreme hot days is stronger for 1.68 • C GMT AR5 warming when 1.5 • C is reached in GMT M17 than for 1.5 • C GMT AR5 warming.At 1.68 • C GMT AR5 warming, 40% of the land area experiences an increase in the annual maximum daily temperature of 1 • C relative to 1986−2005, while at 1.5 • C GMT AR5 warming only 30% of the land area would experience this increase.Similarly, the difference between 1.68 • C and 1.5 • C GMT AR5 warming could lead to an additional sea level rise of 5 cm in the median for the end of the century, about 10% of the projected median rise for 1.5 • C relative to the 1986−2005 period.Note that future sea level rise exhibits a considerable dependency on the temperature trajectory and projections for Paris Agreement compatible pathways would therefore divert slightly from the stylised estimates presented here (Mengel et al 2018).

Discussion
Our analysis outlines important differences between different GMT products and illustrates their implications for climate risks assessment.We have shown that by using GMT products other than those used in the IPCC AR5, risks identified for a certain level of global warming in this report would occur at other levels.The quantified discrepancies between observationally derived GMT products and climate change risk levels as expressed in international agreements have important consequences for on-going discussions in the climate policy arena.Climate action is guided by the desire to avoid impacts, not by reaching an imaginary GMT number.If the impacts policy makers aim to avoid (as indicated in the Paris Agreement) will occur at a lower levels in other GMT products, then science needs to communicate this clearly and ideally provide adequate adjustments.In order to limit potential confusion this requires understanding of both, the identified discrepancies between GMT products and the nature of the Paris Agreement temperature goal (Rogelj et al 2017).Indeed, the discrepancy between observed GMT products and the GMT AR5 will not be easy to reconcile and communicate.
It is important to clarify that our argumentation is not rooted in a scientific reasoning in favour of the IPCC AR5 method that is not without shortcomings and ambiguities.The 1986−2005 reference period, for example, is not free of influences of natural variability (like volcanic eruptions).Climate models used for projecting future warming are not systematically evaluated and may already exhibit substantial deviations compared to observed present-day warming (see figure 1).Furthermore, the effect of different definitions of the 'pre-industrial level' needs to be considered (Hawkins et al 2017).At the same time, scientists will continue to use observed GMT products to assess the state of the climate system.Approaches to assess GMT will be, and should be, updated as our scientific understanding progresses.To ensure the policy relevance of future products in relation to the Paris Agreement and to maintain the agreement's integrity, it is therefore of key importance that different (updated) GMT metrics can be converted into the GMT AR5 values used at the time, and that full transparency is provided about methods, as we have attempted here.This also relates to other methodological choices in the IPCC AR5 such as the use of multi-model means instead of medians or the 'one-model-one-vote' principle (Flato et al 2013).Diverting from this approach by averaging over all available model runs yields slightly different estimates for the biases between the GMT products (compare figure S4).Under the UNFCCC, climate policy now progresses in quinquennial cycles which include a stocktaking phase and a phase in which governments put forward new proposed actions to limit climate change.If, during the stocktaking phase, current progress and the current state of the Earth system is not assessed with metrics comparable and consistent with the metrics used to define the Paris Agreement long-term temperature goal, the assessment of progress will be imprecise, and, as we have shown, the risk of hitting instead of avoiding some particularly sensitive climate impacts would be increased.In the context of the 1.5 • C and 2 • C global average temperature limits, our results show that following practices based on observational products (Millar et al 2018) would consistently lead to an underestimation of the urgency of emissions reductions (Schurer et al 2018, Richardson et al 2018).

Figure 1 .
Figure 1.Mismatch between global mean temperature products.(a) Warming in alternative products (GMT alt , y-axis) as compared to GMT AR5 (x-axis) for 32 CMIP5 models.Triangles show 20 year running mean warming from individual CMIP5 models in the respective products.The warming ranges in GMT AR5 are derived from a distribution of GMT AR5 values of the years in which each model's GMT alt is closest to 1.5 • C (highlighted by stars).Boxplots show the full model spread (whiskers), the 66% range (boxes) and the multi-model mean (white bar).In the legend the multi-model mean and the full ensemble range are indicated.(b) Warmingdependent differences between GMT blend−mask and GMT AR5 .As in panel (a), boxplots represent the model spread at selected warming levels in GMT AR5 (+1 • C, +1.5 • C, +2 • C and +2.5 • C).

Table 2 .
Conversions between 1.5 • C warming levels in different GMT products and carbon budget implications.Temperatures are given in • C since preindustrial (see table1).Multi-model mean and the full ensemble ranges in brackets are derived as in figure1for the 20 year running mean values closest to 1.5 • C in the respective GMT product.Carbon budget estimates are based on a TCRE of 1.65 • C / 1000 PgC, the arithmetic mean of the IPCC AR5's likely 0.8 to 2.5 • C / 1000 PgC range(Collins et al 2013) and assume invariable non-CO 2 contributions.Positive values indicate an increasing budget in the alternative GMT product compared to GMT AR5 .

Figure 2 .
Figure 2. Differences in climate hazards at 1.5 • C in different GMT products.(a) Changes in hot extremes (TXx) on global land area at 1.5 • C relative to the 1986−2005 reference period based on Fischer and Knutti (2014).Probability density functions show the globally aggregated land fraction that experienced a certain change in TXx for GMT AR5 (green) and GMT M17 corresponding to 1.68 • C warming in GMT AR5 (blue).The shaded areas show the range of land fraction PDF's of the individual models.(b) Sea level rise in 2100 relative to 1986−2005 with uncertainty, based on Mengel et al (2018).Boxes indicate the 66% range, the white bar the median.Projections are given for 1.5 • C warming in the GMT AR5 (green) and GMT M17 product (blue).

Table 1 .
Overview of the different GMT computation methods.GMT SATGlobal mean surface air temperature icrease in CMIP5 models relative to 1861−1880.GMT AR5IPCC AR5 method: global mean temperatures relative to preindustrial levels are obtained by adding model-based GMT SAT anomalies to the 1986-2005 reference period and observed warming up to this period from the HadCRUT4 dataset relative to 1850-1900 (0.61 • C).