Changes in a suite of indicators of extreme temperature and precipitation under 1.5 and 2 degrees warming

Following the 2015 Paris agreement, the Intergovernmental Panel on Climate Change was tasked with assessing climate change impacts and mitigation options for a world that limits warming to 1.5 °C in a special report. To aid the scientific assessment process three low-warming ensembles were generated over the 21st century based on the Paris targets using NCAR-DOE community model, CESM1-CAM5. This study used those simulation results and computed ten extreme climate indices, from definitions created by the Expert Team on Climate Change Detection and Indices, to determine if the three different scenarios cause different intensity and frequency of extreme precipitation or temperature over the 21st century. After computing the indices, statistical tests were used to determine if significant changes affect their characteristics. It was found that at the grid point level significant changes emerge in all scenarios, for nearly all indices. The temperature indices show widespread significant change, while the behavior of precipitation indices reflects the larger role that internal variability plays, even by the end of the century. Nonetheless differences can be assessed, in substantial measure for many of these indices: changes in nearly all indices have a strong correlation to global mean temperature, so that scenarios and times with greater temperature change experience greater index changes for many regions. This is particularly true of the temperature-related indices, but can be assessed for some regions also for the indices related to precipitation intensity. These results thus show that even for scenarios that are separated by only half of a degree in global average temperature, the statistics of extremes are significantly different.


Introduction
In December of 2015, the Paris agreement committed to pursue the goal of staying below 2 • C above pre-industrial levels, and targeting 1.5 • C of warming compared to pre-industrial. The Intergovernmental Panel on Climate Change is currently preparing a special report to assess climate change impacts and mitigation options for a 1.5 • C world. Considering this, simulations under three low-warming scenarios over the 21st century based on the Paris targets using the CESM1-CAM5 model were made available to the community (Sanderson et al 2017 referenced hereafter as S17): one never exceeding 1.5 • C (1.5 C), another briefly exceeding 1.5 • C then declining to stabilize at 1.5 • C (overshoot) and one stabilizing at 2.0 • C by the end of the century (2.0 C). The analysis in S17 focused on changes in mean temperature and precipitation.
The present study investigates some of the potential benefits of mitigating climate change down from 2.0 C to the lower scenarios by computing ten extreme climate indices and investigating how their characteristics change across the three low-warming scenarios, mindful of the well-established fact that weather extremes have a more direct impact on human, and natural systems than average changes (IPCC 2012). This perspective is aimed at informing impact research, and therefore do not focus on physical explanations of the changes from a process based perspective.
The goal is to determine if the outcomes from the three scenarios exhibit statistically significant, and meaningful differences in temperature and precipitation extremes by the end of the century, which could inform mitigation choices.
Ten extreme climate indices from the Expert Team on Climate Change Detection and Indices (ETCCDI) definitions (Alexander et al 2005) were chosen to represent a wide variety of temperature and precipitation extremes that occur in different climates. There are five precipitation based indices, and five temperature based indices addressing both minimum and maximum temperature behavior. This suite of indices gives a rich picture of climate tail behavior relevant to different regions and climates. There are limitations to the indices. For example, some of them rely on fixed thresholds (like 10 mm day −1 of precipitation, or 0 • C), which may not qualify as extremes in certain regions (Zheng et al 2005). Their usefulness, however, has long been established among modeling and observational communities, and we refer to the large body of literature that has described them, analyzed them and validated model output with them (Sillman et al 2013a, 2013b, and references therein).
Several papers have addressed aspects of changes in extremes under 1.5 and 2 degrees warming levels. For example, Schleussner et al (2016) examined changes in some of the same ETCCDI indices based on Coupled Model Intercomparison Project (CMIP5) simulations, considering the times in the simulations when global average temperature hits 1.5 and 2 C; Fischer and Knutti (2015) used a method akin to pattern scaling to examine changes in extremes; Seneviratne et al (2018) used simulations under the Half a degree Additional warming, Prognosis and Projected Impacts (HAPPI) protocol. Other work has pointed out the need to examine the assumptions of approximating 1.5 and 2 C warming worlds by experiments designed to stabilize at those levels (Mitchell et al 2016) and in general the validity of the relation between changes especially in precipitation and global average temperature across scenarios (Chadwick andGood 2013, Pendergrass et al 2015).
Existing literature does not use simulations dedicated to the Paris targets, and therefore we propose to use them to explore how extreme indices change in a world that stabilizes at 1.5 • C and 2 • C.

Methods
We compute annual values of each index at each gridpoint for the years in the simulations spanning the period 1995-2100, for each of the available CESM ensembles of initial conditions, using minimum and maximum daily temperature at reference height and total daily precipitation output. We use the ten available ensemble members for 1.5 C and 2.0 C scenarios, and the five available for the 1.5 C overshoot (see S17 for details on the model and scenario configurations).
The five ETCCDI precipitation extreme indices chosen are: 1. Number of days per year with over 10 mm of precipitation (R10 mm); 2. Precipitation intensity, defined as the total annual precipitation divided by the number of wet days in the year (SDII); 3. Annual maximum precipitation occurring over five consecutive days (Rx5day); 4. Annual sum of precipitation occurring on days exceeding the 95th percentile of daily precipitation during the baseline period (R95pTOT); 5. Dry spell duration, defined as number of consecutive days with less than 1 mm of precipitation (CDD); All the precipitation indices except for dry spell duration depict how precipitation intensity at different time durations will change within scenarios. Dry spell duration is indicative of potential water stress.
The five temperature indices chosen are: 1. annual highest maximum temperature (TXx); 2. annual lowest minimum temperature (TNn); 3. number of frost days, defined as the number of days where the minimum temperature is below 0 • C (FD); 4. warm spell duration, defined as the length of the longest streak of six or more days with the maximum temperature exceeding the 90th percentile of the baseline period (WSDI); 5. growing season length, defined as the maximum number of days between frost days (GSL).
Note that in our analysis all these indices are computed as annual summary values, rather than distinguishing seasonal behavior. For some of the ETCCDI indices seasonal analysis may add nuance to the results, but we chose explicitly indices that have meaning when calculated over the year, to make this overview as general and as concise as possible. By computing the indices over the entire year, the highest maximum temperature and the warm spell duration indices describe hot summer extremes at every location; lowest minimum temperature, conversely, describes winter conditions. The longest dry spell is picked up during the locally-varying dry season, whenever that takes place for specific regions, and the growing season length and frost days have meaning when computed over the whole year only. While the characteristics of intense precipitation events differentiated by season is certainly relevant for process oriented analysis of changes, our focus here is on impact relevance, and therefore we think that the annual statistics of such metrics provide important information nonetheless.
All the indices except for frost days and growing season length provide meaningful results globally, whereas frost days and growing season length are only relevant at mid to high latitudes, because tropical regions have zero frost days across all scenarios. Therefore, the absence of change between scenarios for these two indices in these regions only indicates that those areas are not cold enough for frost days to occur.
One could argue that growing season length has no meaning for regions in the high latitudes, but the metric might be relevant also for the life and evolution of natural ecosystems, so we chose to present results for all latitudes outside the tropic, even if in some areas, the growing season is too short to grow actual crops.
All indices are computed only over land (i.e. grid cell of the model where more than half of the cell is occupied by land).
To organize our discussion, we break the index suite into three groups based on the impact-relevance of the various indices. Annual number of days with over 10 mm of precipitation, precipitation intensity, annual five day maximum precipitation, and annual sum of precipitation on days in the 95th percentile will be grouped together as the indices representing precipitation intensity. Maximum annual temperatures, minimum annual temperatures, and warm spell duration are examined together as the indices describing heat/cold extremes, which have particular relevance to impacts on human health (Patz et al 2005). Dry spell duration, annual count of frost days, and growing season length are grouped together as the indices with the most direct relevance for agricultural impacts.
We consider here how the indices change over time in the three scenarios, and the geographic patterns of that change. The data is broken into three time periods for comparison: a baseline data set for the indices from 1996-2015, mid-century data from 2046-2065, and long term data from 2081-2100.
Once all ten indices were computed at each grid point over land for all ensemble members of all scenarios, mean and standard deviation across the ensemble members were considered to compute changes and their significance. We computed differences and their significance for each scenario between time periods, and across the three scenarios for a given time. We focus on the changes rather than the absolute values of indices at the different time periods since previous work has shown how large biases affect climate model simulations of the climatologies of these indices, while the trends that have been validated within the historical period have been shown to better compare to observations (Tebaldi et al 2006, Sillman et al 2013a. For the two indices describing annual maximum and minimum temperature this has been shown specifically for the model used to produce these low-warming simulations, CESM1-CAM5 (Tebaldi and Wehner 2016) Therefore, we mitigated potential model biases by focusing on changes between time periods and between scenarios.
To evaluate the statistical significance of the projected changes we exploited the availability of a number of initial condition ensemble members for each of the scenarios. Determining the statistical significance of the changes only indicates the rejection of the 'no change' hypothesis without necessarily implying that the change will produce consequential impacts. Nonetheless, we still consider significance from statistical testing as a necessary condition. A two-sided t-test is used at each grid point comparing the distribution of the changes from the ensemble of simulations, either across periods when testing the significance of change within scenarios, or between ensembles when testing the significance of the differences between scenarios similarly performed in Paeth et al (2017). The threshold for p-values deemed significant is determined by controlling for the false discovery rate (Wilks 2016). An alpha value of 0.1 is used, equivalent to limiting false negatives to 10%. This technique has been shown to mitigate the problem of multiple testing when dealing with many locations of data fields, while not being too sensitive to high degrees of spatial correlation in the data (Ventura et al 2004). Additional details concerning the false discovery control is available in the supplementary materials available at stacks.iop.org/ERL/13/035009/mmedia.

Statistical significance
Maps of the significance of the different groups of indices are useful for determining which indices show results worth investigating further, and in which locations. Figure 1 shows the number of indices with significant changes by the end of the century at each grid point in the 1.5 C scenario. The significance plots of the overshoot and 2.0 C scenario are available in the supplementary materials and generally show similar results, with an enhancement of the extension of the significance over the regions under the 2.0 C, and a decrease in significance under the overshoot scenario compared to 1.5 C due to the smaller ensemble suite of the overshoot (however, when results from the five overshoot members are compared to an ensemble of five randomly sampled 1.5 C members the significance after stabilization is nearly the same). The precipitation indices have spatially variable significance but some regions do show consistent significant changes across all precipitation indices. High latitudes tend to have greater levels of significance, with most of northern Canada and Alaska experiencing significant change for all precipitation indices. This is consistent with wellestablished results of warming scenarios, both for mean and extreme precipitation changes, showing that, the forced signal towards wetter conditions emerges faster in the high latitude regions (Collins et al 2013). The agriculture indices show significant changes in at least two indices over most of the northern hemisphere. Note that the two temperature-based agriculture indices, frost days and growing season length, are not applicable in the tropics because they require minimum temperature falling below 0 • C during the year, therefore zero change in those areas should not be taken literally. The dry spell duration index changes with different geographic pattern than any other precipitation-based index, as it is the only index measuring lack of precipitation. We find very small areas of significant change in dry spell duration. Within the areas showing significant change, there is no dominant direction for that change resulting in no global trend when those areas are averaged. Global mean aggregation plots of all three scenarios are provided in the supplementary materials. In each scenario, there is no point where trend change surpasses this index variability. There is also no observable difference between the global average trajectories of the index across scenarios. This index will therefore be excluded from most of the analysis presented in the remainder of this paper.

Magnitude analysis
For the significant changes, we focus on the magnitude of the change to assess its importance. Few coherent regions show significant change in the precipitation indices, so in this main section we present results for the temperature indices, but corresponding figures for the precipitation indices are shown in the supplementary materials, where it can be noted how areas that do show significant changes trend towards an increase in precipitation intensity. The magnitude of internal variability (i.e. of the differences in the spatial pattern of change across members for a given scenario and time) is smaller for temperature indices than it is for precipitation indices. It is therefore easier to extract a consistent, large scale picture of their changes by considering mean ensemble changes. Here we choose the warm spell duration index to represent the temperature indices. The index can be used to provide a depiction of changes in heat-wave duration, whose relevance for impacts on human health is easily argued (Tan et al 2009). Changes in warm spell duration are shown in figure 2 for all three scenarios (here as well we follow the convention of coloring only regions with statistically significant changes, but for this index every grid point shows significant change in both time periods and for all three scenarios). The mid-century time window (2046:2065) is shown side by side to the end of century (2081:2100), which allows us to assess if and how the indices behavior changes after the model climate stabilizes at its temperature target, following the S17 scenario design. We find large geographic variability in the magnitude of the changes, yet there is a consistent trend in all regions toward longer warm spells across all scenarios as global mean temperature increases. This index definition involves exceedances of a percentile that is computed from the baseline historical period, 1996-2015, so a day that can contribute to a warm spell is considered 'hot' relative to the local climatology, not relative to an absolute threshold. As can be assessed, the distribution of maximum temperature shifts dramatically and large changes occur in the index across all scenarios with some regions seeing up to 80 day increases in the 2.0 C scenario. These changes imply that the frequency of today's extreme high temperatures will increase to a point that the same temperatures would not be considered rare by the end of the century in the 2.0 C scenario, but significantly less so in lower scenarios as shown in figure 4 below. The growing season length index is shown in figure 3 in the same format as figure 2. In almost all high latitude grid points significant change is seen. There are large changes taking place in the highest latitudes, a manifestation of arctic amplification (Screen and Simmonds 2010) that studies have linked to changes in Arctic ocean temperatures (Alexeev et al 2013) or dynamical changes in wave patterns in high latitudes (Luo et al 2016). Similar patterns are seen in other temperature indices change (see supplementary materials), and have been seen before in other studies of change in temperature extremes, where the straightforward thermodynamic explanation has been coupled with investigation of changes in circulation, particularly blocking mechanisms (Whan et al 2016). Also for this index the relative magnitude of the overall change across time and scenarios follows what would be expected as a function of global mean temperature change, with larger change at the end of the century in the 2.0 C scenario, than the 1.5 C or the overshoot, and with more change in the overshoot than the 1.5 C scenario at the mid-century mark, while by the end of the century the difference between the two lower scenarios has diminished.

C Changes in 2081-2100 (#Days) C Changes in
To summarize, the temporal pattern of most indices global average time series follows the pattern of the global average temperature time series for each scenario. In the 1.5 C scenario it is observed that nearly all changes seen in temperature indices occur before the mid-century time, with very few additional changes occurring during the second half of the century. In the overshoot scenario temperature indices are farther from the baseline at the mid-century mark, then by the end of the century they stabilize nearly identically to where the 1.5 C scenario stabilizes. In the overshoot scenario, the end of century data is closer to the baseline than the mid-century data. The 2.0 C scenario has a significant change from the mid-century time to the end of the century. Representative patterns of change can be seen in figures 2 and 3.
To show how extensive changes of various magnitudes are over land, for all temperature indices the percent of land that experiences an ensemble mean change beyond a certain threshold is calculated (as mentioned before, the precipitation indices show irregular areas of significant changes and are therefore excluded from this calculation, since the ensemble mean percentages would be smaller than the individual members' due to different portions of land showing significant change). The bar plots below show for each scenario, and at both time points what percent of land will experience a change that goes beyond a threshold that is set for each temperature index. For each scenario, the leftmost bar is the mid-century time point (2046)(2047)(2048)(2049)(2050)(2051)(2052)(2053)(2054)(2055)(2056)(2057)(2058)(2059)(2060)(2061)(2062)(2063)(2064)(2065), and the rightmost is the end of century time point (2081-2100).
To exemplify a quantification of differences between the scenarios we chose two thresholds for each of the indices at each time point to show how the land distribution of the indices differ at the low and high end of the distribution, as well as showing the fraction of land that falls somewhere in between the two thresholds. Obviously, the magnitude of the thresholds chosen are not immediately translatable into a measure of the importance of the change: determining what magnitude of change constitutes a change of importance for adaptation consideration, for example, is beyond the scope of this paper, and likely varies in its determination according to the type of impact considered, the level of vulnerability of the system exposed, and the capacity for adaptation. Studies of the historical levels observed for these indices are available, and could be used for comparison (Donat et al 2013). Figure 4 depicts the difference between scenarios and between the two different time points. For all indices, the same trend is seen, showing that the changes in indices' values are strongly correlated to changes in mean global temperature. When mean temperature is highest, all index changes are largest. For all indices, the 2.0 C scenario has greater change at both the mid-century and end of century times than the other two scenarios. The overshoot always has a greater change at mid-century than the 1.5 C, then at the end of the century the 1.5 C and the overshoot have comparable values, however the overshoot is still greater than the 1.5 C. Since largest discrepancies occur in the arctic it is possible that the changes in snow/ice albedo feedback triggered by the higher temperatures under the overshoot scenario in the middle of the century may have ripple effects later in the century, preventing the temperatures to revert to the same levels that the noexceed 1.5 C scenario experiences. In the 1.5 C scenario, the mid-century point has greater magnitudes than the end of the century, hinting at possible path dependence of the changes in these quantities beyond a correlation with global average temperature.
The growing season length and frost days do not change significantly in the tropics, so their percentages

Conclusions and discussion
This evaluation of ten indices for the three S17 lowwarming scenarios (1.5 C, 1.5 Overshoot, and 2.0 C) relevant to the Paris temperature targets, provides the first overview of the behavior of a diverse suite of extreme climate indicators based on fully coupled model simulations tailored specifically to the Paris targets. Performing analysis on ten different extremes helps produce robust results while comparing indices representing the effects of similar processes related to temperature and precipitation change. The use of ensembles of initial conditions addresses the need of evaluating the significance of changes within and between scenarios against the background of internal variability. Projected changes over the 21st century of precipitation and temperature extremes were computed from the ETCCDI definitions for the three low-warming scenarios. The patterns of change support those of previous studies, that extreme temperature events increase at a faster rate than global mean temperature increases, and that minimum temperatures increase faster than maximum temperatures (Donat et al 2013). It is consistent with previous literature that precipitation indices do not show extensive significant changes (at these low levels of warming) due to internal variability within the model, however the regions with is significant change do show trends that are compatible with the expected intensification of precipitation, and a tendency to wetter average conditions in the high latitudes, a well-established result under higher scenarios (Sillmann et al 2013b). Our results also appear to agree, at least for large scale patterns of changes, with recent work that used approximations to these low-warming targets based on a selection of time windows from transient scenarios of CMIP5 simulations (Schleussner et al 2016) and in a less literal way (i.e. not for the same indices but for similar metrics of extremes) with the findings of Fischer and Knutti (2015). If not in a formal manner, this agreement suggests a certain amount of linearity in the behavior of these indices, which was confirmed, for a set of extreme quantities, in the work of Wartenburger et al (2017).
The results presented show that all S17 scenarios have changes in extreme events by the end of the century, but precipitation and temperature extremes change very differently. Precipitation extremes behavior is confounded by large noise from internal variability, and changes can thus be difficult to distinguish. One method for addressing this difficulty with precipitation indices is to aggregate the individual changes into a PDF describing a large area, thus providing a coherent picture that in the aggregate can be distinguished by the null distribution created solely by internal variability (Fischer and Knutti 2014). Area aggregation could prove to be a useful method to find significance in the precipitation index changes, however one attribute of grid point area aggregation is that every point within the area is considered a part of the same distribution. With the spatial variability existing in the indices it can be problematic to make such an assumption. Further studies may pursue these approaches. This study does not address the impacts that the changing extremes may cause. The thresholds chosen for figure 4 are somewhat arbitrary, simply discretizing the range of changes for ease of comparison among the different scenarios. Individual impact studies will use case specific threshold and thus produce a more impact relevant comparison of the changes in the indices across scenarios than we did here.
The robustness of our results is hindered by the use of a single model affected by individual biases and a single value of climate sensitivity. Different models tend to have different biases, and different strengths of feedbacks that may affect local climate and extremes so by using different models a better sense of which changes are more robust through an analysis of the models' consensus can be achieved (Hagedorn et al 2005). However, no other models of the CESM class has run the same scenarios, and a multi model ensemble is not currently an option. The CESM biases were mitigated by only comparing the outputs of the scenarios as changes within and between them, rather than arguing based on the absolute values achieved. All that said, these results constitute a survey of model specific results to which eventually more models could be compared. For now, these results are the best indication we have on how the calculated extremes will change under the S17 scenario.
It is not within the scope of this paper to discuss any impacts, or what mitigation efforts should be made to avoid such impacts. It is evident from the data that in all three S17 scenarios temperature indices experience a significant and marked change by the end of the century, to a different degree in the 2.0 C than the 1.5 C and the overshoot.