Is land use producing robust signals in future projections from Earth system models, all else being equal?

We use six Earth system models (ESMs) run under SSP3-7.0, a scenario characterized by a relatively large land use change (LUC) over the 21st century, and under a variant of the same scenario where a significantly different pattern of LUC, taken from SSP1-2.6, was used, all else being equal. Our goal is to identify changes in climate extremes between the two scenarios that are statistically significant and robust across the ESMs. The motivation for this study is to test a long-held assumption of the shared socio-economic pathway-representative concentration pathway (SSP-RCP) scenario framework: that the signal from LUC can be safely disregarded when pairing different SSPs to the compatible RCPs, where compatibility only considers global radiative forcing, predominantly determined by well-mixed greenhouse gasses emissions. We analyze extremes of daily minimum and maximum temperatures and precipitation, after fitting non-stationary generalized extreme value distributions in a way that borrows strength along the length of the simulation (2015–2100) and across initial condition ensembles. We consider changes in the 20 year return levels (RL20s) of these metrics by 2100, and focus on eight locations where LUC is large within each scenario, and strongly differs between scenarios, averaging the RL20s over a neighborhood characterized by the same LUC to enhance the signal to noise. We find that precipitation extremes do not show significant differences attributable to LUC differences. For temperature extremes (cold and hot) results are mixed, with some location-index combination showing significant results for some of the ESMs but not all, and not many coherent changes appearing for indices across regions, or regions across indices. These ESMs are representative of what is typically adopted as the source of climate information for impact studies, when the SSP-RCP framework is put to use. Overall, our analysis suggests that the hypothesis to pair SSPs to RCPs in a flexible fashion is overall defensible. However, the appearance of some coherence in a few locations and for some indices invites further investigation.


Introduction
Our perspective in this study is that of a user of climate model output interested in deriving measures of how some climatic impact-drivers (Ranasinghe et al 2021, Ruane et al 2022 are changing under scenarios of future anthropogenic forcings. To do so, the user interrogates Earth system models (ESMs) participating in ScenarioMIP of the Coupled Model Intercomparison Project Phase 6 (CMIP6), thus extracting model output at their standard resolution of about 100 km (∼1 • ).
But our user does not stop at the evaluation of the physical climate outcomes: she is an impact modeler who wants to exploit the matrix framework that organizes the use of shared-socioeconomic pathways (SSPs) and representative concentration pathways (RCPs) (O'Neill et al 2014, van Vuuren et al 2014). The matrix framework works under the hypothesis that most RCPs can be associated with most SSPs, i.e., most radiative forcing levels at 2100 in the plausible range (∼1.9-8.5 Wm −2 ) can be reached by emission and land use pathways developed on the basis of many of the five SSPs. Importantly, this matrix framework also implies that the five SSP socioeconomic scenarios, used to determine exposure and vulnerability can be associated with future physical climate variables obtained under most of the RCPs, without endangering the coherence of future climate risk estimates so derived. This SSP-RCP pairing is known to fail in a few instances, when the socioeconomic assumptions of the SSP cannot produce the magnitude of anthropogenic emissions leading to the forcing level of the RCP. E.g., only SSP5 can produce emissions that result in 8.5 Wm −2 radiative forcing by 2100; in the world of SSP3, regional rivalry, it is very hard to implement successful mitigation policies on a global scale that would allow meeting the 1.5C target; SSP5, by construction reliant on fossil fuel, is not compatible with very low warming targets either (Riahi et al 2017, Rogelj et al 2018. Aside from these extreme pairings, however, the association of (almost) any SSP to (almost) any RCP is assumed to be internally consistent.
A different aspect that would undermine this consistency is if signals emerged in ESM output attributable to land use change (LUC) and/or short-lived climate forcers (SLFCs), components of scenarios that vary significantly across SSPs (Riahi et al 2017). In case LUC and SLFC were found to significantly alter ESM projections at the regional scale, output from a climate model run under SSP3-7.0, say, known to have large deforestation trends over the 21st century and driven by the assumptions in SSP3 (Fujimori et al 2017) could not be used with a socio-economic input from a different SSP (say SSP1, that has opposite LUC trends in many regions, as documented in van Vuuren et al 2017) in an impact study. This would violate the requirement of internal consistency between the physical world and the human system influencing and impacted by it.
To our knowledge the emergence of these signals solely due to regional forcers has not yet been tested, despite having inspired a deliberate experimental design across MIPs: when designing the experiments to be run under ScenarioMIP (O'Neill et al 2016), SSP3-7.0 was chosen for Tier 1 for its enhanced use of LUC and SLCFs (relatively large aerosol precursor emissions until mid-century). At the same time two other MIPs, the land use MIP (LUMIP, Lawrence et al 2016) and the aerosol chemistry MIP (AerChemMIP, Collins et al 2017) included variants of SSP3-7.0 specifically designed to analyze the signal from these regional forcers. In particular LUMIP prescribed an SSP3-7.0 experiment where land use over the 21st century would be swapped for that of SSP1-2.6, a sustainable future with very different patterns and lower magnitudes of deforestation and considerable levels of reforestation or afforestation. AerChemMIP prescribed SSP3-7.0 variants with low SLCFs, but we do not consider these in this study.
Here we take advantage of SSP3-7.0 simulations under ScenarioMIP (from now on SSP370) and LUMIP SSP370-SSP126Lu (SSP370Lu from now on) and tackle the detectability of significant changes in output between the two scenarios, by design only attributable to the different land use assumptions, all else being equal (in particular greenhouse gas concentrations, by which these concentration-driven simulations are forced).
As we aim to support the use of the scenario framework by impact modelers, we focus on a set of extremes of temperature and precipitation, since we expect the most severe impacts to be a function of phenomena like heat and cold extremes and major downpours. Work on average temperature and precipitation changes under LUC, mostly using idealized deforestation experiments with a background of a stationary, pre-industrial climate, has already revealed the difficulty of finding a robust signal across models in terms of patterns and at times even sign of changes. The more robust effects are in the direction of a cooling and drying effect of deforestation (Boysen et al 2020, Luo et al 2022, Yu and Leng 2022 but with disparities among models on the size and significance of these changes. Here we present results from an analysis that attempts to maximize the LUC signal: we focus on changes by the end of the century, averaged over regions where LUC differences are largest and most consistent across ESMs. Even by doing so, it remains challenging to derive a clear and robust picture of the effects of LUC on these extremes, particularly as we take a multi-model perspective, which has become standard in addressing important sources of uncertainty from models' structural choices, and which is also increasingly used in the impact research community Sutton 2009, Frieler et al 2017).
The paper is structured as follows: in section 2 we describe data and methods, then we present our results in section 3 and we conclude with a discussion in section 4.

Data and methods
We target a multi-model ensemble in order to derive a measure of robustness of the signal from LUC across ESMs. Only a small subset of the ESMs participating in ScenarioMIP (more than 40, Tebaldi et al 2021) and running SSP370 also participated in the relevant LUMIP experiment, SSP370-SSP126Lu (SSP370Lu from now on). We require daily output Table 1. ESMs included in this study and their characteristics. In the column that specifies the grid resolution we also add information regarding the size of the neighborhood that we will use in our analysis at each of eight locations. The radius units are grid points, and the choice is made so that the radius of the neighborhood is about 5 • for each of the models.

ESM Institution
Land of daily minimum and maximum temperature and average precipitation to compute six extreme metrics, and the variable 'treefrac' representing the fraction of trees in each grid cell at each timestep. We also consider orography and land masks for each ESM. Six ESMs provide all the required experiments and output, some of them with multiple initial condition members under each scenario. See table 1 for a list, including information about the land model coupled to the ESM, the resolution of its atmosphere component grid and the number of runs available.
All runs cover the length of the 21st century, from 2015 to 2100 (in some cases 2098 or 2099). As already described, the two scenario forcings differ only in the LUC maps, one corresponding to SSP3-7.0 (Fujimori et al 2017), the other corresponding to SSP1-2.6 ( van Vuuren et al 2017). Figure 1 shows the difference in tree cover in each cell at 2100 when comparing the change in this variable between 2015 and 2100 in SSP370 to that in SSP126. Despite the imposition of a common set of LUC forcings (LUH2, Hurtt et al 2020), each of the six land models interprets the input maps in ways that can be substantially different across models. We therefore look for regions where changes appear consistent in sign and magnitude (at least in terms of relative magnitude according to each model changes) across the six maps. Positive values (expressed as percentage of tree cover in the cell) indicate more tree cover in the standard SSP370 scenario compared SSP370Lu. Negative values indicate that SSP370Lu has more trees in that location than SSP370. Circles identify the locations we choose to focus on, after having assessed that no coherent results could be synthesized for the entirety of the land area. As can be gathered by comparing maps in figure 1 we choose locations where models agree in the sign of the difference, and where the differences are relatively larger in absolute values on each map. The selection proceeded at first heuristically, by pinpointing areas with coherent and substantial tree fraction change (substantial defined here as exceeding the 5th and 95th percentile of the distribution of change in treefrac for each model, across the land areas) common across ESMs. Then, a centroid (longitude/latitude coordinates) for each of eight regions thus identified was chosen, and so was the nearest grid point for each ESM. After an analysis at the single grid point scale showed a lack of statistically significant results in most cases (models/indices/regions combinations), a neighborhood of a varying number of grid points (depending on the grid resolution) but covering a similar spatial extent (about 5 • ) was chosen for each ESM (see table 1). For these neighborhoods we checked that orography was consistent across ESMs (see table 2), but we did not go further in assessing process-related characteristics of the land surface, in keeping with our multi-model descriptive perspective.
Note that a fundamental source of possible inconsistency across models is the differential use of dynamic vegetation (see land models marked by an asterisk in table 1), by which prescribing afforestation does not necessarily result in more trees if the planted trees do not grow. To address this aspect at least in part, we extract, when available, another variable from the two scenarios, indicative of vegetation height. We find (figure S1) that at least for the three models for which the variable vegHeight was saved (ACCESS-ESM1-5, CESM2 and UKESM1-0-LL, the latter using dynamic vegetation) areas that appear with more trees in figure 1 also show taller vegetation in figure S1 (and vice-versa), thus confirming The eight locations chosen are listed in table 2, together with the average size of the tree cover difference between the two scenarios across the ESMs considered, and elevation at the site (in both cases we also show the standard deviation across the ensemble). We note that one of the locations, southeast of the Amazon region, (EAMZ) shows more tree cover under SSP370 for all models except ACCESS-ESM1-5 and CESM2, which we exclude from the comparison at that site. In one other location, central Africa (CAF), one of the models (CanESM5) has a small (1%) positive change differently from the rest of the ensemble, which shows a strong negative change, and is accordingly excluded from the comparison. In all other cases we compare the responses of all six ESMs.
At each location, the metric of interest is averaged over the 5 • area whose radius encompasses the strong tree cover change signal, thus maximizing the potential for signal detection in the extreme metrics from each ESM individually.
At each location and for each model we consider the 20 year return level (RL20) of six different extremes of maximum and minimum temperature (Tmax and Tmin) and precipitation (Pr). We start from daily output of these three variables and summarize, in each year of the simulation, • The warmest day of the year, taking the maximum value of daily Tmax over the calendar year: TXx. The eight locations chosen based on consistent tree cover change between scenarios and across ESMs. Also shown are the mean difference in the fraction of land covered by trees (and its standard deviation) across the ensemble, and the same statistics for elevation. In the column reporting tree cover percentage difference we also list the indicator (H when the difference is positive, as in higher percentage; L when the difference is negative, as in lower percentage) that is used in figure 2 through 4.  The individual metrics' time series at each grid point of the ESM native grids are fitted to a nonstationary generalized extreme value (GEV) distribution, using the time series of logarithm of annual CO 2 atmospheric concentration as a statistical covariate in the GEV location parameter to represent the role of that anthropogenic forcing. From the estimated GEV parameters, the non-stationary RL20 can be calculated, together with a confidence interval, for each year in the series (2015-2100) at each grid point. Although by design the CO 2 series is the same for both experiments, the parameters of the GEV and the resulting estimates of RL20s would be significantly different between experiments if LUC effects were significant. The method allows multiple ensemble members (available for some ESM/experiment combination, see table 1) to be used as replicates, enhancing the precision of the estimates, as reflected by the width of the confidence intervals. It also allows us to perform the multi-model comparison of the spatially averaged RL20s for the locations in figure 1 despite the fact that, to achieve the same spatial average, we need to consider a different number of grid points depending on each ESM grid resolution: the estimation produces standard deviations of the return levels at each gridpoint, and we can therefore compute the standard deviation of the mean return level of a number of grid points. This ensures that the confidence intervals for the averaged RL20s reflect the number of grid point that contribute to each average, enabling a rigorous calculation of statistically significant differences (or lack thereof).
Thus, for eight locations and six ESMs we can ask the question of how, if at all, different LUC, all else being equal, influences extreme temperature and precipitation. As in previous work (Tebaldi and Wehner 2018) we choose a moderate extreme, the RL20 of each metric or, more appropriately in a changing climate, the event with 5% chances of occurring every year. The metrics' behavior has been studied extensively in the literature based on climate model simulations (Tebaldi et al 2006, Sillmann et al 2013a but also by analyses of observations (Zhang et al 2011, Donat et al 2013 and therefore also enabling validation of model performance over the historical period (Sillmann et al 2013b, Wehner et al 2020. Here however we do not focus on the question of validation or performance, as our motivating question is a modelcentric concern. Our result section focuses therefore on assessing the significance of the differences in the behavior of these extremes between SSP370 and SSP370Lu, together with the robustness of the direction of change across ESMs. One could also wonder about the consistency of these effects across space, i.e., do regions that see the same type of LUC also see the same type of changes in extreme behavior but we anticipate this to be a more complex question than that we set out to answer, involving processbased understanding of local vegetation, evaporative regimes and climate features (Koster et al 2009), and the compounding of local and remote effects of LUC (Pongratz et al 2021, Grant et al 2022.

Results
Previous work sets expectations of how LUC of a given sign (more tree cover or less tree cover) would affect climate (Pongratz et al 2021), depending on the latitudinal, seasonal, and daily versus nightly nature of the atmospheric quantities of interest. For temperature extremes increased tree cover is expected to create cooling in the summer by increasing evapotranspiration but cause warming in the winter, especially at high latitudes, by decreasing surface albedo (substituting dark vegetation for snow, for example); also, daytime temperatures are expected to be affected differently from night time temperatures, with the difference also modulated by the latitude at which LUC is taking place, especially distinguishing tropical forest effects from high and mid-latitude forest effects (Christidis et al 2013, Meier et al 2019, Grant et al 2022. These effects are supported by observations of changes in temperature due to deforestation (Alkama and Cescatti 2016). Effects on precipitation are expected to be noisier, with signals typically emerging only when a strong deforestation pattern is imposed. For example, deforestation in the Amazon has been modeled by some ESMs as triggering drying (Li et al 2022), and even, potentially, a selfreinforcing cycle of drying, with the potential to reach a tipping point leading to an extensive die-off of the forest (Zemp et al 2017). Note however that, from a modeling perspective, Grant et al (2022) concludes that ESMs do not show a coherent response to LUC in their historical simulations, finding that diversity and uncertainties in their representation of land processes are too large to overcome other sources of noise, like internal variability.
Our findings confirm that precipitation behavior as described by the two indices, the wettest day of the year (Rx1Day) and the wettest pentad of the year (Rx5Day), through the lens of their RL20s by the end of the century, is not affected significantly and/or in a robust way across models at any of the locations we have analyzed. Figure 2, for Rx1Day (top panel) and Rx5Day (bottom panel) for each of the 8 locations and the six ESMs shows as the height of barplots the central estimates of the differences in RL20 changes at 2100 (with respect to 2015) between SSP370 and SSP370Lu. Each color bar corresponds to an ESM, and the darker colors with a black line running along the length of the bar indicate statistical significance of the difference. The eight regions are plotted side by side, separated by a grey line. Most bars are plotted in light colors, with no black line running through their length, indicating that the change is not statistically significant. Consistent with the effects of internal variability, in the majority of cases the ESMs show differences of opposite sign within the same region. Note that for a few cases, indicated in the figure by a star at the basis of the bar, changes in the RL20 within each scenario are not significant, and we therefore could not expect their difference to be, either. But these are a minority of region-model combinations, and do not invalidate our general conclusions. We conclude that for precipitation intensity, given the experimental design and this ESM ensemble, no significant and robust effects of afforestation or deforestation on precipitation extremes at these locations can be identified. It is important to note that the coarseness of the horizontal resolution of the CMIP6 models prevents realistic simulations of many of the storm types that produce extreme precipitation such as tropical cyclones (Wehner et al 2015) or mesoscale convective systems (Prein et al 2020).
We now consider the behavior of temperature indices. A synthesis here is more challenging, as we are considering four indices of cold and heat extremes, based on minimum (nighttime) and maximum (day time) temperature. Figure 3 (for warmest day and night of the year, TXx and TNx respectively) and figure 4 (for coldest day and night of the year, TXn and TNn) summarize our results. Differently from the two precipitation indices, ESMs produce statistically significant changes for these temperature indices in most cases, as the darker colors and the black lines indicate. In fact, in the case of these temperature indices, all changes within scenarios are significant (there is no star at the bottom of any bar). Thus, all non-significant results in these plots are purely a reflection of the non-significant effect of the LUC changes between scenarios. For a given region, however, the sizes of the bars, indicating the magnitude of the difference estimated because of LUC, differ substantially among ESMs, even when the models agree on the sign of the difference. In almost all regions the sign is also contentious among ESMs. If we count only those ESMs producing a significant change, and only those regions when at least two ESMs produce such changes, only three regions show consistent significant changes in the same direction for the warmest day of the year (TXx), EAMZ, NEUS and WCAN; two for the coldest night of the year (TNn), WAF and NEUS; only one, NEU, for the warmest night of the year (TNx) and for the coldest day of the year (TXn). In all these cases but for WCAN under TXx, the direction of change is also what would be expected under afforestation (warming of cold extremes, cooling of hot extremes) or deforestation (cooling of cold extremes, warming of hot extremes). Every other region/index  figure 1 and table 2). The regions are shown side by side along the x-axis (see labels), separated by grey lines. In each region, six bars indicate the magnitude of the difference in the metric's RL20 by 2100 (RL20 under SSP370 minus RL20 under SSP370Lu) for each of six ESMs (four in the case of EAMZ, five in the case of CAF). Darker colors and a black line running along the length of the bar indicate statistically significant differences (5% level). Lighter colors indicate a non-significant change. In each plot the regions are grouped according to the sign of LUC: An H or an L associated with the region indicates higher (H) or lower (L) tree cover under SSP370 compared to SSP370Lu. The indicators H and L are positioned to suggest the expected sign of the change, so when bars point towards the indicator, they represent a change consistent with expectation. A star at the basis of the bar indicates a case where the change in the extreme index is not significant in the individual scenarios. combination tells a more complex story of model disagreement and/or unexpected response to LUC.
From a location perspective: • Of the eight regions considered (table 2), Northern Eurasia (NEU) and the Northeast of the United States (NEUS) are the only two regions where changes are consistent with expectations for all four temperature indices, given a large increase in tree cover, when comparing SSP370 to the 'greener' SSP370Lu, for NEU and a large decrease for NEUS, with a response robust across a majority of ESMs, if not all. • WCAN sees its cold extremes (TNn and TXn) respond as expected, and consistently across most ESMs. • Of the remaining regions, significant changes in opposite directions across ESMs or responses against the expected direction appear, preventing a coherent synthesis of the multi-model outcomes.
Figures S2 through S7, similar to figures 1 and S1, show the fields of these differences over the global land area for each index. We do not include ranges, which for the same index vary substantially across models, nor measures of significance, which is very sparse over these fields. Rather, we intend these figures as a qualitative depiction of the inter-model variability in response, and the noise affecting the geographic patterns of the models' projections. We have attempted to overcome the variability and noise by synthesizing these multimodel results through our location-based, spatial averaging approach. When comparing the patterns of LUC in figure 1 to the patterns of change in RL20 in the vicinity of the locations we have focused on, it is clear that we could have enhanced the signal-to-noise further by defining locations not as regular circles, but more tightly coupled to the LUC patterns, i.e., averaging over the specific grid points where decreases (or increases) in tree cover are largest. Such a choice, however, would have made our analysis an idealized one, while we strive to adopt the perspective of an impact study, for which location is not defined by the patterns of LUC, but by a geographic locale that is of specific interest because of assets at risk, for example, and around which a neighborhood is usually defined by the choice of a radius.

Discussion and conclusions
We have used results from two SSP370 variants constructed to explore the role of afforestation/deforestation, everything else being equal, to test the sensitivity of regional climate extremes to LUC. One of the fundamental assumptions of the SSP-RCP framework is that regional forcers like LUC will not affect the results of a scenario experiment in manners so significant as to invalidate the pairing of different SSPs, from where LUC derives, to different RCPs, which to first order are characterized by the overall strength of the forcings, not their regional distribution. Testing this assumption is the main motivation of our study.
We choose to analyze the effects of LUC on six impact-relevant metrics of heat, cold and precipitation extremes, since impact modeling is the main domain of application of the SSP-RCP framework.
We focus our analysis on changes in 20 year return values (RL20, calculated by fitting GEV distributions at the grid point scale), asking if these changes will be significantly different by 2100 in the two SSP370 variants. Rather than declaring defeat when faced with noisy geographical patterns of change across models, we focused our analysis on a set of eight locations, identified by the strength of LUC.
Wet precipitation extremes do not show significant differences within each ESM, and, not surprisingly for results affected by internal variability, changes are often in opposite directions across the multi-model ensemble. This is consistent with the large magnitude of internal variability affecting precipitation (Lehner et al 2020), but also with the nature of heavy precipitation represented in these relatively coarse-grained models, which is mainly advected over a region rather than depending on localized effects of land-atmosphere interactions.
Extreme temperature indices show significant changes, coherently across ESMs for a few locations but not for all. Most indices do not have a coherent behavior across ESMs in a majority of locations. Only one location sees all indices responding coherently. For the remaining locations the noise of internal variability affecting the uncertainty of the RL20 estimates overwhelms any signal, or significant signals of opposite directions are found across ESMs.
Overall, our conclusion is that, when using our metrics of temperature and precipitation extremes, users of climate model output for impact assessment adopting a multi-model framework to address uncertainties in future projections need not be concerned with the details of LUC. Of course, this is predicated on the use of a CMIP6-type multi-model ensemble, at horizontal resolutions of the order of 100 km. Natural variability and model uncertainty remain the main sources of variation, overwhelming scenario differences, when LUC is the only component of the scenario changing, consistently with the results for average temperature and precipitation that, even for more idealized and stronger deforestation experiments, found large differences among model responses, often even in significance and sign (Grant et al 2022).
This should reassure practitioners using the SSP-RCP matrix structure. It may additionally suggest that scenario design for this type of simulations may not need to invest significant resources in the representation of differing LUC patterns for the sake of producing a range of climate outcomes. We note also that these results support the practice of deriving global warming level projections by stratifying simulated future outcomes according to global temperature anomalies, irrespective of the scenario under which these anomalies are produced, i.e., disregarding the potential effects of local forcings that may vary from scenario to scenario within the same level of global radiative forcing (James et al 2017).
Importantly, however, our results have the limitations of an analysis that remains 'on the surface' of relatively coarse models. This is not a study of the physical processes governing land-atmosphere interactions, for which a focus on the details of ESMs and their land models is required. The emergence of a few instances of significant differences in particular locations and for particular indices, and even the lack of coherence among models or among locations suggest however that further work is needed. Further analyses would be most promising if they indeed followed process-based approaches, focusing on land modeling choices, at higher resolutions than considered here, or on sub-grid scale land use effects. Such analyses could be able to separate remote from local effects. They could also account for the effects of dynamic vegetation modeling choices, or lack thereof. As figure S8 shows, the models that do not represent dynamic vegetation (see table 1) are broadly consistent with each other in terms of the behavior of globally aggregated tree cover changes under each scenario, though differences still persist. The ones that use dynamic vegetation can show significantly different trends in tree cover due to the combined response of the imposed LUC and endogenous tree migration, and this aspect could explain some of the inconsistent responses we found across the ESMs. Further, more in-depth process-based analyses could account for the possibility that other feedbacks mechanisms (cloud behavior pre-eminently) may be muting the LUC response, a possibility complicated by the fact that different models represent the impacts of surface temperature changes and fluxes on clouds differently. Notably, previous modeling studies produced contrasting impacts of LUC on mean and extreme temperature (e.g., Pitman et al 2012), partly because of uncertainty in the competing biophysical and albedo-driven effects (Pitman et al 2009, de Noblet-Ducoudre et al 2012. As warming increases water vapor in the atmosphere, LUC effects may be more prominent in combined temperature and humidity extremes (Findell et al 2017, Song et al 2022, which should be further investigated in the future. Last, the scenarios analyzed here achieve relatively high radiative forcing (7.0 Wm −2 ) in most part from greenhouse gas emissions and our results indicate that the strong warming signal from that forcing overwhelms the differences from forced LUC responses in most cases. Of course, this is the hypothesis we set out to test, but this also leaves out the possibility that a similar analysis conducted for lower warming scenarios (e.g. RCP2.6) may produce more significant and consistent results. All in all, many modeling choices can contribute to the type of results we have depicted here. However, the basic message remains, that a user of model output for impact analyses that do not focus strictly on subtleties of land-atmosphere interaction may exchange scenarios under the same global radiative forcing, given model uncertainties and noise from internal variability.

Data availability statements
Code to fit the GEV distribution used the climex-tRemes package available at https://github.com/cran/ climextRemes.
The data that support the findings of this study are openly available at the following URL/DOI: https:// esgf-node.llnl.gov/projects/cmip6/.