Origin, importance, and predictive limits of internal climate variability

Adaptation to climate change has now become a necessity for many regions. Yet, adaptation planning at regional scales over the next few decades is challenging given the contingencies originating from a combination of different sources of climate projection uncertainty, chief among them internal variability. Here, we review the causes and consequences of internal climate variability, how it can be quantified and accounted for in uncertainty assessments, and what research questions remain most pertinent to better understand its predictive limits and consequences for science and society. This perspective argues for putting internal variability into the spotlight of climate adaptation science and intensifying collaborations between the climate modeling and application communities.


What is internal climate variability and where does it come from?
Internal variability refers to fluctuations that arise intrinsically in a non-linear dynamical system, even when that system is closed (energy, mass, and momentum are conserved) and not subject to any changes in external forcing. In the case of Earth's climate, internal variability arises primarily from the uneven distribution of energy across the planet at any given time. Physical processes, such as oceanic or atmospheric heat transport or radiation to space, act to balance this unevenness, but they do so at different temporal and spatial scales. Together with Earth's rotation, this means that an even energy distribution is never reached, leaving Earth's climate to shift energy around perpetually. For example, mesoscale processes such as tropical moist convection are often used as illustrative 'starting points' for internal variability (formally described as error growth in the context of weather forecasting; e.g. Judt 2018). Such processes individually or in aggregate then affect the larger synoptic scales, leading to baroclinic instabilities in the extratropics, expressed as mid-latitude weather systems or the meandering jet stream-deviations from the expected climatological mean state (Lorenz 2006). Conceptually similar processes exist in slower system components, such as the ocean or land, with expression in, e.g. sea surface temperature (SST) or soil moisture.
Fast and slow climate system components interact to create a wide spectrum of variability, which manifests, for example, as variations in temperature from one day to the next, but also by strengthening and weakening of ocean currents from one decade to the next. Internal variability thus affects virtually every aspect of the climate system. It is worth noting that for variables such as sea level pressure or terrestrial temperature and precipitation, a large fraction of the internal variability spectrum can be recovered by intrinsic atmospheric variability alone and does not require oceanic influence (Deser et al 2012b). This can be shown by comparing a climate model simulation with prescribed climatological SSTs to a fully coupled simulation with an interactive ocean component (figure 1). Comparing the variances of the two simulations reveals that the fraction of variance attributable to intrinsic atmospheric variability is between 50% and 100% over most land areas for interannual as well as longer time scale variability (warm colors in figure 1). Depending on the time scale and region, the ocean can amplify variability beyond what the atmosphere generates intrinsically-for example through the El Niño-Southern Oscillation (ENSO).
Importantly, internal variability occurs around a mean state, which is dictated by the long-term balance of the system at hand. Internal variability thus almost always occurs within a certain range (for example a range of temperature values) and does not drive the system unidirectionally. In Lorenz' seminal papers on chaos, a set of differential equations describe a non-linear system with one or more attractors around which the system varies (Lorenz 1963). In a simplified analogy, Earth's long-term average climate is such an attractor and variations around it represent internal variability. However, it is still debated what constitutes a climate attractor and thus how many attractors the climate system has (Nicolis and Nicolis 1984, Grassberger 1986, Lorenz 1991). The Lorenz model, for example, describes convection arising from heating below and cooling atop a fluid layer, much like atmospheric convection. The domain size, boundary conditions, and heating gradient set behavioral bounds to the system and establish the attractors-in other words, while individual convective cells can behave chaotically, their long-term statistics converge to a well-defined distribution. For an example on completely different timescales, one could think of Earth's thermostat as an attractor mechanism, in which warming increases rock weathering, which accelerates burying of carbon, which then cools temperatures (e.g. Brantley et al 2023). Transitions between warming and cooling could be chaotic but would vary around an attractor that reliably prevents run-away warming or cooling during Earth's history.
The Lorenz model offers insight into a key feature of internal variability: its inherent unpredictability. Even in a deterministic system such as the Lorenz model, miniscule differences in the initial conditions used to solve the equations can lead to very different trajectories after a while. This phenomenon exists in reality as well, where fluid mediums, such as the atmosphere and ocean, are intrinsically unstable to tiny perturbations. Because the initial conditions of a system under prediction (for example, weather) are never known perfectly, this inevitably leads to a gradual divergence between the prediction (for example, a weather forecast) and the actual outcome.

Why care about internal variability?
The example of weather forecasting and initial condition predictability suggests that internal variability arises from small spatial and temporal scales and grows to encompass global and decadal scales. This leads to important issues for climate science. The historical climate record reflects not only a response to various external forcings affecting its energy balance (such as changes in orbital parameters, solar irradiance, greenhouse gas concentrations, or aerosols), but it has also imbedded fluctuations that are the expression of a unique realization of internal variability. 'Unique' here means that it is just one of many possible and equally plausible realizations that could have occurred. Taking this insight to the extreme has led to the analogy that the flap of a butterfly could cause a hurricane. While proving such causality is beyond our reach, this concept of 'contingency' (Gould 1989) is useful when thinking about the possible consequences of even the smallest perturbations to the atmosphere and, once they grow, to the larger climate system-or, conversely, how the system would have evolved had that butterfly not flapped its wing.
This superposition of external and internal influences complicates the interpretation of the historical climate record (e.g. Sippel et al 2020), as our understanding of climate variability and change relies on a robust separation of these influences (Thompson et al 2009). Like the past, future climate will also reflect the combined influence of external forcing (for example, an increase in greenhouse gas concentrations) and internal variability. Precise prediction of future climate for lead times longer than a few years is thus not possible, but the importance of internal variability can be illustrated and quantified with climate models, which encapsulate enough relevant physical processes to also produce internal variability when integrated in time (e.g. Deser et al 2012a). Internal variability can also lead to extreme events beyond anything experienced so far . Because people's lived experience of extreme events influences their risk awareness and consequently adaptation actions (Moore et al 2019), this can leave them underprepared for surprising events (e.g. Bradshaw et al 2022). Internal variability thus poses a communication challenge and calls for probabilistic predictions-again very similar to weather forecasts, where a chance of rain rather than a binary forecast of rain or no rain is given.
This perspective reviews the role of internal variability in climate projections using the example of air temperature, provides an overview of methods used to separate and quantify external and internal drivers of climate variability, and assesses their predictability. An additional aim of this perspective is to discuss outstanding scientific challenges in the field of climate projections, as well as how physical understanding of internal variability and its manifestation in large ensemble model simulations, a primary tool in climate science, can be used to better inform climate change adaptation. For the case studies featured herein, we rely on internal variability as simulated by climate models and will also briefly discuss associated model biases.

An example: temperature projections
To illustrate the influence of internal variability on climate projections, we show projections of wintertime air temperature over North America from many different simulations with the same climate model under the same radiative forcing scenario, specifically the Community Earth System Model 1 Large Ensemble, or CESM1-LE (Kay et al 2015; figure 2). As with the Lorenz model, any individual simulation is started from slightly different initial conditions (here, round-off errors in atmospheric temperatures at year 1920 of each simulation-the butterfly wing flap), while the rest of the model setup is kept identical between the different simulations (Kay et al 2015). Specifically, each simulation is subject to the same historical external forcing, such as volcanic eruptions or changes in greenhouse gas concentrations, as well as an emissions scenario going out to year 2100. Such a setup is sometimes referred to as a single-model initial-condition large ensemble (SMILE;, Maher et al 2020, Deser et al 2020a. The initial perturbations to each simulation are so small that the weather on the model planet over the first few days looks almost identical between simulations. Then, gradually, the different simulations diverge and eventually become decorrelated with regard to their interannual variability. At that point, the climate system has 'forgotten' about the initial conditions and the resulting range of temperature represents the many possible trajectories of climate around its climatological attractor. Here, the different trajectories are illustrated by linear trends over the next 30 years (2021-2050) from each of the first 30 SMILE ensemble members (figure 2). All ensemble members show wide-spread increases in winter temperature, consistent with the expectation of warming with future greenhouse gas emissions. The spatial pattern of this warming, however, shows substantial variations-there are even ensemble members with regional cooling trends over 2021-2050. Due to the identical experimental setup in all ensemble members, any member-to-member differences are attributable solely to internal variability.
The average of the 30 trend maps is shown in the bottom left panel of figure 2. This 'ensemble mean' trend pattern represents the response of the climate model to the external forcing imposed to each simulation, as the differences between the individual ensemble members are largely averaged out. In other words, the forced response or 'signal' common to all simulations is distilled from the 'noise' of internal variability through the averaging process. This 'signal' shows a poleward-amplified pattern of warming throughout North America. The 'noise' pattern, quantified by computing the standard deviation across the 30 individual maps, also shows a poleward-amplified pattern (bottom middle panel of figure 2). As a result, the signal-to-noise map shows a more amorphous pattern than either the signal or the noise (bottom right panel of figure 2). Importantly, the signal-to-noise values are larger than 1 over almost all of North America-this indicates the emergence of the forced response from the background climate noise.
Assuming that the internal variability of this particular climate model is realistic (a point discussed later), any projection of temperature for the real world is subject to irreducible uncertainty of the magnitude shown in figure 2. In other words, the real world could end up looking like any one of the maps in figure 2, and we would not be able to predict ahead of time which one it might resemble.

Contribution of internal variability to total projection uncertainty
Besides internal variability, other sources of uncertainty are important for climate change projections: model response uncertainty and scenario or socio-economic uncertainty. How does internal variability compare to these other sources of uncertainty? A common framework to quantify this is to calculate a 'total projection uncertainty' from a collection of simulations from different modeling groups run under a common set of emissions scenarios (Hawkins and Sutton 2009).
Taking all available models from the sixth Coupled Model Intercomparison Project (CMIP6) that provide simulations for multiple emissions scenarios results in a subset of 29 models and 4 different emissions scenarios (Lehner et al 2020). For each model and scenario, the forced response can be estimated either by averaging all of the available simulations, or-as is more common due to the lack of multiple simulations from some models-it can be estimated as a statistical fit to each model's simulation (for example, a 4th-order polynomial is fit to a temperature time series as done in Hawkins and Sutton 2009). The residual from this fit provides an estimate of internal variability (quantified by computing the variance of the residual time series). The variance across the estimated forced responses in each model for a given emissions scenario constitutes an estimate of response uncertainty. Finally, averaging the forced responses across all the models for each scenario separately and then calculating the variance across the scenarios constitutes an estimate of the scenario uncertainty. These uncertainties are approximately additive (Yip et al 2011), such that the total uncertainty can be estimated as the sum of the individual sources of uncertainty.
Each source of uncertainty can now be expressed as a time-varying fraction of total uncertainty. For example, for projections of decadal global mean temperature (figure 3(a)), internal variability initially dominates total uncertainty (figures 3(b) and (c)). Moving further into the future, internal variability contributes increasingly less to total uncertainty, while response uncertainty and eventually scenario uncertainty become the dominant sources of uncertainty (figures 3(b) and (c)). At regional or local scales, such as North America or a grid cell near Anchorage, AK, respectively, internal variability remains important or even dominant for a longer time, as climate tends to be more variable at smaller spatial scales (figures 3(d)-(i); see also Sutton 2009, Kumar andGanguly 2018). In fact, the orange area in the right-most column of figure 3 nicely illustrates the gradually increasing fraction of uncertainty explained by internal variability.
It is worth noting that a robust separation into different sources of uncertainty becomes more difficult the smaller the spatial scales considered. This is due to the generally higher signal-to-noise ratio at smaller scales, which leads to internal variability aliasing into the estimate of the forced response (Lehner et al 2020). Ideally, the ensemble mean of a SMILE is used to estimate the forced response of a model, rather than a 4th order polynomial fit, as the latter has a higher potential to conflate internal variability and forced response.

Drivers of internal variability
3.1. Drivers of temperature variability Generally, temperature changes as shown in figure 2 can be thermodynamically-and/or dynamically-induced . 'Thermodynamically-induced' refers to changes caused by time-varying radiative fluxes or changes in sensible and latent heat fluxes at the Earth's surface. Formally, this excludes any concomitant changes in the atmospheric circulation. 'Dynamically-induced' refers to temperature changes attributable to changes in the atmospheric circulation, irrespective of their cause.
Together with the presence of external forcing, the cause for a given temperature trend can be partitioned into four categories: forced-thermodynamic, forced-dynamic, unforced-thermodynamic, and unforced-dynamic . 'Forced' refers to the externally forced radiative imbalance of Earth (e.g. due to greenhouse gases) and 'forced-thermodynamic' then indicates that the external forcing impacts temperature via a thermodynamic process (e.g. changes to radiative fluxes caused by increasing greenhouse gases). Conversely, 'unforced-thermodynamic' also refers to a thermodynamic process influencing temperature, except this process is not caused by external forcing but just arises from unforced internal variability (e.g. changes in surface fluxes due to intermittent states of snow cover or soil moisture). Finally, 'forced-dynamic' and 'unforced-dynamic' refers to changes in temperature via forced or unforced changes in atmospheric circulation variability (e.g. changes in sea level pressure). Such a decomposition has previously been applied to the past 50 years , but is here applied to future projections.

Decomposition of drivers
The categories laid out above enable a decomposition of the relative contributions of internal variability and forced response to a given temperature trend-for example the 2021-2050 winter trend pattern over North America in one of the simulations shown in figure 2-and also a means of diagnosing the processes through which internal variability and forcing exert influence (figure 4). In practice, several steps are needed to conduct the partitioning quantitatively. As illustrated in figures 2 and 3, the forced contribution can be estimated either by using the ensemble mean of a SMILE, the multi-model mean of a set of CMIP models, or by making statistical assumptions about the forced response, with the residual representing the internal (or 'unforced') contribution. For certain variables such as atmospheric circulation or precipitation, it can be difficult to diagnose the forced trend from observations alone because of their typically small signal-to-noise ratio. Even a statistically significant trend within an observational dataset could be unforced, as the population distribution from which the confidence interval is derived might undersample the true distribution (Wittenberg 2009, Horton et al 2015. While time series models exist to estimate the magnitude of internal variability from observations alone, even for noisy variables (Thompson et al 2015), they cannot reliably estimate the forced response (Lehner et al 2020). Using SMILEs avoids this issue, at the cost of relying on models to provide a more complete distribution and robust estimate of the forced response. With the advent of multiple SMILEs, across-SMILE comparison can strengthen confidence in model-derived estimates of the forced response (Deser et al 2020a, Maher et al 2021. To further partition the forced and unforced components into thermodynamic and dynamic contributions, the method of dynamical adjustment is used . This method aims to estimate the contribution of the atmospheric circulation to a given temperature pattern. Briefly, using the example of monthly mean data, the atmospheric circulation (e.g. sea level pressure) pattern in a given target month (say, January 1980) is reconstructed from the circulation patterns in all the other Januaries in the data set, for example using regression or analogs. Each circulation reconstruction is associated with a temperature reconstruction, which thus gives an estimate of the typical temperature pattern that occurs with this type of circulation pattern. Once repeated for all the months in the dataset, one obtains an estimate of the role of atmospheric circulation in bringing about the temperature changes seen in the original data-the dynamic contribution. The residual between the original data and the dynamic contribution is an estimate of the thermodynamic contribution. Further decomposition into forced-dynamic and forced-thermodynamic is achieved by conducting the above analysis for each member of a SMILE and then averaging the respective estimates. There exists now a range of other methods to achieve this separation, including tools of statistical learning and pattern recognition (Sippel et  The pattern seen in figure 4(a) can now be decomposed completely. This particular ensemble member was chosen for its interesting pattern of cooling over Western North America (which occurs despite the projected increase in greenhouse gases over the next 30 years), along with pronounced warming over Eastern North America. The decomposition reveals that internal variability contributes substantially to this cooling ( figure 4(b)), in particular when compared to the total forced response which shows ubiquitous warming ( figure 4(c)). Most of the cooling occurs due to atmospheric circulation, i.e. the dynamic contribution, specifically a strong trend towards atmospheric ridging off the west coast ( figure 4(d)). This dynamically-induced cooling is entirely internal (figure 4(e)) as there is essentially no forced trend in atmospheric circulation (figure 4(f)). On the other hand, the warming over Eastern North America is almost entirely due to thermodynamic processes (figure 4(g)), of which the forced component dominates (figure 4(i)); indeed, the internal-thermodynamic component drives a weak cooling trend ( figure 4(h)). More generally, figure 4 shows that, for western North America, changes in atmospheric circulation are of almost negligible importance compared to thermodynamic processes for understanding regional temperature trends due to anthropogenic forcing. However, large internal variability of the atmospheric circulation can at times overwhelm the anthropogenic response in a given realization.

Prospects for uncertainty reduction
Uncertainty from internal variability has here been portrayed as being irreducible-and for good reason (Hawkins et al 2016): initial conditions that would provide predictability are typically quickly forgotten by the climate system, unless they are tied to components with substantial memory such as the ocean. Efforts in decadal prediction via initialized climate model simulations are advancing rapidly, but currently show regionally limited skill for lead times of 2-9 years (Meehl et  The expected uncertainty reduction from initialized prediction can be illustrated using the CESM1 Decadal Prediction Large Ensemble (CESM1-DPLE; Yeager et al 2018), which provides 40-member ensemble predictions for lead times of up to 10 years with the same model as used in figure 3. The CESM1-DPLE is initialized using observations every November from 1954 to 2017, so its last prediction runs from November 2017 to December 2027. For illustration, we compare the CESM1-DPLE prediction for the period 2013-2022 with the uninitialized CESM1-LE over the same time period ( figure 5(a)). Interestingly, the CESM1-DPLE predicted the accelerated pace of global warming from 2013-2016, followed by a slowed warming rate, both features seen in observations but not in the uninitialized CESM1-LE; thus, these changes in the rate of global warming are potentially attributable to information contained in the 2012 initial conditions. For winter temperatures over North America and Alaska (figures 5(c) and (g)), on the other hand, no skill is visible (nor expected given the highly variable winter weather, even at continental scale). This is consistent with the more comprehensive CESM1-DPLE skill assessment conducted in Yeager et al (2018).
Besides prediction skill, important here is the spread across CESM1-DPLE ensemble members as an indicator of the potential uncertainty reduction from initialization (figures 5(b), (e) and (h)). For global annual temperature, the CESM1-DPLE spread at lead time 1 year is about 50% of the CESM1-LE spread (measured by standard deviation), thus initialization can reduce uncertainty from internal variability by about half. However, the memory from initial conditions quickly fades and the spread approaches 100% after lead year 5 ( figure 5(b)). This pronounced but short-lived constraint on internal variability uncertainty is visualized alongside the other sources of uncertainty for CMIP6 global temperature projections until 2030 ( figure 5(c)). For winter temperatures over North America and in Anchorage, AK, on the other hand, the ensemble spreads from the CESM1-DPLE and CESM1-LE are indistinguishable already at lead time 1, which nota bene is the December-February mean right after initialization in November (figures 5(e) and (h)). Thus, there appears to be no general benefit from initialization for this region on seasonal time scales and beyond (figures 5(f) and (i)). As these examples show, the potential for uncertainty reduction from initialization depends on the location, season, and lead time (and also physical quantity) and thus needs to be assessed and communicated carefully. Recent discovery of a signal-to-noise paradox (Scaife and Smith 2018) brought forth new methods to better isolate the predictable signal in large ensemble simulations, leading to improved prediction skill in regions affected by the North Atlantic Oscillation (Smith et al 2019, Moulds et al 2023. Application of these methods to other regions and climate indices might lead to improved skill over what is reported here. As for constraining the other sources of uncertainty-response and scenario uncertainty-recent years have seen some progress that is worth exploring but is not covered in detail here. First, new climate-social system models suggest that some emissions scenarios are less likely to actually occur in the future (Moore et al 2022). These models simulate feedbacks between socio-economic dynamics and climate based on empirical relationships and argue that there exist dampening feedbacks in the system, e.g. society implementing more stringent climate mitigation in response to mounting extreme event damages. These feedbacks allow us to rule out the highest emissions scenarios, thus substantially reducing the scenario uncertainty that often dominates total projection uncertainty later this century ( figure 3(c)). Second, new physically-based constraints on global and regional climate projection uncertainty (Lorenz et al 2018, Brunner et al 2020, Sherwood et al 2020, Qasmi and Ribes 2022 promise to reduce response uncertainty, whose importance tends to peak mid-century when initial conditions are forgotten but scenarios have not yet diverged substantially (figure 3(c); see also Lehner et al 2020). Future work will seek to combine these two pathways to constrain projection uncertainty at regional scales.

Summary
This paper provides a review of the role of internal variability in climate projections using the example of temperature trends over the coming decades. The significant influence of internal variability, especially at regional scales, is illustrated by the range of possible future temperature trends simulated by the individual members of a SMILE (figure 2). The influence of internal variability is also shown to be important when compared to other sources of uncertainty such as response uncertainty originating from structural differences across models or the choice of future emissions scenario (figure 3). Its relative importance decreases with lead time but increases with progressively smaller spatial scales and can account for 50% of local projection uncertainty even by year 2100 (figure 3). The chaotic nature of the atmospheric circulation is a key driver of internal variability in regional temperature trends (figure 4). The prospect for reducing uncertainty from internal variability is limited, as shown by the prediction skill and ensemble spread reduction achieved with a state-of-the-art decadal prediction system (figure 5), though it might carry value for certain applications. Despite the focus on seasonal to annual temperature here, the lessons-learned apply equally to other variables and research questions, for example precipitation (Deser et (Rodgers et al 2015, Lovenduski et al 2016. Generally, the smaller the scales and the noisier the variables of interest, the more likely is the research to benefit from including climate model large ensembles (Milinski et al 2020, Bevacqua et al 2023. The implications of internal climate variability are therefore far-reaching, for both science and society.

Implications for climate science
Internal variability continues to complicate our interpretation of the historical record, which relies on cleanly separating observed climate trends into contributions from external forcing and internal variability. Interestingly, this challenge has neither gotten easier nor less important over the last 20 years. If anything, one could argue that several recent developments in climate science urge a renewed focus on better understanding internal variability.
First, CMIP6 showed a relatively larger response uncertainty than CMIP5 for temperature and many temperature-related quantities, puzzling the community (Zelinka et al 2020). As a result, a flurry of emergent constraints were developed that use the agreement (or lack thereof) between observations and models to assess the models' credibility, arguing for example that models which warm too much over the historical period are also likely to warm too much in the future (Tokarska et al 2020). Such constraints can be powerful but rely critically on a robust understanding of the historical record. Is the historical record our best-guess central estimate of how the climate responds to external forcing or did a combination of forced response and internal variability render it more of an end-member of a plausible but elusive distribution of historical records that could have been? The answer to this question directly determines which climate models are considered to be in agreement or disagreement with observations. While much progress has been made in developing emergent constraints with the necessary care (Hall et al 2019, Sherwood et al 2020, the uncertainty from past aerosol forcing in particular still looms large as a confounding influence on our interpretation of the historical record and the inference we draw from it (Persad and Caldeira 2018, Deser et al 2020b, Lehner and Coats 2021. Second, and relatedly, while one can generally expect our understanding of the magnitude of internal variability to solidify with a longer observational record, it can also harbor surprises. One prominent current example is the long-term evolution of tropical Pacific SST, which has been trending La Niña-like for decades, while the forced response from climate models consistently predicts a more El Niño-like response (e.g. Seager et al 2019). The tropical Pacific ocean drives a myriad of impactful teleconnections; thus, its future state is of great interest for many regions around the world. Until a few years ago, this discrepancy between observations and models could readily be chalked up to internal variability, as one could find individual simulations in SMILEs that replicated the observed trend (Watanabe et al 2021), suggesting that reality is just an end-member of its own distribution of plausible trends and that observations will eventually reverse and fall in line with the models' forced response. With every passing year that this discrepancy persists, however, it has become more difficult to defend that interpretation, such that it is now suggested that models either underestimate the magnitude of internal variability in this region or, worse, are misrepresenting the forced response of tropical Pacific SSTs (Wills et al 2022). Similarly, recent unprecedented extreme events, such as the Pacific Northwest heatwave (Philip et al 2022) have brought into focus that our observational records are sometimes still too short to robustly constrain all aspects of internal variability and that we need to continue to observe and study it. Finally, there is evidence from subseasonal to decadal prediction experiments that climate models might be less predictable than the real world: in other words, their signal-to-noise ratio is too small, especially for mid-latitude atmospheric circulation (Scaife and Smith 2018, Klavans et al 2021. It is possible that this so-called 'signal-to-noise paradox' also applies to the response to anthropogenic forcing, which would have implications for our confidence in uncertainty partitioning as presented here (figures 3 and 5) and elsewhere. Resolving the signal-to-noise paradox thus remains an important goal for the climate modeling community.
While the search for the true magnitudes of internal variability and forced response will thus continue into the foreseeable future, it is indeed on the topic of model validation where progress can be made today. The advent of multiple SMILEs (Deser et al 2020a) and the development of observational large ensembles (McKinnon and Deser 2018, 2021) enable a probabilistic comparison of models and observations (e.g. Deser andPhillips 2023, Wieder et al 2022) with the goal not necessarily of identifying which model is right or wrong, but which ones are more or less plausible. Such nuances are surprisingly important as we rethink model evaluation in light of internal variability. Model behavior that was previously viewed as inconsistent with observations is suddenly compatible, as the latter is revealed to under-sample variability . On the other hand, if a model has too much variability (like CESM1 for winter temperature over high latitudes; Simpson et al 2022) this can create a wider range of future temperatures than might actually occur in reality (Thompson et al 2015). Thus, if needed, certain models can now be rejected or down-weighted more confidently thanks to the larger sample sizes (van Oldenborgh et al 2020, Zscheischler and Lehner 2022).
Further, internal variability itself can change with external forcing (Rodgers et al 2021). For example, precipitation variability is expected to increase with warming (Pendergrass et al 2017, Stevenson et al 2022 and mid-latitude winter temperature variability is expected to decrease (Screen 2014), while the future of the ENSO remains uncertain (Maher et al 2018). With the increasing number of SMILEs and their public availability, quantitative assessments of changes in variability become possible, also for typically under-sampled phenomena such as extreme events or modes of internal variability (Haszpra et al 2020, O'Brien and. It does, however, require confidence in the models' ability to correctly simulate internal variability itself.

Implications for society
Despite the remaining scientific challenges, the implications of internal variability for society are clear. Internal variability and the uncertainty it injects into projections for the coming decades will continue to accompany any climate change impact assessment and decision making. This suggests that a perspective focused on risk and robust decision making (Sutton 2019, Mankin et al 2020, Reed et al 2022 is more pertinent to users of climate model information than the focus on a 'best-estimate' . This is increasingly being recognized by application communities dependent on climate science, such as climate-related health impacts (Garcia-Menendez et al 2017, Fiore et al 2022 or water resource management (Harding et al 2012, Chegwidden et al 2019, Smith et al 2022. A current example is the scoping of new guidelines for the management of the Colorado River in the Western U.S. The co-occurrence of declining precipitation since the 1980s (largely attributed to internal variability, e.g. Lehner et al 2018) with anthropogenic warming and increased water demand from socio-economic growth has led to rapid reservoir depletion (Lukas and Payton 2020). This puts pressure on new guidelines to be more robust to the full range of possible hydrologic outcomes in the face of internal variability and forced climate change. Yet, hydrologic projections based on climate model simulations show a paralyzingly wide range of outcomes (Lukas et al 2020). The water management community thus uses methods of Robust Decision Making to develop guidelines that perform optimally across this wide range of futures rather than assuming a best estimate (Smith et al 2022). However, these approaches do not currently distinguish sources of climate projection uncertainty, continue to undersample internal variability (Mankin et al 2020), and do not incorporate new observational constraints on regional climate projections (e.g. Brunner et al 2020, Grise 2022, thus motivating closer collaborations between the climate modeling community and practitioners. Also, the integration of internal variability and large ensemble simulations into decision making is perhaps not happening fast enough, as the superposition of internal variability and the forced response can create a rapid increase in unprecedented events , causing unanticipated socio-economic damages. This, in turn, prompts decision-makers and even scientists to conclude that climate change is 'happening faster than we thought' , when, really, often internal variability has not been considered to the degree it should have. Various new approaches with large ensembles, such as developing storylines of 'unseen' but plausible extreme events through resampling and 'boosting' of large ensembles (e.g. Gessner et al 2021, Kelder et al 2022 are poised to advance not just our scientific understanding of internal variability, but also societal communication around what to expect in a changing climate. Thus, the decade-old seminal call to improve communication around internal climate variability (Deser et al 2012a), especially when engaging with decisionmakers, remains as relevant as ever. Beyond communication, lessons learned from the seasonal prediction community on co-designing information (e.g. Sánchez-García et al 2022) might also be applicable for decision-making on longer time scales: deriving impact-relevant metrics from climate model output, stress-testing real-world systems with unseen events from large ensembles, and discussing where projection uncertainty is potentially reducible-and where it is not.