A revised interpretation of signal-to-noise ratio paradox and its application to constrain regional climate projections

The signal-to-noise ratio paradox is interpreted as the climate model’s ability to predict observations better than the model itself. This view is counterintuitive, given that climate models are simplified numerical representations of complex earth system dynamics. A revised interpretation is provided here: the signal-to-noise ratio paradox represents excessive noise in climate predictions and projections. Noise is potentially reducible, providing a scientific basis for improving the signal in regional climate projections. The signal-to-noise ratio paradox was assessed in long-term climate projections using single-model and multi-model large ensemble climate data. A null hypothesis was constructed by performing bootstrap resampling of climate model ensembles to test its ability to predict the 20th-century temperature and precipitation trends locally and compare it with the observations. The rejection of the null hypothesis indicates the existence of a paradox. The multi-model large ensemble does not reject the null hypothesis in most places globally. The rejection rate in the single-model large ensemble is related to the model’s fidelity to simulate internal climate variability rather than its ensemble size. For regions where the null hypothesis is rejected in the multi-model large ensemble, for example, India, the paradox is caused by a smaller signal strength in the climate model’s ensemble. The signal strength was improved by 100% through ensemble selection and based on past performance, which reduced uncertainty in India’s 30-year temperature projections by 25%. Consistent with previous studies, precipitation projections are noisier, leading to a paradox metric value 2–3 times higher than that of the temperature projections. The application of ensemble selection methodology significantly decreased uncertainty in precipitation projections for the United Kingdom, Western Australia, and Northeastern America by 47%, 36%, and 20%, respectively. Overall, this study makes a unique contribution by reducing uncertainty at the temporal scale, specifically in estimating trends using the signal-to-noise ratio paradox metric.


Introduction
Reducing the uncertainty in regional climate projections is vital for climate change adaptation and planning.Recent progress in improving seasonal to decadal climate prediction skills (Merryfield et al 2020) has shown promising results (Smith et al 2020), but their benefits for long-term climate projections are largely unknown.For example, a recent study (Smith et al 2020) found skillful predictions of the North Atlantic Oscillation using an ensemble selection method that can improve temperature and precipitation predictions in North America and Europe on decadal time scales.In contrast, the multi-model mean provides a muted signal for regional climate change projections (Maloney et al 2014).Moreover, because the externally forced component like greenhouse gas emissions-based climate change signal is small in near-term climate projections (Deser et al 2012a(Deser et al , 2020)), particularly in areas like the Southeastern United States (Kumar et al 2013c), the equal weighting of several climate realizations has been questioned (Tebaldi and Knutti 2007, Smith et al 2009, Scaife and Smith 2018).
Understanding the distinction between climate change signals and background noise is pivotal for adaptation planning and model refinement.Our method involves comparing climate model projections directly with observed data, specifically focusing on historical simulations.Increasing the ensemble size across various climate models and averaging the data reduced uncertainty related to external forcing impacting global average temperature trends (Kumar et al 2013b, Duan et al 2021).However, significant uncertainty persists in regional climate projections (Sheffield et al 2013, Kumar et al 2013c).Expanding the sampling of internal variability through a Large Ensemble (LE) approach (Deser et al 2012a(Deser et al , 2020) ) resulted in greater uncertainty and a smaller signal-to-noise ratio, as demonstrated later.This discrepancy, characterized by a higher correlation with observations and a limited ability to predict the model itself, is termed the 'signal-to-noise paradox' (Scaife et al 2014, Scaife andSmith 2018).
There is a growing consensus that the paradox is due to excessive noise (Boer et al 2019, Sévellec and Drijfhout 2019, Weisheimer et al 2019), which can be reduced by improving the process representation and high-resolution climate modeling (Yeager et al 2023).For example, mesoscale ocean processes (eddy-resolving) representation in high-resolution climate modeling reduces noise and improves the decadal predictability signal (Zhang et al 2021).Similarly, strengthened eddy feedback improves the ability of the climate model to predict observations (Hardiman et al 2022).
In climate change projections, understanding the noise component and determining a threshold for its inclusion is crucial for refining the accuracy of projections.We demonstrate that reducing the signal-to-noise ratio paradox metric value closer to one determines the threshold for keeping noise in the climate change projections.Firstly, we assessed the signal-to-noise paradox in long-term climate projections using the ratio of predictable component (RPC) metric (Smith et al 2020).This evaluation utilized datasets including single-model large ensemble (SLE) and multi-model large ensemble (MLE) data (Eyring et al 2016, Rodgers et al 2021).Subsequently, we employed a method focused on ensemble selection, specifically targeting regions exhibiting a notably higher paradox.This methodology aimed to reduce noise in these specific areas.Finally, our study demonstrates the practical application of this approach in refining temperature and precipitation projections at regional scales, showcasing the effectiveness of setting noise thresholds in improving the reliability of climate projections.
Previous efforts to constrain future projections have focused on a global scale using observed warming as the constraining parameter (Tokarska et al 2020, Shiogama et al 2022).For example, Shiogama et al (2022) reduced the uncertainty in global average precipitation projections by 8%-30% using the emergent constraint method and observed temperature trends from 1980 to 2014.Our study uniquely contributes to constraining regional climate projections using a new signal-to-noise ratio paradox metric.Additionally, a novel contribution is reducing the uncertainty in regional climate projections on a temporal scale, i.e., providing estimates of temperature and precipitation trends for the next 30 years (shown later) and comparing the ability of the SLE and MLE to capture the uncertainty in regional climate projections.Traditional downscaling methods, such as bias correction and spatial disaggregation, have limited or no skill in constraining trend estimates (Xu and Wang 2019).

Data and method
Clarification regarding trend, variability, and projection terminologies: regionally, observed climate change results from a combination of externally forced responses, namely a trend component, and internal climate variability (Kumar et al 2013a).Consequently, comparisons between climate models and observations encompass both trend and variability components.However, our analyses specifically target low-frequency climate variability and change by employing a 5-year running mean and assessing long-term trends spanning 30 years or more.
In stricter climate science literature, climate projections refer to non-initialized future climate simulations driven by model-derived emission scenarios (Eyring et al 2016).This stands in contrast to initialized decadal predictions, where observed atmosphere and ocean states are used to initialize the model predictions (Smith et al 2020).However, by design, future climate simulations cannot be verified using observations.Historical climate simulations, on the other hand, are scientifically akin to non-initialized simulations utilizing historically observed emission levels, volcanic eruptions, and land use changes (Kumar et al 2016).Furthermore, historical simulations can be validated against observations.Therefore, within the scope of this study, we interchangeably employ historical simulations and projections terminologies.
Multi-model large ensemble (MLE) climate data: the coupled model intercomparison project phase 6 (CMIP6) is a federated experimental protocol for climate model intercomparison studies (Eyring et al 2016).
Participating climate models differ in their model structure, parameterization, resolution, etc, resulting in uncertainty in climate model projections (Knutti and Sedlác ˇek 2013, Tokarska et al 2020, Zelinka et al 2020) and representation of the internal climate variability (Parsons et al 2020).The multi-model mean reduces inter-model uncertainty and provides global average temperature projections comparable to the observations in the historical record (Duan et al 2021).We used 23 CMIP6 climate models and all available ensembles, contributing a total of 220 ensembles (supplementary table S1) (Duan et al 2021).
Single model large ensemble (SLE) climate data: several climate realizations (e.g.> 30) using the same climate model and forcing scenario were developed by perturbing the initial conditions in the climate simulations.They provide considerably different climate projections at a regional scale, e.g., a cooling projection (−0.1 • C) to a strong warming projection (4.2 • C) for Seattle, USA, winter temperatures over the next 50 years (Deser et al 2012b).Uncertainty in SLE is purely due to internal climate variability in the given climate model, as opposed to the combination of internal variability, model structure, and parameterizations in the MLE.Hence, SLE provides a cleaner dataset for studying the role of internal variability (Lehner et al 2020).However, an open question is whether SLE can capture regional climate variability and trends.
The community earth system model version 2 large ensemble (CESM2-LE) is the latest SLE that provides 100-member climate realizations for historical and future climate (Rodgers et al 2021).CESM2-LE uses a combination of macro-and micro-perturbations.Four phases of the atlantic meridional overturning circulation states contributed to macro perturbations and infinitesimal perturbation (10 −14 ) to atmospheric potential temperature initial states, giving a micro-perturbation ensemble (Rodgers et al 2021).Because of the chaotic nature of the climate system, any small or large initial condition perturbations quickly grow and saturate to a similar magnitude-2-3 weeks after initialization (Kumar et al 2014), providing 100 equally plausible long-term climate simulations.We incorporated an analysis of 90 members ensemble that was available at the time of the analysis.
In addition, three CMIP6 climate models had 30 or more ensembles: CanESM5 (ensemble size = 50), IPSL-CM6A-LR (32), and NorCPM1 (30).We analyzed these three climate models in addition to CESM2-LE, totaling four SLEs.We will show later that an ensemble size of ∼30 is the optimal threshold for selecting LE data for this study.
Observations: we used a high-resolution (0.5 • × 0.5 • ) gridded surface temperature and precipitation dataset from the climate research unit time series version 4 (CRUTS4) (Harris et al 2020) as the long-term observation (OBS) because the CRUTS4 data provide spatially and temporarily continuous coverage from 1901 onwards.To address observational uncertainty, we performed an additional analysis for a better observation period, i.e. 1950-2014, and an alternative data source from the goddard institute for space studies (GISS) surface temperature data (Lenssen et al 2019).
Data processing: we re-gridded the monthly data of the climate model to a common 2.5 • × 2.5 • resolution using the area average preserving method (area_conserve_remap) (supplementary section T1) (NCL 2019).We then computed monthly temperature anomalies relative to the monthly climatology from 1951 to 1980.Monthly anomalies were averaged to compute the annual average temperature anomaly, ( f ye ) in year y and ensemble (e).We smoothed the data using a 5 year running mean filter to capture long-term climate variability and change and minimize high-frequency climate variability, such as El Niño-Southern Oscillation (ENSO) effects.The ENSO operates on a frequency of 2-7 years; the uninitialized historical simulation is not anticipated to capture the variability associated with ENSO, unlike the initialized multiyear prediction experiment, which falls outside the scope of this study.

The ratio of predictable component (RPC):
The RPC metric (equation ( 1)) effectively quantifies how well the ensemble mean can predict observation than the model itself (Scaife and Smith 2018, Smith et al 2019, 2020).The ratio compares the model ensemble mean correlation with observations and individual ensemble members (equation ( 1)).If the RPC is greater than one, it means that the observed climate variability is more predictable in ensemble mean than individual models. (3) Here, RPC o refers to RPC in relation to observations (see next paragraph).PC obs represents the Pearson correlation between the ensemble mean and observations.PC mod is the Pearson correlation between the ensemble mean and individual ensemble, estimated as the positive square root of the fraction of the total ensemble variance explained by the ensemble mean (equations ( 2) and ( 3)) (Scaife and Smith 2018).'f ' stands for climate forecasts (simulations), 'o' denotes observations, and 'y' denotes the year, e.g. 1901, 1902…., 2014.f ye represents annual average temperature or precipitation anomaly for year y, and the ensemble member e; dot (.) sign represents the corresponding index that has been averaged.Using dot notation, PC obs is the person correlation between model ensemble mean (EM = f y. ) and the observation (OBS = o y ) time series.Additionally, σ 2 sig refers to the signal variance of the ensemble mean, and σ 2 tot stands for the total variance of the climate ensembles.The denominator in equation ( 1) is the upper limit of the theoretical forecast skill, that is, potential predictability (Boer et al 2013).That means the maximum forecast accuracy is achievable with perfect initial information and a complete understanding of the climate system's processes, i.e. the perfect model world.That is why if the RPC becomes greater than one, it represents a paradox.
The original formulation proposed by Scaife and Smith (2018) employed the parameter RPC 2 o ; which inherently limits its range of values between 0 and ∞.In contrast, our approach (equation ( 1)) allows for the correlation between observations and ensemble mean to possess its sign, facilitating the discrimination between warming and cooling trends.Consequently, RPC o values are extended to range from -∞ to ∞.
RPC null hypothesis: the null hypothesis is that the RPC calculated using a randomly selected climate model ensemble (RPC m ) is not significantly different from the RPC calculated using observations (RPC o ) (equation ( 1)).Using the bootstrap method, we test whether the observations lie within the spread of the model ensembles.The following procedure was followed.
Step 1: for CMIP6, 220 ensemble members and observations generated a pool of 221 members.We randomly selected a climate ensemble (one of 221 members) as an 'observation' .Compute its correlation with the CMIP6 220 model ensemble mean and corresponding RPC m (equation ( 1)).
Step 2: repeat the previous step 1000 times (with replacement) to get 1000 RPC m .Step 3: rank 1000 RPC m and compute its 2.5th and 97.5th percentiles as the 95% range of RPC m .
Step 4: check whether RPC o falls outside the 95% range of RPC m ; if so, reject the null hypothesis.A lower rejection rate represents a better sampling of the internal variability by the climate model (shown later) and, hence, a better chance to capture the observation within the climate models' 95% uncertainty range.Rejecting the null hypothesis indicates the presence of the signal-to-noise ratio paradox, that is, excessive noise that can be potentially reduced.

Effects of ensemble size on RPC metrics:
we assessed the effects of ensemble size (e.g.ten, 20, 30) by randomly selecting it with repalcement from the available SLE and MLE data, conducting RPC null hypothesis testing, and repeating the process 500 times.Finally, we computed the median and 95% range of RPC rejection rates from the 500 samples.
Ensemble selection method to reduce uncertainty in climate projections if the null hypothesis is rejected: following (Smith et al 2020), we define the ratio of predictable signal using equation (4).

RPS
σ o is the variance of the observations.σ f sig is the signal variance of the forecast, that is, the ensemble mean variance.To adjust for the underestimation of the predictable signal in the ensemble mean, we multiplied the ensemble mean (f y. ) with the RPS and obtain the adjusted ensemble mean (f y. * RPS).We used the anomaly correlation between the adjusted ensemble mean and original ensemble members (equation ( 5)) as a metric for ensemble selection The procedure is as follows: Step 1: 220 CMIP6 and 90 CESM2-LE ensemble members generated a pool of 310 members.Step 2: compute the adjusted ensemble mean (f y. * RPS).Step 3: calculate the ACC e of each ensemble member and the adjusted ensemble mean (equation ( 5)).Step 4: select the top 10% of ensemble members based on the highest anomaly correlation.The ensemble selection varies across different calibration periods, such as the period from 1900 to 1950 compared to that from 1900 to 1980.

Signal-to-noise ratio paradox in MLE and SLE
The CMIP6 MLE captured the observed long-term warming from 1900 to 2014 on local scales (2.5 • × 2.5 • grid cell; figure 1(a)).The RPC null hypothesis was not rejected at most places (88%), and RPC was closer to The RPC increases with increasing ensemble size from 0.52 ± 0.16 with one ensemble to 0.98 ± 0.05 with 30 ensembles and then saturate, i.e., there is no further increase in RPC as ensemble size increases beyond 30 (figure 2(a)).A considerable drop in the signal-to-total ratio, the denominator in equation ( 1) contributes to the increase in the RPC value; for example, the signal-to-total ratio drops from 1 to 0.72 ± 0.04 at an The red line is the signal-to-total ratio (denominator of RPC).The x-axis is the number of ensembles randomly selected from the given model ensemble (CMIP6 and four SLEs).Each selection on the x-axis is repeated 500 times.The colored lines are the mean of 500 iterations.The corresponding color shading is in the 95% uncertainty range, double the 500 iteration's standard deviation.
ensemble size of 30 (red curve in figure 2(a)).The EM correlation with the observation also increases with increasing ensemble size, e.g., 0.53 ± 0.17 at one ensemble to 0.70 ± 0.01 at 30 ensemble size.
The CMIP6 MLE agrees more among themselves than with the observations in the Southeastern US, the western part of South America, and central-west Africa, as indicated by an RPC of less than one and rejection of the null hypothesis (blue color and hatching in figure 1(a)).This is partly due to observational uncertainty, for example, the GISS data show missing information in South America and Africa (supplementary figure S1).Multi-decadal climate variability, for example, Pacific Decadal variability and atlantic multi-decadal variability, significantly influences climate variability in the southeastern US (Kumar et al 2013c, Meehl et al 2015, Pan et al 2022).Uncertainty can also result from the underrepresentation of high-resolution climate processes, for example, air-sea interactions in the Gulf stream that impact decadal climate variability in the Southeastern US (Zhang et al 2021(Zhang et al , 2022)).The null hypothesis was rejected in India, and the RPC value was greater than one (discussed later).
The RPC was closer to 1 in most places (figure 1(a)), resulting in a global average value of 1 (figure 2(a)).We found a similar result using the 1951-2014 analysis period with a slightly improved climate model performance, for example, the rejection rate reduced to 8% in CMIP6 MLE (supplementary figures S2 and  S3).Overall, we did not find a widespread signal-to-noise ratio paradox in CMIP6 long-term temperature projections, unlike the decadal climate prediction problem (Smith et al 2020).One reason is that climate change signals were more detectable in temperature data.For example, we found three times greater paradox rejection in the atmospheric circulation-related variable, e.g.sea-level pressure data, than in the temperature (supplementary figures S2(b) and S4).
SLEs can better predict themselves than observations, as suggested by an RPC value of less than 1 (figures 1 and S2).For example, the RPC value saturates at 0.87 ± 0.02 for 32 and greater ensembles in CESM2-LE (figure 2(b)) and 0.96 ± 0.04 for IPSL-CM6A-LR (figures 2(b) and (c)).The SLE captured the regional temperature variability and trends less robustly than the MLE for the null hypothesis rejection rate.The rejection rate is generally higher for SLE than for MLE (39% for CESM2-LE and1901-2014).The observed temperature variability and trends in many parts of North and South America, Central Africa, and parts of Southeast Asia are not well captured in the CESM2-LE projections.The rejection rate is a function of the model structure rather than its ensemble size.For example, CESM2-LE, with an ensemble size of 90, has a rejection rate of 39% compared with IPSL-CM6-LR, with 32 ensemble members and a rejection rate of 13%.With the same ensemble size, the fractions of rejection areas among SLE and MLE were different (figure 1(f)).
We then explored the reasons behind the varying rejection rates among SLE to suggest a plausible hypothesis.The rejection rates of the four SLEs were inversely related to their ability to simulate internal climate variability.For example, Parsons et al (2020) estimated interdecadal climate variability of the four SLEs as follows: 0.12 • C for IPSL-CM6A-LR, 0.07 • C for CanESM5, 0.06 • C for NorCPM1, and 0.06 • C for CESM2; and the corresponding RPC null hypothesis rejection rates are: 13, 20, 21, and 39%, respectively (figure 1).CESM2-LE performance improves during 1950-2014 period (21% rejection rate) and becomes comparable to the NorCPM1 model (16% rejection rate) (supplementary figure S2).Overall, we hypothesize that a larger internal variability (interdecadal) improves climate model's ability to capture observed temperature trends locally.

Application of the signal-to-noise ratio paradox metric to improve long-term temperature projections in India
Climate change signals were muted if there was excessive noise in the projections.We posit that the RPC metric can be used to identify noise and improve signals.In India, the RPC is significantly greater than one (RPC = 2.02), and the signal-to-total ratio is smaller (=0.48) (figure 3(a)).The RPC null hypothesis was rejected in India for MLE and the two SLEs (figure 1) and using GISS data (supplementary figure S1).We combined all climate ensembles to create a larger ensemble pool (220 CMIP6 + 90 CESM2-LE = 310) for regional climate studies.Because of the excessive noise, the multi-model mean estimate shows a considerably smaller warming (0.36 ± 0.05 • C from 1901 to 2014) than the observations (1.01 • C).Most uncertainty comes from the external forcing effects.After removing the linear trend, the MLE compares well with the observation; for example, the trend estimates are zero (by design), and the null hypothesis is not generally rejected in most Indian regions (supplementary figure S5).
We selected the top 10% ensemble (=31) to improve the signal, showing the highest correlation with the adjusted ensemble mean (see methods).As a result, the selected ensembles showed considerably improved signals (signal to total ratio = 0.85) and RPC values closer to one (RPC = 1.13) (figure 3(b)).The 20th-century temperature trend estimates also improved to 1.07 ± 0.06 • C and became closer to the observations (1.01 • C).This example showcases the application of a noise threshold criterion, specifically the RPC value nearing 1, which enhances the accuracy of regional climate change signals.
Of course, we cannot use observations for the projection period.To demonstrate the efficacy of the proposed method, we divided the historical period into observation (50 years or longer) and forecast periods (30 years).We used observations from the past (historical period) to adjust the model ensemble mean and select the ensemble in the same period.Only the model in the forecast period was used to assess the model's performance.For example, we used the 1901-1950 period for ensemble selection and then used the selected ensemble to project the next 30 years' temperature trend .In addition, we use all observations preceding the forecast period for ensemble selection, for example, 1901-1980 for ensemble selection and 1981-2010 for the projection period.Overall, we assessed the efficacy of the RPC-based ensemble selection method for four projection periods : 1951-1980, 1961-1990, 1971-2000, and 1981-2010 using the preceding observations for ensemble selection.
The RPS-based ensemble selection method reduced the uncertainty by 25% in India's 30 year temperature trend projections (figure 4(a)).The uncertainty in temperature projections, represented by the full projection range shown using whiskers in figure 4(a), decreased by 44% from −0.72 to 1.06 • C in the raw ensemble to −0.12-0.88• C in the selected ensemble for the period from 1971 to 2000.The observed warming rates are captured by the interquartile range of the selected ensemble in all four projection periods, compared to only one projection period  in the unselected ensemble.The most recent projection period  showed accelerated warming (0.64 • C), which is generally consistent with India's climate change projection report (Sanjay 2020).The selected ensemble shows the least reduction in uncertainty (15%) during the 1981-2010 period.It is likely that the accelerated warming phase has not been calibrated well during a long historical period, i.e., 1901-1980 (not investigated).
IPSL-CM6A-LR contributed the most (1/3rd) to the selected ensemble in three projection periods :1951-1980, 1961-1990 and 1971-2000, when reductions in uncertainty were substantial (figure 4(b)).Furthermore, IPSL-CM6A-LR performed at par with the multi-model ensemble for temperature projection, for example, compare figure 1(c) with 1 (a), and yellow and black lines in figure 1(f).Basha et al (2017) identified the predecessor model IPSL-CM5A-LR as the best model to capture observed temperature trends in India.Compared to IPSL-CM5A-LR, IPSL-CM6A-LR shows reduced biases in capturing near-surface air temperature (Boucher et al 2020).Overall, the signal-to-noise ratio-based ensemble/model selection method performs at par with process-based and/or impact-modeling studies.
Other models that contributed 10% or more of their ensemble in two or more projection periods include NorCPM1, GISS-E2-1-H, UKESM1-0-LL, BCC-ESM1, MIROC-ES2L, and CAMS-CSM1-0 (figure 4(b)).However, several of these models have a small sample size; for example, MIROC-ES2L has only three raw ensembles, of which two were selected; hence, they were not investigated further.

Signal-to-noise ratio paradox in precipitation projections
Precipitation projections are generally noisy; the signal strength is smaller than the temperature, particularly in the historical period and over land, because of the opposing effects of global warming and aerosols (anthropogenic and volcanic) (Richardson et al 2018).Additionally, uncertainties in convection parametrization due to coarse model resolution (Kendon et al 2021), cloud microphysics processes (Morrison et al 2020), and circulation-driven dynamical responses (Pfahl et al 2017) contribute to challenges in predicting regional precipitation changes.A smaller signal strength contributed to a higher RPC value in the precipitation projection (figures 5 and 6).For example, the RPC value ranged from −5 to +5 for precipitation projections (figure 5) and −2 to +2 for temperature projections (figures 1 and S2).The RPC is  However, the rejection rates of the RPC null hypothesis for precipitation projections are not widespread and are generally comparable to the rejection rate in temperature projections.For example, the CMIP6 MLE precipitation projections show a 15% rejection rate for 1901-2014 projections and 12% for 1951-2014; most of the rejection areas show RPC values from 0.5 to −1 (figure 5).Therefore, a higher RPC value does not necessarily suggest the rejection of the null hypothesis.
The SLEs generally showed a higher rejection rate than the MLEs.For example, CESM2-LE showed a rejection rate of 25% for 1901-2014 and 15% for 1951-2014.However, the best SLEs (least rejection rate) differed between temperature and precipitation.For example, IPSL-CM6A-LR performed best for temperature projections; however, IPSL-CM6A-LR performance is similar to CESM2-LE for precipitation projections (figures 5(c)-(f)).

Reducing uncertainty in precipitation projections
We selected three regions: Northeastern America, the United Kingdom, and Western Australia; these regions showed positive and higher RPC values, e.g.5.7 for Western Australia.The RPC null hypothesis was rejected for these regions.For generality, we selected a larger box region surrounding the grid cells where the null hypothesis was rejected.The box regions are shown in figure 5(a).
In Western Australia, a higher RPC value in the raw ensemble of precipitation projections (n = 310) is due to a very small signal-to-total ratio (=0.1) rather than an improved correlation with the observations (0.57) (figure 6(a)); therefore, supporting the main point that the signal-to-noise ratio paradox is due to excessive noise in the climate projections.The ensemble selection method improved the signal-to-total ratio from 0.1 to 0.45 and the correlation with observations from 0.57 to 0.67; therefore, the RPC decreased from 5.74 to 1.48 in Western Australia (figure 6(b)).
Using 1901-1980 as the calibration period for ensemble selection, the ensemble selection method considerably reduced the uncertainty in precipitation trend projections from 1981 to 2010.The projection uncertainty, as measured by the width of the whiskers, was reduced by 20% in Northeastern America, 47% in the United Kingdom, and 36% in Western Australia (figure 7(a)).Ensemble selection for precipitation differs from that for temperature; for example, CanESM5 contributed more than 10% of their ensembles in all three regions (figure 7

Discussion and conclusions
Improved sampling of internal climate variability in the MLE (Parsons et al 2020) allowed better capture of regional temperature variability and trends than SLE.There is no widespread signal-to-noise ratio paradox in the MLE temperature projections from 1901-2014 and 1951-2014 (figure 1 and supplementary figure 2).Additionally, MLE has a lower RPC null hypothesis rejection rate than SLE.A similar result was found for the precipitation projection.Hence, the MLE provides more robust regional climate variability and trend sampling.
The signal strength within climate model ensembles, as assessed by the signal-to-total ratio, decreases with increasing ensemble size (red curve in figure 2).Concurrently, the corresponding correlation with the observation increases as ensemble size grows (blue curve in figure 2), thereby ameliorating the paradox behavior by bringing the RPC value closer to one for temperature projections (black curve in figure 2).Nevertheless, if the signal is too weak, e.g. for precipitation projections, then it gives rise to the signal-to-noise ratio paradox (e.g.figure 6(a)).Consequently, the paradox signifies excessive noise in climate projections, necessitating the optimal selection of ensemble size to converge the RPC value towards 1.This interpretation is termed 'revised' as it highlights that augmenting ensemble size does not alleviate the paradox in long-term climate projections but instead underscores the significance of optimal ensemble/model selection, as shown in figures 3 and 6.
MLE and SLE noises are potentially reducible in regional climate projections.Ensemble selection ameliorates the temperature projection paradox in India.We used the RPS-based ensemble selection method on the climate model's past performance (50 years or longer), which improved the signal for the next 30 years' temperature projections in India (cf figure 4).Of the selected 31 members, IPSL-CM6A-LR contributed 1/3rd, which is corroborated by process-based studies of CMIP5 (Basha et al 2017).
Precipitation projections are noisier, leading to the RPC value 2-3 times higher than that of the temperature projections.This finding supports the idea that lower signal strength or increased noise levels correspond to higher RPC values.The ensemble selection methodology considerably decreased uncertainty in precipitation projections for the United Kingdom, Western Australia, and Northeastern America by 47%, 36%, and 20%, respectively.Selected ensembles/models for precipitation projections differed from those for the temperature projections, i.e., the ensemble selection methodology is sensitive to the variable of interest.
The signal-to-noise paradox is less widespread in long-term temperature and precipitation projections than in decadal climate predictions (Scaife andSmith 2018, Smith et al 2020).Internal climate variability is the dominant source of decadal climate variability, whereas external radiative forcing plays a major role in long-term climate projections (Hawkins and Sutton 2009).Hu and Zhou (2021) proposed a variance adjustment method that combined the effects of internal climate variability and long-term linear trends to improve decadal precipitation predictions in the Tibetan Plateau.A simple Markov model shows that the paradox can occur owing to less persistence in the climate model, that is, more unpredictable noise (Zhang and Kirtman 2019).
Finally, our study was primarily motivated by the application perspective; that is, given MLE and SLEs, does an ensemble mean the best projection?Of course, the answer depends on the region and variable of interest, for example, precipitation or temperature.We demonstrate the utility of the novel metric 'signal-to-noise ratio paradox' to answer this question.However, our findings are generally consistent with process-based studies (Basha et al 2017).We did not investigate the process-level details of the selected ensemble or model, which is a limitation of our study.Future studies should investigate process-level details and the application of machine learning techniques and emergent constraint methods (Yu et al 2022) to reduce uncertainty in regional climate projections.

Figure 1 .
Figure 1.Signal-to-noise ratio paradox in 20th-century temperature projections.The figure shows the ratio of predictable component (RPC) for 1901-2014 temperature variability and trends at local scales, which represent long-term warming in most places (not shown).Panels (a)-(e) are the RPC for CMIP6 or MLE and four SLEs.The hatching shows the region where the RPC null hypothesis is rejected at a 95% confidence level (see text).The legend is the model name/ensemble size/rejection rate.A higher rejection rate (5% or greater) represents less reliable model performance.Panel (f) shows the effect of ensemble size on the rejection rate of the null hypothesis shown for NorCPM1 (purple), CanESM5 (blue), IPSL-CM6A-LR (yellow), CMIP6 (black), and CESM2-LE (reddish brown).Colored lines show an average of 500 iterations.The number of ensembles given on the x-axis in panel (f) is randomly selected for RPC calculation, and the process is repeated 500 times.The respective color shading shows the 95% uncertainty range from 500 iterations.The rectangle box in panel (a) is the selected research region in India.

Figure 2 .
Figure 2. Effects of ensemble size on signal-to-noise ratio paradox for the 20th-century temperature projections (1901-2014).Panels (a)-(e) are the RPC for CMIP6 or MLE and four SLEs.The black line shows the global average of RPC values calculated at 2.5 • × 2.5 • grid box.The blue line is the Pearson correlation between ensemble mean and observation (the numerator in RPC).The red line is the signal-to-total ratio (denominator of RPC).The x-axis is the number of ensembles randomly selected from the given model ensemble (CMIP6 and four SLEs).Each selection on the x-axis is repeated 500 times.The colored lines are the mean of 500 iterations.The corresponding color shading is in the 95% uncertainty range, double the 500 iteration's standard deviation.

Figure 3 .
Figure 3. Signal-to-noise ratio paradox in India's temperature projections.Observed (black line) and model cluster ensemble mean (red line) time series plot from 1901 to 2014.The surface temperature is five years running mean annual anomaly from 1901 to 2014 in India.The model cluster is a combination of CMIP6 and CESM2-LE (n = 310).Panels (a) and (b) are the time series before and after model selection.The red shading shows the 5%-95% confidence interval.The ACC is an anomaly correlation between the climate model's ensemble mean and observation (GISS).Dotted lines represent trend lines obtained through linear regression analysis, with the corresponding equations (y = slope * x + intercept) displayed in their respective colors-red for climate models and black for observations.

Figure 4 .
Figure 4. Constraining uncertainty in long-term temperature projections in India using the signal-to-noise ratio paradox metric.(a) The temperature trend (Theil-Sen method nonparametric trend) for unselected (left brown box) and selected (right blue box) ensemble members in India.The selected ensemble members are in the top 10% of 310 total ensemble members (see text).The y-axis is the temperature trend during the new projected 30 yr.We multiplied the Theil-Sen trend estimate (per year) by 30 to represent total warming in a 30 year projection period.The black circle is the observed trend.Period one is ensemble member selection based on 1901-1950 and applies the selection to 1951-1980.The period two is member selection of 1901-1960 applied to 1961-1990; period three is the training of 1901-1970 to get a 1971-2000 result, and period four is the training of 1901-1980 to compare 1981-2010 results.The outlier values are not shown.(b) Selected ensemble members in the first three periods of panel (a).The y-axis is the number of the selected ensemble contributed by each selected model.The black line is 10% of the respective climate model's ensemble size.

Figure 6 .
Figure 6.Observed (black line) and model cluster ensemble mean (red line) time series plot of precipitation.The precipitation is a 5-year running means annual anomaly from 1951 to 2014 in Western Australia.The model cluster is a combination of CMIP6 and CESM2-LE.Panel (a) is a time series before model selection.Panel (b) is a time series after model selection.The red shading shows the 5%-95% confidence interval.The ACC is the correlation of forecast ensemble mean with observation.The extra models shown here which were not included in temperature analysis are E3SM-1-0, FGOALS-g3, and GISS-E2-1-G-CC.

Figure 7 .
Figure 7. Constraining uncertainty in long-term precipitation projections using the signal-to-noise ratio paradox metric.(a) The precipitation trend (Theil-Sen method nonparametric trend) for unselected (left brown box) and selected (right blue box) ensemble members in Northeastern America (NE America), Western Australia, and the United Kingdom (UK).The selected ensemble members are in the top 10% of 310 total ensemble members, showing the highest correlation with observations in the calibration period from 1901 to 1980 (see text).The y-axis is the precipitation trend during the projected 30 years.The black circle is the observed trend.The ensemble member selection training period is 1901-1980; the validation period is 1981-2010.In the box plot, two extension cords are maximum and minimum without outliers.The extra models shown here which were not included in temperature analysis are E3SM-1-0, FGOALS-g3, and GISS-E2-1-G-CC.(b) Selected ensemble members from different climate models.The black line is 10% of the respective climate model's ensemble size.