Improving the seasonal forecast by utilizing the observed relationship between the Arctic Oscillation and Northern Hemisphere surface air temperature

Although the seasonal prediction skill of climate models has improved significantly in recent decades, the prediction skill of the Arctic Oscillation (AO), the dominant climate mode over the Northern Hemisphere, remains poor. Additionally, the local representation of AO impacts has diverged from observations, which limits seasonal prediction skill of climate models. In this study, we attempted to improve prediction skill of surface air temperature (SAT) with two post-processing on dynamical model’s seasonal forecast: (1) correction of the AO impact on SAT pattern, and (2) correction of AO index (AOI). The first correction involved replacing the inaccurately simulated impact of AO on SAT with that observed. For the second correction, we employed a empirical prediction model of AOI based on multiple linear regression model based on three precursors: summer sea surface temperature, autumn sea-ice concentration, and autumn snow cover extent. The application of the first correction led to a decrease in prediction skills. However, a significant improvement in SAT prediction skills is achieved when both corrections are applied. The average correlation coefficients for the North America and Eurasian regions increased from 0.23 and 0.06 to 0.28 and 0.30, respectively.


Introduction
The Arctic Oscillation (AO) is recognized as the primary mode of atmospheric variability in the Northern Hemisphere, impacting weather and climate at mid-high latitudes (Thompson and Wallace 1998).It is particularly influential on boreal winter climates in regions such as North America, Europe, and East Asia (Higgins et al 2002, Kolstad et al 2010, Park et al 2011, Tomassini et al 2012, Kim and Ahn 2015).Extensive studies have explored the variability of the AO and its profound impact not only on air temperature, precipitation, sea ice, snow cover, and the Asian monsoon system (e.g.Wang and Ikeda 2000, Rigor et al 2002, Bamzai 2003, Gong et al 2004, Sun and Ahn 2015) but also on tropical climates such as the El Niño-Southern Oscillation and the Indian Ocean Dipole (Chen et al 2014, 2015, 2017, Cheng et al 2023, 2024).
Despite the significant impact of the AO on the mid-latitude regions, these models still demonstrated low prediction skill.Riddle et al (2013) investigated the seasonal prediction skill of the AO index (AOI), achieving effective forecasts for the winter AOI with lead times of over two months, but found that the models did not adequately represent related processes.Kang et al (2014) assessed AOI prediction in six state-of-the-art models, showing meaningful prediction skill at up to two months of lead time.These outcomes highlight the ongoing challenge of accurately predicting AOI and its impact on climate models.Ren and Nie (2021) developed a linear empirical model using the average sea surface temperature (SST) and sea ice concentration (SIC) from August as precursors to address these limitations and achieved a high prediction skill of 0.67.Kang et al (2014) also evaluated the spatial patterns of surface temperature associated with the AO mode from each reforecast.Their findings indicate that, while most models reasonably reproduce the surface temperature anomaly pattern over land, the amplitude of these patterns is mostly underestimated in the reforecasts.
Using teleconnection patterns such as the AO, efforts have been made to enhance predictions of mid-latitude winter temperatures through postprocessing methods.Lee et al (2023) constructed multiple regression equations using key Northern Hemisphere teleconnection patterns and improved winter temperature predictions in the Northern Hemisphere.Jung et al (2020) demonstrated the APEC Climate Center Multi-Model Ensemble model's ability to simulate the Warm Arctic Cold Eurasia pattern (WACE) and the Arctic temperature (ART) index (Kug et al 2015).They further showed the potential to enhance mid-latitude temperature prediction by adjusting the ART index using observational data.
Motivated by these previous studies, the present study therefore aimed to improve the seasonal prediction skill of the Northern Hemisphere surface air temperature (SAT) during the winter season by utilizing AO as a predictability source.
Section 2 provides detailed information on the Community Earth System Model version 2 (CESM2) and methodology.Section 3 demonstrates the characteristics of the AO prediction skill in CESM2 and how the AO pattern is utilized to enhance the prediction skill of Northern Hemisphere winter SAT.A summary of the results and a general discussion are provided in section 4.

Model and retrospective predictions
The dynamical model used in this study was CESM2, which consists of components representing the atmosphere, ocean, land, and sea ice.The atmospheric model was the Community Atmosphere Model, version 6 (CAM6); the ocean model was the Parallel Ocean Program version 2 (POP2) (Smith et al 2010, Danabasoglu et al 2012), the land model was the Community Land Model version 5 (CLM5) (Lawrence et al 2019), and the sea ice model was CICE version 5.1.2(CICE5) (Hunke et al 2015).
We conducted a comprehensive analysis of the performance of the CESM2 coupled model in the seasonal prediction of winter (December to February) mean SAT over 24 years from to 1993/1994 to 2016/2017 (table 1).Using retrospective prediction data, we assessed the ability of the model to predict the AO and aimed to improve winter SAT prediction skill.The prediction skill was evaluated using the anomaly correlation coefficient (ACC) between the SAT from the Japanese 55 year Reanalysis (JRA-55) and the model's forecasted SAT.All data were detrended before analysis.
To construct retrospective prediction experiments for the CESM2 coupled model, initialization of atmosphere, land, ocean, and sea ice conditions was performed annually on October 21st.The initial atmospheric ensemble conditions were prepared using the ECMWF Reanalysis v5 (ERA5) interpolated to the CAM6 model grid.The initialized fields of the variables on pressure levels consisted of the zonal and meridional wind, temperature, and specific humidity.The variables on a single level consisted of the surface pressure, temperature at 2 m, specific humidity at 2 m, and zonal and meridional wind at 10 m.A random field perturbation method was employed to generate 31 ensembles.This approach proposed by Magnusson et al (2009) constitutes an effective method to generate model spread with performance comparable to other, more sophisticated methods (Richter et al 2020(Richter et al , 2022)).
The initial ocean and sea ice conditions were configured according to the Coupled Model Intercomparison Project Phase 6 (CMIP6)-endorsed Ocean Model Intercomparison Project Phase 2 (OMIP2) (Griffies et al 2016, Tsujino et al 2020), which has been utilized in sub-seasonal to seasonal (S2S) predictions using CESM version 1 (CESM1) (Richter et al 2020) and CESM2 (Richter et al 2022).This approach involved repeating the process four times while prescribing atmospheric reanalysis data from JRA-55 Data Assimilation and Observation (JRA55-DO) (Tsujino et al 2018) from 1958 to 2009, and the ocean and sea ice data produced in the fifth cycle were used as initial conditions.
The land initial conditions were generated using the stand-alone CLM5 with satellite phenology (CLM5.0-SP), a project that models seasonal changes in plants using satellite observation data.A 50 year spin-up was conducted, utilizing climate state 3hourly atmospheric variables, consisting of precipitation, temperature, wind speed, and shortwave and longwave radiation, obtained from ERA5 (Hersbach et al 2020) for the period of 1991-2016.To obtain real-time initial land conditions, CLM5.0-SP was continuously forced with ERA5 data from 1993 to 2016, and the resulting initial condition files were utilized.

Definition of AOI
To analyze the winter mean AO, empirical orthogonal function (EOF) analysis was conducted using the monthly mean of area-weighted geopotential height anomalies at 1000 hPa north of 20 • N in the Northern Hemisphere, following the definition provided by the Climate Prediction Center.Geopotential height anomalies at 1000 hPa were obtained from JRA-55 To assess the prediction skill of the AOI using retrospective prediction data, seasonally averaged observed and forecasted geopotential height anomalies at 1000 hPa were projected onto the observed AO loading vector.The resulting AOI time series were normalized to the standard deviation.Prediction skill was assessed by the temporal correlation coefficients between the observed and forecasted AOI.

Correction of the AO impact on SAT pattern
We used the approach suggested by Jung et al (2020) to adjust winter SAT for the observed AO patterns.This was achieved by replacing the SAT variation coupled to AO simulated by the model with that from observation.Unlike those authors, who corrected for the observed ART patterns, we focused on AO impact on SAT pattern: (1) In equation ( 1), SAT cor represents the SAT forecast corrected by AO impact on SAT pattern.Here, f, h, and o denoting the forecast, hindcast (retrospective prediction), and observation, respectively.The term Reg indicates the linear regression.Equation (1) consists of three terms: the first term (SAT f ) represents the dynamical model-predicted SAT, the second term (AOI f × Reg(AOI h , SAT h )) removes the AO-SAT impact simulated by dynamical model, and the third term (AOI * × Reg(AOI o , SAT o )) incorporates the observed AO-SAT impact related to the AOI (AOI * ).Depending on the AOI correction, AOI * can be the observed AOI, dynamical model-predicted AOI, or empirical model-precited AOI.It is important to note that this equation is applied only to regions where the linear regression results are statistically significant at p < 0.05.Through the analysis of observed data spanning four distinct periods-1969 to 1990, 1979 to 2000, 1989 to 2010, and 1999 to 2020-we have demonstrated the stability of the relationship between the observed AOI and SAT (Fig. S1).The linear regression approach can be improved by factoring in the observed AO-SAT impact.

Correction of the AOI
To enhance the prediction skill of SAT in equation ( 1), AOI * must closely approximate the observations.Therefore, accurate prediction of the AOI is crucial.Ren and Nie (2021) effectively predicted the late winter (January-March) AOI using SST and SIC in August as precursors.In the present study, we developed a multiple linear regression (MLR) model to predict the winter (December-February) AOI using summer (June-August) SST, September SIC, and October snow cover extent (SCE) as precursors.These predictors exhibit weak correlations with each other, with SST and SIC having a correlation coefficient of 0.16, SST and SCE showing a correlation of −0.17, and SIC and SCE correlated at −0.07, indicating minimal multicollinearity issues in the MLR model where a, b, c, and d are regression parameters to be estimated.SST JJA was calculated by averaging over the North Atlantic region (45 which is known for its high correlation with the AOI (figure S2).For SIC SEP , averages were taken over the Barents-Kara Sea (30 following the methodology presented by Ren and Nie (2021) (figure S3).SCE OCT was derived by averaging over the North American region (110 , which exhibited a negative correlation with the AOI (figure S4).
The dataset utilized for empirical model construction spans 42 years  and included the retrospective prediction period of the dynamical model.A leave-one-out cross-validation method was employed for validation.Monthly SIC data were sourced from the National Aeronautics and Space Administration Special Sensor Microwave Imager/Sounder (SSMIS); SST data from the UK Meteorological Office's Hadley Centre Global Sea Ice

AO prediction skill of dynamical model
To evaluate the impact of the AO on SAT, we compared the linear regression maps between the observed AOI and SAT with those between the dynamical modeled AOI and SAT (figure 1).Although the observation data demonstrated a significant positive relationship between the AOI and SAT over North America and Eurasia, the dynamical model inaccurately simulated this relationship.The dynamical model failed to capture the strong positive relationship observed in Eurasia and only simulated a significant positive relationship in certain localized regions of North America.
We evaluated the AOI prediction skill by assessing the correlation coefficient between the observed AOI and the dynamical model-predicted AOI for different ensemble sizes using the bootstrap method (figure S5).The bootstrap method involves randomly resampling data from a given dataset to obtain new statistics (e.g.Wilks 2011).Screen et al (2014) highlighted that sufficient ensemble members are required to separate the Arctic forced atmospheric response from the internal atmospheric noise, with insufficient members potentially resulting in an overestimation.To address this issue, we conducted 1000 resampling iterations.As the ensemble size increased, the prediction skill of the dynamical model-predicted AOI converged to 0.28 (figure 2 blue solid line).Before normalization, the dynamical model-predicted AOI corresponds to approximately 32% of the standard deviation of the observed AOI, indicating that the model tends to underestimate the magnitude of the AOI.

Correction using AOI and observed AO-impact pattern
The dynamical model has inaccurately simulated the impact of the AO on SAT and exhibited low AOI prediction skill.To overcome the limitations of dynamical model, we employed two post-processing: (1) correction of the AO impact on SAT pattern, and (2) correction of AOI.
The first correction involved replacing the inaccurately simulated impact of AO on SAT (Reg(AOI h , SAT h )) with that observed (Reg(AOI o , SAT o )) using equation ( 1).The degree of SAT improvement was assessed by adjusting AOI * , which determined the impact of the AO and SAT relationships in the observation (AOI * × Reg(AOI h , SAT h )).
When assuming AOI * = AOI O in equation ( 1), we observed a theoretical maximum improvement in model prediction skill (figures 3(b) and (e)), particularly in the previously weakly predicted regions of North America and Eurasia (figure 3(a)).A simple equation that replaces the inaccurately simulated AO-SAT impact with that observed yields meaningful results.However, equation (1) (AOI * = AOI O ) relies on the observed AOI, which is not applicable to actual seasonal forecast.
Assuming the application of equation ( 1) for actual predictions and setting AOI * = AOI f , where AOI f represents the dynamical model-predicted AOI, the corrected results show improved prediction skill in the northern Eurasian region but a significant decrease in predictability in southern Eurasia and eastern North America (figures 3(c) and (f)).This outcome was attributable to the low AOI f prediction skill of 0.28, indicating poor correction of the observed AO-impact pattern.
For the second correction, recognizing the limitation of the dynamical model-predicted AOI skill at 0.28, which did not reach statistical significance, we implemented an empirical model (equation ( 2)) to predict the AOI, a critical factor in improving SAT prediction skill.We performed leave-one-out, leave-three-out, and leave-five-out cross-validation.The results showed prediction skills of 0.55, 0.57, and 0.53, respectively, for the period 1979/80-2020/21.For the retrospective period from 1993/94 to 2016/17, the prediction skills were 0.39, 0.44, and 0.38, respectively.These results not only demonstrate the empirical model's stability but also indicate higher prediction skills compared to the dynamical modelpredicted AOI, approaching a 95% significance level.Additionally, the empirical model-predicted AOI represents about 65% of the standard deviation of the observed AOI, demonstrating a relatively higher amplitude compared to the dynamical model-predicted AOI (32%).
When both the correction of AOI and the correction of the AO impact on SAT pattern are applied, as specified in equation ( 1) where AOI * = AOI MLR , significant improvements in prediction skill were observed over North America and Eurasia (figures 3(d) and (g)).

Summary and discussion
We found that the dynamical model's low SAT prediction skill in mid-latitude landmass regions can be attributed to an inadequate representation of the relationship between AO and SAT.We replaced the inadequate simulated impact of AO on SAT pattern with that observed.This post-processing significantly enhanced the model's mid-latitude SAT prediction skill in North America and Eurasia, as shown in figure 4. Replacing the AOI * with the observed AOI, i.e.AOI * = AOI O , allowed us to achieve the concept of potential predictability, which represents the maximum possible improvement in SAT prediction skills.This led to a significant increase in actual predictability from an initial 0.23 and 0.06 to 0.32 and 0.34 for the North America and Eurasia, respectively.We addressed this using an empirical model to correct the AOI, based on three precursors: summer SST, autumn SIC, and autumn SCE.Corrections using the AOI predicted by the empirical model, i.e.AOI * = AOI MLR , significantly improved prediction skill in both regions to respectively 0.28 and 0.30, closely approaching maximum potential predictability.
In general, dynamical models show less prediction skill over land than over oceans because of complex terrestrial interactions and feedback mechanisms that are challenging to simulate accurately.This underlying difficulty highlights the broader challenge of enhancing forecast accuracy in regions, such as North America and Eurasia.According to Jung et al (2020), these areas demonstrate low prediction skill across multiple seasonal forecast models within a multi-model ensemble, indicating that significant improvements in ensemble forecasting for these regions may be inherently difficult to achieve.We used 31 ensemble members of the CESM2 model to study the impact of ensemble size on forecasting.We divided the regions into four categories (Solid, Promising, Limited, and Hopeless) based on the number of ensembles required for convergence and the ACC.Convergence with fewer than half the total number of ensembles is denoted as ENS−, and with more than half as ENS+.If the converged ACC reaches a significance level of 95% or above, it is indicated as ACC+, and if below, as ACC− (see table 1S).These criteria help us effectively analyze the impact of ensemble forecasting.The regions of North America and Eurasia fall into the 'Hopeless' and 'Limited' categories, indicating that ensemble forecasting has minimal impact on the prediction skill of SAT in these areas (figure 5).By correction using AOI and observed AO impact pattern, the method proposed in this study significantly enhances prediction skill in regions where ensemble seasonal prediction is not effective.Our findings highlight the importance of the proposed method in overcoming the challenges associated with increasing the number of ensemble members.
However, the prediction skill of the empirical model developed for correcting the AOI for use in actual seasonal forecasts may vary depending on the analysis period.Additionally, the spatial pattern of the AO is subject to fluctuations influenced by climatic factors (Chen et al 2013(Chen et al , 2020)).This is especially true because it is known that the AO is influenced by long-term variabilities such as the Pacific Decadal Oscillation (PDO) and North Pacific Gyre Oscillation (Kang et al 2014), highlighting the importance of understanding the interplay between these variabilities and the AO.Therefore, future studies should focus on this relationship.Furthermore, changes in the AO patterns and their impacts are expected in warm climates (Hamouda et al 2021), and further studies are also required to investigate these factors.

Figure 1 .
Figure 1.Impacts of AO on SAT from the observation and dynamical model.(a) Linear regression of observed AOI onto JRA55 SAT anomalies (Reg(AOIo, SATo)), and (b) linear regression of simulated AOI onto SAT anomalies from CESM2 retrospective prediction (Reg(AOI h , SAT h )) from December to February of 1993/1994-2016/2017.Black and gray dots indicate regions where the results are statistically significant at p < 0.05 and meet the false discovery rate (FDR) criteria, respectively.

Figure 2 .
Figure 2. Time series of the observed AO index (AOI), and AOI predicted by dynamical and empirical models.The black dashed line represents the observed AOI, and the red solid line indicates the empirical model-predicted AOI (AOIMLR) using a leave-one-out cross-validation approach for the boreal winter from 1979/1980-2020/2021.The blue solid line indicates the dynamical model-predicted AOI (AOI f ) for the boreal winter from 1993/1994-2016/2017.

Figure 3 .
Figure 3.The effect of AO-SAT impact correction and AOI correction on SAT prediction skill.SAT anomaly correlation coefficient (ACC) of the dynamical model (a) without correction and (b, c, d) with correction.SAT ACC with AO-SAT impact correction applied using (b) observed AOI (AOIo), (c) dynamical model-predicted AOI (AOI f ), and (d) empirical model-predicted AOI (AOIMLR).Corrections were validated through k-fold cross-validation.(e)-(g) The difference in SAT ACC between corrected and uncorrected.Black and gray dots indicate regions where the results are statistically significant at p < 0.05 and meet the False Discovery Rate (FDR) criteria, respectively.

Figure 4 .
Figure 4. Comparison of Eurasian and North American SAT prediction skills based on AO correction.The regional average of SAT ACC in (a) Eurasia and (b) North America, comparing potential and actual predictability with AO-SAT impact correction using dynamical model-predicted and empirical model-predicted AOI.

Figure 5 .
Figure 5. SAT prediction skill changes with increasing ensemble size.Improvement in prediction skill of surface air temperature (SAT) with the growth in ensemble members during December-February for 1993/1994-2016/2017.

Table 1 .
Summary of CESM2 retrospective prediction experiment.