Redefining the North Atlantic Oscillation index generation using autoencoder neural network

Understanding the spatial patterns of the North Atlantic Oscillation (NAO) is vital for climate science. For this reason, empirical orthogonal function (EOF) analysis is commonly applied to sea-level pressure (SLP) anomaly data in the North Atlantic region. This study evaluated the traditional EOF-based definition of the NAO index against the autoencoder (AE) neural network-based definition, using the Hurrell NAO Index (Station-Based) as a reference. Specifically, EOF and AE were applied to monthly SLP anomaly data from ERA5 (1950–2022) to derive spatial modes of variability in the North Atlantic region. Both methods produced spatial patterns consistent with the traditional NAO definition, with dipole centers of action between the Icelandic Low and the Azores High. During boreal winter (December to March), when the NAO is most active, the AE-based method achieved a correlation of 0.96 with the reference NAO index, outperforming the EOF-based method’s correlation of 0.90. The all-season Adjusted R-squared values were 50% for the AE-based index and 34% for the EOF-based index. Notably, the AE-based index revealed several other non-linear patterns of the NAO, with more than one encoded pattern correlating at least 0.90 with the reference NAO index during boreal winter. These results not only demonstrate the AE’s superiority over traditional EOF in representing the station-based index but also uncover previously unexplored complexities in the NAO that are close to the reference temporal pattern. This suggests that AE offers a promising approach for defining climate modes of variability, potentially capturing intricacies that traditional linear methods like EOF might miss.


Introduction
The North Atlantic Oscillation (NAO) is a climatic phenomenon in the North Atlantic region that represents fluctuations in the difference in atmospheric sea-level pressure (SLP) between the Icelandic Low and the Azores High [1].These two pressure centers are semi-permanent atmospheric features over the North Atlantic region, and their relative strength and position significantly influence weather patterns in the region [2].During the positive phase of the NAO, there is a strengthening of the Icelandic Low and the Azores High, leading to increased westerlies across the North Atlantic [1,2].This typically results in milder and wetter conditions in Northern Europe and colder conditions in Southern Europe and the Mediterranean.Conversely, during the negative phase of the NAO, the pressure difference between the Icelandic Low and the Azores High is reduced.This leads to weakened westerlies, resulting in colder and drier conditions in Northern Europe and milder, wetter conditions in Southern Europe and the Mediterranean.
Several studies have indicated that the NAO has a significant impact on weather and climate in the North Atlantic region, affecting temperature, precipitation, and storm patterns [3][4][5][6].These variations can have substantial effects on agriculture, energy demand, water resources, and ecosystems in the affected regions [7].Indeed, the NAO is a critical climate pattern that plays a vital role in shaping weather and climate in the North Atlantic region and beyond [7][8][9].Its different phases have distinct impacts, and enhancing our understanding and characterization of the NAO is essential for meteorology, climate science, and various practical applications.
The NAO exhibits variations across different seasons, with its effects often most pronounced during the boreal winter months [1].The significant impact of the NAO on regional climates around the globe makes its optimal measurement, predictability, and characterization crucial [10].On that note, several studies have engaged in examining the representation of the NAO in climate models [11][12][13].The NAO index quantifies the state of the NAO at a given time and is typically calculated using the difference in SLP between specific stations in the Azores and Iceland [1].Other methods, such as empirical orthogonal function (EOF) analysis of SLP over the North Atlantic region, are also often used [14].
While the station-based NAO measurement might serve as a reference [15,16], it is essential to recognize that climate modes of variability are complex, and their centers of action might change with time and the state of the atmosphere at a given moment.Therefore, EOF analysis often serves as a multivariate tool that might be useful in exploring the complexities of the NAO and other climate modes [14,[17][18][19][20].However, despite its usefulness in reducing data dimensionality and extracting patterns that explain the most variance, EOF analysis can be limited as it is a linear technique, and any non-linearity in the NAO mode might not be represented.This limitation is significant, as non-linearities can provide deeper insights into the underlying dynamics of the NAO.
Building on this understanding, the aim of this study is to evaluate the EOF-based NAO index against the autoencoder (AE) neural network based NAO index.Unlike EOF, AE has the flexibility to model non-linear relationships [21].Therefore, it is worth investigating AE's performance in producing the station-based NAO index compared to the EOF-based method.

Data
SLP was obtained from ERA5 reanalysis [22], at a monthly temporal resolution, spanning from 1950 to 2022.The horizontal resolution of the data is 0.25 • in both longitude and latitude.Anomalies were calculated by subtracting the long-term mean of each month from the corresponding monthly SLP values, a common practice in climate studies to remove the effect of intra-seasonal variabilities that might obscure the representation of the NAO mode.The study domain encompasses the North Atlantic basin, specifically 20 • -80 • N and 90 • W-40 • E, capturing the Azores High and the Icelandic Low, which are essential for defining the NAO [1,15].The station-based monthly NAO index used as the reference was obtained from the Hurrell NAO Index (Station-Based) for the same period, 1950-2022 [15,23].

Methods
EOF analysis is a widely used method in climate science for denoising climate data and maximizing variance [3,6,23], making it a conventional multivariate tool for extracting climate modes of variability such as the NAO.Conventionally, the EOF-based NAO index is defined as the time series of the leading EOF of SLP anomalies over the North Atlantic region (20 • -80 • N, 90 • W-40 • E) [23].In this study, EOF was applied to the SLP anomaly field over the North Atlantic region to define the EOF-based index.Singular value decomposition was used to derive the spatial patterns (i.e.eigenvectors or EOFs) and the corresponding time series.To transform the EOFs into correlations between the time series (scores) and the standardized SLP data, they were multiplied by the square root of the corresponding eigenvalues.This step enhances the interpretability of the EOFs by scaling them in terms of correlations.
AE is a neural network architecture designed for unsupervised learning.It functions similarly to EOF in terms of reducing the dimensionality of data and extracting the most crucial patterns [21].Unlike EOF, AE is capable of extracting complex and non-linear patterns.In this study, AE was applied to compress the SLP anomaly data into a compact form using the encoder.Specifically, during training, the AE learns to reconstruct the input SLP data from the encoded representations using the decoder.The reconstructed patterns are compared with the original input to compute the loss (such as mean squared error), which is then minimized during training.This process helps the model learn to capture the essential patterns of the data.Figure 1 presents the schematic of the AE model employed in this study, along with its workflow for deriving the NAO's spatiotemporal patterns.
The SLP field was split into the training set accounting for 80% of the data from 1950 to 2022 and the test set, used to evaluate the generalizability of the model, accounting for the remaining 20% of the SLP data.The validation set used for hyperparameter tuning accounts for 20% of the training set.
The validation set is used to tune the model's hyperparameters and assess its performance on a subset of the training data that the model hasn't seen during the training phase.This helps in optimizing the model.The test phase involves evaluating the model on completely unseen data (the test set).Moreover, by comparing the performance of the model on the test data with its performance on the validation data, the generalizability of the model is assessed.
The training set, which does not overlap with the validation and test set was normalized between [0, 1] using the MinMaxScaler from the sklearn.preprocessingmodule in Python [24].The scaling parameters (Min and Max) from the training set were also applied to the validation and test set, ensuring no data leakage.The normalization brings about uniformity, aiding in faster convergence during the neural network training.Input and output neurons were matched to the data's dimension , while hidden neurons were chosen through experimentation to balance the representation of crucial SLP patterns and model complexity (see appendix 1 for more detailed explanations).Various configurations of neurons (i.e. 2, 4, 8, 16, 32, 64, 128, 256…) and epochs (with maximum epoch number set to 50) were tested, aiming to minimize reconstruction error (see appendix A1 for details).In deciding the optimal number of epochs, the early stopping method was used.Thus, training is set to halt if validation loss does not decrease after 5 consecutive epochs to prevent overfitting.The 128-neuron configuration at 19 epochs was selected for efficiency and simplicity.
The encoder used the rectified linear unit function to introduce non-linearity in the model [25], and the decoder used a sigmoid function for outputs between [0, 1].The AE was trained with the Adam optimizer due to its adaptive learning rates and efficiency [26].The compiled architectures trained the AE, and the encoder reduced the dimension of the SLP anomaly data while preserving essential patterns in the North Atlantic for NAO exploration.This part of the network where the data is at its most compressed form is referred to as the bottleneck in figure 1 For evaluating the performance of the EOF and AE methods in reproducing the reference station-based NAO index, Pearson's correlation and adjusted R-squared were applied.These metrics were chosen to assess the relative closeness of the EOF-based index and the AE-based index to the reference, providing a quantitative measure of how well each method captured the underlying patterns of the NAO.

Results and discussion
The encoded patterns were matched to the station-based NAO index to identify those patterns that closely represent the NAO's spatial structure.For this reason, all patterns with an all-season correlation coefficient of at least 0.75 were retained for further analysis.In contrast, the first ten EOFs, explaining about 93.5% of the monthly SLP data's variance, were also retained.Figure 2 show the AE-based and EOF-based patterns over the North Atlantic sector, based on these criteria.
Both methods suggest the possible existence of several dipole SLP patterns in the North Atlantic region, in addition to the traditional NAO pattern.These dipole patterns, characterized by opposing centers of action, have been documented in the literature.Examples include the Atlantic Meridional mode [27] and the Atlantic tripole mode [28], among others [29].The spatial pattern of Node 3 exhibits distinct patterns of anomalies across various regions of the Atlantic.Specifically, there is a pronounced positive anomaly over the northwestern part of the Atlantic.This is accompanied by a relatively milder positive anomaly that can be observed over the region of the Azores high.Conversely, a negative anomaly is evident over the area of the Iceland low.When analyzing the relationships between the encoded patterns and known climatic indices in the North Atlantic region, Node 3 shares a moderate correlation with the Atlantic Meridional Mode, exhibiting a correlation coefficient of approximately 0.45.This suggests that the variations in the Atlantic Meridional Mode have some level of association with the anomaly patterns observed in Node 3. Furthermore, the encoded spatial patterns in figure 2 bear significant correlations with the Arctic Oscillation.It is noteworthy to mention that the same patterns that have demonstrated correlations with the NAO also show corresponding correlations with both the Arctic Oscillation, and the Atlantic Meridional Mode (for Node 3).This underscores the intertwined nature of these climatic modes.
Since the focus of this study is the NAO, figure 3 explores the patterns that strongly contain the NAO signal.This was achieved by correlating the station-based NAO index with the time series of the spatial patterns shown in figure 2. Figures A1 and A2 contain the statistical significance of the correlations in figure 3 at a 95% confidence level.For the EOF-based patterns, EOF1, which explained 43.8% of the variance in the SLP anomaly field, had the highest correlation with the station-based NAO index (figure 3, bottom).This finding is consistent with prior applications of EOF in defining the NAO index [23].
In contrast, for the encoded patterns, Nodes 1, 3, 6, and 9 (i.e.specific components in the AE model) showed relatively higher correlations with the station-based NAO index (figure 3, top).These encoded patterns (Nodes 1, 3, 6, and 9) reveal the possible existence of other non-linear patterns of the NAO beyond the traditional NAO pattern, indicating changes in the centers of action between the Azores high and the Icelandic low.The spatial configuration of Node 1 bears the closest resemblance to the patterns typically associated with the NAO.This is most evident in the pronounced anomalies observed in both the Azores high and the Iceland low.In contrast, Node 3 displays a relatively weaker anomaly across the Icelandic low.Nodes 6 and 9 on the other hand, indicate zonal shifts in anomalous pressure systems over the Azores high, with a stronger anomaly on the east coast of North America under Node 9 but more centralized towards Western Europe/North Africa under Node 6.These variations from the traditional dipole patterns of the NAO are consequent upon the fact that the Azores high and Icelandic low are not static atmospheric features.Their dynamic nature leads to the diverse patterns observed within the NAO.Thus, unlike the EOF-based patterns, where EOF1 is the only pattern applicable in exploring the traditional NAO mode, the AE-based patterns provide a more holistic representation of the NAO's spatial patterns.
Figure 4 shows the dates when the patterns associated with Nodes 1, 3, 6, and 9 had the highest amplitudes during the analysis period.By visual examination of figure 4  modes, such as the El Niño Southern Oscillation (ENSO), where various patterns exist beyond the classical El Niño, as measured by the Niño 3.4 index [30][31][32].These findings could lead to improved understanding, modeling, and prediction of climate variability in the North Atlantic region.
The AE-based and principal component analysis (PCA)-based indices consistently had the highest correlations with the station-based NAO index from December to March, peaking in December and weakening in April (figure 3).This trend aligns with the fact that the NAO is most active during boreal winter [1], which the encoded patterns and the EOF patterns captured.During these months the NAO exerts its most substantial influence on weather patterns, characterized by a more robust and coherent signal [14].The contrast in SLP between the Azores High and the Icelandic Low becomes more pronounced in winter, driven by the increased temperature gradient between the polar and tropical regions.This leads to a strengthening of the westerly winds across the North Atlantic [1,14], influencing storm tracks and weather patterns over Europe and North America.
Understanding these complex interactions provides insights into the NAO's seasonal behavior and explains the high correlations observed with both PCA-based and AE-based methods during boreal winter.The ability to capture the non-linear NAO's seasonal patterns through the AE-based method has far-reaching implications for climate science and weather prediction.By revealing the NAO's spatial and temporal characteristics, the AE approach contributes to a deeper comprehension of the NAO's role in shaping weather patterns across the North Atlantic region.This knowledge can be leveraged to improve seasonal forecasts, particularly in regions heavily influenced by the NAO, such as Europe and the eastern United States.Furthermore, recognizing the NAO's non-linear patterns and seasonal variability enhances our ability to model and predict broader climate phenomena, including potential interactions with other climate modes like the ENSO.
In terms of the relative performance of the EOF-based and the AE-based methods, figure 3 shows that for all months, Nodes 1, 3, 6, and 9 that contain the NAO signal outperformed EOF1 based on the correlations with the station NAO index.This suggests that the AE-based method may be more adept at capturing the underlying spatial and temporal patterns of the NAO, possibly due to its ability to model non-linear relationships.
To achieve a more thorough analysis, the Hurrell EOF-based NAO index was also matched to the station-based index (figure 5).Indeed, by comparing correlations in figure 5, in terms of reproducing the station-based index, the EOF-based index in figure 5 has higher correlations compared to the index derived in this work by applying EOF analysis to ERA5 SLP data (figure 4, bottom).This discrepancy may be attributed to differences in data sources, methodologies, or spatial and temporal resolutions.However, despite the improvement, the correlation between the AE-based index and the station NAO index under Nodes 1, 3, 6, and 9 (figure 4, top) is higher than that of the Hurrell EOF-based index, except in February (as exemplified with Node 1, figure 5).
From figure 6, based on the station NAO index, the wavelet analysis indicated that the NAO is characterized by approximately 8 year periodicity (i.e.black contour inside the cone of influence).Also, another 2-3 year periodicity of the NAO is equally evident (figure 6).The 2-8 year periodicity of the NAO (i.e. the interannual and decadal variations) is consistent with existing literature [33,34].Interestingly Nodes 1, 6, and 9 have consistent results with the station-based NAO index in terms of the observed 2-8 year cycle of the NAO (figure 6).The EOF-based index derived in this study using ERA5 SLP captured the 8 year periodicity but not the 2-3 year periodicity (i.e.interannual variation).This was however improved in the Hurrell EOF-based index ('PC' in figure 6), which produced more consistent periodicity of the NAO.Again, the performance of the AE is quite promising in producing a consistent periodicity of the NAO as observed.Finally, figure 7 illustrates the time series of the annual mean December to March station-based, EOF-based, and AE-based NAO index.Both methods (AE and EOF) capture the inter-annual variability of the station-based NAO index, but the AE-based method exhibits a stronger correlation (0.96) with the reference NAO index compared to the EOF-based method (correlation = 0.90).The AE-based method also outperformed the Hurrell EOF-based index, which had a correlation of about 0.93.Moreover, the all-season (1950-2022) Adjusted R-squared, a measure of how well the model fits the observed data, is approximately 50% for the AE-based index and 34% for the EOF-based index.These findings highlight the potential of artificial neural networks, such as AEs, in enhancing our understanding of atmospheric processes [36].By providing a more accurate representation of climate modes like the NAO, these advanced neural network techniques can lead to improvements in weather prediction accuracy, more robust climate modeling, and more precise risk assessment in regions affected by the NAO.Thus, the findings herein contribute to the growing body of evidence supporting the use of machine learning techniques in climate science and offer promising avenues for future research and application.

Conclusion
The performance of EOF-based and AE-based methods was compared in this work, focusing on their ability to produce the station-based NAO index using SLP modes in the North Atlantic sector.Across the analysis period, the AE-based method consistently outperformed the EOF-based method, particularly during the boreal winter months (DJFM) when the NAO is most active.The correlation between the AE-based and station-based NAO index reached 0.96, compared to 0.90 for the EOF-based index.The all-season explained variance further supported the AE-based method's advantage, with ∼50% of the variance in the station-based NAO index explained by the AE-based index, compared to ∼34% for the EOF-based index.Also, the observed 2-8 year periodicity of the NAO was better represented in the AE-based index compared to the EOF-based index.
While EOF effectively identified the components explaining the most variance in the SLP anomaly data, the AE-based method's ability to model complex nonlinear relationships captured aspects of the NAO that linear methods might have missed.These findings have significant implications for our understanding of the NAO and its representation in climate models.The AE-based method's ability to capture nonlinearities and multiple spatial patterns offers a more complete view of the NAO, potentially enhancing weather prediction, climate modeling, and risk assessment.
The analysis presented is specific to the ERA5 SLP data and the period considered (1950-2022), as well as the architecture employed in constructing the AE.While the correlations with the reference NAO index were generally higher for the AE-based method, there may be room to further optimize the AE parameters.The findings, while promising, should be seen as a starting point.Extending the method to various other datasets, analysis periods, and climate modes will provide a more comprehensive evaluation.If subsequent studies consistently demonstrate the AE-based method's superiority, it may suggest that AE could emerge as a potent alternative to EOF analysis in climate science.Such a shift could unlock new insights and capabilities, enhancing our ability to understand, model, and respond to complex climate phenomena.

Figure 1 .
Figure 1.Schematic representation of the AE model and its application in deriving the spatiotemporal patterns of the NAO.
. It represents the learned encoded representations of the input data.The time series of the encoded spatial patterns were obtained by projecting the encoded spatial patterns onto the monthly SLP anomaly data in the North Atlantic sector.

Figure 2 .
Figure 2. Encoded patterns in the North Atlantic sector using the autoencoder (a) and the first-ten EOFs (b).Analysis was carried out with monthly SLP anomaly data from 1950-2022.

Figure 3 .
Figure 3. Pearson correlation between the monthly station-based NAO index and the monthly AE-based NAO index (a); and the monthly EOF-based NAO index (b).The correlation analysis includes monthly data from 1950 to 2022, i.e. 73 data values for each month.
, the variations in the centers of action of the pressure systems become evident.Positive NAO phases were captured by Node 1 in February 1990 and Node 3 in February 1997.As reflected in the encoded patterns in figure 2 under Nodes 1 and 3, the amplitude of the low-pressure system over Iceland was lower in February 1997, compared to February 1990.In the same respect, a Negative NAO phase was captured in Node 6 in March 2013 and in Node 9 in March 1962.Again, as depicted by the encoded patterns in figure 2 under Nodes 6 and 9, there are subtle zonal spatial variations in the pressure patterns over the Azores high from the composite maps in figure 4. In March 2013 the center of the action was more located toward Western Europe, while in March 1962 it was more situated on the east coast of North America.The AE learns to extract historically prominent patterns in the SLP data, and the subtle spatial differences in the NAO modes at a given time that are neglected by linear techniques might hold the key to the accurate forecasting of regional weather impacts of the NAO.The ability of AE to capture non-linear patterns has broader implications, potentially translating into a more comprehensive understanding of the NAO.Similar complexities have been observed in other climate

Figure 4 .
Figure 4. SLP maps of dates with the highest encoded temporal values for Nodes 1, 3, 6, and 9.

Figure 5 .
Figure 5. Monthly correlation between Node 1 and the reference NAO index, and between Hurrell EOF-based NAO index and the reference NAO index.The values on the bars are the correlation coefficients.The correlation analysis includes monthly data from 1950 to 2022, i.e. 73 data values for each month.

Figure 6 .
Figure 6.Local power spectrum of the NAO index from the AE-based method, station NAO index, EOF-based NAO index derived in this study (third row, right, named 'EOF'), and the Hurrell EOF-based index (fourth row, named 'PC').Wavelet analysis was conducted with the biwavelet package R-studio[35].The black contour shows 95% statistical significance.

Figure 7 .
Figure 7. December to March annual mean time series of Node 1, EOF1, and station-based NAO index from 1950 to 2022.The correlations in the graph are the annual mean December to March correlations with the station NAO index.The winter season is treated as a continuous period.For instance, the mean for 1951 is calculated using the values from December 1950 and January to March 1951.