Application of autoencoders artificial neural network and principal component analysis for pattern extraction and spatial regionalization of global temperature data

Spatial regionalization is instrumental in simplifying the spatial complexity of the climate system. To identify regions of significant climate variability, pattern extraction is often required prior to spatial regionalization with a clustering algorithm. In this study, the autoencoder (AE) artificial neural network was applied to extract the inherent patterns of global temperature data (from 1901 to 2021). Subsequently, Fuzzy C-means clustering was applied to the extracted patterns to classify the global temperature regions. Our analysis involved comparing AE-based and principal component analysis (PCA)-based clustering results to assess consistency. We determined the number of clusters by examining the average percentage decrease in Fuzzy Partition Coefficient (FPC) and its 95% confidence interval, seeking a balance between obtaining a high FPC and avoiding over-segmentation. This approach suggested that for a more general model, four clusters is reasonable. The Adjusted Rand Index between the AE-based and PCA-based clusters is 0.75, indicating that the AE-based and PCA-based clusters have considerable overlap. The observed difference between the AE-based clusters and PCA-based clusters is suggested to be associated with AE’s capability to learn and extract complex non-linear patterns, and this attribute, for example, enabled the clustering algorithm to accurately detect the Himalayas region as the ‘third pole’ with similar temperature characteristics as the polar regions. Finally, when the analysis period is divided into two (1901–1960 and 1961–2021), the Adjusted Rand Index between the two clusters is 0.96 which suggests that historical climate change has not significantly affected the defined temperature regions over the two periods. In essence, this study indicates both AE’s potential to enhance our understanding of climate variability and reveals the stability of the historical temperature regions.


Introduction
The climate system is inherently complex, influenced by a multitude of factors that interact in intricate ways [1].This complexity arises from the interplay of various components, including the atmosphere, oceans, land surfaces, and ice sheets [1,2].Different regions on Earth experience varied climates due to their geographical location, altitude, proximity to large water bodies, and ocean currents.Solar radiation is the primary driver of Earth's climate.The amount of sunlight received by different regions varies due to the Earth's tilt and orbit.This uneven distribution of solar energy leads to temperature differences, driving atmospheric circulation patterns [3].Atmospheric circulation, in turn, redistributes heat and moisture around the planet, influencing regional climates [4].
Given the spatial heterogeneity of the climate system, there is a need to spatially simplify it by grouping regions with similar climate characteristics, allowing for a more focused analysis [5].Climate classification systems, such as the Köppen-Geiger system, help in understanding and categorizing the vast array of climates found around the world [6].However, [7] noted that Koppen never intended for his classification to be used as a finished product, and he made several revisions aiming to reconcile his classification with physical theory.Indeed, classifying the climate system is challenging due to its non-linear behavior and the myriad of factors influencing it [8].In that respect, traditional classification systems may not capture the nuances of transitional zones, regions with complex non-linear behavior, or the impacts of rapid climate change [9].Moreover, climate data can be noisy due to several reasons, including measurement errors and other external influences that can affect climate measurements but are not directly related to the natural climate system such as instrumentation changes, data processing methods, among others [10].Understanding and disentangling the signal, i.e. the genuine climatic patterns and trends that reflect natural climate variability and long-term changes, from the noise-which includes measurement errors, instrumentation changes, and other external non-climatic factors-is crucial for accurate climate classifications and analysis.
Principal Component Analysis (PCA) has been employed to denoise climate data and capture the main variability in the data [5].However, its limitation lies in the inability to extract non-linear relationships, which are often present in climate datasets [5,11].Alternatively, autoencoders (AE), a type of artificial neural network that can capture non-linear relationships, show promise in extracting the crucial patterns in data and denoising data [12].There are very few studies that have applied AE in extracting climate patterns.Among them, [13] applied AE to extract features from large climatological time series data.Saenz et al [14] applied AE to encode temperature field datasets from pre-industrial control runs in the CMIP5 first ensemble.Tibau et al [15] explored AE as a tool to extract features and patterns from complex high-dimensional climate data without the need for labels, and [16] explored the use of AE to extract latent patterns in climate variables, emphasizing their utility as noise reduction tools.Hence this study adds to the growing body of literature on the potential of AE in climate science, in view of its application to pre-process climate data prior to applying a clustering algorithm, such as K-means.
While K-means clustering has been applied to cluster climate data [17], its limitation arises due to the continuous nature of climate data [18].Fuzzy C-means on the other hand offer a more flexible approach, allowing data points to belong to multiple clusters, potentially resulting in a more detailed classification [17][18][19][20].The major focus of this work is to apply Fuzzy C-means to cluster the extracted global temperature patterns from AE.We also investigate the consistency of temperature regions classified by the combination of AE or PCA with K-means or Fuzzy C-means.Through a comparative analysis, we seek to evaluate the sensitivity of the classification to (i) the method used to extract the temperature patterns; (ii) the clustering method; (iii) the different architecture of the combination of Fuzzy C-means and AE, and (iv) the impact of historical drift in the global climate system on the temperature regions.

Data and methods
Gridded global temperature data was obtained from the Climate Research Unit (CRU) [21] at a monthly temporal resolution from 1901 to 2021.Prior to its application, the CRU global temperature data was also evaluated by comparing it to the Berkeley Earth (BE) [22] temperature data during the analysis period.Figure 1 shows the 10 yr moving average of the global mean temperature anomaly data from CRU and BE. Figure 1 shows that the temporal variability of CRU and BE temperature anomalies are consistent.The correlation coefficient between the monthly values from 1901-2021 is 0.84, the coefficient of determination is 70% and the mean absolute error is 0.27.The difference between the two data sets can be attributed to the different methods used in creating them.Nonetheless, the evaluation indicates that both data sets are generally consistent supporting the further use of CRU in this study.
PCA and AE were applied comparatively in pre-processing the temperature data before the application of a clustering algorithm.PCA is a statistical procedure that transforms a high-dimensional dataset into a lower-dimensional space, retaining as much variance as possible from the original data [5].For our dataset, PCA is employed to retain only those principal components that explain 95% of the variance.The rationale behind this is to capture the most significant patterns in the data while discarding noise or redundant information.
AE, on the other hand, is a type of neural network architecture designed for unsupervised learning [23].AE works by compressing the input into a compact latent space representation using the encoder and then reconstructing the original input from this representation using the decoder.In this study, the primary aim of utilizing AE is data compression, providing a different dimensionality reduction approach compared to PCA. Figure 2 shows the flow chart in applying AE and PCA for pattern extraction and regionalization of the global temperature data.
Prior to defining the architecture of the AE, the temperature data was normalized to a [0, 1] range, ensuring uniformity vital for neural network training.The architecture considered input/output neurons  reflecting the temperature data dimensionality, with a chosen hidden layer of 128 neurons.The 128 neurons were objectively determined by splitting the data into training and validation sets.The AE model is trained with varying neuron counts in the hidden layer, ranging from 2 to 256 neurons.After training with each neuron count, the model's reconstruction capability is assessed by computing the error between the original validation data and the reconstructed data.The optimal number of neurons is the one that minimizes the reconstruction error without excessively increasing the model's complexity and tendency to overfit.In this study, 128 neurons were found to be sufficient.
The encoder used the rectified linear unit activation function [24], which introduces non-linearity in the model, while the decoder adopted sigmoid, ensuring outputs between [0, 1], with the Adam optimizer [25] and mean square error as the loss function.The AE was trained for a maximum of 50 epochs while halting training when there is no decrease in reconstruction validation error after five consecutive epochs (i.e. a form of early stopping).This resulted in using 11 epochs for the model training.This trained AE was applied to reduce the global temperature data's dimensions, preserving crucial patterns for the regionalization task.
The Fuzzy C-means clustering algorithm [26] was applied to the encoded temperature data.Unlike K-means clustering, which assigns each data point to only one cluster, Fuzzy C-means assigns each data point a degree of membership to each cluster.This allows for a more holistic representation of data points that might inherently belong to more than one cluster.The Fuzzy C-means clustering process was initiated with a predefined number of clusters.The fuzziness coefficient was set to 2, which is a commonly used standard in fuzzy clustering, to provide a balance that allows for some overlap between clusters while maintaining distinct groupings.Clusters were iteratively adjusted until either a maximum of 1000 iterations was reached or the change in membership values between iterations was below 0.005.The membership degree lies in the interval [0, 1].
To determine the optimal number of clusters, we performed Fuzzy C-means clustering for a range of cluster numbers and calculated the Fuzzy Partition Coefficient (FPC) for each.FPC provides a measure of how cleanly the data is divided into fuzzy clusters, with values closer to 1 indicating better partitioning.We applied bootstrapping with replacement to calculate the 95% confidence interval of the FPC decrease.We iterated this process, each time randomly sampling from our data with replacement, then computing and recording the FPC.After repeating this process many times, we used the empirical distribution of FPC values to construct a 95% confidence interval.The optimal cluster number was chosen based on the strongest FPC decrease within the 95% confidence interval, as this signifies the best trade-off between the precision of the clustering (reflected in the FPC) and the granularity (the further details provided by the number of clusters).This decision is also consistent with the Elbow Method and the idea is to choose a number of clusters so that adding another cluster does not significantly improve the total variance.We subjectively evaluated threshold values of 0.5, 0.6, and 0.7 for defining fuzzy membership, and investigated how adjusting the threshold influences the granularity of the clustering.
Finally, to quantitatively assess the congruence between two data clustering, we employed the Adjusted Rand Index [27], which rectifies the Rand Index by ensuring that random cluster assignments yield an Adjusted Rand Index near 0. The Adjusted Rand Index is a metric that quantifies the degree of similarity between two distinct sets of clusters.This measure is particularly useful in evaluating the consistency of cluster assignments between two different methods or the same method with different parameters.The Adjusted Rand Index value ranges between 0 and 1.A score of 1 indicates perfect congruence between the two clustering solutions, whereas a score of 0 suggests absolute discordance.

Results
Figure 3(a) illustrates that the FPC decreases as the number of clusters increases.There is a noticeable drop from 2 clusters to 3 clusters, and then another significant drop from 3 to 4. Beyond 4 clusters, the FPC decrease becomes smaller and relatively incremental.From figure 3(b), cluster 5 marks the first cluster with the lowest percentage FPC decrease.This suggests that 4 clusters are optimal for a more general model.Hence, we considered 4 clusters but also examined the granularity associated with 5 and 6 clusters.The exploration of 5 and 6 clusters was undertaken to assess the added granularity and its impact on the model's interpretability.However, beyond 5 clusters, the additional complexity does not yield proportional benefits in terms of clarity or understanding of climate zoning.Thus, we limited our analysis to a maximum of 5 clusters.
Figure 4 shows the temperature regions by applying K-means clustering to the PCA-based patterns and the encoded patterns from AE, respectively.Both methods produced consistent regions.The Adjusted Rand Index between the two clusters in figure 4 is 0.75, indicating significant overlap.From figure 4, The K-means clustering applied to the processed temperature data using AE and PCA, respectively, roughly regionalized global temperature into the polar regions (yellow color), the temperate regions (blue color), the tropical regions (green color), and the subtropical regions (red color).The polar regions are characterized by extremely low temperatures year-round, with long, severe winters and short, cool summers.Ice and snow cover is predominant.The temperate regions experience moderate temperatures with distinct seasonal changes.Summers are warm to hot, and winters are cool to cold, with varied precipitation patterns.The tropical regions are known for their consistently high temperatures and humidity.Rainfall is often abundant, and these regions do not experience traditional winter and summer seasons.The subtropical regions feature hot summers and mild winters, often with high humidity.Rainfall can be seasonal, with some areas experiencing a wet-dry climate pattern.
Despite the similarity of the patterns in figure 4, there is also observable discordance between the clustering from the PCA and AE patterns.For example, over the southwestern part of southern Africa, which typically has the Mediterranean type of climate (also commonly considered a temperate zone), the spatial extent clustered in the temperate region (blue color in figure 4) is higher in the AE-based clusters compared to the PCA-based clusters.Similarly, the clustering from AE patterns, compared to PCA-based clusters (figure 4(a)), correctly identified the spatial extent of high-elevation south-central Eurasia regions (i.e. the Himalayas), which share similar climatic conditions with those found in polar regions [28] (figure 4(b)).
The differences between the AE-based clusters and the PCA-based clusters might be attributed to the capability of AE to extract complex and non-linear patterns from the global temperature data, which might  not be properly characterized by linear PCA.The domain knowledge highlights that the clusters from the AE-based patterns are more representative than the clusters from the PCA-based patterns.Hence in subsequent analysis, we will focus on the AE-based clusters.Also, despite its seemingly satisfactory performance in figure 4, K-means does not have the flexibility to consider the possibility of a region exhibiting the characteristics of more than one climate type, which is not unusual in climatic regions.Hence, we will further explore the Fuzzy C-means applied to the encoded AE patterns.
Figure 5 shows the four fuzzy clusters obtained by the application of Fuzzy C means to the encoded patterns.The possibility of a specific region having the characteristics of multiple climates is evident in figure 5.For example, the high-elevation region of south-central Eurasia can exhibit both the characteristics of the polar, temperate, and subtropical climates, based on their membership in clusters 1, 2, and 4 (figure 5).The same argument also holds for some southern parts of South America, for clusters 1 and 2. From figure 6, analyzing five fuzzy clusters indeed increases the granularity, allowing for the further segmentation of the tropical climate under 4 fuzzy clusters (i.e.cluster 3 in figure 5) into the tropical and desert climate (i.e.cluster 1 and cluster 3 in figure 6).Keeping six clusters or more leads to further segmentation but at the expense of the FPC value (cf figure 1), which might impact the overall quality of the classification and the generalizability.Moreover, between the 4 and 5-cluster solutions, the 4-cluster solution has a lower Partition Entropy than the 5-cluster solution, suggesting clearer cluster assignments than the 5-cluster solution.The 4-cluster solution also has a higher Partition Coefficient, indicating more distinct cluster assignments compared to the 5-cluster solution.
A challenge in characterizing fuzzy clusters is the decision of the threshold used to define cluster membership.So far, we have subjectively used 0.5 as only 4.64% of the grids were unclassified using a threshold of 0.5, for four clusters.The threshold of 0.6 implies that 15.08% of the grids will have no dominant cluster, for 0.7 it is 26.86%, and for 0.9 it is 69.52%.Thus, as the threshold increases the number of unclassified grid points increases as well.In the same respect, using a threshold of 0.5, 4 clusters imply that 4.64% of the grids were unclassified, keeping 6 clusters implies that 5.85% of the grids were unclassified and 6 clusters imply that 8.25% of the grids were unclassified.So, for a given threshold, increasing the number of clusters decreases the number of classified grids, as can be ascertained from figure 7. Since we use a threshold of 0.5 to determine the dominant cluster of a point/grid, if none of the cluster memberships exceed this threshold, we record that there is 'no dominant cluster' for that point.With an increasing number of clusters, and the consequent spreading out of membership values, it becomes more common for points to fail to meet this threshold for any cluster.Figure 7 shows that for 4 Clusters the membership values are more concentrated around higher values, especially closer to 1.This means that many grid points have a high membership to a particular cluster, making it less likely for them to fall into the 'no dominant cluster' category when using a threshold of 0.5.For 5 Clusters, the membership values start to spread out more as the number of clusters increases.This spreading out means that it is more common for points to have lower membership values across all clusters, making it harder for them to meet the 0.5 thresholds.For 6 clusters, the spread continues to increase, with many more grid points having membership value below 0.5.Hence, as the number of clusters increases, the likelihood of grid points not meeting the threshold for any cluster also increases.
Finally, we investigated the impact of historical climate change on the climate regions.We divided the analysis period (1901-2021) into two periods: 1901-1960 and 1961-2021, applied AE to extract the most important patterns for each period, and applied the Fuzzy C-means to the output for each period.Figure 8 shows the output of the fuzzy clustering, by assigning a grid to the cluster it has the dominant membership values, to enable visual comparisons.The clusters in the two periods are very comparable with an Adjusted Rand Index of 0.96.

Discussion and conclusions
Despite the usefulness of climate regionalization in reducing the complexity and high spatial heterogeneity of climate variables, the accuracy of the classification is dependent on the classification algorithm [18].Moreover, the inherent patterns of climate data are often obscured by noise [5].Therefore, denoising the data prior to the application of a clustering algorithm is often beneficial.In this study, we have evaluated the performance of AE and PCA in denoising global temperature data alongside extracting the most important inherent global temperature patterns.
AE is a type of artificial neural network that has the flexibility to be tuned to capture different extents of non-linearities and complexities associated with (climate) data [23].PCA, on the other hand, works best in capturing only linear relationships in the data [29].We found that the application of Fuzzy C-means to the AE-based patterns and the PCA-based pattern resulted in generally consistent temperature regions with an Adjusted Rand Index of 0.75.However, the clusters derived from the AE-based patterns appeared to have come with some improved accuracy in regionalizing global temperature.The high-elevation regions of south-central Eurasia (i.e. the Himalayas) are characterized by a complex climate [30].The Himalayas, owing to their substantial elevation, host ice caps and exhibit a polar climate at their peaks.This resemblance in glacial and climatic characteristics to the Arctic and Antarctic regions has led to Himalayas' frequent designation as the 'third pole' .This comparison extends beyond mere climatic similarities, acknowledging the Himalayas' significant role in global climatic patterns and their extensive ice reserves, akin to the Earth's polar regions [31][32][33][34].The AE-based clusters correctly identified the Himalayas region as having similar temperature characteristics with the Polar regions compared to the PCA-based clusters that barely captured this characteristic.This was promising and supports studies that documented the potential of AE in climate science [12,14,15].Hence using AE to extract the crucial patterns in climate data should be prioritized over PCA.
Climate regions do not have step boundaries; this underscores the need for the application of fuzzy clustering algorithms [35].The Fuzzy C-means applied in this study to the extracted patterns with AE correctly identified that the Himalayas region, for example, has the characteristics of a subtropical and temperate climate due to its location at the low latitudes but dominantly has the characteristics of the polar climate due to its high elevation.Hence, we conclude that despite the simplicity offered by using the combination of PCA and hard clustering algorithms, a Fuzzy clustering algorithm applied to patterns extracted with a technique that can capture linear, non-linear, and complex relationships is beneficial for the accuracy of climate classifications.
Finally, the temperature regions classified during 1901-1960 matched with the regions classified during 1961-2021 with an Adjusted Rand Index of 0.96, suggesting that historical climate change between these two periods has barely affected the spatial nature of the temperature regions.This could be due to key climatic features defining the zones, such as persistent geographical features, ocean currents, or prevailing wind patterns that do not rapidly change.Moreover, while there have been significant global temperature changes, these changes may not have been evenly distributed to cause changes in the climate zones during the analysis period considered.This implies that although climate change has been significant over the last century, its impact on climate zoning might not be as pronounced or immediate.
In future studies, we will combine several climate variables such as precipitation, temperature, wind, etc, and perform a more detailed clustering of the global climate zones using AE and Fuzzy C-means.Also, we will examine the representation of the climate zones in CMIP6 climate models in the historical and future climate scenarios.

Figure 1 .
Figure 1.Ten year moving average of annual mean temperature anomalies from CRU and BE.Anomaly was calculated with respect to the 1901-2000 period.

Figure 2 .
Figure 2. Flow chart of the steps in applying AE and PCA for pattern extraction and regionalization of the global temperature data.

Figure 3 .
Figure 3. Relationship between the number of clusters and Fuzzy Partition Coefficient (a) and the percentage decrease in the Fuzzy Partition Coefficient (b).

Figure 4 .
Figure 4. Clustering of global temperature data by the application of K-means to PCA-based patterns (a) and autoencoder-based patterns (b).The number of clusters is 4.

Figure 5 .
Figure 5. Clustering of global temperature data by the application of Fuzzy C-means to autoencoder-based patterns for 4 clusters.The threshold value for deciding group membership is 0.5.

Figure 6 .
Figure 6.Clustering of global temperature data by the application of Fuzzy C-means to autoencoder-based patterns for 5 clusters.The threshold value for deciding group membership is 0.5.

Figure 7 .
Figure 7. Relationship between membership values for group membership and the number of grid points grouped under a given class for different cluster numbers.

Figure 8 .
Figure 8. Clustering of global temperature data by the application of Fuzzy C-means to autoencoder-based patterns for 4 clusters during 1901-1960 and 1961-2021.A grid is assigned to the cluster with the dominant cluster membership value.