Meta-learning-based estimation of the barrier layer thickness in the tropical Indian Ocean

Accurately estimating the barrier layer thickness (BLT) is crucial for enhancing our understanding of the ocean’s role in climate variability on both regional and global scales. Here, we propose a meta-learning-based ensemble model to estimate the BLT using satellite observations in the tropical Indian Ocean. The results show that the meta-learning-based ensemble model outperforms the three individual models in terms of spatial distribution and accuracy, with significantly reduced root mean square errors in the Southeast Arabian Sea, Bay of Bengal, and eastern equatorial Indian Ocean. Furthermore, we found that sea surface salinity plays the most significant role in the estimation of BLT, highlighting the dominant influence of salinity stratification. These preliminary results provide an insight into the feasibility of predicting the BLT using satellite observations and have implications for studying the upper ocean dynamics using machine learning techniques.


Introduction
The barrier layer is an intermediate layer between the base of density-defined mixed layer and the top of temperature-defined isothermal layer, resulting from stronger stratification in salinity than temperature (Lukas and Lindstrom 1991, Sprintall and Tomczak 1992, Vinayachandran et al 2002, Qu and Meyers 2005, Qu et al 2014).Variability of the barrier layer thickness (BLT) is believed to play a role in the global climate system through modulation of the mixed layer dynamics.Among other effects, its impact on sea surface temperature (SST) has been widely recognized by previous studies (Vialard andDelecluse 1998, Masson et al 2005).Changes in SST may further influence local air-sea interaction and directly contribute to the variability of precipitation, evolution of tropical cyclones, and large-scale modes of climate variability, such as the Indian Ocean Dipole (IOD) and the El Niño and Southern Oscillation (ENSO) (Maes et al 2002, Balaguru et al 2012, Qiu et al 2012, Ivanova et al 2021).
Most previous studies on the BLT were based either on limited in-situ observations or results from numerical models.However, both these traditional approaches have limitations.In most cases, the observationbased BLT lacks sufficient resolution to characterize its variability in both time and space due to scarce in-situ observations, while numerical models are usually time consuming and computationally expensive.The rapid advance in remote sensing technology has significantly revolutionized many aspects of ocean observations by providing temporally continuous and spatially extensive samplings of the sea surface, including SST, sea surface salinity (SSS), sea surface height (SSH) and sea surface wind (SSW).To make full use of these satellite observations, substantial effects have been devoted to retrieving the ocean's interior variability from the sea surface (Chu et al 2000, Ali et al 2004, Liu et al 2017, Foster et al 2021, Qi et al 2022).For instance, Felton et al (2014) established a relationship between the BLT and satellite-derived SSS, SST, and SSH anomalies using a multi-linear regression model.They applied the model to the Indian Ocean and demonstrated the feasibility of monitoring the BLT using satellite observations at a reasonably high frequency and fine resolution.
Recently, the machine learning techniques have rapidly advanced.These new techniques provide a datadriven and flexible solution for regression tasks and have been applied to various academic disciplines, including remote sensing (Su et al 2018), climate forecasting (Ham et al 2019), and data assimilation (Reichstein et al 2019).Despite some degree of success, the machine learning techniques are still full of challenges in areas with limited data availability and/or computational resources (Rossi et al 2014).Meta-learning, as an advanced form and innovative field of supervised machine learning, has received particular attention from the community (Willard et al 2021, Tian et al 2022).Across different applications, experimental results have demonstrated its superiority over the traditional machine learning techniques.
In this study, we propose a novel hybrid ensemble approach to estimate the BLT in the tropical Indian Ocean using meta-learning techniques.We first select three robust machine learning models, including the gated recurrent units (GRU) (Song et al 2020), convolutional neural network (CNN) (Ham et al 2019), and artificial neural networks (ANN) (Ali et al 2004), as our base models, and then employ a Bayesian Neural Networks (BNN) to ensemble the results from the three individual models.A detailed description of the data and methodology used in this study is presented in Section 2, followed by a description of the results in Section 3. In Section 4, we provide a summary of our findings and insights.

Data
In this study, we utilized satellite-derived sea surface data, including the Soil Moisture and Ocean Salinity (SMOS) Level-3 SSS product (Boutin et al 2018), the National Oceanic and Atmospheric Administration (NOAA) Optimum Interpolation Sea Surface Temperature (OISST) version 2 products (Reynolds et al 2002), the Archiving, Validation, and Interpretation of Satellite Oceanographic (AVISO) data center of CNES (Center National d'Etudes Spatiales) Sea Surface Height Anomaly (SSHA) product (Hauser et al 2020), and the Cross-Calibrated Multi-Platform (CCMP) wind velocity product (Atlas et al 2011).To account for the cyclical nature of seasonal trends, we incorporated the month of the year, expressed through its sine and cosine values, as input data.This inclusion was aimed to enhance the performance of our machine learning model by reflecting the annual cyclical influences.The annual cyclical feature is defined as follows: where time represents the month in which the data was recorded.
The observation-based BLT data used for this study were directly obtained from the Asia-Pacific Data-Research Center, University of Hawaii (http://apdrc.soest.hawaii.edu/dods/public_data/Argo_Products/monthly_mean), which were derived from Argo using a variable density criterion (Kara et al 2000).We processed the data on a monthly basis with a horizontal resolution of 1°in both latitude and longitude, serving as a baseline dataset to assess the fidelity of the meta-learning model in reproducing the tropical Indian Ocean (10°S -27°N, 35°E-110°E) BLT using satellite observations.The meta-learning model was performed in two stages: training and testing.The BLT data over the period from 2010-2018 and 2020 were used for training the model, and the data in 2019 were used for testing the model.

Methods
As mentioned above, the meta-learning algorithm can make better predictions by taking the outputs and metadata of machine learning algorithms as input.The proposed hybrid ensemble estimation model for the BLT consists of the meta-learning part and the individual predictor part.A flowchart of the model is presented in figure 1.
We utilized a Multi-input and Multi-output BNN with multiple hidden layers to implement the metalearning algorithm (figure 1).The BNN was selected as the meta-learning algorithm for its ability to incorporate uncertainty into the estimation and its demonstrated performance by previous meta-learning studies (Springenberg et al 2016).The individual learner part consists of three pre-trained individual estimators, each based on a different machine learning model: GRU, CNN, and ANN.The BNN utilized the outputs and metadata of these individual predictors as inputs to generate weight coefficients for each of them.The outputs of the individual predictors were then weighted using these coefficients, and the final prediction was made by combining the weighted outputs.To normalize the inputs of the BNN, satellite-based SSS, SST, SSHA, and SSW observations were pre-processed before being fed into the network.Here, the SSW has been decomposed into eastward wind speed (USSW) and northward wind speed (VSSW) components.The annual cycle of each of these surface parameters was considered as a cyclical feature and represented by its cosine and sine components.In the present study, the meta-learning model was trained using all data collected during 2010-2018 and 2020 and evaluated using data collected during 2019.The optimization of the model parameters was carried out using the grid search method, and the optimal parameter combinations were shown in table S1 as Supporting Information.The performance of each machine learning model was evaluated using the root mean square error (RMSE), which is defined as where N represents the total monthly sample size, BLT i represents the model-based monthly BLT, and BLT i represents the Argo-based monthly BLT at time step i.
The BLT value at a target time to be estimated was obtained using the following equation: Where x x x x , , , ( ) c x , 2 4 and ( ) c x 3 4 are the weight coefficients for the three individual estimators, which can be used to improve the performance of the learning mechanism itself.See the Supporting Information for more details about the machine learning techniques used for the present study.

Basic performance of the meta-learning model
To evaluate the accuracy and stability of the models in estimating the BLT, we compared their testing outputs during 2019 with in-situ observations from Argo (figure 2).Thick BLT from Argo is seen along the coast of the southeast Arabian Sea (SEAS: 68°E-77°E, 5°N-15°N), the eastern equatorial Indian Ocean (EEIO: 85°E-95°E, 5°S-5°N), and the Bay of Bengal (BoB: 85°E-93°E, 13°N-19°N), with its maximum exceeding 18 m in the northern BoB (figure 2(a)).This spatial distribution of the BLT shows essentially the same pattern as that during 2010-2020 (figure S1a in Supporting Information), both of which are consistent with previous studies (Qu andMeyers 2005, Felton et al 2014).Results from the four models resemble the observations in nearly all the details, suggesting the reliability of machine learning techniques in estimating the BLT.The standard deviations of the BLT from Argo are considerably larger in the SEAS and BoB than in the other regions (figure 2(f)), and this is consistent with their long-term mean values during 2010-2020 (figure S1b).Large differences are particularly evident in the EEIO region between these two periods, presumably due to the influence of the 2019 IOD event.The details will be discussed later.
While all the four machine learning models (the meta-learning model and the three individual models) can reproduce the fundamental features of the BLT, there are certain differences between these models.For example, all the three individual models (GRU, ANN, and CNN) tend to overestimate the BLT in the EEIO but underestimate it in the SEAS (figures 2(g)-(i)).After an ensemble by the meta-learning model, the discrepancies between the model and observations are significantly reduced (figure 2(j)).The improvement is particularly  evident in the SEAS.This result suggests that the meta-learning model performs better than the three individual models in reproducing the BLT in the tropical Indian Ocean.
To further evaluate the effectiveness and reliability of the meta-learning model, we compared its testing results during 2019 with the three individual models, using the Argo data as a reference.The results show that the GRU model has the largest RMSEs, with their maximum exceeding 8.21 m in the southeastern Arabian Sea (figure 3(a)).Large RMSEs (>7.50 m) from the GRU model are also visible in the EEIO and northwestern BoB.The ANN model exhibits relatively small RMSEs, ranging from 3 m to 7 m, with their maximum (∼7.20 m) taking place in the SEAS (figure 3(b)).The result from the CNN model is similar to that from the GRU model, displaying large RMSEs (>5 m) in the SEAS and northwestern BoB (figure 3(c)).The RMSEs from the metalearning model are significantly reduced (figure 3(d)).In the SEAS, for example, the maximum RMSE from the GRU, ANN, and CNN models reaches 10.52 m, 7.21 m, and 7.52 m, respectively.After integrating the metalearning model, the maximum RMSE drops to 5.11 m.
To comprehensively evaluate the meta-learning model's performance, we conducted an extensive statistical analysis using metrics such as area-mean RMSE, median value, minimum and maximum quartiles (figures 3(e)-(g)), and Pearson Correlation Coefficient (R) (as shown in table S2 in the supplementary material).Consistently, the meta-learning model outperformed the three individual models, confirming its enhanced estimation abilities.For example, in terms of area-mean RMSE values, the meta-learning model achieved the lowest errors, with values of 3.29, 3.97, and 2.78 for SEAS, BoB, and EEIO, respectively (table S2).Similar results were observed for other RMSE statistics, including median value, minimum and maximum quartiles, where the meta-learning model consistently outperformed the three individual models across the three regions.As for the R, the values for the meta-learning model were 0.91, 0.92, and 0.81 for SEAS, BoB, and EEIO, respectively.These results also indicate that the meta-learning model better reproduces the BLT in the tropical Indian Ocean compared to the three individual models.

Seasonal variations
The model-based BLT seasonal variations in the three selected regions are presented and compared with the Argo data in figure 4. In the SEAS region, the BLT seasonal variation from the meta-learning model is nearly identical with that from Argo (figure 4(a)), both showing thicker BLT in winter (December-February) and thinner BLT in spring and summer (April-August).Advection of low-salinity water from the BoB by the monsoonal winds is likely the dominant process responsible for this seasonal variation (Shenoi et al 2004, Thadathil et al 2008).
In the BoB region, both the meta-learning model and Argo data show similar pattern.For example, the maximum BLT values are observed in January-February, while the minimum BLT values are seen in April-May, which is in agreement with previous studies based on in situ observations (figure 4(b)).Earlier studies illustrate that the seasonal variation of the BLT in the BoB is primarily influenced by two key factors.The first is the timing of the upper water column freshening associated with the precipitation and the discharge of freshwater from the rivers along the BoB.The second is the surface circulation pattern associated with the monsoon (Thadathil et al 2007, Agarwal et al 2012, Kumari et al 2018).
In the EEIO region, the BLT seasonal variation from the meta-learning model also demonstrates a good agreement with the Argo data.Anomalies in Wyrtki jet, local rainfall, and surface current associated with monsoon all play a role in generating this seasonal variation (Sprintall andTomczak 1992, Masson et al 2005).It is worth noting that the BLT seasonal variation during 2019 is essentially the same as that during 2010-2020 in the SEAS and BoB regions (figures 4(a) and (b)).But, this doesn't seem to be the case in the EEIO region, where both the meta-learning model estimated BLT and the Argo derived BLT during 2019 exhibit substantial differences from the long-term mean climatology (figure 4(c)).These differences are especially obvious in October-December, when the long-term mean BLT is much thicker than that during 2019.These differences likely reflect the influence of a positive IOD event that occurred in 2019.During a positive IOD event, the enhanced easterly wind along the equator induce upwelling Kelvin waves, which raise the isothermal layer depth, leading to the thinning of BLT in the EEIO (Qiu et al 2012, Ma et al 2020).
It is widely acknowledged that both SSS and SST significantly influence BLT variations (Felton et al 2014).To further investigate this, we conducted three sets of sensitivity experiments (Case 1, Case 2, and Case 3) to examine the individual roles of SSS and SST in the estimation of BLT.In Case 1, all variables (SSS, SST, SSW, SSHA, sin_time, and cos_time ) were included as input parameters.In Case 2, we excluded SSS to understand its impact, while in Case 3, we excluded SST to assess its influence.The results from these sensitivity experiments indicate that both SSS and SST contribute significantly to BLT estimation.However, SSS emerges as a more substantial contributor to BLT estimation compared to SST, aligning with previous research findings (Qu et al 2014).The genesis of the barrier layer fundamentally reflects the salinity stratification near the sea surface (Sprintall andTomczak 1992, Thadathil et al 2007), which is primarily governed by SSS.SST, while still influential, plays a more secondary role.

Conclusions
We have proposed a novel meta-learning-based ensemble model to estimate the BLT and its seasonal variations in the tropical Indian Ocean.The model works pretty well and is able to reproduce most of the fundamental features observed by Argo.Comparison with the three individual machine learning models (GRU, ANN, and CNN) suggests that the meta-learning model outperforms all the others in terms of spatial distribution and accuracy.In particular, the meta-learning model demonstrates an obvious improvement in the SEAS region, where the individual models tend to underestimate the BLT.The meta-learning model also nicely reproduces the BLT seasonal variations in the tropical Indian Ocean.
Among all the sea surface parameters, SSS plays a dominant role in the estimation of the BLT, and this seems to suggest that SSS is the process dominating the BLT formation in the tropical Indian Ocean.Changes in SSS may influence the salinity stratification near the sea surface and directly contribute to the BLT variations.While influential, SST appears to play a more secondary role.
The present study has successfully applied the machine learning technology to the BLT estimation in the tropical Indian Ocean.The results provide an insight into the feasibility of monitoring the BLT using satellite observations.While our current model holds important implications for the application of machine learning technology in oceanographic research, it's important to note that our model does have its limitations and requires further refinement.One notable limitation is that our model is purely data-driven and does not incorporate relevant physical constraints, representing one of its limitations.In future work, the development of advanced Physics-Informed Neural Networks (PINNs) that integrate data-driven machine learning with physical constraints offers substantial potential.This interdisciplinary approach represents an important direction for future study.The sensitivity of BLT estimations to the chosen BLT criteria is indeed an aspect worth investigating.Exploring this could deepen our understanding of the sensitivity of BLT estimations and potentially enhance the robustness of our model.Another important thing to note is that the input variables used in this study are all derived from satellite data.However, certain satellite datasets, like SSS, have relatively shorter time series, which may pose limitations when examining oceanic phenomena such as El Niño-Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD).To comprehensively investigate the applicability of machine learning in physical oceanography, it is possible to utilize datasets with longer time series, such as CMIP 6, ORAS5, and other datasets, for a more extensive validation of machine learning's suitability within the fields of physical oceanography.This is also an important research direction for the future to expand the application of machine learning in physical oceanography.

4
denote the different forms of sea surface parameters required by the GRU, CNN, ANN and BNN models at the time to be estimated.( ) the BLT outputs from the three individual estimators at the time to be estimated.( ) c x , 1 4

Figure 1 .
Figure 1.Flowchart of the meta-learning-based model for estimating the BLT in the tropical Indian Ocean.

Figure 2 .
Figure 2. Annual mean BLT in the tropical Indian Ocean during 2019 estimated from (a) Argo, (b) GRU, (c) ANN, (d) CNN, and (e) Meta-learning model, and (f) the standard deviation of BLT from Argo; Annual mean BLT differences between Argo and (g) GRU, (h) ANN, (i) CNN, and (j) Meta-learning model.The black rectangles in (a) show the geographic locations of three selected regions in the SEAS, BoB, and EEIO.

Figure 3 .
Figure 3. Spatial distribution of the BLT root mean square errors (RMSE; unit: m) from the (a) GRU, (b) ANN, (c) CNN, and (d) metalearning models, and their area-mean values in the (e) SEAS, (f) BoB, and (g) EEIO.The red rectangles in (d) show the geographic locations of the three selected regions.The horizontal orange line across each box of the lower panels represents the median value, and the lower and upper boundaries of the box indicate the lower and upper quartiles, respectively.The green diamond inside each box represents the area-mean value, and the black line encompasses data within 1.5 times the interquartile range.The circles indicate outliers.

Figure 4 .
Figure 4. Seasonal evolution of the BLT in the (a) SEAS, (b) BoB, and (c) EEIO, the results from sensitivity experiments represented by (d) RMSE and (e) R 2 .Here, the climatology in (a), (b), and (c) represents the BLT mean seasonal cycle from Argo during 2010-2020.