Compression AutoEncoder for High-Resolution Ocean Sound Speed Profile Data

High-resolution ocean sound speed profile (HROSSP) data is essential for ocean acoustic modeling and sonar performance evaluation. However, the large volume and storage requirements of this data severely restrict its practical application in ocean acoustics. In this paper, we propose a compression autoencoder specifically designed for managing HROSSP data (CAE-HROSSP) and investigate the optimal network structure. Experimental results demonstrate that by using the min-max normalization method for input data and the corresponding inverse normalization for output data, along with employing the LeakyReLU function as the final activation layer, the accuracy of decompressed data reconstruction can be significantly improved. To tackle the challenges of fitting the distribution of surface sound speed data caused by significant variations and noise, we propose two loss functions: slice mean square error and elemental mean square error. These loss functions are combined with mean squared error through weighted summation to enhance CAE-HROSSP’s ability to fit the distribution of surface sound speed values and minimize the reconstruction errors of compressed data. Performance evaluation experiments reveal that CAE-HROSSP outperforms two existing methods in compressing HROSSP data, achieving superior performance with smaller data reconstruction errors at higher compression ratios. Furthermore, transfer learning is utilized to enhance the training of CAE-HROSSP, employing HROSSP data from the area where the mesoscale eddy is situated, as well as at the convergence of cold and warm ocean currents. The compression performance of both the training set and the validation set is comparable in the sea, where the structure of the sound speed profile varies greatly. This indicates that CAE-HROSSP can compress highly variable sound speed profile data in more sea areas using transfer learning, and has the potential to be extended globally. The findings and insights obtained from this study provide guidance for future endeavors in utilizing autoencoders to compress HROSSP data.


Introduction
In the field of ocean acoustic technology, high-resolution ocean sound speed profile (HROSSP) data are critical for obtaining detailed ocean acoustic information.
Such data are critical for researching ocean acoustic modeling and analysis, evaluating acoustic performance calculations, and detecting underwater acoustic targets.However, high-resolution ocean reanalysis products that are currently available generally have a horizontal spatial resolution of at least 1°×1° and a temporal resolution of at least 3 hours.Consequently, converting environmental parameters like temperature and salinity from these reanalysis products to acquire HROSSP data results in a sizable data amount.The file size for a single data sampling event may reach the gigabyte (GB) level.Ordinary Autoencoder is an end-to-end neural network model for unsupervised learning, with its simplest form being a feed-forward, non-recurrent neural network.The goal of an autoencoder is to map input data to a latent space (encoding) and reconstruct the original input as closely as possible (decoding).
Figure 1 illustrates an autoencoder with three fully connected layers. 1 ,  2 and  3 represent the input layer, code layer, and output layer, respectively.Here,  1 and  2 form the encoder part, defined by the transformation process , while  2 and  3 form the decoder part, defined by the transformation process .Let's assume we have a set of training samples  = { 1 ,  2 ,  3 ,  4 ,  5 }, a set of code layer neurons  = { 1 ,  2 ,  3 }, and a set of training outputs  ′ = { 1 ′ ,  2 ′ ,  3 ′ ,  4 ′ ,  5 ′ }, we have Since there is only one hidden layer in this instance, for the encoder transformation , we have where  is the weight matrix,  is the bias vector.The  layer is usually referred to as a code layer, which can be regarded as a compressed representation of the input . is the activation function, such as the sigmoid function, tanh function, ReLU function, LeakyReLU function, or linear activation. represents the variable for the input activation function.The expressions for the sigmoid, tanh, ReLU, and LeakyReLU functions are as follows: R U() = { ,  > 0 0,  ≤ 0 , k R U() = { ,  > 0 ,  ≤ 0 .
After encoding, we reconstruct  into  ′ through the decoder transformation , where  ′ has the same shape as .We have As the code layer, if  has a lower dimensionality than the input data dimensionality, the network is forced to learn a compressed representation of the input data.It can be seen as compressing and abstracting the input data, thus forming a compressed autoencoder (CAE).CAE, by learning the compressed representation, can achieve reconstruction of the input data.CAE has great potential in data compression, as mentioned in the Introduction of related research.We believe that the proposed CAE-HROSSP in this paper can serve as a lossy compressor for HROSSP, achieving higher compression ratios and higher reconstruction accuracy than CDL and CEOF.

CAE-HROSSP
The complete structure of CAE-HROSSP is shown in figure 2, which comprises a total of 13 layers.It begins with a Scaler layer and concludes with a Reverse Scaler layer, referred to as Re-Scaler, as described in section 2.3 below.In the middle, there are 11 fully connected layers, including 5 layers of encoder's fully connected layers  1  ,  2  ,  3  ,  4  ,  5  , and 1 layer of code layer .The decoder consists of 5 layers of fully connected layers  1  ,  2  ,  3  ,  4  ,  5  .All the fully connected layers are interconnected using the LeakyReLU activation function, except for the activation function after  5  , which will be discussed in section 3.3.1 below.

Figure 2. Structure of CAE-HROSSP
Each layer of the encoder and decoder is associated with a weight matrix and bias vector to facilitate dimensionality reduction.The input file undergoes compression through the 5 layers, resulting in its information being represented with fewer neurons in the  layer.The decoder follows a similar structure as the encoder, with each layer having a weight matrix and bias vector.The data in the  layer is processed through the 5 decoder layers and then outputted to the final file.
If we consider CAE-HROSSP as a compressor, the  1  layer represents the original file, the  layer represents the compressed file, and the  5  layer represents the decompressed file.The encoder and decoder correspond to the compression and decompression processes, respectively.

Scaler and Re-Scaler Layer
The Scaler layer performs scaling operations such as normalization and standardization on the input data, transforming the data into a specific distribution and range.The Re-Scaler layer reverses the scaling operation performed by the Scaler layer, restoring the data distribution and range to their original state.It should be noted that the Scaler and Re-Scaler layers are used together in the same type, and there is no cross-combination usage.Therefore, in the following text, when a Scaler type is selected, the corresponding Re-Scaler is automatically selected.
During model training, the training dataset is often divided into several batches and sequentially inputted to the model for training.In the sound speed profile dataset, the different depth layers in the sound speed profile samples can be considered as different features of the dataset.To address the problem of significant differences in the range and distribution of speed values across different depth layers, we add a Scaler layer at the front end of the original autoencoder network and a Re-Scaler layer at the back end.By performing scaling operations, we can ensure that the speed values of each layer have similar ranges and distributions, helping the algorithm to better balance the weights of each layer and improve the stability and convergence speed of the model training.
Assuming a batch of data  = {  ,   , … ,   }, after applying the Scaler layer, the output is  ′ = {  ′ ,   ′ , … ,   ′ }.Both  and  ′ are  rows by  columns matrices, where  represents the row index and  represents the column index.There are a total of  =  •  elements. represents the number of sound speed profile samples in this batch, and  represents the number of depth layers, which is the number of features in the samples.  and   ′ represent the length- feature vectors for the -th feature.
For the choice of scaler type, we will introduce several commonly used scaler types in section 3.3.1 below.
StandardScaler: It transforms each feature to have a distribution with zero mean and unit variance.We have The Re-StandardScaler is given by where   represents an element of the feature vector   ,   represents the mean of the -th feature,   2 represents the variance of the -th feature, and  is a small constant added during normalization to avoid division by zero when the variance is zero.
MinMaxScaler: It normalizes the data to a specified range, typically between 0 and 1.We have where  _ represents the minimum value of the -th feature, and  _ represents the maximum value of the -th feature.
The Re-MinMaxScaler is given by MaxAbsScaler: It scales the data to be within the range [-1, 1].We have where max (| _ −  _ |) represents the maximum absolute difference.The Re-MaxAbsScaler is given by Note that in order to ensure consistent feature distribution for each batch with the input file, it is necessary to scale each batch using the statistical values of the training set, such as maximum, minimum, mean, and variance.This should be done instead of using the statistical values within a single batch.For both validation and test sets, it is important to also scale using the statistical values from the training set.This is because the test set is an independent and unknown dataset from the training set.It aims to examine the performance of the trained model on completely separate and unfamiliar data, therefore assessing the model's ability to generalize.

Optimization
Assuming the input and output matrices are  and ′, respectively, we hope that  = ′ .However, due to the nature of neural networks, some data loss always occurs.Typically, the reconstruction error of an autoencoder, i.e., the loss function, uses the mean squared error (MSE) loss function, we have The goal of CAE-HROSSP is to efficiently compress HROSSP data while preserving high data reconstruction precision.Sound speed in seawater is substantially impacted by seawater temperature, with sea surface temperature influenced by factors that encompass solar radiation, weather situations, surface wind field, and ocean currents, ultimately leading to notable diurnal and seasonal fluctuations with significant noise.The variability of the distribution of sound speed is greater at the surface of the sea in comparison to below it.In addition, MSE calculations amplify larger errors by squaring them, leading to a more significant contribution to the overall loss by outliers.Therefore, the model is trained to minimize the overall loss and reduce outliers.Therefore, when dealing with sea surface sound speed profile data that has high variance and noise, the MSE loss function enables the model to minimize reconstruction errors, thereby improving the fitting effect of the sea surface layer data.In light of this, the paper proposes two new loss functions, Slice Mean Squared Error (SMSE) and Element Mean Squared Error (EMSE), in addition to computing the MSE for the entire dataset.
SMSE calculates the MSE loss of the surface layer (shallower than 200 meters), i.e., the first  1 layers of the sound speed profile.SMSE increases the weight of the response of the model to the surface layer sound speed profile during training, which helps to improve the fitting effect of the surface layer sound speed profile data.We have where   and   ′ represent the -th sound speed vectors of the input and output data in the first  1 layers.EMSE separately calculates the MSE loss of  2 elements that exceed the default sound speed absolute error threshold.EMSE can fine-tune and focus on some important positions, which helps avoid the model focusing optimization on the whole and ignoring the individual important locations, thereby improving the overall performance and robustness of the model.We have where   represents an element that exceeds the error threshold.Finally, by weighted summing MSE, SMSE, and EMSE, the model is guided to improve the fitting effect of deeper layers with a lot of noise while considering minimizing the maximum error loss, which is more in line with the accuracy requirements of data reconstruction in practical applications.The total loss is where , , and  represent the weights of MSE, SMSE, and EMSE, respectively.

Evaluation Metrics for Compression Performance
The performance of the model is evaluated in terms of compression ratio and data recovery accuracy.The theoretical compression ratio  ℎ and actual compression ratio   are used to evaluate the compression efficiency of the data.Mean absolute error (MAE), mean relative error (MRE), percentage of exceeding absolute error bound (PEAEB), and maximum absolute error (Max-AE) are used to evaluate the accuracy of the reconstructed data.The calculation of the  ℎ is expressed as follows: The calculation of the actual compression ratio   is expressed as follows: where  represents the information of other additional files that need to be stored.
The expression for MAE is as follows: The expression for MRE is as follows: The expression for Max-AE is as follows: The PEAEB is calculated as follows: PEAEB = (

Number of samples with absolute error exceeding bound
Total number of samples We also need to set an absolute error bound (AEB) and calculate the PEAEB.This is because simply averaging the errors is not sufficient to comprehensively evaluate the performance of CAE-HROSSP's lossy compression.For example, if the average absolute error in compression testing is 1m/s, but the error bound requirement is 3m/s, it could be the case that 80% of the output numbers have testing errors smaller than 1m/s, but the absolute errors for the remaining numbers could be as high as 200% or even higher.Since the average value of ocean sound speed is about 1500 m/s, when the absolute error is less than 3 m/s, i.e., the relative error is less than 0.2%, it can meet the accuracy requirements of most ocean sound speed profiles.In this paper, we set AEB to be 3 m/s and evaluate the compression performance of the model more comprehensively by calculating PEAEB.In section 3.5 below, corresponding solutions will be provided for cases where errors in the reconstructed results of CAE-HROSSP exceed the AEB.

Implementation Details
3.1.1Datasets.This article utilizes the second-generation global high-resolution coupled ice-ocean reanalysis product, China ocean reanalysis 2.0(CORAv2.0),to obtain high-resolution ocean sound speed profiles.CORAv2.0 is sourced from the National Marine Data and Information Service of China and can be downloaded from the data website http://mds.nmdis.org.cn.The required datasets include sea water temperature, salinity, and pressure.For the oceanic domain, longitude and latitude are divided into equally spaced grids with a spatial resolution of 0.1°.The temporal resolution is 3 hours, and the depth is non-uniformly divided into 50 layers.

Data
Preprocessing.We select a local subset of data from CORAv2.0, with a spatial range of [175.95°E-179.95°E,40.95°N-44.95°N],consisting of 1600 raster samples.The time span covers the entire year 2014, with a total of 2920 sampling moments.
(1) The seawater temperature, salinity, and depth data of the local dataset are converted to sound speed data using an empirical formula.The sound speed formula is where a,  represents the sound speed of seawater.s, t, and p represent salinity, water temperature, and static pressure, respectively.The values of s, t, and p should satisfy 0 ≤  ≤ 0, 0 ∘ ≤  ≤ 0 ∘ , 0 ≤  ≤ 0 8  .  , ,  and  are empirical functions related to seawater temperature and static pressure.The standard deviation of the calculated sound speed is 0.19 m/s, indicating that the calculated results are close to the actual sound speed and can meet the requirements of the application scenario.
(2) The quadratic spline sliding interpolation method is used to complete the sound speed values at locations where values were originally missing in the seafloor sediment.This ensures that each sound speed profile sample has sound speed values in 50 depth layers.
(3) The data are organized in a two-dimensional matrix.Specifically, the sound speed profile samples at a given latitude and longitude coordinate are sorted columnwise in chronological order to obtain a high-resolution sound speed profile matrix corresponding to a single latitude and longitude sampling point for the entire year.Then, all sound speed profile matrices for all latitude and longitude points are sorted columnwise to obtain a complete sound speed profile matrix with a size of 4,672,000 × 50.The data are then converted to double precision floating point format, where each number occupies 8 bytes, and written to a MATLAB binary data file in MAT format.The processed high-resolution ocean sound speed profile data set is referred to as A.
(4) The A data set is divided into training set (containing 22,803,200 samples), validation set (containing 934,400 samples), and test set (containing 934,400 samples) in the ratio of 6:2:2.The three sets have no overlap and are all in double precision floating point format.
Due to the distinct seasonal and regional characteristics of the sound speed profiles, spatially adjacent sound speed profiles at the same time have similar structures that vary roughly along the latitude direction.In addition, sound speed profiles exhibit significant temporal evolution, especially in shallow water, where they show clear seasonal variations and short-term variations influenced by factors such as short-term weather systems.In summary, the temporal variation of sound speed profiles is often complex due to the interaction of multiple factors.
Therefore, a spatially stratified sampling method is used to partition the dataset.Specifically, starting from the highest latitude and lowest longitude, within each group of five coordinates, one sampling point randomly selects 2920 sound speed profile samples to be included in the test set, while another sampling point randomly selects 2920 samples to be included in the validation set.All remaining samples from the other coordinates are included in the training set.This process is repeated until all coordinate points have been traversed.is used as the optimizer to update the weight matrices and bias vectors.After an epoch of training, the validation set is entered once to check for overfitting.The validation set does not participate in the optimization to update the parameters.After model training is complete, the final weight matrices and bias vectors are obtained, and the test set is entered into the model.Since the test set does not overlap with the training set, the test results can reflect the generalization ability of the model.

Training Hyperparameters
In CAE-HROSSP there are several important parameters that need to be set: Layers: We trained several models and compared their reconstruction errors on the same test set to determine the appropriate number of layers in the CAE-HROSSP neural network.We found that using 11 fully connected layers, excluding the normalization and anti-normalization layers at the beginning and end, yielded the best results in terms of both generalization performance and model size.This configuration allows for convenient deployment on personal laptops.If the number of layers is too high, overfitting problems can occur.The overly complex model fits the noise in the training data rather than capturing its general characteristics [12] .

Number of Neurons:
The number of neurons in each layer determines the model's parameter count, which refers to the amount of data in the weight matrices and bias vectors within the neural network.In theory, a larger number of neurons implies better fitting ability, but also leads to larger memory requirements and decreased computational efficiency.Therefore, a balance must be struck between model performance and computational efficiency.In addition, the ratio of the number of neurons in the input layer to the number of neurons in the encoder layer determines the theoretical compression ratio of the model, i.e., the degree of dimensional reduction.Through multiple experiments, the following neuron numbers were selected for the 11 fully connected layers in this study: 50, 5000, 3000, 2000, 1000, number of neurons in the coding layer Z, 1000, 2000, 3000, 5000, 50.This configuration achieved good performance while ensuring compatibility with personal laptops.
Batch Size: The batch size defines the number of samples propagated through the network.In this study, the input files were divided into multiple batches of 16 samples each.The Adam optimizer was used, which is a variant of the stochastic gradient descent algorithm.Using smaller batch sizes allows for more frequent gradient updates, which tends to drive the model toward flat minima.These flat minima have little variation within a small neighborhood of the minimum, which helps reduce the risk of overfitting and improves generalization.Conversely, larger batch sizes tend to converge to sharp minima that exhibit significant variation, making it easier to get trapped in local optima [13] .
Epochs: The number of epochs determines how many times the neural network is trained on the training data set.To ensure model convergence, epochs are set to infinite.However, if the absolute difference between the validation loss and the training loss remains below 0.001 for ten consecutive epochs, the model is considered to have converged.
Learning Rate: In this study, the learning rate was set to 0.0001.A smaller learning rate implies smaller parameter updates during the model training process, which promotes stability and allows convergence to the globally optimal solution while avoiding getting stuck in local optima.

Ablation Study
To determine the optimal neural network structure and loss function type for CAE-HROSSP, this study progressively modifies the structure of based CAE-HROSSP and trains models using different types of loss functions.The best configuration is determined based on the MAE, Max-AE, and PEAEB of reconstructed data on the test set.The based CAE-HROSSP has a fixed  ℎ of 5, which means the number of neurons in the code layer  is fixed at 10.The model has a total of 13 layers, including 11 fully connected layers with the following number of neurons: [50, 5000, 3000, 1000, 10, 1000, 2000, 3000, 5000, 50].

Neural Network Structure.
The modifications to the model structure focus on the activation function in the last layer and Scaler layer.This is because both directly affect the distribution and range of the model's output.Furthermore, since the autoencoder itself is not a specific classification or regression task, it does not have a dedicated softmax function as the activation function in the last layer to transform the continuous output values into a probability distribution representing the probabilities of each class.Based on this, this study conducted experiments to determine the best activation function for the last layer and the best Scaler layer, which is one of the novel contributions of this study.In addition, in this neural network structure ablation experiment, weighted summation MSE, SMSE and EMSE were used as loss functions to train various CAE-HROSSPs.The optimal combination type of loss functions will be discussed in 3.3.2below.
There are 5 choices for the last layer's activation function: linear, sigmoid, tanh, ReLU, and LeakyReLU.There are 4 choices for the normalization layer: NoScaler, StandardScaler, MinMaxScaler, and MaxAbsScaler.These are paired together, resulting in a total of 20 combinations.Each corresponding CAE-HROSSP is trained and the results for MAE, Max-AE, and PEAEB are recorded in table 1, table 2, and table 3, respectively, with units of m/s.The best results are highlighted in bold.In addition, the errors of the 5 output results corresponding to MinMaxScaler are relatively small and are also expressed in bold.a The number of units with other combinations exceeding AEB is not shown, because their Max-AE is much larger than 3m/s and does not meet the accuracy requirements of sound speed profiles.
b The parentheses after PEAEB indicate the number of elements greater than AEB(3m/s).
From tables 1, 2, and 3, we can see that the combination of LeakyReLU and MinMaxScaler layers achieves the best performance, with a Max-AE of 3.3203 m/s and MAE of 0.1084 m/s.There are only nine units with values greater than AEB (3 m/s), and the performance metrics are better than the remaining 19 combinations.The number of units with values greater than AEB can be ignored compared to the total number of 46,720,000 units in the 934,400 samples of the test set.Therefore, the output results can meet the accuracy requirements of underwater acoustic engineering experiments.It should be noted that the average value of sound speed is about 1500m/s, and MRE can be considered as the ratio of MAE to 1500.Therefore, MAE and MRE can be approximated as proportional.In order to save space in this paper, the results of MRE index will not be shown here.The reasons are analyzed as follows: From tables 1 and 2, it can be seen that the output errors of the Linear and LeakyReLU functions combined with various scaler layers are generally lower than those of Sigmoid, Tanh, ReLU and various scaler layer combinations.
First, we believe this is because the ranges of the LeakyReLU and linear functions are wider, with no upper limit in the positive part, which can better preserve large range values in the original data.Among them, LeakyReLU, and linear function all have slopes on the negative axis, which allows the activation function output to respond to negative inputs and avoid vanishing gradient problems, improving model robustness and generalization ability.This helps to better propagate error signals and assist the network in learning data representation and reconstruction more accurately [14] .
Second, the LeakyReLU and Linear functions are linearly activated in the positive domain, and the Scaler layer performs linear scaling, neither of which changes the distribution of the input data.However, the Sigmoid and Tanh functions are nonlinear activation functions that change the data distribution, making the data more densely near the extrema while other regions are relatively sparse [15] .They also limit the output result to less than 1m/s, causing information loss for certain data and increasing reconstruction errors in the output.
Finally, the LeakyReLU function still retains nonlinearity, and compared to the linear function, it can better capture nonlinear relationships in the data and learn more complex feature representations, thus improving the reconstruction accuracy of the data.
Overall, all five types of activation functions with MinMaxScaler layers have low Max-AE and MAE, indicating that using MinMaxScaler layers is more suitable for autoencoder tasks.The output errors of models that do not use Scaler scaling are quite large, which shows that scaling data with Scaler layers can improve the training effectiveness of the model, and it is necessary to set scaler layers.

Loss
Type.We trained the above optimal CAE-HROSSP structure using different combinations of loss functions, including MSE only, sum of MSE and SMSE, sum of MSE and EMSE, and sum of MSE, SMSE, and EMSE.The results are presented in table 4. The best result is red and the next best is bold.
a If Max-AE is significantly larger than AEB(3m/s), it does not meet the accuracy requirements, and it is not meaningful to display its PEAEB, so fill in the cell with "---".b Parentheses below PEAEB indicate the number of elements greater than AEB(3m/s).
From table 4, both MSE&SMSE and MSE&EMSE have better compression performance compared to the loss function with only MSE.This indicates that our designed SMSE and EMSE are effective.In addition, when the three losses are weighted and summed, the model training performance is optimal, indicating that the improvement effects of SMSE and EMSE do not cancel each other out.The optimal setting of the weighting parameters is considered as a starting point for future improvement research.

Quantitative Comparisons
To compare the best CAE-HROSSP obtained from the above Ablation Study with the traditional CDL and CEOF methods, experiments were conducted from the perspectives of fixed  ℎ .The experiments are performed on the test set which has no intersection with the training data.The MAE, MRE, Max-AE, and PEAEB of the three methods were compared.The results are presented in table 5.In addition, the reasons why  ℎ f C does not reach 50 are as follows: where,  represents the number of deep layers,  represents the number of input data samples,  represents the number of samples of the dictionary set in dictionary learning, and  represents sparsity, that is, the sparsity of the sparse matrix set in the sparse representation of dictionary learning in CDL method.The larger  is, the sparser it is, and  can only be an integer greater than 0. Under normal circumstances, the number of input data samples  is much larger than the number of samples  of dictionary data set, so the second and third terms of the equation are approximately equal to 0 and can be ignored.Therefore, equation (30) can be approximated as In this paper,  = 0, and  ≥ .Therefore, the maximum  ℎ f C is 25.The best result and the next best are expressed in bold.As shown in table 5, for each  ℎ , CAE-HROSSP achieves the best performance in all performance metrics, demonstrating its superiority over traditional methods.However, when  ℎ is equal to or greater than 8, the Max-AE of CAE-HROSSP exceeds 6 m/s, which is twice the maximum allowable error of 3 m/s for sound speed profiles in underwater acoustics.Although the percentage of units with such large errors is negligible, it deserves attention and can serve as a starting point for future model optimization.Solutions to overcome this limitation are provided in section 3.5 below.

Example Result
In the conducted experiments, the optimal configuration for the CAE-HROSSP was determined.This configuration includes a fixed compression ratio of 5, 10 neurons in the code layer Z, and a model depth of 13 layers.The final two layers consist of min-max normalization and de-normalization layers, while the remaining 11 layers are fully connected layers.The fully connected layers comprise neurons with the following distribution: [50, 5000, 3000, 2000, 1000, 10, 1000, 2000, 3000, 5000, 50].The employed loss function is a weighted combination of MSE, SMSE, and EMSE, and a Sigmoid activation function is employed at the output layer.In the test set, the test data is presented in figure 4, while the reconstructed data is shown in figure 5.The difference between reconstructed data and input data is shown in figure 6.In addition, the horizontal coordinate of the above three graphs is the sample serial number (total 934400 samples), and the vertical coordinate is the number of depth layers (total 50 layers).In qualitative comparison, there is almost no difference between figure 4 and figure 5.As shown in figure 6, the difference between the input data and the reconstructed data is very small, most of which is around 0m/s.This shows that the extracted data of CAE-HROSSP can be reconstructed with high precision.
Furthermore, a quantitative evaluation of the CAE-HROSSP's performance is provided in table 6.The MAE is 0.1084 m/s, and MRE is 0.0073%, rendering the average error negligible.Moreover, Max-AE remains about 3.3 m/s, which aligns with the specified error boundary of 3 m/s.The exceeding data beyond the error boundary accounts for a mere 1.93e -5 % of the total dataset.This maximum error satisfactorily fulfills the accuracy requirements of most underwater acoustic engineering tasks.
Regarding data points surpassing the AEB, we thoroughly examine each predicted value.If the prediction error exceeds the AEB, we store the difference between the input and output values, preserving the corresponding index and creating a difference-index pair file.During the decompression process, we retrieve the difference-index pairs and match the errors with their respective indices.By adding the differences to the predicted values, the entire decompressed file adheres to the AEB [16] .
In addition, the results show a PEAEB of 1.93e -5 %, and the data volume of the difference-index pair file is negligible compared to the original data size.Thus, it can be concluded that the theoretical compression ratio  ℎ is consistent with the actual compression ratio   .

Highly Variable Sound Speed Profiles
CAE-HROSSP was further trained and tested using data from regions with significant variations in the sound speed profile, influenced by different oceanographic phenomena.The compression performance of new CAE-HROSSP was then evaluated.
Since the spatial resolution of CORAv2.0 adopted in this paper is 0.1°, the sound speed profiles affected by oceanic sub-mesoscale and small-scale processes with spatial scales of 10 km and below, such as internal waves, turbulence, etc., cannot be analyzed.Only large-scale and mesoscale oceanic processes with scales of 100 km and above, such as mesoscale eddies and currents, can be analyzed.
In this section, sound speed profiles from the major warm and cold convergence regions of the ocean current and regions where mesoscale eddies were located in 2014, as well as one-third of the original dataset, are selected and merged as a new dataset.Transfer learning [17] (TL) is used to further train the pre-training CAE-HROSSP in section 3.5 above.That is, the HROSSP data compression capability obtained from training in the original sea area is transferred to the new compression task.
During the training process of TL, a smaller learning rate of 0.00001 is used and gradually reduced to 0.0000005 as the training progresses.The above training strategy, on the one hand, saves training costs and does not require CAE-HROSSP to be re-trained to adapt to the new compression task in new seas.On the other hand, the lower learning rate and the merging of old and new data also ensure that the pre-trained model parameters are adjusted to a lesser extent, preventing CAE-HROSSP from overperforming the compression performance in the new seas, leading to a degradation in original sea area.
The TL in this section was divided into two steps, first by merging the original data with the data from the main warm and cold convergence regions of the ocean current to train the pre-training CAE-HROSSP in section 3.5 above, and second by further merging the mesoscale eddy data for the second training.In addition, the META3.1EXPDT (https://www.aviso.altimetry.fr/en/data.html),a mesoscale eddy trajectory product was selected to determine the center position of the mesoscale eddy, which was used as the basis for extracting the HROSSP data covered by and around the mesoscale eddy.
Table 7 quantitatively shows the compression performance of the three models before TL, after the first TL and after the second TL.After two TL, the MAE of CAE-HROSSP in the original sea area increases about 0.009m/s, but the Max-AE decreases to 3.2186m/s from 3.3203m/s, and the number of units exceeding 3m/s decreases to 3 from 9. Meanwhile, the Max-AEs in the two new sea areas are both less than 3m/s.It indicates that CAE-HROSSP could adapt to the new compression tasks, and the compression performance of CAE-HROSSP on the original task improved rather than decreased due to the training of the new task.TL can be used to train the HROSSP data from more various seas, which provides the global generalization ability of CAE-HROSSP.

Summary and Conclusion
The CAE-HROSSP proposed in this study demonstrates improved training effectiveness through the SMSE and EMSE losses.Furthermore, the combination of MinMaxScaler and LeakyReLU as the terminal activation function achieved the highest reconstruction accuracy.
Compared to traditional methods such as CDL and CEOF, CAE-HROSSP exhibited lower reconstruction errors at the same  ℎ .The maximum compression ratio of CAE-HROSSP could reach up to 50, which was unattainable with CDL.
Through TL, the compression category of CAE-HROSSP was extended to the area where the mesoscale eddies was located and the area where the cold and warm ocean currents converged, and the compression performance in the original sea area was not affected.

3. 1 . 3
Training and Testing Details.In this paper, CAE-HROSSP is implemented based on the Pytorch framework and trained on NVIDIA A100 GPUs.The testing and evaluation experiments of CAE-HROSSP are performed on a Windows 10 laptop with an Intel(R) Core (TM) i5-10210U CPU @ 1.60 GHz and 16 GB RAM.This further demonstrates that CAE-HROSSP can be deployed and run on regular personal laptops.Note that the data sets used in the test and evaluation experiments are not overlapping with the training set, and all types of CAE-HROSSP models used in the experiments are converged models after training.The model is initialized using the Xavier normal distribution to initialize the weights and bias vectors of each layer of the network.All of these model structure parameters are stored in double precision floating point format to be consistent with the input data format.During model training, multiple epochs of training are performed, and the training set is divided into multiple batches for sequential training in each epoch.Adam (Adaptive Moment Estimation)

Figure 3
presents the training and validation loss curves during the model training process.Note that validation sets are only carried out once per training epoch, and train loss is only drawn once per 10,000 batches run.As shown in the figure 3, although validation loss converges slightly slower than train loss, both of them converge to about 0.004m/s.It shows that CAE-HROSSP is convergent and has good generalization.

Figure 6 .
Figure 6.The difference between reconstructed data and input data of test set.

3697 2 .
9003(0) a Before TL represents the pre-training CAE-HROSSP in section 3.5, 1 st TL represents the model trained after the first TL, and 2 nd TL represents the model trained after the second TL.The unit is m/s.b Numbers in parentheses after Max-AE indicate the number of absolute errors exceeding 3m/s.The "-"indicates that the number of cells exceeding 3m/s is too large and will not be shown in detail.The following two examples qualitatively show the superior compression performance of CAE-HROSSP in the two new seas.Figures7, 8 and 9show HROSSP data before compression and after reconstruction in the convergence of the warm and cold currents (longitude 150°E, latitude 30°N -45°N).

Figure 7 .
Figure 7.The data before compression.

Figure 8 .
Figure 8.The data after reconstruction.

Figure 9 .
Figure 9.The difference between reconstructed data and input data.

Figures 10 ,
Figures 10, 11 and 12 show HROSSP data before compression and after reconstruction in a mesoscale eddy (longitude 30°W, latitude 3°N-9°N) with a radius of about 290km on January 1, 2014.

Figure 10 .
Figure 10.The data before compression.

Figure 11 .
Figure 11.The data after reconstruction.

Figure 12 .
Figure 12.The difference between reconstructed data and input data.

Table 1 .
Performance of reconstructed data of 20 combinations on MAE.

Table 2 .
Performance of reconstructed data of 20 combinations on Max-AE.

Table 3 .
If the MinMaxScaler layer is used, the PEAEB of the reconstructed data.

Table 4 .
The performance of the model is trained by the combination of 4 types of loss functions.

Table 5 .
Performance of three compression methods under 4 fixed  ℎ .Outside the brackets of the table head is  ℎ , and inside the brackets is the compression efficiency, that is, the ratio of the amount of compressed data to the amount of original data.b CAE indicates the CAE-HROSSP model.c Parentheses below PEAEB indicate the number of elements greater than AEB (3m/s). a

Table 6
shows the error indicators of the test set extracted after CAE-HROSSP compression.

Table 7 .
Compression performance of three models for three sea areas.