Removing the systematic errors of the model in operational oceanography forecasting system using data assimilation method

For the operational oceanography forecast, the synoptic forecast error is partly from the long-term systematic bias of the model, which can be partly counteracted by adjusting the values of the physical parameters. To this end, a four-dimensional optimization system is implemented into the South China Sea operational oceanography forecasting system, to adjust the values of multi-parameters using data assimilation method. By assimilating Argo temperature profiles of 51 days in the model domain, five physical parameters (coefficients of horizontal/vertical diffusion/viscosity and linear bottom drag) of the model have been adjusted simultaneously, and then the optimal values are obtained. The RMSE of temperature simulations in the assimilation window decreases from 1.17 to 0.97 K, when using the optimal values. The validation of the freerun experiments shows that the temperature RMSE decreases from 0.97 to 0.88 K, which indicates that the optimal values are still valid in a longer and independent period. Finally, the validation of the hindcast experiments shows that at the synoptic scale the temperature RMSE decreases from 0.90 to 0.80 K and other variables also present improvements. It hints that it is feasible to reduce the synoptic forecast errors by adjusting the parameter values at the climatological scale to partly counteract the systematic bias of the model. Therefore, it also provides a potential pathway to improve the synoptic forecast skill for the operational oceanography forecasting system.


Introduction
In recent years, the role of marine environmental forecast in the South China Sea (SCS) has become increasingly important, with the development of marine resource exploitation, shipping and fisheries [1].The SCS Operational oceanography Forecasting System (SCSOFS) is implemented, consisting of two parts: the data assimilation system and the numerical model.By assimilating observations and integrating the model, the SCSOFS is able to provide the forecasts of temperature, salinity, velocity and sea level anomaly for the future seven days, to support the offshore activities [2,3].In order to improve the forecast skill, the modification to the assimilation system and the model of the SCSOFS has been ongoing [4].As part of the modification work, a multi-parameter optimization system was developed using a four-dimensional assimilation method to reduce the forecast errors by adjusting the parameter values of the model in the SCSOFS.
Broadly speaking, forecast errors are resulted from initial errors and model errors [5].Model errors originate from the approximations of the model dynamics and physical parameterization schemes, the errors of external forcing, as well as the misfits of parameter values in the parameterization schemes [6].For an operational forecasting system, it is impossible to accurately decide one or several factors that dominate the model errors, or quantify the respective contribution of these factors.From practical perspective, even though the sources of model errors are not distinguished, the model errors can be partially compensated by introducing an extra term, and then the forecast is improved to some extent [7].It should be noted that this approach does not correct the inherent defects of the model itself.However, from a practical perspective, it is useful to improve the performance of the models [8].
Researchers have developed various methods to partially correct the model errors.For example, Tao and Duan [7] introduced a tendency term in the right-hand side of the governing equations of the model to partially offset the combined effects of model errors from different sources.This method extends the skillful ENSO forecast from 6 leading months to 12 leading months.Parameter adjustment, also known as parameter estimation or optimization, is commonly used to reduce the model errors [9][10][11][12][13][14]. Parameter adjustment automatically finds the optimal parameter values by assimilating observations over a period, to make the simulations approach the observations as much as possible [15].Based on a threedimensional POM and its adjoint model, Peng et al. [8] implemented a four-dimensional variational assimilation system.By assimilating water level observations during a typhoon event, they optimized the two-dimensional wind stress drag coefficients, and then improved the simulations of storm surge.
In climate prediction, researchers have conducted a series of studies on parameter adjustment [16].For example, Zhang [17] simultaneously optimized multi-parameter values in a simple air-sea coupled model under the framework of twin experiments, and found that parameter adjustment can significantly improve the skill of decadal prediction.Wu et al. [6] used an intermediate complex coupled model to adjust parameter values under the framework of twin experiments.By assimilating the fake observations of SST anomaly, they effectively improved the prediction skill of ENSO.
From the perspective of time scales, previous studies can be broadly divided into two categories.1) Parameter values are adjusted for a given synoptic event, such as typhoon, to improve the simulations of this synoptic event [8].2) Parameter values are adjusted for a climatological event, such as ENSO event [17].The forecasting of the SCSOFS belongs to the first category, as it provides the forecast with a weekly leading time.At this synoptic scale, the forecast errors are dominated by the initial errors, therefore, the optimal values for the current forecast may perform poorly in the next forecast.Meanwhile, the impact of changes of parameter values is difficult to influence the model variables at this time scale, and thus the effect of parameter adjustment is limited to some extent.To avoid these two problems, this work directly adjusts the model parameter values at a climatological scale.Independent hindcast experiments are then conducted to verify whether the optimal values can improve the forecasts at the synoptic scale.
In the following, the section 2 introduces the forecasting system, the methodology, and the observations used for assimilation and validation; the section 3 discusses the sensitivity of the model simulations to the changes of the parameter values; the section 4 presents the results of parameter adjustment and validation; finally, the section 5 provides conclusions and discussions.

The forecasting system
The SCS Operational oceanography Forecasting System (SCSOFS) consists of a data assimilation system and an ocean model [3].The assimilation system employs the Ensemble Optimal Interpolation (EnOI) method [18,19].The ocean model uses the Regional Ocean Modeling System (ROMS) version 3.7, and is configured on a domain [4.5°S-28.4°N,99°E-144°E] with a horizontal resolution of 1/12° in the western Pacific and 1/30° in the SCS, 50 vertical layers with an enhanced resolution near the surface.Atmospheric forcing adopts the Climate Forecast System Reanalysis (CFSR) dataset with a 6-hour interval.Lateral boundary conditions are from monthly Simple Ocean Data Assimilation (SODA) dataset.Baroclinic and barotropic time steps are 6 and 180 seconds, respectively.Temperature and salinity diffusion uses harmonic parameterization with horizontal and vertical diffusion coefficients  ℎ and   .Momentum viscosity also uses harmonic parameterization with horizontal and vertical viscosity coefficients  ℎ and   .Bottom stress is parameterized as quadratic drag function with a bottom drag coefficient   .The five parameters will be simultaneously adjusted in the following to reduce the model biases and then forecast errors.
The data assimilation system is used in the "hindcast" experiments (see the section 2.2) for model initialization [4].It assimilates observations of Sea Level Anomaly (SLA) and temperature/salinity profiles with a window 7 days, and Sea Surface Temperature (SST) within 1 day, using a First Guess at Appropriate Time (FGAT) approach.The output of the assimilation system, i.e., analysis increment, is proportionally superimposed to model governing equations at each time step through Incremental Analysis Updates (IAU) method.
It should be noted that the data assimilation system in the SCSOFS is used to yield initial condition for the model, which will be used in the "hindcast" experiments.However, the focus of this paper is to develop another data assimilation system that is used to adjust the parameter values.To avoid confusion, the data assimilation system denotes the latter in the following, while the former is denoted as the Initial Condition (IC) assimilation system.

The methodology
There are three experiments in this study as shown in figure 1.(a) In the adjustment experiments, the model is integrated without IC assimilation from January 1 to May 30, 2017, of which the first 99 days (January 1 to April 9) is used as the spin-up stage and the rest (April 10 to May 30) as the assimilation window for adjusting the parameter values.The setting of the assimilation window will be explained in the section 3. The model will be integrated several times for the five months, and the parameter's values will be adjusted to make the simulations approach the observations in the assimilation window.(b) In the freerun experiments, the model will be integrated from May 31, 2017 to December 31, 2018 without IC assimilation, to test the performance of the model using the optimal parameter values in an independent period.The freerun experiments will integrates the model twice: one with the original values, and the other with the optimal values.(c) The hindcast experiments conduct hindcasts of the SCSOFS from August 3, 2017 to August 22, 2018 with IC-assimilation and forecast cycles, to validate the performance the SCSOFS when using the optimal parameter values at the synoptic scale.The hindcast experiments also conduct twice, one with the original values and the other with the optimal values.

Figure 1.
The configurations of the experiments, where the "Adjust", "Freerun" and "Hindcast" correspond to the adjustment, freerun and hindcast experiment, respectively, and the dates denote the start and end dates of the experiments.
In the adjustment experiments, a cost function  is defined, to measure the residual of the model simulations and the observations, where  represents the observations within the assimilation window,  is the number of observations,  denotes variables of the model simulations, and  is an interpolation operator mapping model variables to the time and locations of the observations.Parameter values will be adjusted through the optimization algorithm to minimize the cost function  and then make the simulations approach the The adjustment procedure is illustrated in Figure 2.For the values of the five parameters to be adjusted ( ℎ ,   ,  ℎ ,   and   ), first guesses of parameter increments are given using the random numbers.The model is integrated once from January 1 to May 30, 2017 using the increments, and the cost function is computed within the assimilation window (from April 10 to May 30) to measure the residual between the model simulations and the observations, where (∆ 1 , ∆ 2 , ∆ 3 , ∆ 4 , ∆ 5 ) denotes the model states using the parameter increments (∆ 1 , ∆ 2 , ∆ 3 , ∆ 4 , ∆ 5 ).Next, small perturbations   are added to the five increments, respectively, and the model is integrated five more times to obtain respective cost functions    .For example, for  = 1,  1  is calculated using Gradients of the cost function   with respect to parameter increment ∆  are estimated using The gradients are input into sequential quadratic programming algorithm [20] to update the parameter increments ∆  .This process iterates until the optimization converges to an optimal solution [21].

The observations
In the adjustment experiments, 314 Argo temperature profiles from April 10 to May 30, 2017 are assimilated (Figure 3a).There are about 6 profiles per day in the model domain, concentrated in the central SCS and western Pacific (Figure 3b).The observations in Argo temperature profiles are intensive for the data assimilation, and thus, sparsification is needed.To this end, the observations in each profile are interpolated into 8 standard depths [150, 200, 250, 300, 400, 500, 700, 1000m].In the adjustment experiments, no near-surface temperature observation is assimilated, therefore, SST appears an independent variable for validation.
In the freerun experiments, Argo temperature and salinity profiles, SST and SLA observations are used to validate the performance of the model using the optimal parameter values (Figure 3c-h).In each day, there are around 20 Argo profiles, 1000 SST observations and 4000 SLA observations.Similarly, the observations in each profile are interpolated vertically to 27 standard levels from 0 to 900m for validation.
In the hindcast experiments, the SCSOFS is used to validate the performance of the SCSOFS using the optimal parameter values.Observations assimilated for IC include Argo temperature and salinity profiles, AVHRR SST, and AVISO SLA.Observations used to validate the hindcast is the same to those in the freerun experiments.Theoretically, the adjustment of parameter values should be unconstrained.In other words, the variation ranges of the parameter values should be large enough to make the model simulations approach observations as much as possible.However, the model integration may blowup, if the value range is too large.Here, we choose broad ranges of the parameter values to test the sensitivity, but some results are not presented due to the blowups of the model integrations (Figure 4).Finally, the minimum and maximum of the parameter values that will not cause blowups decide the adjustment range during adjustment, as listed in Table 1.
In this paper, the RMSE of model simulations with respect to the observations is used to measure the sensitivity of the model simulations to the value variations of the parameters.The observations use the Argo temperature profiles.If there are significant differences between the RMSEs using different parameter values, it can be inferred that the simulation can be influenced by the perturbed parameter value.Figure 4 shows the time series of the RMSE of model simulations against the observations.In general, after initial adjustments, RMSE differences start to increase with oscillation features.Within the first 99 days, the RMSE differences are small, suggesting that parameter changes have slight impacts on the RMSE and the model simulations.After this stage, the RMSE difference increases obviously.The RMSE differences can reach about 0.1 K.It indicates that the simulations can be influenced by the value variations of the parameter.Therefore, the model simulation is sensitive to the changes of the parameter values, only after the first 99 days.
Table 1.The adjustment ranges, the original and optimal values of the five parameters.
Adjustment range The assimilation window should be defined in a period, in which the perturbed parameter values can effectively influence the simulations.If not, the optimization cannot converge.From the above analysis, the first 99 days is not appropriate for the assimilation window.Finally, the assimilation window is defined as the period of the day 100 to 150, as denoted by two vertical solid lines in Figure 4. Of course, the assimilation window could be longer, or later (e.g., day 100 to 200).But the optimization involves repeated model integrations, which is computationally intensive.Considering the fact that the simulations are not sensitive to the value variations in the first 99 days and the computation will increase rapidly with longer integrations, an assimilation window from day 100 to 150, i.e., April 10 to May 30, is chosen in this study.

The adjustment experiments
The optimization is shown in Figure 2. The five parameters started from their original values, through repeated iterations and integrations, and finally converged to the optimal values (Table 1).Within the assimilation window, the temporal and vertical distributions of the RMSE show that using optimal values can significantly reduce the RMSE (Figure 5a and b).The horizontal distribution of the RMSE also shows that the blue grids occupy the majority, indicating a decreased temperature RMSE at most grids when using the optimal values (Figure 5c).On average, the temperature RMSE is 1.1656 K when using the original parameter values in the assimilation window (Table 2), but reduced to 0.9680 K using the optimal values, i.e., a decrease of 0.1976 K (16.95%).It should be noted that the optimal values are all located at the boundaries of the given ranges (Table 1).For example, the optimal value of  ℎ is 3.0E+1, that is the left boundary of the adjustment range.It indicates that the adjustment ranges are too narrow.If the adjustment range is further relaxed, the model variables would get even closer to the observations.However, sensitivity experiments show that broader ranges would cause model blowup and a failure in integration.Anyway, adjusting the parameter values within the above ranges can significantly reduce the RMSE by 16.95%, which hints such tuning is still meaningful in practical applications.

The freerun experiments
In the adjustment experiments, the aim of parameter adjustment is to make the model simulations approach the observations as much as possible within the assimilation window.It has been shown in the section 4.1 that using the optimal values can significantly decrease the RMSE in the adjustment experiments.However, does it still decrease the RMSE when using the optimal values, in an independent period?To check the effectiveness of the optimal values, the freerun experiments are conducted.The independent validation of the freerun experiments shows that in the 1.5 years outside the assimilation window (Table 2), the temperature RMSE using the optimal parameter values (0.8849°C) is still lower than that using the original values (0.9712°C), with a decrease of 0.0863 K (8.89%).The vertical structure of the temperature RMSE shows noticeable improvements of about 0.1 K in the thermocline from 100 to 200 m and at the layers deeper than 400 m (Figure 6a).The temporal distribution shows significantly lower RMSE using the optimal parameters throughout the freerun period (Figure 6c).To clearly show the difference of the RMSE, a monthly average is used to filter the high-frequency RMSE.Spatially, the areas with error reductions using the optimal values concentrate in the central SCS, east of Taiwan and Luzon Island (Figure 7a).
Besides temperature, the simulation of SLA also shows noticeable improvement, with a decrease of RMSE from 16.13 cm to 14.06 cm (Table 2).It is also obvious from the temporal distribution of the RMSE (Figure 6f), that the monthly RMSE using the optimal values is always lower than that using the original values in the whole validation period.The horizontal distribution of SLA RMSE indicates a significant improvement in the areas from the SCS to the western Pacific, with a decrease of about 3 cm (Figure 7d).The improvement in salinity simulation is relatively smaller.Vertically, the RMSE using the optimal values decreases at the layers from 300 to 600 m (Figure 6b).Horizontally, the improved areas concentrate in the east Taiwan and Luzon Island with a slight RMSE decrease (Figure 7b).After applying the optimal parameter values, the RMSE of SST simulation increases slightly from 0.5999 K to 0.6274 K (Table 2).Temporally, the SST errors do not show significant modifications (Figure 6e), while spatially, the areas with reduced RMSE are located in the southwestern SCS and east of Taiwan Island (Figure 7c).
The above analysis shows that the optimal values obtained from the data assimilation method are effective not only within the assimilation window, but also in another longer and independent period.Moreover, using the optimal values also improves the simulation of the SLA and salinity, although the SST error increases slightly.It indicates that the adjustment of the model states by using the optimal values is likely reasonable.

The hindcast experiments
The adjustment and freerun experiments both validate the simulations using the five optimal parameter values at the climatological scale.Are these optimal values still effective in the operational forecast at the synoptic scale?To answer this question, we tested the performance of the optimal values using the SCSOFS at the synoptic scale in the hindcast experiments.The validation of the hindcast experiments shows that using the original parameter values, the temperature RMSE is 0.8985 K (Table 2), while with the optimal values, the RMSE decreases to IOP Publishing doi:10.1088/1742-6596/2718/1/01202710 0.7998°C.The RMSE of hindcast decreases 0.0987 K (10.98%).The vertical structure of temperature RMSE shows significant improvements at the layers deeper than 100 m, with a decrease of 0.1 K (Figure 8a).Temporally, the monthly RMSE is significantly lower than that using the original values (Figure 8c).Spatially, the grids with decreased RMSE account for the majority after using the optimal values (Figure 9a).
The improvement in SLA hindcast is also obvious, with a decrease of RMSE from 8.36 cm to 7.79 cm, i.e., a reduction of 0.57 cm (Table 2), when using the optimal values.The horizontal distribution of SLA RMSE shows there are decreased RMSEs in the central SCS and the western Pacific, but increased RMSE in the coastal SCS (Figure 9d).The RMSE of salinity and SST also decreases, with the salinity RMSE reduced from 0.0973 psu to 0.0935 psu, and the SST RMSE from 0.4926K to 0.4745°C, which are relatively slight (Table 2).
In this study, the optimal values are obtained at the climatological scale.Therefore, the model error is partly compensated at this time scale in the adjustment and freerun experiments.However, the optimal values also reduce the forecast errors at the synoptic scale, according to the validation of the hindcast experiments.It can be inferred that there is similarity between model errors at the climatological and synoptic scales.Therefore, the optimal values obtaining from the climatological scale also work at the synoptic scale.
In summary, using the optimal parameter values can significantly reduce the errors of the temperature and other variables in the hindcast using the operational SCSOFS.It indicates that adjusting parameter values at the climatological scale can improve the forecast skill at the synoptic scale, for an operational forecasting system.It provides a feasible pathway to partly decrease the forecast errors at the synoptic scale.

Conclusions and discussions
In this study, an assimilation system to simultaneously adjust multi-parameter values of the model, is implemented to reduce systematic biases in the South China Sea Operational oceanography Forecasting System (SCSOFS).By assimilating Argo temperature profiles, it yields the optimal values of the five parameters, and thus reduce the RMSE by 16.95% in the assimilation window.The validation of the freerun experiments shows that the optimal values can also effectively improve temperature simulations in an independent and longer period, e.g., the temperature RMSE decreases by 8.89%.Furthermore, SLA simulations are also improved by a RMSE reduction of 12.83%.The validation of the hindcast experiments demonstrates that in the operational forecasting system, the optimal parameter values significantly decrease temperature RMSE by10.98%, as well as SLA RMSE by 6.82%, and slightly improve salinity and SST hindcast.The results show that the adjustment of the five parameter values using data assimilation method is valid, for the operational forecasting system to reduce the systematic bias.
For the synoptic scale forecast, like what the SCSOFS does, the forecast errors are dominated by the initial errors.However, the model errors are superimposed on each time step, exerting a small but continuous impact.If the parameter values are adjusted at a synoptic scale, the results are affected by initial errors.Furthermore, the impact of parameter value changes is also restricted, and thus the systematic bias cannot be reflected and then removed.In this study, the parameter values are adjusted at the climatological scale, by using a long assimilation window after a spin-up stage, to correct the systematic bias of the model.In return, the optimal parameter values are tested in the operational SCSOFS, which indicates that it is indeed valid to decrease the forecast errors at the synoptic scale.
It should be noted that the optimal values from the parameter adjustment are not necessarily the true values.The parameter values are adjusted using the assimilation method, aiming to counteract the model errors.However, the source of the model errors is not limited to the value errors of the adjusted parameters.It may also originate from the errors of external forcing, inaccurate description of physical processes in the parameterization scheme, etc.In this study, after using the optimal values, the involved physical processes are changed, which offsets the model errors, thereby improving the simulations of the concerned physical variables.According to the validation, the adjustment improves not only the temperature simulations, but also other variables to a certain degree, which suggests that parameter adjustment reduces the model errors from a holistic perspective and improves the simulation of multiple variables.It also indicates that this adjustment is reasonable.
However, different forecasting systems and models have different model errors, therefore, the optimal values in this paper may not necessarily be suitable for other models.Furthermore, after the model configurations in this paper are modified, the optimal values may not remain valid, and they need to be adjusted again for this new configuration.This paper only adjusts 5 common physical parameters for model tuning, and does not discuss whether other parameters have more significant impacts on the model simulations.In follow-up studies, we plan to adopt the methods of Wang et al. [22] to systematically select the model parameters to be adjusted.In addition, this work only assimilates temperature profiles, so the improvement of temperature simulations and hindcasts is the most significant, while for other variables the improvement is relatively small.In the future work, we plan to assimilate multi-source observations simultaneously, in order to obtain better parameter values.

Figure 2 .
Figure 2. Flowchart of the adjustment to the parameter values.

3 .
Parameter sensitivityBefore adjusting the parameter values, it is necessary to confirm that the changes of parameter values can significantly influence the model simulations, otherwise the optimization cannot converge.To this end, we conduct five groups of experiments to evaluate the sensitivity of model simulations to the value changes of the five parameters, i.e.,  ℎ ,   ,  ℎ ,   and   .Each group includes seven independent experiments, with the perturbed values of the given parameter and the original values of other parameters, to isolate the contribution of the value change of the given parameter.For the total 35 experiments with different parameter values, models are integrated for one year from the identical IC.

Figure 3 .
Figure 3. Temporal (left) and spatial (right) distributions of the observations.Row 1 presents the observations assimilated in the adjustment experiments, while rows 2-4 present the observations used for validation in the freerun and hindcast experiments.

Figure 4 .
Figure 4. Time series of temperature RMSE when using the perturbed value of one given parameter.Due to model blowups with certain extreme values, the number of lines in (a), (c) and (e) are less than the cases in the legend.

Figure 5 .
Figure 5. Temporal (a) and vertical (b) distributions of the temperature RMSE in the assimilation window using the original (blue) and optimal (red) parameter values, and horizontal (c) distribution of RMSE differences, where blue (red) grids denote lower RMSEs using the optimal (original) values.

Figure 6 .
Figure 6.Vertical (a, b) and temporal (c-f) distributions of the RMSE in the freerun experiments using the original (blue) and optimal (red) parameter values.Thick lines in (c-f) show monthly-averaged RMSE.

Figure 7 .
Figure 7. Horizontal distributions of the RMSE differences in the freerun experiments, where blue (red) grids denote lower RMSEs when using the optimal (original) values.

Figure 8 .
Figure 8. Same to Figure 6, but for the hindcast experiments.

Figure 9 .
Figure 9. Same to Figure 7, but for the hindcast experiments.

Table 2 .
RMSE using the original and optimal values in the adjustment, freerun and hindcast experiments.