A generalized data assimilation architecture of digital twin for complex process industrial systems

As one of the critical cores of digital twin (DT), data assimilation (DA) can maintain consistency and synchronization between DT and physical system. Kalman filtering is a common DA method, but its estimation performance is deteriorated by factors such as model inaccuracy and time-varying noise covariance in practical applications. The errors caused by these multiple uncertainties are all coupled to the measurements, which augments the difficulty for DT to obtain physical system information. In order to tackle the DA problem with multiple uncertainties, this paper proposes a generalized DA architecture for DT in sophisticated process industry. First, by combining Stein variational gradient descent and nonlinear Bayesian filtering paradigm, a recursive estimation framework is established, which has higher accuracy in estimating the noise covariance compared to traditional methods. Second, to effectively deal with model inaccuracy by using filtering residuals containing time-varying noise, we propose a neural network and modified wavelet-based model error compensation (NNMW-MEC) block. Based on the modified wavelet technique, the filtering residual denoising built in NNMW-MEC can better cope with time-varying noise compared to existing wavelets, and extract the low-frequency signal involving model error information from noisy residual smoothly. In addition, because of the neural network-based state-compensation subblock, NNMW-MEC has more outstanding ability in compensating the state deviations with large changing range. Finally, we take the boiler system in a coal-fired power plant as an example to verify the effectiveness of our architecture. Experimental results show that the DA architecture proposed in this paper can improve the estimation performance of DT under inaccurate models and uncertain noise statistics.


Introduction
Background and challenges.As an inevitable trend in social development, Industry 4.0 is the industrial intelligent revolution driven by information technologies like artificial intelligence, the Internet of Things, and big data [1][2][3].Wherein, DT has been recognized by most savants and entrepreneurs as a critical tool for achieving intelligent manufacturing [4][5][6].Michael Grieves first proposed the concept of DT [7], and later NASA applied it to the aerospace field [8].Since then, due to the gradual deepening and improvement of understanding for DT, researchers have proposed numerous DT frameworks for various domains, such as manufacturing [9][10][11], energy [12][13][14], civil engineering [15] and healthcare [16].According to the summary of [17], DT can be generally described as follows: 'an equivalent mapping of physical system in virtual space, and is updated through bidirectional data exchange to maintain consistency and synchronism between physical and virtual system.' Indubitably, if we want to successfully establish a DT and apply it to complex industries, ensuring the consistency between DT and physical system is a necessary prerequisite.However, it is difficult for DT to always keep consistent with the physical system by only establishing a highfidelity model due to the following factors [18]: (1) Because of modeling errors in the virtual model, the digital states are inconsistent with the physical states; (2) Even if the established virtual model is accurate enough, due to the accumulation of calculation errors, the digital states will gradually deviate from the physical states; (3) The unpredictable changes of time-varying parameters in actual system may lead to the decrease in prediction accuracy of the original DT model; and (4) The degradation of actual system components may cause parameter drift, and finally resulting in deviation between two systems.
To maintain consistency with the true states, DT needs to obtain real-time measurements from sensors to calibrate the digital states, which is exactly what DA can do [20].In the general five-dimensional DT model (as shown in figure 1) proposed by Thelen et al [19], updating the digital state (physical to virtual) through measurements is an important dimension in the DT framework.Therefore, it can be said that DA is an indispensable part of the DT model.
Presently, DA methods can be divided into two categories: variational DA and KF DA [21,22].Variational DA defines a time window with a specified size to optimally estimate all observed data and states within this window [23].On the contrary, KFDA is a recursive estimation method that achieves the optimal estimation of the next time states based on the current states and the received observation data [24].Obviously, the KFDA is more suitable for DT because it has lower computational burden and higher real-time performance [25].However, uncertain factors such as unpredictable parameters and time-varying noise covariance in actual system can reduce the estimation ability of KF [26][27][28].Let us take the boiler system of a coal-fired power plant as an example.On the one hand, due to the complex working environment of industrial sites, affected by electromagnetic interference, the noise statistics from most sensors are time-variant and no longer remain constant [29,30].On the other hand, the net calorific value of coal and the specific heat capacity of flue gas are typical disturbance parameters [31,32].Differences in coal quality and insufficient combustion processes can lead to changes in these parameters [33,34].Once these unobservable parameters change, it will inevitably influence the prediction accuracy of the DT model, and then corrupt subsequent control and optimization decisions by more or less [35,36].In addition, because of the modeling and calculation errors, the DT model can hardly be the same as the real system [37].Apparently, one of the main challenges in DT is the lack of a DA algorithm that can handle multiple uncertainties well.Thus, further research is needed on the DA algorithm for DT to maintain satisfactory consistency with the physical system.Literature review.As mentioned above, the common uncertainties in KFDA mainly include two types: model uncertainty caused by time-varying parameters and modeling error and uncertainty in noise statistics.To deal with time-varying noise covariance, AKF is a usually adopted method.The early solution that received attention and application is SHAKF [38].Subsequently, Simo and Aapo proposed VBAKF to estimate noise covariance iteratively using a simple family of exponential distributions to approximate the posterior distribution of hidden states through mean field theory [39].This method improves the estimation performance of SHAKF, but the mean field assumption forcibly breaks the correlation among latent variables [40].Meanwhile, the cumbersome derivation process of the CAVI method used in VBAKF is not friendly to high-dimensional dynamic systems [41].To overcome the limitations of VBAKF and simplify the algorithm derivation, the BBVIAKF is proposed, which approximates gradients through sampling methods [42][43][44][45].In particular, an SVGD based Bayesian inference approach put forward recently has shown more promising prospects in estimating target distribution [46,47].Similar to black box variational inference (BBVI), SVGD utilizes a prior distribution to generate a set of particles, and then drives these particles to the target distribution gradually, overcoming the flaws of VBAKF cleverly [48][49][50].In fact, the study results of Casey Chu in [51] show that BBVI with a specific kernel is precisely equivalent to SVGD.Although VBAKF and corresponding modification can handle noise uncertainty effectively, once the system model is not accurate, it is difficult to achieve satisfactory results [52].
For model uncertainty, IMMAKF and H ∞ KF are commonly used methods [53,54].IMMAKF needs to create a model set to describe the possible states during the operation, and update the probability of each model based on the model matching likelihood function to form a suboptimal estimation result [55].That is, IMMAKF requires establishing a set of system models with different parameter values [56].One drawback of this approach is that if there are many timevarying parameters and with large changeable range as well in the real system, the number of interaction models may increase exponentially, which is a disaster for the IMMAKF [57].As another way to deal with model uncertainty, H ∞ KF minimizes the H ∞ norm of estimation error in the worst-case scenario without making any assumptions about model uncertainty and noise statistics [58][59][60].This method is specifically designed for robustness, but it is hard to guarantee that the H ∞ performance parameter γ will satisfy the filter existence conditions at every step [28,61].With the popularization of deep learning, using DNN to assist KF in dealing with model uncertainty becomes a new solution [62][63][64][65][66].In particular, Guy Revach presented KalmanNet, a hybrid data-driven/model-based filter that does not require full knowledge of the underlying model parameters [67,68].The numerical experimental results illustrate that KalmanNet can achieve optimal performance without full knowledge of the model parameters [69,70].Nevertheless, when facing multiple uncertain factors in practice, the estimation performance of these methods can be compromised.
In order to synchronously and accurately reflect changes in a physical system, the DA algorithm for DT needs to estimate the state of the system with multiple uncertainties.For the state estimation problem of nonlinear systems with timevarying model parameters and inaccurate noise covariance, an NN-VBACKF method is proposed in [71].The simulation results demonstrate that the NN-VBACKF has better estimation accuracy and robustness than VBAKF, but the analysis of the range and improvement for model uncertainty is not sufficient.If significant changes or offsets of system parameters arise, the estimation performance and robustness of this method need more intensive validation.More recently, Li W studied a robust particle filtering (PF) method for nonlinear systems with both additive time-varying uncertainty in the state transition equation and inaccurate process noise covariance [72], which exhibits enhanced robustness compared to conventional PF.Unfortunately, the uncertainty of some nonlinear systems may not be expressed as additive.For the fusion filtering problem of networked multi-sensor fusion systems with multiple uncertainties, some study results are elaborated in [73] and [74].Obviously, for the state estimation issue of physical systems with multiple uncertainties, the proposed DA methods in existing limited literature are difficult to meet the requirements of DT.That is to say, further research on the DA algorithm that is suitable for DT to maintain consistency is still needed.
Motivation.Our purpose is to establish a universally applicable DA architecture that not only ensures consistency between DT and physical systems under multiple uncertainties, but can also adopt different methods based on different problems and requirements.Motivated by this, in this paper, we propose a generalized DA architecture for DT in complex process industry.This architecture includes two major blocks: SVANF block and NNMW-MEC block.First, the role of SVANF is to estimate covariance iteratively while predicting the states of the physical system.Considering that most physical systems are nonlinear, we establish a recursive estimation framework based on nonlinear filtering (NF) to analyze the latent states dynamically.Then, to obtain higher accuracy in noise covariance estimation, we combine SVGD with NF to form SVANF. Second, the purpose of NNMW-MEC is to handle model inaccuracy.It includes three subblocks: FRD, NNSC and MN.FRD is used to smoothly extract model error information from filtering residuals with time-varying noise.To obtain higher quality model error information from noisy residuals, we construct the FRD through an ESID wavelet denoising technique.Compared to previous methods, it can still maintain good denoising performance under changing noise levels.The function of NNSC is to infer state errors according to the denoised residuals, and complete real-time compensation for SVANF estimation results.This subblock can not only tackle state deviation with large changing range but also train the network in a timely manner when the residual exceeds the threshold.The MN has the same operation as the system measurement function.We train a network to replace the original measurement function to facilitate the parameter training of NNSC.In this way, the influence of the order of parameters magnitude in measurement equations can be avoided during the gradient backpropagation process.Finally, the simulation results demonstrate that the proposed scheme can improve the DA performance for DT systems with inaccurate models and time-varying noise.

Contributions.
The main contributions of this paper lie in the following three aspects: (1) A generalized DA architecture for DT is established, which not only has better estimation accuracy and convergence performance than traditional schemes under inaccurate model and uncertain noise covariance, but also can adopt different implementation algorithms according to different problems and requirements; (2) To overcome the issue of inaccurate system model, we propose a NNMW-MEC block.On the one hand, the model error information can be obtained more smoothly under time-varying noise.On the other hand, the estimation results of SVANF are compensated more effectively; and (3) The estimation performance of time-varying noise covariance is further improved by establishing a recursive estimation framework which combines SVGD and NF paradigm.
The rest of this paper is organized as follows.In section 2, we specifically describe the issues that need to be addressed.The DA architecture for the DT system is presented in section 3.In section 4, the algorithms for each block in DA framework are described in detail.The simulation results are provided to illustrate the effectiveness of the proposed method in section 5.In section 6, some conclusions and future work are discussed.

Problem formulation
Consider the following nonlinear discrete-time state space model with uncertain parameters and time-varying measurement noise covariance matrix: where Based on the above assumptions, the values of the timevarying parameter p var k and measurement noise covariance R k+1 are uncertain.In this problem, we assume that they are generated from an unknown exogenous dynamic system: where d p k ∈ R d and d R k ∈ R r are unknown disturbance factors.We then consider a nonlinear discrete-time state space model for DT with nominal values of time-varying parameter: where pvar k is the nominal value for offline identification of where Σ k+1 is the covariance at time step k + 1.When the system model is inaccurate, estimation bias can occur, resulting in the mean no longer being 0, which is written as where µ k+1 is the mean of residual at time step k + 1.
To maintain consistency between the estimated state of the DT and the true state of the physical system, our objective in this paper is to minimize the mean squared error of the residual mean where the W sc and b sc are the weights and biases.For the above optimization problem, three key issues need to be addressed: (1) How to better calculate the posterior probability density?(2) How to implement a compensation state network?and (3) How to obtain the mean of measurement residuals µ 1:k+1 ?Hence, in the next section, we propose a DA architecture that uses SVANF, NNSC, and FRD blocks to solve the three problems shown above, respectively.

DA architecture for DT
To achieve the goal of maintaining consistency between DT and physical systems under multiple uncertainties, a DA architecture for DT is presented in this section, including SVANF and NNMW-MEC, as shown in figure 2. Firstly, the DT system constructs a virtual model of the physical factory to generate prediction data.Combining the predicted and measured data, SVANF calculates the estimation results for the next time step.Meanwhile, the noise covariance information of the system can also be iteratively estimated based on the SVGD method.Then, according to the measured data and predicted observations, the filtering residual is formed in NNMW-MEC.Before inputting residual into NNSC, it is necessary to use FRD to eliminate time-varying noise to obtain the mean value of residual.The compensated data is the final estimated value and serves as the input for the next step for the DT model.Finally, through the calibration of SVANF and NNMW-MEC, the purpose of maintaining consistency between DT and physical factory under multiple uncertainties can be realized.The subsequent content of this section will describe each block in detail.
SVANF: SVANF is mainly responsible for calculating the posterior probability density.Based on the nonlinear Bayesian filtering paradigm, a recursive estimation algorithm for complex systems is established.Afterwards, the SVGD method is used to estimate the unknown noise covariance more accurately.It utilizes the continuous iteration and update of a set of particles to gradually approximate the target distribution of covariance, which has better estimation performance than traditional schemes.
FRD: in practice, the only information we can use to analyze system model errors is filtering residuals, but it contains noise and may even be time-varying.The purpose of FRD is to extract error information from filtering residuals effectively.That is, this block is used to obtain the mean of residuals.As an outstanding technique in signal decomposition and denoising, wavelets can extract low-frequency error information from noisy data.Compared with existing methods, the ESID wavelet denoising built in this paper can better cope with residual signals with time-varying noise, which is very suitable for our scenario.Moreover, this block improves the prediction accuracy of NNSC, thereby enhancing the stability of the whole system.
NNSC: the function of NNSC is to infer state errors according to the denoised residuals, and refine the estimated states of SVANF before serving as input of DT model for the next step.In this paper, we use neural networks to establish the state error compensation block.It can not only tackle state deviation with large changing range but also train the network in a timely manner when the residual exceeds the threshold.
MN: if the original measurement equation is too complex, the order of parameter magnitude and difficult derivation process will seriously affect the optimization performance and convergence speed of NNSC.Therefore, assuming that the measurement equation remains unchanged, we can then use DNN to fit the measurement equation through offline training.With the same operation as the measurement function, the MN makes the parameter convergence of NNSC faster.
It should be mentioned that we mainly propose a generalized DA architecture for DT systems to tackle the state estimation problem under time-varying parameters and inaccurate noise covariance.The specific implementation methods of each block in the architecture can be modified according to different problems or plants.For systems with linearity or weak nonlinearity, we can directly use KF or extended KF (EKF) while for highly nonlinear systems, we should use cubature KF (CKF) or unscented KF (UKF).Meanwhile, FRD can be flexibly replaced according to different noise levels and different types of plants.For example, if the measurement accuracy of the sensor is high and the noise contained in the filtering residual can be almost ignored, then we can use ordinary wavelet methods or even directly remove FRD.Since the architecture is mainly proposed for the process industry, which has large inertia and relatively slow state changes, it is reasonable to assume that the low-frequency part of the filtering residual is caused by model inaccuracy, while the highfrequency part is random noise.Naturally, we can use FRD to obtain low-frequency signals of residual smoothly.However, for targets with high maneuverability, it is necessary to consider whether FRD will cause the loss of model error information.For NNSC, CNN, RNN and other suitable networks can be selected based on specific demands such as feature extraction and sequence prediction.Briefly speaking, as long as the requirements of the corresponding blocks in the architecture are met, the implementation algorithm can be arbitrarily selected for specific systems.Furthermore, our architecture can be applied not only to process industries represented by coal-fired power plants, but also to discrete manufacturing.But there needs to be a prerequisite.The state used to describe specific problems in discrete manufacturing must be defined in real space, i.e. satisfying x ∈ R n .A detailed study on the application of DA in discrete manufacturing can be found in [75][76][77].In the next section, we will derive the specific implementation algorithms for each block in the architecture.

Algorithm implementation of each block in the architecture
In this section, the specific algorithm implementation scheme for each block in DA architecture is described in detail.The overall scheme is shown in figure 3.In addition, the convergence of the whole DA system is analyzed.

SVANF
In this subsection, a generalized ANF framework based on SVGD is established.On the one hand, this method can estimate the state of nonlinear systems under time-varying noise more accurately.On the other hand, the general framework can be combined with any KF algorithm, such as EKF, UKF, and CKF.In figure 3, the input of SVANF not only includes the corrected state, control signal and measurement but also contains estimated covariance.In addition to the estimated state, the output also has gradient information of the logarithmic probability density function, which is used to update the covariance distribution.Based on the gradient information, the SVGD algorithm iteratively updates the particles generated by the prior distribution to approximate the target distribution.Naturally, the updated particles are used to estimate the posterior distribution of covariance, and then a new set of particles are generated based on this distribution for SVGD calculation at the next time step.In this paper, we only analyze the case of time-varying measurement covariance.That is, the measurement noise covariance R k+1 of the sensors is time-varying due to electromagnetic interference, while Q k is constant.Then, the joint posterior distribution of the nonlinear DT model described in equation ( 3) with random variable R k+1 can be written as where p (y k+1 |y 1:k ) is the normalization constant which is difficult to calculate.p (x k+1 , R k+1 |y 1:k ) and p (y k+1 |x k+1 , R k+1 ) are joint prior density and likelihood density, respectively.Because x k+1 and R k+1 are independent of each other, p (x k+1 , R k+1 |y 1:k ) can be represented as The first term on the right in equation ( 8) can be derived through Bayesian filtering.In the Bayesian filtering paradigm, the posterior density of state is updated iteratively through two basic steps: time update and measurement update [78].We only provide the final equations of Bayesian filtering in algorithm 1. Specific details can be found in [78].The second term on the right in equation ( 8) is the prior distribution of the measurement covariance at time k + 1.The inverse-gamma distribution with shape parameter α and scale parameter β defined by is usually used as the conjugate prior distribution of Gaussian distribution to approximate the target distribution of covariance.Because x k+1 and R k+1 are not related, R k+1 can be regarded as a constant temporarily in the current measurement update step in Bayesian filtering.Then based on ŷk+1|k and P zz,k+1|k , we can update R k+1 online with SVGD method.The main idea of SVGD is to approach the target distribution gradually with a set of particles R 0 k+1,i N i =1 initialized by a prior distribution.Through gradient information provided by minimizing the Kullback-Leibler divergence between posterior and prior densities, the particles are driven toward the direction of the target distribution iteratively [79], which can be expressed as where ker . N is the total number of particles, t is iterations, ε is learning rate, ker (•, •) is a RBF kernel, h is the kernel bandwidth and med (•) is the median of the pairwise Euclidean distance.We find that the derivation of the SVGD method only involves gradient information of the target distribution, avoiding the tedious calculation of the normalization coefficient.In algorithm 1, we list the general forms of numerical calculation method.It usually uses a set of points {x * s }

S s=1
to pass through the system and measurement equations, and then computes the mean of the propagation results to complete the state estimation.
Estimate the cross-covariance matrix

FRD
The reason for adding the FRD into the architecture is to obtain the mean filtering residual.Wavelet analysis is an excellent method in dealing with signal transformation and denoising.It represents the original residual signal ∆y w k by a weighted sum with wavelet coefficients [80].The wavelet coefficients include different scales that correspond to different degrees of approximation to the original data [81,82].In terms of residual signal, this means that the lower frequencies, which usually contain useful messages about state error, are represented by a small number of larger scale coefficients, while high-frequency noises are represented by a large number of small coefficients at the finer scale.Hence, the main idea of wavelet denoising is to selectively eliminate small-scale wavelet coefficients through a thresholding trick.To obtain high quality error data from residuals, this paper proposes an ESID denoising method which has two advantages.First, the wavelet denoising method based on interval-dependent thresholding can better calculate the mean of residuals under time-varying noise.When encountering scenarios with uncertain noise covariance, this method still has good noise reduction capacity.Second, the exponential smoothing method is employed to reduce occasional vibrations after wavelet denoising.The ESID includes four steps: (1) Calculate wavelet coefficients using C w = W w • ∆y w k , where W w is the transform matrix; (2) Threshold the wavelet coefficients using soft thresholding [83] in the different intervals, which can be written as where T w is the threshold; (3) Estimate the wavelet denoised signal by applying the inverse transform matrix to the modified coefficients as ∆ŷ w k = W −1 w • D w ; and (4) Compute the output denoised signal based on exponential smoothing method, which is expressed as where γ is weight parameter.

NNSC and MN
The NNSC block provides state compensation in real time according to the residual mean calculated by FRD.Due to model inaccuracy and accumulation of computational errors, the estimated state xk+1|k+1 of SVANF will deviate from the true value gradually.Thus, the function of NNSC is to infer reasonable state deviation ∆x k+1 based on the extracted observation error ∆ŷ k , and then form the compensated state ⌢ x k+1|k+1 .Generally speaking, the network for building NNSC can be any type such as multi-layer feedforward neural network (MLFNN), CNN and RNN.In this paper, considering the real-time ability of DT system, we choose MLFNN which has less computational burden and training cost to form the NNSC.Additionally, the update of the weight parameters depends on the setting of threshold T n .When the measurement error ∆ŷ k exceeds the threshold, the weights are trained online until they are less than the threshold again.The purpose of approximating the measurement function through MN is to improve the convergence speed during the training process of NNSC.The magnitude order of each parameter in the original measurement equations may have significant disparity, which will deteriorate the gradient optimization path and slow down the convergence speed.Moreover, complex derivative calculations may hinder the parameter training process.To avoid these issues, we use MLFNN to fit the measurement equations.Under the assumption that the measurement equations are unchanged, we can use historical data for offline training.The weights in MN can be regarded as constant when updating that of NNSC online.Therefore, we can reasonably suppose that the predicted observation y p k of the fully trained network is extremely close to that of the original equations.Finally, the full computational process of DA architecture is shown in algorithm 2. As a generalized architecture for handling multiple uncertainties in the process industry, one can design appropriate algorithms for each block based on different physical plants.In the next subsection, we will analyze the convergence of the DA architecture.

Convergence analysis
DA needs to ensure that the digital state of DT is consistent with the physical state.Therefore, we should analyze the convergence of DA.As described in section 2, our goal is to minimize the residual mean µ 1:k+1 .Let us first give two lemmas before our theorem.
Lemma 1 (Weierstrass approximation theorem) [84]: for any continuous function G on a closed interval and given ε > 0, there is an algebraic polynomial Lemma 1 is also known as the universal approximation theorem.This indicates that continuous functions on closed intervals can be uniformly approximated by polynomial series.For a fully trained neural network, or with appropriate parameters W N and b N , the deviation between the output and target values is bounded, meaning that the output value will eventually converge to the neighborhood of the target value.
Lemma 2 (smoothing theorem) [80]: for an unknown function sequence {µ with additive noise in the form of ∆y w k+1 (t i ) = µ (t i ) + σξ (t i ), where ξ ∼ N (0, 1) is standard Gaussian white noise, σ is the noise level, t i = iN w and N w are the sample number, there are universal constants (π n ) with π n → 1, and constants C w (F w , ψ w ) depending on the function space F w [0, 1] ∈ S w and on the wavelet basis ψ w , such that the estimated function sequence {μ where Pr {•} represents the probability.Lemma 2 indicates that the estimated function μ is, with probability tending to 1, at least as smooth as µ in every smoothness space F w taken from the scale S w .Thus, if the µ is bounded, then the μ is also bounded and there is a maximum estimation error satisfying For matrices composed of multi-dimensional state vectors, they can be decomposed into function sequences by row, and the wavelet denoising results in each dimension satisfy equation (18).Then, based on lemmas 1 and 2, we can have the theorem 1 as follows.
Theorem 1: For a given constant J and estimated residual mean μk , the established state error-compensation model and measurement model where W sc , b sc , W mn and b mn are the weight and bias for F NNSC and F MN , respectively.
Proof : based on the absolute value inequality, the left side in equation ( 19) can be written as According to lemma 1, for nonlinear polynomial functions composed of F NNSC and F MN , uniform approximation to the estimated mean μk+1 , which is used as the target value, can be achieved through the applicable parameters after sufficient training.The first item on the right side of equation ( 20) is denoted by Substitute ( 17) into (20), we have max Let J = ε + M w , theorem 1 is proved.As long as the parameters of F NNSC and F MN , wavelet function and threshold are selected appropriately, then ε + M w may be small enough and tend to 0 gradually.In short, the DA architecture in this paper ensures the convergence performance of the estimation results under multiple uncertainties, that is, the estimation results will gradually approximate the true value, and the error will be bounded.In the next section, we further demonstrate the effectiveness of DA through numerical experimental results.

Experimental results and analysis
In this section, we evaluate the accuracy and robustness of the proposed DA architecture through specific experimental cases.In a typical process industry, the boiler system in a coal-fired power plant is chosen as the test object.The experiments are implemented in MATLAB 2022a.In section 5.1, we compared the estimation results of the proposed method with existing methods.Then, some performance investigations of the proposed method are presented in section 5.2.

Experimental results
To evaluate the effectiveness of the proposed method, we compared it with existing methods, including CKF and VBACKF.
The discrete-time state space DT model of the boiler system is written as where The A, B and C are system, control and observation matrices, respectively, which are represented as where The physical meanings of above parameters are shown in the nomenclature of the symbol.The parameter values of DT model are shown in appenidx A1.In the boiler system, the pulverized coal mixes with the secondary air to heat the water wall through combustion.The reason for choosing this system is that its working process is influenced by various complex time-varying parameters and noise.As one of the most typical and concerned time-varying disturbance parameters, the net calorific value Q net will change with types and composition, which seriously affect the estimation accuracy of gas temperature and oxygen content.Even worse, obtaining accurate measurements of Q net through sensors is high-priced and impracticality.Similarly, the air density ρ a and specific heat capacity of gas C gs are also time-variant due to factors such as temperature and gas ingredients.Treating them as constants also reduces the predictive performance of the model.Therefore, in our experiment, the Q net , C gs and ρ a are selected as time-varying parameters.First, we assume that the nominal values of these three parameters in the DT model are exactly equal to the corresponding physical parameters.Later, due to disturbance factors, these three parameters in the physical system gradually change at 1000 s, which are defined as where Q 0 net = 20.899MJkg −1 , C 0 gs = 2.03kJ (kg • C) −1 and ρ 0 a = 1.273kg m −3 are initial values.Meanwhile, assuming that the covariance of the process noise remains unchanged, and that of the measurement noise undergoes a step change during 2000 s to 3000 s due to electromagnetic interference or sensor failure, which can be described as For FRD, the receding window length of ESID wavelet denoising is set to be 400.The wavelet basis function and decomposition level required in the MATLAB wavelet toolbox that we use is set to be db1 and 9, respectively.Moreover, the weight γ is 0.9.To quantitatively evaluate the filtering performance, the simulation results are based on MC = 100 Monte Carlo runs, and then the average root mean square error (RMSE) containing all states is used as the benchmark to compare the performance of different algorithms, which is described as where T da is the length of time sequence and agm means the corresponding algorithm.xmc i,k and x mc i,k are the estimated and true values of the ith state in the mcth Monte Carlo run, respectively.
The simulation results of states through CKF, SVACKF, SVGD-based DA (named NNW-SVACKF) and VB-based DA (named NNW-VBACKF) schemes are shown in figures 4 and 5.The upper subgraphs in the figures depict the global trend of parameters contained in all methods, and the lower subgraphs, as a supplement, show the details of curves.The gas density ρ gs is latent variable, so there is no observation curve.The result of gas flow V gs is not displayed because it is not affected by parameter changes.As mentioned earlier, the system parameters change gradually after 1000 s.Compared to the CKF and SVACKF methods, NNW-SVACKF and NNW-VBACKF can compensate timely when there is a small deviation in the state, and make the estimation results converge to the true value gradually.The estimation error generated by SVACKF is the largest because it does not have the ability to evaluate model accuracy.The SVACKF supposes that the description of the DT model for the real system is accurate enough and almost identical, while the observation errors are caused by time-variant noise.Conceivably, SVACKF attributes all errors to changes in variance, and the estimation results of the state are nearly equal to that before parameters change.To confirm the above statement, we present the variance estimation results of each observation of SVACKF in figure 6.It can be seen that after attributing all errors to the noise variance, the estimated variance of the algorithm deviates significantly from the corresponding true value set in equation (25).Due to the inability of CKF to predict noise variance and compensate state error, its estimation results are also not ideal.In addition, between 2000 s to 3000 s when the noise variances undergo a step change, the estimation results of NNW-SVACKF are less affected than that of NNW-VBACKF.The results in table 1 indicate that the average RMSE of NNW-SVACKF is the lowest, which is smaller than that of NNW-VBACKF by over 39%.Furthermore, the RMSEs of these two methods are only 2.2% and 6.7% of that of SVACKF and CKF, respectively.Clearly, NNW-SVACKF has a stronger coping ability for state deviations caused by model inaccuracy and uncertain noise.From the above experimental results, it can be seen that the DA architecture proposed in this paper can significantly maintain the consistency of DT with physical system compared to existing methods when the physical system encounters both parameter changes and variance mutations simultaneously.In other words, the proposed DA architecture can effectively deal with time-varying parameters and inaccurate noise covariance, and can meet the requirements of DT.In the next subsection, we further explore the performance of DA under different methods, parameters, and variance change frequencies.

Performance analysis
In this subsection, we further test the robustness of the DA system under different methods, parameters, and variance change frequencies.First, the impact of FRD with different methods on the filtering results for the whole system is analyzed.We then compare the estimation of the filtering results under different weights using the exponential smoothing trick.Finally, in order to test the tracking ability of the DA system for variance signals, we provide estimation results for noise under sinusoidal variance at different frequencies.

The effect of FRD block with different methods.
To analyze the impact of FRD on DA stability and convergence, we compare the single-test estimation results of three methods, FRD-free, wavelet denoising-based FRD (WD-FRD), and ESID WD-FRD, as shown in figures 7 and 8.If we do not use the FRD to pre-eliminate the noise in filtering residuals, the stability and convergence of the estimation results are not satisfactory.From the results of estimated states with  After selecting ESIDWD as the implementation for the FRD, we further compare the influence of different exponential smoothing weights γ on the estimation results, as shown in figures 9 and 10.We simulate the single-test results under different weights setting as 0.6, 0.8, 0.9 and 0.99, respectively.The results show that setting the weight value between 0.8 and 0.9 is appropriate.If the weight is less than 0.8, even though the estimation results can still converge gradually, there will be significant oscillations when the covariance changes.Conversely, when the weight is set too large, although it ensures the smoothness of the estimation results and the stability of the DA system, the real-time and rapidity performance is compromised.The average RMSEs of different weights are shown in table 3. It can be seen that the RMSEs of the weights between 0.8 and 0.9 are relatively  small.Therefore, the default value of γ is set to be 0.85.The parameter values are shown in idx A2.

The tracking ability of the for noise variance.
In this subsection, we aim to investigate the tracking ability of the SVANF block for noise variance under different frequencies.In practice, the noise variances may change with some frequency.If the transition rate of noise variance reaches a certain level, the estimated variance will not be able to track the true value in real time.Obviously, exploring the frequency limit that the DA system can track is meaningful.In this paper, we provide the tracking results of the DA system for noise with sinusoidal variance under different frequencies, as shown in figure 11.The parameter values are shown in appenidx A2.The covariance of measurement noise undergoes a sinusoidal change during 1500 s to 3500 s, which can be written as     The ω s representing the frequency is set to be 0.02, 0.05, 0.08, 0.12, 0.15 and 0.18, respectively.Since the tracking results for each noise variance are similar, to retrench the   Based on the average RMSEs of states in table 5, it can be seen that the high changing frequency of noise variance only slightly affects the state estimation accuracy of the proposed architecture.Therefore, the high frequency of noise variance will significantly affect the variance estimation results of SVANF, but will not affect the state estimation results of DA markedly.

Conclusion
In this paper, a generalized DA architecture of DT for sophisticated process industry with high nonlinearity and multiple uncertainties is proposed, which includes SVANF, FRD, NNSC and MN.The experimental results show that, when the parameters and noise covariance are both time-varying, the proposed NNW-SVACKF assimilation architecture reduces prediction errors by 39%, 97.8%, and 93.3% compared to NNW-VBACKF, SVACKF, and CKF, respectively.We also analyze the performance of the DA system under different methods, parameters, and signal frequencies.First, the results of using different FRD blocks show that the estimation accuracy of ESIDWD-FRD has improved by 70% and 46% compared to FRD free and WD-FRD, respectively.Then, we set the default value of the exponential smoothing weight to be 0.85 because the RMSE of the system is lower when the weight is between 0.8 and 0.9.Finally, when the change frequency of noise variances is larger than 0.1, the real-time tracking ability of SVANF for measurement noise covariance deteriorates and affects the estimation performance of DA systems.In summary, the experimental results show that the proposed method can improve the estimation performance of DT system under model inaccuracy and unknown noise covariance.Currently, we only consider the case where the measurement noise is unknown.In the future, we shall further investigate the situation where process noise is also time-variant.

Figure 2 .
Figure 2. The data assimilation architecture for digital twin.

Figure 3 .
Figure 3.The implementation scheme of the DA architecture.

Figure 4 .
Figure 4.The estimation results of gas density and temperature.

Figure 5 .
Figure 5.The estimation results of boiler pressure and gas oxygen content.

Figure 6 .
Figure 6.The measurement variances of

Figure 7 .
Figure 7. DA estimation results of gas density and temperature.

Figure 8 .
Figure 8. DA estimation results of boiler pressure and gas oxygen content.

Figure 9 .
Figure 9. DA estimation results of gas density and temperature.

Figure 10 .
Figure 10.DA estimation results of boiler pressure and gas oxygen content.

Figure 11 .
Figure 11.The estimated noise variance of gas oxygen content with different frequencies.
Qnet net calorific value, MJ kg −1 Ha air enthalpy, kJ kg −1 N (x|µ, ∑ ) the Gaussian distribution with random variables x, mean µ, and covariance Σ k /y k /w k /v k /Q k / R k /µ k /x true k /y o k /y p k / ∆x k /∆y * k /∆y w k    the state (input/measurement/process noise/measurement noise/process covariance/measurement covariance/residual mean/true state/actual measurement/predicted measurement/error compensation/ideal residual/actual residual) of time step k x 1:k (y 1:k /R 1:k /µ 1:k ) the state (measurement/measurement covariance/residual mean) sequence from time step 1 to time step k xk+1|k (P k+1|k /ŷ k+1|k /P zz,k+1|k /P xz,k+1|k ) the estimated state (state error covariance/measurement/measurement associated covariance/ cross-covariance) of step k + 1 at step k xk|k ( ⌢ x k|k /P k|k / αk|k / βk|k ) the updated state (compensated state/state error covariance/shape parameter/scale parameter) of step k at step k s=1 a set of propagation points and corresponding weights in ANF Wsc/bsc the weight/bias of NNSCT N /Tn the total number of iterations/update threshold in NNSC Ww/Cw/Dw the transform matrix/wavelet coefficients/refined wavelet coefficients in FRD γ weight parameter of exponential smoothing in FRD Vgs outlet gas flow rate, m 3 s −1 ρgs gas density, kg m −3 Tgs gas temperature in furnace, • C Ocp gas oxygen content, % W f inlet coal flow rate, kg s −1 Va inlet air flow rate, m 3 / • C Cgs specific heat capacity of gas, kJ (kg • • C) −1 ρa air density, kg m −3 nonlinear processes and measurement functions.wk and v k+1 are process and measurement white noise with covariances Q k and R k+1 .pcon k ∈ R c and p var k ∈ R v are constant and time-varying parameters, respectively.In this paper, we make the following assumptions.
Assumption 1: The noise w k and v k+1 are mutually independent.Assumption 2: The parameters p con k and process noise covariance Q k are exactly known, whereas the knowledge about p var k and measurement noise covariance R k+1 are inaccurate.Assumption 3: The nonlinear function h (•) is exactly known.
the compensated posterior estimated state, xk|k is the posterior estimated state, ∆x k is the error compensation for the estimated state xk|k at time step k.Let x true (5)orithm 2. DA algorithm for DT system.Infer the state error ∆x k+1 based on observation error ∆ŷ k ;(5)if ∆ŷ k > TnTrain the weighs of NNSC online using error ∆ŷ k ; ⌢ x k|k , u k , y o k+1 , y o k , y p k , P k|k , Q k .⌢ x k+1|k+1 .⌢ x k+1|k+1 , y p k+1 .

Table 1 .
The average RMSE of different algorithms.

Table 2 .
The average RMSE of different algorithms.

Table 4 .
The average RMSE of noise variance with different frequencies.
Figure 12.The estimation results of gas oxygen content under different noise variance frequencies.

Table 5 .
The average RMSE of states with different noise variance frequencies.paper,weonlypresentthe estimation results for gas oxygen content O cp in figure11.The experimental results indicate that when the changing frequency of noise variance is larger than 0.1, the estimation accuracy of SVANF for noise variance decreases.The average RMSEs shown in table 4 also illustrate that the system has better estimation performance for noise variance as long as the frequency is within 0.08.Nonetheless, the estimation results of SVANF for the state O cp are not significantly affected, as shown in figure12.

Table 8 .
Different parameter values in 5.2.