Parameter estimation via weak measurement with machine learning

Wenhuan Liu; Jingzheng Huang; Yanjia Li; Hongjing Li; Chen Fang; Yang Yu; Guihua Zeng

doi:10.1088/1361-6455/aafbb0

1. Introduction

Weak value amplification (WVA) [1] has been developed as a new metrological technique and successfully employed in various parameter estimation tasks [2–6]. Although extremely high precision can be achieved at laboratory [7], the practical performance of WVA is highly depended on varied kinds of technical noises [8].

In the previous works, estimation errors induced by constant but unknown deviation (e.g. alignment errors [9]) and short-time correlated noises (e.g. white noises [10]) had been taken into consideration. However, the problem of overcoming estimation errors induced by long-time correlated noises (e.g. 1/f noise [11] and 1/f² noise) has not been sufficiently studied. For the short-time correlated noises (noise correlation time much shorter than the mean time between successive measurements), estimation error can be effectively reduced by increasing the measurement time. In contrast, for the long-time correlated noise (noise correlation time similar to or longer than the mean time between successive measurements), estimation error will increase along with the measurement time. To overcome the long-time correlated noises, an error compensation process should be performed, and the main obstacle is to predict the time-varying error in the condition that varying trend of noise is highly unpredictable.

Instead of deriving an explicit model to predict the time-varying error, in this work we employ machine learning algorithm to 'learn' the time-varying trend of errors from existing experimental data. In recent years, machine learning has shown its power in many applications, such as face recognition [12], speech recognition [13] and natural language processing [14]. For its applications in parameter estimation, machine learning has been applied to some specific metrological tasks, such as qubit-phase [15] and temperature drift [16]. As far as we know, it is the first time machine learning algorithm is employed to improve the performance of weak measurement experiment.

Among the varied kinds of machine learning algorithms, long short-term memory (LSTM) network [17] can handle long delays and signals with a mix of low and high frequency components, which fulfills our requirement. Therefore this algorithm is adopted in our scheme design, where the details are presented in section 2. In order to verify the feasibility of our scheme, we carry out a time-delay measurement experiment and achieve the noise compensation via machine learning algorithm. By simulating the 1/f² noise through a random modulation on a nonlinear liquid crystal and compensating the consequential error through machine learning algorithm, we experimentally demonstrate a 6 dB reduction of mean-square error (MSE), comparing to the original setup without error compensation process.

This paper is organized as follows: in section 2, we introduce the scheme of applying machine learning for error compensation in weak measurement. In section 3, we designed and performed an experiment to verify our scheme, where the experimental results and discussions are summarized in section 4. Finally in section 5, a brief conclusion is presented.

2. Scheme

2.1. Weak-value based on parameter estimation

WVA can be used to obtain a small coupling parameter estimated result. As was pointed out in [1], with a proper post-selection state of the system, the average shift of the pointer can be much larger than any eigenvalue of the observation. The amplification effect of weak measurement has been proved effective by many practical experiments, such as the spin Hall effect of light [2], small time-delay [6], optical frequency shift [18], temperature measurement [5], optical phase shift [4], etc.

Consider a physical interaction between a system and a pointer, which is described by a Hamiltonian $H=g\hat{A}\otimes \hat{P}$ . In this case, $\hat{A}$ and $\hat{P}$ are operators act on the system and pointer respectively, g characterizes the coupling strength. Suppose the initial state of the system and pointer are $| {\varphi }_{i}\rangle$ and $| {\phi }_{i}\rangle$ respectively, after interaction the joint state of the system and pointer evolves to $| {\rm{\Phi }}\rangle =\exp (-{ig}\hat{A}\otimes \hat{P})| {\varphi }_{i}\rangle | {\phi }_{i}\rangle$ .

Afterward, the system is post-selection by $| {\varphi }_{f}\rangle$ , and the pointer then collapses to the (unnormalized) state $| {{\rm{\Phi }}}^{{\prime} }\rangle \,=\langle {\varphi }_{f}| \exp (-{ig}\hat{A}\otimes \hat{P})| {\varphi }_{i}\rangle | \phi \rangle$ . When the strength of the interaction is sufficiently weak (g ≪ 1), approximately we have: $| {\phi }_{f}\rangle \approx \langle {\varphi }_{f}| {\varphi }_{i}\rangle (1-{{igA}}_{w}\otimes \hat{P})| {\phi }_{i}\rangle$ , where ${A}_{w}=\tfrac{\langle {\varphi }_{f}| \hat{A}| {\varphi }_{i}\rangle }{\langle {\varphi }_{f}| {\varphi }_{i}\rangle }$ is so-called the weak value [1]. In general, the weak value is a complex value, with its real and imaginary parts playing different roles in the amplification effect [19].

In the case of $g| {A}_{w}| \ll 1$ , $| {\phi }_{f}\rangle$ can be rewritten as $| {\phi }_{f}\,\rangle \approx \exp (-{{igA}}_{w}\hat{P})| {\phi }_{i}\rangle$ . Let p be the eigenvalue of $\hat{P}$ , the shift of average value of p between initial and final is given in [19] $\langle p{\rangle }_{f}-\langle p{\rangle }_{i}=2{gIm}({A}_{w})({{Var}}_{p})$ , where $\langle p{\rangle }_{(i,f)}\,\equiv \langle {\phi }_{(i,f)}| p| {\phi }_{(i,f)}\rangle$ is the average values of p that can be acquired from experiment, Im(A_w) denotes the imaginary part of A_w, and Var_q is the variance of the initial pointer state. According to this linear relation, the value of g can be estimated by

$\begin{eqnarray}&&\hat{g}=\displaystyle \frac{\langle p{\rangle }_{f}-\langle p{\rangle }_{i}}{2{Im}({A}_{w})({{Var}}_{p})}.\end{eqnarray} \tag{ 1 }$

We note that the scale factor can be modulated by Im(A_w).

In practice, estimation errors are unavoidable, and $\hat{g}$ can be expressed as $\hat{g}(t)={g}_{0}(t)+{\rm{\Delta }}\hat{g}(t)$ . Here, g₀ denotes the true value of the parameter, and ${\rm{\Delta }}\hat{g}$ denotes the estimation error which is introduced by the disturbances in experiments. The dependence on t reflects the fact that both g₀ and ${\rm{\Delta }}\hat{g}$ are time-varying. While adaptive method can be applied to maintain high precision with time-varying signal [20], techniques for overcoming time-varying error in weak measurement are not fully developed. In previous works, only unknown but constant errors (e.g. misalignment error [9]) and fast-varying and short-term correlated technical noises (e.g. jitter noises, air turbulences [21] and 1/f noise [11]) are taken into account. Alternatively, we study how to deal with the slow-varying and long-term correlated noises (e.g. 1/f² noise [22]) existing in weak measurement.

In theory, estimating errors can be sufficiently reduced by averaging plenty of measurement results. For the short-term correlated noises, estimating error drops rapidly along with the averaging time over measurement results, because of the correlated time of noise is short. However for the long-term correlated noises, a much longer averaging time is required (e.g. for days or months), which is obviously impractical. Therefore, compensating the time-varying deviation is a key for enhancing the estimating accuracy in weak measurement. Among the various error compensating methods that had been proposed [23, 24], machine learning has been applied for error compensation in many applications [15, 25] and achieved good performances. Here, for the first time, we introduce this technique for parameter estimation via weak measurement.

2.2. Error compensation with assistance of machine learning

The most significant difficulty of compensating errors is to derive a model that can precisely predict the time-varying errors in future, because the complex disturbances are nearly impossible to be fully understood. On the other hand, machine learning [26] uses statistical techniques to 'learn' from data and progressively improve performance on specific tasks [27, 28] without being explicitly programmed, which makes it a promising solution for error compensation. Particularly, in order to apply machine learning in weak measurement with time-varying errors, an algorithm that can handle long-term correlated errors (i.e. with low frequency component) must be found.

As one of the most important machine learning algorithms, LSTM network [17] is a kind of recurrent neural network [29] that can fulfill this requirement. In particular, LSTM can prevent the back-propagation errors from vanishing by the memory block [29], which contains one or more memory cells and a pair of adaptive, multiplicative gating units which gate input and output to all cells in the block. Hence LSTM network can handle the long-term and short-term time dependent complex sequential data that have a mixture of low and high frequency components. The basic LSTM neural network structure and the computation process will be explained in the appendix A.

The whole error compensation progress is divided into three periods: data preparing, training and compensation. While the training period obtains the LSTM model, the compensation period performs error compensation based on this established LSTM model. Details of these periods are presented as follows.

In the data preparing period, we need to carry out data preparation and data preprocessing. Firstly, we obtained a series of parameter estimating error data $\{{\rm{\Delta }}{\hat{g}}_{1},{\rm{\Delta }}{\hat{g}}_{2},{\rm{\Delta }}{\hat{g}}_{3}...\}$ from experiment by cutting off the signal (i.e. g₀ = 0). We note that the order of these data is arranged according to the order of acquisition. These data are then divided into three sets, namely the training set, the validation set and the testing set, with a proportion of 7:1:2. The training set is used for deriving an optimal LSTM model for error compensation. The validation set is used to simply evaluate the model in the training process, because we can see the change of model error in the validation set during the whole model training process. The testing set is designed to accurately evaluate models via the rolling prediction (see section 3.2). The difference between the validation set and the testing set will be seen later in the application process.

In the training period, we will derive an optimal LSTM model for error compensation, and the basic process is depicted in figure 1. After initializing the hyper-parameters (which is a set of parameters that determining the basic structure of the model, such as the number of layers), the training set is used for deriving a LSTM model. Suppose the length of training data is T, we transform these data into two formatted matrices X_train and Y_train, which can be expressed as:

$\begin{eqnarray*}&&{X}_{{train}}=\left[\begin{array}{c}{X}_{1}\\ {X}_{2}\\ ...\\ {X}_{T-N}\end{array}\right]=\left[\begin{array}{cccc}{\rm{\Delta }}{\hat{g}}_{1} & {\rm{\Delta }}{\hat{g}}_{2} & ... & {\rm{\Delta }}{\hat{g}}_{N}\\ {\rm{\Delta }}{\hat{g}}_{2} & {\rm{\Delta }}{\hat{g}}_{3} & ... & {\rm{\Delta }}{\hat{g}}_{N+1}\\ ... & ... & ... & ...\\ {\rm{\Delta }}{\hat{g}}_{T-N} & {\rm{\Delta }}{\hat{g}}_{T-N+1} & ... & {\rm{\Delta }}{\hat{g}}_{T-1}\end{array}\right]\end{eqnarray*}$

$\begin{eqnarray*}&&{Y}_{{train}}=\left[\begin{array}{c}{Y}_{1}\\ {Y}_{2}\\ ...\\ {Y}_{T-N}\end{array}\ \right]=\left[\begin{array}{c}{\rm{\Delta }}{\hat{g}}_{N+1}\\ {\rm{\Delta }}{\hat{g}}_{N+2}\\ ...\\ {\rm{\Delta }}{\hat{g}}_{T}\end{array}\right].\end{eqnarray*}$

Here, we denote X_train as the training sample matrix and Y_train as the predicted target value vector of training samples. The jth row of the X_train matrix corresponds to the jth sample X_j, and Y_j corresponds to X_j. By embedding matrices X_train and Y_train to the paradigm described in appendix A, a LSTM model can be established.

**Figure 1.** The basic progress of training period.
Download figure:
Standard image High-resolution image

The data of the validation set can be simply used to evaluate the performance of this model. And the composition of matrix X_vali and Y_vali can refer to the rule of composition of matrices X_train and Y_train. We can derive a series of prediction results according to X_vali. If the length of validation set is $T^{\prime}$ , the prediction result matrix ${\widetilde{Y}}_{{vali}}$ can be denoted as

$\begin{eqnarray*}\begin{array}{rcl}{\widetilde{Y}}_{{vali}} & = & \{{\widetilde{Y}}_{1}^{{vali}},{\widetilde{Y}}_{2}^{{vali}},\ldots {\widetilde{Y}}_{T^{\prime} -N}^{{vali}}\}\\ & = & \{{\rm{\Delta }}{\tilde{g}}_{N+1},{\rm{\Delta }}{\tilde{g}}_{N+2},\ldots {\rm{\Delta }}{\tilde{g}}_{T^{\prime} }\},\end{array}\end{eqnarray*}$

where the subscripts of all data ${\rm{\Delta }}\tilde{g}$ correspond to the order of data in the validation set. The performance of the current LSTM model can then be evaluated by the prediction accuracy, which is calculated from the MSE between ${\widetilde{Y}}_{{vali}}$ and Y_vali:

$\begin{eqnarray*}&&{\tilde{\sigma }}_{Y}=\displaystyle \frac{1}{{M}}\sum _{k=1}^{{M}}{({\widetilde{Y}}_{k}^{{vali}}-{Y}_{k}^{{vali}})}^{2},\quad ({M}={T}^{\prime} -{N}).\end{eqnarray*}$

According to the evaluating result, we can simply decide how to adjust the hyper-parameters for the next training turn or choose the optimal trained LSTM model for error compensation.

Finally in the compensation period, we apply the optimal trained LSTM model to predict the forthcoming error ${\rm{\Delta }}\tilde{g}$ in a parameter estimation process with nonzero signal ( ${g}_{0}\ne 0$ ). To obtain the future deviation prediction ${\rm{\Delta }}\tilde{g}$ without all the LSTM model input X_k via the past real estimated results ${\rm{\Delta }}\hat{g}$ , a rolling forecast can be used. In this paper, we propose two different rolling prediction strategies based on the established LSTM model. The two prediction strategies will be explained in the section 3.2 In order to testify the performance of error compensation, we can preset a number of parameter values, say g₀ = {g₀₁, g₀₂...g_0L}, and then form a testing set as ${\rm{\Delta }}\hat{g}=\{{\rm{\Delta }}{\hat{g}}_{1},{\rm{\Delta }}{\hat{g}}_{2},\ldots ,{\rm{\Delta }}{\hat{g}}_{L}\}$ , where ${\rm{\Delta }}{\hat{g}}_{k}={\hat{g}}_{k}-{g}_{0k}$ with k = 1, 2, ..., L. After error compensation, the final output of the kth estimated result is given by

$\begin{eqnarray}\begin{array}{rcl}{\tilde{g}}_{k} & = & {\hat{g}}_{k}-{\rm{\Delta }}{\tilde{g}}_{k}\\ & = & ({g}_{0k}+{\rm{\Delta }}{\hat{g}}_{k})-{\rm{\Delta }}{\tilde{g}}_{k}\\ & = & {g}_{0k}+({\rm{\Delta }}{\hat{g}}_{k}-{\rm{\Delta }}{\tilde{g}}_{k}).\end{array}\end{eqnarray} \tag{ 2 }$

The performance of error compensation can then be evaluated by the MSE between ${\rm{\Delta }}\tilde{g}$ and ${\rm{\Delta }}\hat{g}$ :

$\begin{eqnarray}\begin{array}{rcl}{\sigma }_{g} & = & \displaystyle \frac{1}{L}\displaystyle \sum _{k=1}^{L}{({\tilde{g}}_{k}-{g}_{0})}^{2}\\ & & =\displaystyle \frac{1}{L}\displaystyle \sum _{k=1}^{L}{({\rm{\Delta }}{\tilde{g}}_{k}-{\rm{\Delta }}{\hat{g}}_{k})}^{2}.\end{array}\end{eqnarray} \tag{ 3 }$

Obviously, σ_g should be as small as possible, while the program efficiency should also be taken into account.

3. Experiment

3.1. Time-delay experiment

To verify our scheme, we perform a time-delay measurement experiment via WVA to obtain the data and inject some noises into the system artificially. Firstly, we perform weak measurement experiments many times to get time-delay data. As already mentioned in section 2, these time-delay data are the deviation value ${\rm{\Delta }}\hat{g}(t)$ . Secondly, we establish the LSTM model in the training period and obtain the future deviation prediction value ${\rm{\Delta }}\tilde{g}$ according to the testing set. Finally, via the testing set data ${\rm{\Delta }}\hat{g}$ , we subtract the prediction deviation ${\rm{\Delta }}\tilde{g}$ from the value ${\rm{\Delta }}\hat{g}$ to obtain the noise compensation result $({\rm{\Delta }}\hat{g}-{\rm{\Delta }}\tilde{g})$ . Next we explain our experiment procedure.

The setup diagram of the experimental system is shown in figure 2. It has been proposed that white light can be used for a very precise phase estimation [7, 9], when weak measurements are performed. Hence we utilize white light from a commercial light-emitting diode (LED) in our experiment. The light with a measured central wavelength of 778 nm and a spectral width of 20 nm passed through a collimated optical path consisting of a lens and pinhole and entered the first Glan polarizer, which modulated by pre-selection polarization state. Then, the light entered a Sagnac interferometer, which is composed of a polarized beam splitter (PBS), two mirrors and the nematic liquid crystal (NLC). The PBS separates the beam into two beams with horizontal and vertical polarizations. NLC, with a wide voltage range, was placed vertically in a Sagnac interferometer and considered as a phase modulator sensor. A time-delay τ is introduced after light passing through the Sagnac interferometer resulting from the NLC sample. If we change the voltage of NLC, the time-delay value will change accordingly. The voltage of NLC can be precisely controlled with accuracy of 0.001 V by the computer. Then the light passed a quarter-wave plate (QWP) rotated 45° to the fast axis and the second Glan polarizer, which is nearly orthogonal to the first one. Finally we use spectrometer with a range of 690–850 nm to detect the output light.

In this experiment, we use the optical angular frequency as the meter, which obeys Gaussian distribution, i.e. $f{(\omega )=C\exp [-(\omega -{\omega }_{0})}^{2}/2{\sigma }^{2}]$ . Here, ω₀ and σ are the central value and variance of the optical angular frequency, respectively, and C is the normalized factor. On the other hand, we use the optical polarization as the system, with pre-selection and post-selection states of $| {\varphi }_{i}\rangle =\tfrac{1}{\sqrt{2}}(| H\rangle +i| V\rangle )$ and $| {\varphi }_{f}\rangle =\tfrac{1}{\sqrt{2}}(-{{ie}}^{i\epsilon }| H\rangle +{e}^{-i\epsilon }| V\rangle )$ , respectively. Here, $| H\rangle$ and $| V\rangle$ represent the horizontal and vertical polarization respectively.

The experimental process can be described by the mathematical model derived in section 2.1, with replacing $p\to \omega$ and $g\to \tau$ . After post-selection, the spectral shift of the light is given by [5]:

$\begin{eqnarray*}&&{\rm{\Delta }}\omega ={\omega }_{f}-{\omega }_{0}=\displaystyle \frac{\tau {\sigma }^{2}}{{P}_{{pass}}}\sin [2({\omega }_{0}\tau +\epsilon )],\end{eqnarray*}$

where ω_f is the average optical angular frequency of the final state and ${P}_{{pass}}=0.5\{1-\exp (-{\sigma }_{2}{\tau }_{2})\,\cos [2({\omega }_{0}\tau +\epsilon )]\}$ is the successful post-selection probability. As in the weak measurement condition τ ≪ 1, by taking the first-order approximation we have P_pass ≈ ² and ${\rm{\Delta }}w=\tfrac{{\sigma }^{2}\tau }{\epsilon }$ . Accordingly, we can get the time-delay τ from the spectral shift of light.

To verify the performance of our scheme, we experimentally simulate a long-time correlated noise, i.e. 1/f² noise, which have not been studied in the context of weak measurement. As the power spectrum of 1/f² noise has a form of $S(f)\propto \tfrac{{S}_{0}}{{f}^{2}}$ , to mimic this noise we modulate a varying time-delay by randomly modulating the voltage on NLC, with a time varying function being the inverse Fourier transform of S(f), i.e. S(t) = F⁻¹[S(f)].

As was mentioned in section 2, we set ${\tau }_{0}=0$ . By performing the above weak measurement experiments, we can obtain lots of time-delay data ${\rm{\Delta }}\hat{\tau }$ . Via the rolling prediction strategies and the LSTM model obtained after the training period, we can predict the future time-delay deviation data ${\rm{\Delta }}\tilde{\tau }(t)$ . Then we can achieve the process: $\tilde{\tau }(t)=\hat{\tau }(t)-{\rm{\Delta }}\tilde{\tau }(t)\,={\tau }_{0}(t)+({\rm{\Delta }}\hat{\tau }(t)-{\rm{\Delta }}\tilde{\tau }(t))$ . Here we propose two prediction strategies (see next section for details).

Finally, we use the value ${\tilde{\tau }}_{k}={\hat{\tau }}_{k}-{\rm{\Delta }}{\tilde{\tau }}_{k}$ as the estimated value of time delay, and ${\tilde{\tau }}_{k}$ is the algorithm compensation result. In the actual application process, the τ₀(t) can be arbitrary value. However, in order to observe conveniently the compensation result of the model, we still set τ₀ = 0. Hence the data after algorithm compensation is the error between the predictor ${\rm{\Delta }}\tilde{\tau }(t)$ and the real value ${\rm{\Delta }}\hat{\tau }(t)$ . In this paper, the measurement-stage of the second strategy is omitted because the deviation data of the testing set has been obtained beforehand. But in the actually application process, the measurement-stage can not be omitted, because the prediction-stage collecting the data $\hat{\tau }$ and the measurement-stage collecting the data ${\rm{\Delta }}\hat{\tau }$ can't be carried out at the same time.

3.2. The prediction strategies

In this section, we introduce the two prediction strategies in detail. The first prediction strategy and the second prediction strategy are both via rolling prediction. We explain the model via the testing set and the time-delay data.

Strategy I: firstly, we predict the data ${\rm{\Delta }}{\hat{\tau }}_{N+1}$ in testing set via the time-delay deviation $({\rm{\Delta }}{\hat{\tau }}_{1},{\rm{\Delta }}{\hat{\tau }}_{2},{\rm{\Delta }}{\hat{\tau }}_{3},\ldots ...{\rm{\Delta }}{\hat{\tau }}_{N})$ , where ${\rm{\Delta }}{\hat{\tau }}_{1}$ is the first data of the testing set. Then we use the predicted deviation ${\rm{\Delta }}{\tilde{\tau }}_{N+1}$ as an alternative value of the true time-delay deviation ${\rm{\Delta }}{\hat{\tau }}_{N+1}$ , and we can obtain the (N + 2)th predicted time-delay deviation ${\rm{\Delta }}{\tilde{\tau }}_{N+2}$ via the $({\rm{\Delta }}{\hat{\tau }}_{2},{\rm{\Delta }}{\hat{\tau }}_{3},\ldots ...{\rm{\Delta }}{\hat{\tau }}_{N},{\rm{\Delta }}{\tilde{\tau }}_{N+1})$ . This prediction strategy is actually replacing the true time-delay deviation ${\rm{\Delta }}\hat{\tau }$ with the predictor time-delay deviation ${\rm{\Delta }}\tilde{\tau }$ to form a new input sample. The prediction process is shown in the following figure 3. After such iterative rolling prediction, we can get the predictor time-delay deviation ${\rm{\Delta }}{\tilde{\tau }}_{T}$ at any time ( $T\geqq N$ ). The prediction strategy is very convenient. However, with the extension of the prediction time, the correlation between the data becomes weaker, and the prediction accuracy of the model decreases. So we can introduce the second prediction strategy.

**Figure 3.** The prediction process of the first prediction strategy via the testing set.
Download figure:
Standard image High-resolution image

Strategy II: the single implementation of this strategy consists of two stages, prediction-stage and measurement-stage. In the prediction-stage, it is similar to first prediction strategy, which just perform rolling prediction without add the deviation ${\rm{\Delta }}\hat{\tau }$ . Hence we can obtain the predictor time-delay deviation ${\rm{\Delta }}\tilde{\tau }$ within the time interval of the prediction-stage. In the actually application process, within the time interval of the prediction-stage, the experiment can be performed simultaneously to obtain the data $\hat{\tau }$ . In the measurement-stage, we carry out weak measurement experiment to obtain N deviation value ${\rm{\Delta }}\hat{\tau }$ used as the input of the LSTM model for the next prediction-stage.

We suppose we will predict the future L time-delay deviation data ${\rm{\Delta }}\hat{\tau }$ in the prediction-stage. After we obtain the (N + L)th predictor time-delay deviation ${\rm{\Delta }}{\tilde{\tau }}_{N+L}$ , we perform the measurement-stage. In the measurement-stage, we measurement N time-delay deviation ${\rm{\Delta }}\hat{\tau }$ for the next prediction-stage. The prediction-stage and measurement-stage are executed repeatedly for a long-term and accurate prediction. In other words, we use the real deviation data ${\rm{\Delta }}\hat{\tau }$ obtained in the measurement-stage as the input sample data X of model when we predict the first value at each new prediction-stage. The process is shown in the figure 4.

**Figure 4.** The prediction process of the second prediction strategy via the testing set.
Download figure:
Standard image High-resolution image

4. Result

In this section, the basic settings of the optimal LSTM model and the time-delay compensation results are presented, and the MSE after error compensation in two prediction strategies are summarized as follows.

The programming of the LSTM neural network is based on python and deep learning framework, pytorch [30]. After adjusting the model several times, we finally established a LSTM model with a four layers LSTM neural network combined with single fully connected layer. In order to prevent the model from over-fitting, we apply dropout [31] trick inside the LSTM neural network, and we set the dropout probability as 0.5 in the LSTM neural network. Note that the dropout trick will slow down the speed of training. In order to accelerate network training, we add the BN (batch normalization) [32] layer between the LSTM neural network and the fully connected layer. Moreover, to optimize the performance of our algorithm, we adopt the RMSprop [33] and mini-batch-SGD [34] optimization algorithms and set the initial learning rate as 0.01 during the training process. This is the fundamental setup of the LSTM model. Here, we conducted 50 times network training based on the training set, and reshuffle the training set before each network training, i.e. epoch = 50. In total, it consumes 8.5 h to establish the LSTM model, by employing Intel(R) Core(TM) i5-6200U CPU @ 2.30 GHZ(4CPUs).

Strategy I: in the first strategy, the current predicted value ${\rm{\Delta }}{\tilde{\tau }}_{k}$ is used as the last value of the input for the next prediction, and no extra measurement is performed during the procedure (see section 3.2 for details). MSE of this strategy is 1.8495 × 10⁻³ with a data length of 8000, and the prediction result is shown in figure 5(a).

**Figure 5.** Comparison of prediction values (in red) and measured values (in green) under (a) strategy I and (b) strategy II.
Download figure:
Standard image High-resolution image

Strategy II: in the second strategy, the current predicted value ${\rm{\Delta }}{\tilde{\tau }}_{k}$ is also inserted as the last data for the next prediction. In contrast to the first strategy, after a L-step prediction, a set of N(length of time-step) measurement results inserted for the next run prediction (see section 3.2 for details). If we set L as 300 and N as 50, MSE in this strategy is 3.9561 × 10⁻⁴ with a data length of 8000, and the prediction result is shown in figure 5(b). Here, as increasing N improves prediction accuracy but compromises efficiency, and also increasing L improves efficiency but compromises accuracy, the values of N and L are chosen as trade-offs.

Figure 6 shows the time-delay estimation error ${\rm{\Delta }}\tilde{\tau }(t)$ after error compensation via two prediction strategies. In comparison, without error compensation the MSE of testing set is 1.9714 × 10⁻³ with a data length of 8000. Comparison of MSE of time-delay estimation in these situations are summarized in table 1. Moreover, if we consider a shorter prediction time, the performance will be even better. If we predict the testing set data from the first data to the 3000th data, MSE will drops down to 4.9214 × 10⁻⁴ for the first strategy and 1.5783 × 10⁻⁴ for the second strategy, comparing to MSE of 8.3459 × 10⁻⁴ without error compensation (see table 2). In this case, a reduction of estimation error up to 6 dB is achieved.

Table 1. Comparison of MSE of time-delay estimation in three situations: without error compensation (initial data), with error compensation in strategy I (S.I) and strategy II (S.II), with a data length of 8000.

Strategy	Initial data	S.I	S.II
MSE	1.9714 × 10⁻³	1.8495 × 10⁻³	3.9561 × 10⁻⁴
Efficiency	—	higher	lower

Table 2. Comparison of MSE of time-delay estimation in three situations: without error compensation (initial data), with error compensation in strategy I (S.I) and strategy II (S.II), with a data length of 3000.

Strategy	Initial data	S.I	S.II
MSE	8.3459 × 10⁻⁴	4.9214 × 10⁻⁴	1.5783 × 10⁻⁴
Efficiency	—	higher	lower

Finally, we note that in this work the error distribution is assumed to be unchanged for different parameter values. However in some practical cases, e.g. quantum parameter estimation, error distribution may change along with the true value of parameter, which will compromise the prediction accuracy of our method. To solve this problem, extra analysis and modifications based on the current scheme are required, and these will be left as our future work.

5. Conclusion

In conclusion, we propose a weak measurement scheme with assistance of machine learning to overcome the effect of long-time correlated noises in parameter estimation. Particularly, we employ the LSTM network algorithm in our scheme. By learning the time-varying trend of error from existing data, the LSTM network algorithm can predict the forthcoming error with high accuracy. We then carry out a time-delay measurement experiment to verify the feasibility of this scheme, and experimentally demonstrate a 6 dB reduction of MSE comparing to the setup without machine learning. The experimental results suggest that our scheme can effectively improve the performance of weak measurement in the presence of long-term correlated noises.

Acknowledgments

The authors would like to thank Professor Jianping Fan from UNC-Charlotte for his substantial assistance throughout this work. This work is supported by National Natural Science Foundation of China (Grant No. 61701302).

Appendix A.: The LSTM neural network

The LSTM model is trained via the LSTM network. The basic structure of a LSTM network is shown in figure A1. In the testing set, we mark the kth sample as {X_k,1, X_k,2, ..., X_k,N}. And if the input of the model is {X_k,1, X_k,2, ..., X_k,N}, the final predictor value of the model is y_k.

The word recurrent indicates that a progress will be executed many times. As a unique concept of recurrent neural network, time step specifies the number that the process is repeated. In the figure A1, that the structure A will be executed N times indicates the time step is N. For each time step j, the network has the same structure and parameters. A LSTM network maintains a hidden state h, and a memory cell C which preserves long-term memory information.

In the figure A1, the direction of the arrow indicates the direction of information transmission. As a recurrent neural network, LSTM can transfer information from previous time step to the next. The computation at each time step is depicted by Graves et al [35] as follows:

$\begin{eqnarray}&&{h}_{k,j}={o}_{j}\ast \tanh ({C}_{k,j}),\end{eqnarray} \tag{ A1 }$

where

$\begin{eqnarray*}\begin{array}{rcl}{o}_{j} & = & \sigma ({W}_{o}\ast [{h}_{k,j-1},{X}_{k,j}]+{b}_{o})\\ {C}_{k,j} & = & {f}_{j}^{\ast }{C}_{k,j-1}+{i}_{j}\ast {\widetilde{C}}_{k,j}\\ {\widetilde{C}}_{k,j} & = & \tanh ({W}_{g}\ast [{h}_{k,j-1},{X}_{k,j}]+{b}_{g})\\ {f}_{j} & = & \sigma ({W}_{f}\ast [{h}_{k,j-1},{X}_{k,j}]+{b}_{f})\\ {i}_{j} & = & \sigma ({W}_{i}\ast [{h}_{k,j-1},{X}_{k,j}]+{b}_{i}).\end{array}\end{eqnarray*}$

The meanings of the symbols are listed as follows: k is the serial number of the samples, j is the time step number; X_k,j denotes the input data at time step j of the kth sample, h_k,j−1 and h_k,j denote the hidden state at time step j − 1 and j of the kth sample; W_i, W_f, W_g and W_o are the weight matrices, b_i, b_f, b_g and b_o are bias vectors. $\sigma (x)=\tfrac{1}{1\,+\,{e}^{-x}}$ and ${\rm{\tanh }}(x)=\tfrac{{e}^{x}-{e}^{-x}}{{e}^{x}\,+\,{e}^{-x}}$ . [h_k,j−1, X_k,j] means connecting two vectors h_k,j−1 and X_k,j into one vector, and ∗ operation represents element-wise product.

We can adjust some hyper-parameters, such as the length of hidden state h(equal to the length of h_k,j), the number of layers of LSTM network, the value of time step N, and use some parameter optimization algorithm to establish the model based on the training set data.

Figure A1 shows basic structure of a single-layer LSTM neural network combined with fully connected network. If we need to build a multi-layer LSTM neural network, the output data h_k,j of the upper layer of LSTM is also the input data X_k,j of the next layer of LSTM neural network. This will create a multi-layer LSTM network structure, and different LSTM network layers have the same time step length N. And if the length of h_k,N is one, the fully connected network is not needed. And the output h_k,N of the LSTM is equal to the prediction value y_k. Usually the length of h_k,j is not 1, which means the length of the output h_k,N of the LSTM is not 1. To obtain the predictor scalar of the kth sample,we can map the h_k,N to a scalar y_k using a fully connected network. If we build a multi-layer LSTM neural network, we should add the fully connected layer to the last time step of the last LSTM network layer.

Parameter estimation via weak measurement with machine learning

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Scheme

2.1. Weak-value based on parameter estimation

2.2. Error compensation with assistance of machine learning

3. Experiment

3.1. Time-delay experiment

3.2. The prediction strategies

4. Result

5. Conclusion

Acknowledgments

Appendix A.: The LSTM neural network

Parameter estimation via weak measurement with machine learning

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Scheme

2.1. Weak-value based on parameter estimation

2.2. Error compensation with assistance of machine learning

3. Experiment

3.1. Time-delay experiment

3.2. The prediction strategies

4. Result

5. Conclusion

Acknowledgments

Appendix A.: The LSTM neural network