A Deep Learning Technique to Control the Non-linear Dynamics of a Gravitational-wave Interferometer

In this work we developed a deep learning technique that successfully solves a non-linear dynamic control problem. Instead of directly tackling the control problem, we combined methods in probabilistic neural networks and a Kalman-Filter-inspired model to build a non-linear state estimator for the system. We then used the estimated states to implement a trivial controller for the now fully observable system. We applied this technique to a crucial non-linear control problem that arises in the operation of the LIGO system, an interferometric gravitational-wave observatory. We demonstrated in simulation that our approach can learn from data to estimate the state of the system, allowing a successful control of the interferometer's mirror . We also developed a computationally efficient model that can run in real time at high sampling rate on a single modern CPU core, one of the key requirements for the implementation of our solution in the LIGO digital control system. We believe these techniques could be used to help tackle similar non-linear control problems in other applications.


A. Non-linear Dynamic Controls
Some of the hardest problems faced in the area of control theory deal with non-linear control problems.While a well established theoretical and practical formalism exists to design feedback controllers for linear systems, there is no general approach that work in the non-linear case [1].Adhoc approaches are often developed on a case-by-case basis.Non-linear problems like the classic inverted pendulum have well known and studied solutions [2].Other commonly used approaches involve developing approximations to recast the non-linear problem into a linear one by local mapping of the observables or states [3].Often the solutions involve cleverly addressing specific problems by making certain simplifying assumptions about the system.
Recently, due to the advent of machine learning and deep learning we are seeing new, more generally applicable approaches.Deep Neural Networks offer the promise of efficiently fitting complex non-linear relationships and are therefore considered a potential basis for novel non-linear control P. Ma is with the Department of Mathematics, University of Toronto, 40 St. George Street, Toronto, ON M5S 2E4, Canada e-mail: (peterxy.ma@mail.utoronto.ca)G. Vajente is with the LIGO Laboratory at the California Institute of Technology, 1200 E. California Blvd, Pasadena (CA) 91101 email: (vajente@caltech.edu)strategies.Research in areas like deep reinforcement learning are developing algorithms that learn control policies directly from interacting with these complex systems from scratch [4].
Overall, one of the goals in the field of control theory is to explore how these novel deep learning based techniques can address specific non-linear control problems and to gauge their performance against classical techniques.With this motivation, we explore such a case in this paper.We study how deep learning can solve the problem of controlling the longitudinal translational degrees of freedom of the LIGO (Laser Interferometer Gravitational-Wave Observatory) detector, bringing the system from a condition where all degrees of freedom are varying over a large phase space, to a situation when the system is tightly controlled around the operating point, where the system is highly linear.The only observables available to the control system are highly non-linear function of the state.This lock acquisition problem [5] [6] has been historically solved with solutions developed case by case that are difficult to scale to more complex systems.Deep Learning techniques have also been recently applied to the steady state linear control of angular motions in gravitational-wave interferometers [7].

B. Introduction to LIGO
The LIGO (Laser Interferometer Gravitational-wave Observatory) is a 4-km-long Michelson laser interferometer [8], with the goal of detecting the very small differential distance variation (of the order of 10 −21 m and smaller) caused by the space-time metric fluctuation generated by the passage of gravitational waves [9] produced by astrophysical events [10].To reach such extreme sensitivity the LIGO detectors employ multiple resonant optical cavities [11], that need to be maintained at resonance using feedback control systems: high sensitivity and a linear response of the optical system is obtained only when the distances between the mirrors composing the cavities are kept close to the operating point.Additionally, to isolate the system from ground vibration, all mirrors are suspended to sophisticated seismic isolation systems [12], [13]: the mirrors motion at the frequency of interest for the detection of gravitational waves (above 10 Hz) is largely reduced, at the price of a large residual motion at low frequency that spans multiple resonances of the optical cavities.LIGO uses a frontal modulation technique [8], [5] to extract optical signals that can be used to precisely measure the fluctuation of all length degrees of freedom from the operating point.Those signals, readout by photo-detectors probing pickoff laser beams at different places in the instrument, provide a highly sensitive observation of the distance between mirrors (the state of the optical system we are interested in) only for a small region of the phase space, near the operating point.For most of the state space spanned by the seismically-induced random motion of the mirrors the signal responses are highly non linear.
In order to further enhance the detector sensitivity to gravitational waves, LIGO uses additional resonant cavities to increase the laser power circulating inside the interferometer and to shape the systems response to gravitational waves [14].The additional Power Recycling and Signal Recycling mirrors (see figure 1) add complexity to the control problem.A simplified optical layout of the full Advanced LIGO interferometer, not to scale.The input laser is phase modulated at several RF frequencies, including 45 MHz, to allow implementing a Pound-Drever-Hall sensing scheme [5].The two 4-km-long Fabry-Perot resonant cavities in the arms are comprised of Input Test Masses (ITMX and ITMY) ans End Test Masses (ETMX and ETMY).The input laser beam is separated in two equal parts by the Beam Splitter (BS) mirror.The additional Power Recycling Mirror (PRM) in the input path amplifies the laser power circulating inside the interferometer, and the output Signal Recycling Mirror (SRM) is used to increase the detector response to gravitational waves.For the study considered here, only a reduced optical configuration is considered, composed of the PRM, BS, ITMX and ITMY.The other mirrors, greyed out in the layout, are misaligned so that they do not contribute to the optical system behavior.Two longitudinal degrees of freedom, MICH and PRCL, are relevant for this configuration, and are defined with the equation in the figure, from the microscopic changes of the mirror relative distances, measured from the operating point.Two photo detectors are probing laser beams (POP, a pick-off of the Power Recycling Cavity and AS, the anti-symmetric transmission into the Signal Recycling Cavity) and providing the observable optical signals.

C. Challenges
The task of bringing the longitudinal degrees of freedom of a gravitational wave interferometric detector from the initial state, dominated by large random motion, to the final high sensitivity configuration, where all degrees of freedom are kept very close to the working point, is called lock acquisition [6], [15].The controllers use the optical signals as inputs, and are able to apply forces to the mirrors through suitable actuators in the seismic isolation and suspension chain [12], [13].
Traditionally, controlling the motion of a pendulum is a well known and solved problem [16].However, these methods require information about the state of the pendulum, that in our case corresponds to the distances between mirrors.In our problem we do not have the capability of observing continuously the states with the accuracy needed to control the system.The only data we can retrieve from the system are the optical signals, which are non-linear and non-unique functions of the mirror relative positions [11].In other words, the optical signals are linearly related to the systems state only for very small fluctuations around the working point: once the system has been driven in this linear regime, classical linear feedback controllers can be used to maintain the lock.However, this linear region is a small fraction, of the order of 10 −6 or smaller of the entire explorable state space: in most of the state space the optical signals are non-linear and noninvertible functions of the state variables.The non-uniqueness rise from the periodic nature of the laser wave: any change in the state that increases or decreases the distance travelled by laser beams by multiple of the wavelength has no effect on the optical signals.This also implies that there is no unique working point that is acceptable: any shift that meets the criterion specified above provides us with an equally suitable working point.
Hence this is a difficult control problem, where the system dynamics is linear, but the observables are non-linear functions of the states.Despite the difficulty, as long as we can estimate the state of the system, that is the distances between mirrors, the control problem is solvable by classical linear techniques.Thus the central problem is to construct a non-linear state estimator based on the measurable optical signals which we can then be the input to a simple control technique to acquire the lock.

D. Current Lock Acquisition Solutions
The entire Advanced LIGO interferometric detector has five main longitudinal degrees of freedom (see figure 1), with seismically-induced fluctuations that without any control can span several laser wavelengths.To simplify the lock acquisition problem, auxiliary lasers with a different wavelength are injected from the end of the arm resonant cavities [17], allowing independent control of the two 4-km-long cavities.The remaining interferometer degrees of freedom are three (see figure 1): the differential distance between the beamsplitter and the two input test masses (MICH); the average distance between the beam splitter and the two test masses plus the distance from the beam splitter to the power recyling mirror (PRCL); the average distance between the beam splitter and the two test masses plus the distance from the beam splitter to the signal recyling mirror (SRCL).The lock acquisition of those three degrees of freedom works by attempting to switch on classical feedback controls when the mirrors pass through certain relative resonance conditions, determined by power level crossings in several photodiode signals [18].Although this technique works in the current LIGO detectors, it is not ideal: the intrinsic randomness of the mirror motion makes it impossible to predict how long it would take for the lock acquisition, and the process can sometimes last tens of minutes; development of the technique involves expert knowledge about the system; when building new and more complex detectors this technique is not directly transferable.
In this work we study the possibility of applying a Machine Learning technique to solve the problem of the lock acquisition of the corner degrees of freedom.For simplicity, we study the case of a Power Recycled Michelson (PRMI) interferometer, where the signal recycling mirror is omitted.Although this configuration is simpler than the Dual Recycled Michelson (DRMI) case used in Advanced LIGO, it contains the characteristic non-linearity and non-uniqueness of the full problem, and therefore it serves as a suitable proof of principle of the main ideas.

E. Non-Linear and Non-Uniqueness of Optical Signals
Before attempting to solve the control problem we need to understand what makes the relationship between the optical signals and the state of the mirrors non-linear and nonunique.The optical signals are obtained from the laser fields exiting from various ports of the interferometers.Following an extension of the Pound-Drever-Hall technique called frontal modulation, the input laser beam is phase-modulated at fixed radio-frequencies.The laser fields are probed by fast photodetectors and the output signal demodulated so to obtain the full set of signals used to observe the state [19].The frontal modulation produces additional sidebands fields spaced around the main laser field by the modulation frequency f mod which is typically of the order of several tens of MHz.The carrier and sideband fields propagate inside the interferometer independently, and are mixed only at the photodetector level.
In the PRMI setup we are considering for this study, there are two important longitudinal degrees of freedom at play: PRCL and MICH.We consider the zero of both degrees of freedom to be the operating point, so that when propagating inside the PRMI the laser fields accumulate a phase given by, were k = 2π/λ is related to the laser wavelenght λ and Ω = 2π f mod [11].Note that δL MICH and δL PRCL are the deviation of MICH and PRCL from the operating point An analytical expression for the laser fields in the reflection, power recycling cavity, anti-symmetric port and antisymmetric ports of the interfrometer, can be derived following methods similar to what is explained in [11].Here we report the results, since the derivation is outside the scope of this paper.Note that Ψ IN is the input laser field (either carrier or sidebands) Those equations can be used to compute the main carrier laser fields, for which Ω = 0 as well as the two sideband fields with Ω = ±2π f mod .The optical signals are obtained as combination of the product of pair of fields, from photodetectors that can measured the time-varying power impinging on them: The low frequency component (well below f mod ) is called the DC component.The power P(t) contains also a component oscillating at the modulation frequency f mod : the real and imaginary parts of this component can be extracted by demodulating the photodetector output with a heterodyne scheme and provide the I and Q quadrature signals.All signals are products of pairs of field amplitudes and complex conjugates.Inspection of equations 5,6, 7 shows that the fields contain exponential terms that are functions of the MICH and PRCL degrees of freedom, and thus are periodic.The presence of those exponential terms explains the fact that multiple positions, differing by multiples of λ/2, result in the same values for the observed signals, and are therefore indistinguishable and equally suitable as operating point.Additionally, the exponential terms, together with the fact that the optical signals are products of pairs of fields, explain the highly non-linear dependency on MICH and PRCL.

F. Numerical Simulation
Our work is based on numerical simulations of the interferometer motion, obtained by creating time series of the MICH and PRCL degrees of freedom that mimic what is observed in the real world.Then using the field equations 5, 6, 7 we can produce the optical signals that a model may use as input during real-time operations.The simulation provides a data sampling rate of 2048Hz.This was chosen as it provides a good compromise between high resolution of the trajectories and the optical signals while remaining manageable in data volume for our models to handle long sequence inputs and in computation speed for the real time implementation of the

G. Hardware and Engineering Restrictions
We also have engineering restrictions at production time.For any technique to be of practical use we need to produce position estimates at approximately 2048 times per second on a single CPU core.This restrictive limitation is due to  1: AS is the antisymmetric port, POP is the PRC pick-off.For each of those beam, we sense the total power (labelled 'DC'), the Pound-Drever-Hall demodulated signals at 45 MHz (labelled 'RF45 I' and 'RF45 Q') and at 90 MHz (labelled 'RF90 I' and 'RF90 Q').Each of the demodulated signals has two quadratures: in-phase (I) and quadrature (Q) referring to the fact that we look for a RF signal at the photo-diode that is either in-phase with the laser frontal modulation, or out-of-phase by 90 degrees.
the fact that LIGO's realtime system executes control tasks on single processors, which makes parallelization unfeasible [20].Furthermore, there are no GPUs available on the realtime machine.
Beyond just simulating free motion, we can also simulate the interaction of the feedback control loops in order to demonstrate the locking technique.

H. Problem Statement
With these assumptions in place, we can cast the difficult non-linear control problem into an easier state estimator problem.Given the historical data on the optical signals, can we use that to estimate the current state of the mirror's positions?If so can we do this under the constraints of our real-time hardware?We know that if we can answer this question then the control problem is solved, since controlling the motions of pendulums given the state is a well-known and solved problem.

A. Our Approach
Here we begin by introducing our approach which will be expanded on in subsequent sections.Firstly, as discussed in section I-H the control problem can be solved if we can reconstruct the mirror's states.However this is difficult because the relationship of input data to states is non-unique and non-linear.Recall that the non-linear aspect comes from the equations that govern the laser fields (see section I-E), and the non-uniqueness comes from the exponential oscillatory terms.To address the non-linear aspect, we have accurate simulations that allow us to naturally cast the problem into a supervised machine learning problem.This is because these simulations effectively provide us targets which are the true trajectories, thus the supervised learning objective is to learn a mapping between the optical signals and the target trajectories.To address the non-unique aspect we "wrap" the data by taking the trajectories modulo λ 2 in the Z 1 /Z 2 , obtained from MICH and PRCL via a linear transformation that will be explained in section II-B below.Using this wrapped data we train a Gated Recurrent Unit (GRU) to learn the non-linear mappings between the optical signals and the states.We chose GRU networks firstly for their recurrent architecture and their ability to deal efficiently with time series.Secondly, we chose the GRU for its superior performance over classical Recurrent Neural Networks on long time series data.Finally, we trained a model on the wrapped positions, however this introduces artificial discontinuities in the data which need to be fixed to reflect the true motion of the mirrors.Thus we need to unwrap the data using techniques like the Kalman Filter to provide the best state estimate to ensure continuity in trajectories.Here the Kalman filter is a method to help combine information from different sources to provide the best optimal estimation, using sensor data and knowledge of the systems dynamics.For our use case we would need position and velocity estimates from the trained GRU network as sensors.It will be sufficient to use a simplified model of the mirror's dynamics (constant velocity).To utilize the Kalman Filter we need predictions on both the position and the velocity along with their uncertainties [21].Hence we will use techniques in probabilistic deep learning to aid our model in producing uncertainties as well.Here, Probabilistic Deep Learning is a sub class of deep learning methods built to produce outputs that model probability distributions given input data [22].Examples include Bayesian Neural Networks [23], Variational Autoencoders [24][25] and more.

B. Wrapping Data -Non-Uniqueness
As discussed in section I-E a core problem is that there exists an infinite number of positions that would produce the same optical signals we inputted to the model.Any of these solutions would produce the same optical signal.Recall, this is due to the fact that the laser field equations are periodic.This is not a problem, choosing any solution would suffice.The issue is, once a solution is chosen, we need to follow that solution over time to preserve the continuity of the mirror's trajectories.To avoid the non-uniqueness problem, we can "wrap" the data, but to figure that out we need to find the periodicity of these optical signals and how the oscillatory terms in the laser field equations depend on PRCL and MICH.
We inspect the field equations 5, 6, 7: the periodicity is due to the complex exponential terms, that are functions of the variable φ MICH and φ PRCL from eq. 1, 2. With some straightforward algebraic computations one can show that the only terms appearing in complex exponential are the following lengths degrees of freedom, obtained with an invertible linear transformation from MICH and PRCL: All fields, and therefore all optical signals, are independently periodic in the two variables Z 1 and Z 2 with period λ/2.
To wrap the data, firstly we will take the PRCL and MICH positions and follow the transformations described for Z 1 and Z 2 .After that is done, we iterate through the data and linearly shift up or shift down by integer multiples of λ 2 such that all positions are restricted between λ 4 → − λ 4 .Then we transform back returning the PRCL and MICH positions successfully wrapped.This technique guarantees that for every signal the corresponding wrapped position is unique.The uniqueness allows us to invert the state estimation problem.Once again, we will need to undo this process with the model outputs, since this is was done to create a trainable model and thus are not representative of true motions of the mirrors at the boundary where we introduced instantaneous jumps from wrapping the data.

C. Preparing Data
Recall that we are working with is a set of 10 optical signals retrieved from our simulation described in I-F and shown as an example in figure I-F.These 10 optical signals need to be normalized, and thus to do so we simulated about 40,000 seconds data and found the maximum of the absolute value of each signals and used it as a normalization constant throughout all training, testing and in production.The data used to provide the normalization constants is also the training data.
Next we prepare data for training and testing thePOSITION model.We first simulate data as described in I-F and then wrap the positions using the technique described in section II-B.We will use these wrapped positions as the targets to train our POSITION model.We normalize the positions by λ 4 to bring the target values between −1 and 1.We simulated about 40000s of trajectories.We took 0.5s intervals of historical data and asked the model to predict the normalized wrapped position of the current state at the end of each interval.In total we had 63839 training samples and 15960 testing samples.This was the maximum number of samples we could fit both into memory and to train in a reasonable amount of time.
Then we prepare data for the VELOCITY model.We know that although the position is not unique, the velocity of the mirrors are in fact unique.Thus no need to wrap the data.We obtained the velocity of the mirrors by computing the finite differences of the positions by looking at a current time step t and one time step a head t +1 or 1 2048 seconds ahead.Also note that the velocity is unbounded and thus we set a normalization constant of 2λ m/s.
To increse diversity, we also added data produced with the time-domain simulation that has been used later for the lock acquisition tests, as explained in section I-F, accounting for about one sixth of the number of examples (95639 training and 23960 test)

D. Probabilistic Models
As mentioned in section II-A we need a means to estimating uncertainties in our model predictions.To do so, we take inspiration from Variational Autoencoders [25] in our design.Consider a classical feed-forward neural network for a regression problem.Now consider that the last layer splits into two layers, see figure 3. Recall that we are trying to model a probability distribution.We chose to model it with a Gaussian.Thus we denote one layer as the mean µ and the other as the standard deviation σ, where the µ, σ are the parameters of a Gaussian distribution.Notice that this is effectively the encoder model of a Variational Autoencoder with the sampling layer removed.Then when optimizing the model, we compute the probability using which checks how well the target agrees with our predicted probability distribution.We try to maximize this value.Intuitively this could be an effective way to obtain the error of the estimator.By conceptually investigating the limiting behaviors we see that if the model were poor at making predictions, the variance would be high.This means that since the model does not have an accurate estimation of the mean, it compensates that fact by estimating a larger variance this way there is still a random chance that the model predicts a solution close to the target.However, if the model is highly accurate, then the variance should drop and the mean should align with the target value and there is a higher probability the predicted data points fall close to the target.Furthermore, we see that the model would also naturally produce larger uncertainties around discontinuous points.This is because neural networks produce smooth outputs and thus have trouble modeling discontinuous behaviours and therefore to compensate for poor inference the model produces high uncertainty.This is exactly what we wanted since high uncertainty suggests to our Kalman filter to rely more on known dynamics to predict the position rather than the POSITION model.Overall, in theory as the  model improves, the variance should drop and the mean should approach the target value.We can mathematically formulate this as an optimization problem that minimizes the negative log -likelihood using weights θ given neural network model function p and given input data x: In the end, we frame the optimization problem so that the neural network produces outputs modeled by a Gaussian distribution.Specifically, it forces the model to produce the best estimation of the targets and at the same time produce realistic estimates of how close the output is to the target.

E. Position and Velocity Model
With this technique of modeling uncertainties described in II-D we can develop our POSITION and VELOCITY models.Firstly we know that from testing our existing hardware on a single core of an AMD EPYC 7763 64-CORE PROCESSOR that a pure C implementation of our neural network has an upper limit of ≈ 70, 000 parameters for each model (both models run on a the same CPU core).We know that this is around the upper limit since, we found that models with 80,000 parameters were too slow while models with 60,000 parameters (all with the same architecture) were faster than required.
For the POSITION model, it contains 2 GRU layers [26] followed by a Dropout Layer [27] then 3 fully connected dense layers with Leaky ReLU activations [28].Lastly it splits to two separate layers with linear activation for the mean and softplus activation [29] for the standard deviation model.See table I for specific hyperparameters.This gave a total of 67,827 trainable parameters.
For the VELOCITY model, the design is identical to the POSITION model except with different hyperparameters.For specifics see table II.This gave a total of 77,895 trainable parameters.
For a visual representation as to how data flows through the model please see figure 3.

F. Training Position and Velocity Model
We trained both models with 3000 epochs with early stopping with a patience 1 of 100 epochs using the ADAM optimizer [30].We noticed improvements in the models training by implementing a simple warm up learning rate plus exponential decay scheduler, see algorithm 1.We used a warm-up period of 20 epochs, with a decay time of 500 epochs, with an initial learning rate of 1×10 −6 , a base learning rate of 1 × 10 −4 , and a minimum decayed learning rate of 1 × 10 −7 .The batch size is 64 for the POSITION model and 500 for the VELOCITY model.Those settings gave the best performing model and are obtained by manually tuning the hyperparameters.During our training phase we noticed that, when strictly optimizing for equation 12, the reconstructions of the mean values of the estimates were relatively poor but with sensible uncertainties .We believe this is in part due to the low capacity of the model.To enable us to have more control over how the model weighs the importance in predicting the mean versus estimating the uncertainty we devise a new loss function L θ shown in equation 13.
We denote p(x; θ) µ as the µ value returned by the model.Simply put, we added an additional mean squared error term to help drive the predicted mean closer to the actual position.We chose a scaling factor α = 10 such that the model produce favourable results.

G. Testing Model
We now test the performance of the models by running a new simulation of the mirror motions and optical signals, and running the model with continuous inputs of signals to produce a 10 seconds worth of predicted wrapped trajectory.The POSITION model appears to produce reliable predictions in that predictions match the targets closely whereas the VELOCITY model produces less accurate results.We will show that this performance is sufficient in acquiring the lock in subsequent sections.The results are shown in figure 5.
We note that the velocity model performance is worse than the position model.We believe the reason for this reduced performance is the limited capacity of the model, that thus underfitted.We previously attempted to train with larger models and varying architectures and saw marginal improvements.Thus we kept the smaller model due to the engineering constraints described in section I-C.We have also tried training larger models with L1 regularization to remove small weights, however we discovered that the model still contained too many non-vanishing parameters after training.We attribute these training challenges to the fact that the velocity values are unbounded whereas the position has bounded values.Therefore when training a model it is harder to fit the velocity data since the target space is larger and requires more generalization.This result will affect the design of our Kalman filter, as described below in section II-H.

H. Sensor Fusion -Kalman Filter Inspired
The last part of the solution is to implement a version of the Kalman filter for our special case.Specifically, we want to use an approach inspired by the Kalman filter to fuse two simple measurements, the position and velocity, to unwrap the data to preserve continuity in predicted trajectories.Our Kalman filter is unique in that our measurements come from neural networks.Since the rate at which new predictions are available is high on the time scale of the state evolution, we can use a simplified dynamical model..We begin with building intuition.For every given new time step we can use the models to predict the wrapped position and the velocities.
There are two things we can do with this data.First we can take the positions and the velocities at the previous time step and use simple kinematics to estimate the next positions assuming that the interferometer degrees of freedom evolve as free bodies moving at constant velocity.It is important to note that the real dynamics of the mirror's positions is not so simple.This is because the mirrors are subject to random external disturbances due to the ground seismic motion, and are also suspended to pendulums with resonances at about 1 Hz that introduces long-term slow dynamics .But on the time scale of a few time steps the mirror dynamics can be approximated as a constant velocity.Secondly, we can also get the model prediction for the wrapped position at the current time step.However, recall from section II-B that there are an infinite of possible solutions which can produce the observed optical signals: our goal is to select the optimal solution with no discontinuities.Intuitively, since we can use the dynamics to estimate a forward position, we can use this information to help us "select" one of the infinite solutions for the positions: we need to optimally combine the data of the estimated position derived from the dynamics and the estimated position from our sensor (the POSITION model) to produce the best estimate.This takes inspiration from the Kalman filter.
The process is schematically depicted in figure 6.First, we take the estimated position and velocity at the previous time step and use the simplified dynamical model to predict the positions at the current time step.Then we use the POSITION model to predict the wrapped positions at the current time step and we lay out a set of possible solutions by linearly shifting the wrapped positions by multiples of λ/2 in the Z 1 /Z 2 space.For each possible solution we will compute the Kalman gain K.This is a factor computed from the uncertainties of the dynamic prediction and the uncertainty of the sensor data.More concretely, the Kalman gain gives the coefficient of a linear combination of the two Gaussian distribution of positions (one from the dynamics, one the POSITION model) that minimize the variance of the combined prediction.This factor helps us optimally weigh the adjusted wrapped positions and the positions estimated from dynamics to form new Gaussian distribution for each candidate solution.Now we want to pick the candidate position that gives us the maximum probability.This corresponds to finding the candidate distribution that has the highest peak of the Gaussian in the new fused distribution, computed using the Kalman gain.The detailed algorithm is described in inset 2.
Finally let us formalize this in an algorithm which takes inspiration from [31][21], please see pseudocode 2, where we defined our algorithm.Let us define the model p(θ p , x t ) for the position and v(θ v , x t ) for velocity where the θ are the respective weights.This gives us then the mean and covariance µ p t , Σ p t and µ v t , Σ v t where the subscript denotes either position or velocity at times t.x t is the optical signals received at time step t.The total time is T .We denote P * as the best prediction of position.pt is the position predicted by the dynamics.dt is the time step.
When adapting the Kalman filter for these neural networks we found that the VELOCITY model continuously produces relatively poorer reconstructions than the position model.This is anticipated since during testing phase (see section II-G) we realized that the velocity model needs to map to a much larger range and thus generalization is more difficult than the position model.However due to parameter constraints this model was still the best performing out of all the experiments.Thus to mitigate the generalization error, in the Kalman filter we artificially scaled the uncertainty of the velocity to be 15x greater than its original estimate.Although this biases against the velocity model trusting more the position model we can justify this since we know that uncertainties of the velocity model is less trustworthy due to our underfitted explained previously.
We now attempt to reconstruct the mirror's states using this technique we just described.We see that upon first glance our reconstruction appears accurate, see figures 7 and 8. To verify such a claim we compute the residuals, which are the differences between ground truth and estimated values..We know that if we are successful then the residuals should be constant with a value that is one of the integer multiples of λ/2 in the Z 1 /Z 2 space.This is because, as explained in section II-B, any of these position solutions would suffice as long as trajectories remain continuous.We see that our model produces such a result with a constant shift that is allowed by our wrapping technique with deviations from constancy of the order of 10nm.To understand if this accuracy is sufficient, we can compare it with the line-width of the power recycling cavity, that is the maximum displacement from resonance of the PRCL degree of freedom that maintain the power level at more than half the maximum, and ensure a reasonable linearity of the optical signals.If the estimate is accurate within this level, then a control system based on the state estimates would be able to drive the system near resonance, where a classical feedback controller based on linearized optical signals would work.With the parameters of the system simulated here, the power recycling cavity line-width is about 7 nm, of the same order of magnitude of the residual shown in figure 8.We therefore conclude that the performance of out state estimator is sufficient for our goals.Algorithm 2 Kalman Filter Inspired Model Initialize: we set m as the number of solutions we search for.This number specifies the how many different adjustments made by shifting by λ/2 in the Z 1 /Z 2 space do we check for in our algorithm.Initial estimate Best estimate initially comes directly from the model.
We first use the dynamics and update

III. RESULTS
Having developed a reliable means of state reconstruction, we can now address the original question of how to use ML to acquire the lock: that is to design a control system that can bring the system reliably and deterministically to the desired operating point, that is the (0, 0) position in the MICH and PRCL space.In the following sections we demonstrate how our novel technique successfully solves the locking acquisition problem for the PRMI configuration, in simulation.

A. Lock acquisition
The main goal of this work was to develop a state estimator for the simulated power-recycled interferometer, and then use it to drive the mirrors to suppress their random seismic motion and maintain the system close to the zero working position.This procedure, called lock acquisition, is currently implemented in Advanced LIGO by relying on a empirical approach: linear feedback controller are engaged when the power measured by the power-recycling-cavity pick-off (POP) exceed a given threshold, typically corresponding to 50% of the peak value, that is the steady state power measured when the system is maintained at the operating point.An example of the result of this algorithm applied to our simulation is shown in figure 9.This algorithm is intrinsically non-deterministic, since it relies on waiting for the system to cross near the operating point: when that happens, as detected by large power in the recycling cavity, linear controllers are switched on to quickly stop the mirrors motion and drive them to the working point.This is a process that requires a relatively large force (note that the peak force applied to the mirrors in our simulated lock acquisition is 200 mN, as shown in figure 9) and also is not always successful, resulting in multiple attempts and longer times to acquire the lock, of the order of tens of seconds even in the simple PRMI configuration considered here.Finally, the parameters of this algorithm, such as the loop gains and the triggering levels, need to be adjusted manually by experts, to match the system.In more complex configurations, like the dual-recycled interferometer used in Advanced LIGO [20], the increased complexity results in reduced reliability and much longer wait times.
The alternative lock acquisition we present here, based on the non-linear state estimator we developed, can overcome some of the limitations of the classical lock acquisition: it can be developed with supervised learning for any optical configuration, without the need of expert knowledge; it does not rely on the system to randomly move close enough to the working point to trigger the controller, and therefore it is faster and deterministic; the full knowledge of the system state allows us to implement feedback control system that can operate with lower peak force, since there is no need to stop the mirror motion in the short time the system spend near resonance, when moving randomly due to seismic ground motion.
We first develop the linear feedback controllers that are needed to drive the system to resonance, assuming perfect knowledge of the system state.We implement a two-step approach: first we engage a velocity-damping feedback loop, that reduces the mirror velocities; then we engage an integrator so that the actual mirror relative positions are driven to zero, that is the cavity resonance.An example of a completely successful lock is shown in figure 10.Once the final state shown in figure 10 is obtained, the system is actively maintained very near the operating position.In this state it would be possible to engage directly the classical linear feedback controller used for the example shown in figure 9, without any triggering, and recover the highly accurate controlled state needed too operate a gravitational-wave detector.This last transition is not shown in our simulations.
As a second step, we utilize the state estimates described in our methods section to actively control the motions of the mirrors in numerical simulations, using the feedback control strategy described above.In this configuration the entire lock acquisition procedure is based on the non-linear state estimator that we developed and the simple and robust feedback controllers just described.
Figure 10 shows that the estimate produced by our model are good enough to achieve a lock acquisition and maintain the system near resonance.The accuracy of the control is lower than what is obtained with the classical linear controllers, but as already mentioned, it would be possible to engage them, without triggering, once our control strategy has driven the system near the working point.Our new lock acquisition strategy is completely deterministic, faster than the classical lock acquisition, and requires a much lower peak force on the mirrors (compare the maximum force of 20 mN in figure 10 with the 200 mN in figure 9.Here we see three panels depicting a successful lock acquisition using perfect state estimates.Firstly in panel (a) we see that the motions of the mirrors are driven down to close to 0. Secondly, we show in panel (b) that the force needed to acquire the lock is low.Finally, we see in panel (c) that where the power in the power recycling cavity is still fluctuating, but remains high most of the time.This is a good sign since this means the cavities are maintained near.Note that the red shaded region shows the time interval when we ramp up the velocity-damping controller, and the green shaded region shows the time interval when we ramp up the integrator feedback control.Secondly we see panel (c) where the power in the power recycling cavity is still fluctuating, but remains high most of the time.This is similar to the locking scheme using perfect knowledge of the state thus demonstrating success.

IV. DISCUSSIONS
In conclusion, we demonstrated in sections II-A that we have successfully developed an algorithm to accurately produce non-linear state estimation of the mirrors positions.Then we proved that this technique is accurate enough to allow acquiring the lock of the Power Recycled Michelson interferometer in a simulation, by using a two stage dampening locking scheme.We believe this is a superior approach to the classical locking scheme currently deployed because it does not rely on expert knowledge about the system.Due to this advancement this opens up possibilities to generalize to many kinds of GW detector configurations for faster development.Furthermore, perhaps most importantly, this technique allows us to lock the motions of the mirrors faster and with more reliability than relying on random motions as in the classical locking scheme.
Our next step is to implement this on real hardware to verify that this work can translate to the real setting.Should the results not translate, there are a number of investigations that can be done.We can check how well the locking simulation aligns with the real configuration, we can check how well the simulated mirror motions align with the real motions of the mirrors, and lastly we can investigate improvements to the neural network models in sections II-G, perhaps using larger models or different architectures.For future investigations we could explore if our techniques applied to more complex mirror configurations.
Fig.1.A simplified optical layout of the full Advanced LIGO interferometer, not to scale.The input laser is phase modulated at several RF frequencies, including 45 MHz, to allow implementing a Pound-Drever-Hall sensing scheme[5].The two 4-km-long Fabry-Perot resonant cavities in the arms are comprised of Input Test Masses (ITMX and ITMY) ans End Test Masses (ETMX and ETMY).The input laser beam is separated in two equal parts by the Beam Splitter (BS) mirror.The additional Power Recycling Mirror (PRM) in the input path amplifies the laser power circulating inside the interferometer, and the output Signal Recycling Mirror (SRM) is used to increase the detector response to gravitational waves.For the study considered here, only a reduced optical configuration is considered, composed of the PRM, BS, ITMX and ITMY.The other mirrors, greyed out in the layout, are misaligned so that they do not contribute to the optical system behavior.Two longitudinal degrees of freedom, MICH and PRCL, are relevant for this configuration, and are defined with the equation in the figure, from the microscopic changes of the mirror relative distances, measured from the operating point.Two photo detectors are probing laser beams (POP, a pick-off of the Power Recycling Cavity and AS, the anti-symmetric transmission into the Signal Recycling Cavity) and providing the observable optical signals.
results.An example of the motions simulated are in figure I-F.Together they produce a set of 10 optical signals shown in figure I-F.Notice that in figure I-F the typical motion range between −2 × 10 −6 m to 2 × 10 −6 m which spans multiple halfwavelengths of λ 2 = 5.32 × 10 −7 m.

Fig. 2 .
Fig. 2. (a) Shows the simulated 120 seconds worth of trajectories of PRCL/ MICH.(b) is a zoomed-in plot of the optical signals.For naming conventions, figure1: AS is the antisymmetric port, POP is the PRC pick-off.For each of those beam, we sense the total power (labelled 'DC'), the Pound-Drever-Hall demodulated signals at 45 MHz (labelled 'RF45 I' and 'RF45 Q') and at 90 MHz (labelled 'RF90 I' and 'RF90 Q').Each of the demodulated signals has two quadratures: in-phase (I) and quadrature (Q) referring to the fact that we look for a RF signal at the photo-diode that is either in-phase with the laser frontal modulation, or out-of-phase by 90 degrees.

Fig. 3 .
Fig.3.This depict the general structure of the position or velocity models, for the specific parameters check table I and table II.
Now we check the training results of both models, specifically their loss curves in figure4.We see that the position model has plateaued in performance but has yet to completely overfit since the validation and the testing curve remain close together.On the other hand the velocity model has a harder time training as the validation curve tapers at a much slower rate than the training curve, hence the early stopping preventing the model from training any longer.This indicates to us that stopping training around the times chosen was close to optimal.

Fig. 4 .
Fig. 4. (a) Is the loss curve for the position model whereas (b) is the loss of the velocity model.Notice in the VELOCITY model training, the relatively smoother loss curve is due to the larger batch size used.

Fig. 5 .
Fig. 5. Panel (a) shows the wrapped positions reconstructed by the model of PRCL where the shaded area shows the uncertainty as predicted by the model.Similarly for panel (b), showing the PRCL velocity prediction.In panel (c) -(d) we show the results for MICH.We notice that the velocity reconstructions are poorer than the position reconstructions.

Fig. 6 .
Fig.6.Here we visualize the algorithm 2. 1) start from previous step best position 2) propagate forward with dynamics 3) get the new prediction 4) add suitable multiples of λ/2 to the new model prediction 5) pick the best.

Fig. 7 .Fig. 8 .
Fig. 7.Panel (a) shows the MICH position prediction obtained with the Kalman unwrapping.Notice that the two trajectories are effectively the same.Panel (b) shows the residuals between the true motions and the reconstructions.We see that the deviations are small, on the order of 1 × 10 −8 m from zero.

Fig. 9 .
Fig. 9. Classical lock, here we see three panels depicting a successful classical lock acquisition currently used in production.Firstly in panel (a) we see that the motions of the mirrors are driven down such that the optical laser is in resonances, this is shown in panel (c) where the power remains high.In figure (b) we see the forces applied in the PRCL and MICH.
Fig.10.Here we see three panels depicting a successful lock acquisition using perfect state estimates.Firstly in panel (a) we see that the motions of the mirrors are driven down to close to 0. Secondly, we show in panel (b) that the force needed to acquire the lock is low.Finally, we see in panel (c) that where the power in the power recycling cavity is still fluctuating, but remains high most of the time.This is a good sign since this means the cavities are maintained near.Note that the red shaded region shows the time interval when we ramp up the velocity-damping controller, and the green shaded region shows the time interval when we ramp up the integrator feedback control.

Fig. 11 .
Fig.11.Here we see three panels depicting a successful lock acquisition using our neural network + Kalman filter state estimates.Firstly in panel (a) we see that the motions of the mirrors are driven down to close to 0. Secondly we see panel (c) where the power in the power recycling cavity is still fluctuating, but remains high most of the time.This is similar to the locking scheme using perfect knowledge of the state thus demonstrating success.
Peter Xiangyuan Ma is an undergraduate student at the University of Toronto and Laidlaw Scholar studying Mathematics and Physics.He is curently at UC Berkeley investigating deep learning techniques for radio astronomy.Previously he was an intern at Caltech LIGO working on this current project.Before that he was at the Dunlap Institute of Astronomy and Astrophysics developing machine learning techniques for Fast Radio Burst detection with the CHIME/FRB project.He is interested in leveraging machine learning to accelerate scientific discovery.Gabriele Vajente is the Deputy Head of System Science and Engineering of the LIGO Laboratory at the California Institute of Technology.He is currently involved in the development, design, construction and operation of the Advanced LIGO detectors, and on the research and development of future upgrades for gravitational-wave interferometric observatories.Among other research interests, he is studying application of Machine Learning techniques to control and noise reduction in Advanced LIGO and beyond.
table I and table II.
Learning rate scheduler Require: EPOCH is current epoch Require: LR is learning rate Require: WARMUP EPOCHS is number of epochs to warm up Require: INITIAL LR is initial learning rate Require: MIN LR is minimum learning rate Require: BASE LR is learning rate we want to warm up to Require: DECAY EPOCHS is number of epochs we wish to take to decay to the minimum learning rate.if EPOCH ≥ WARMUP EPOCHS then pct ← EPOCH/WARMUP EPOCHS RETURN ((BASE LR − INITIAL LR) • pct) + 1Patience: The number of epochs to wait before stopping if there is no progress on the validation set Algorithm 1 1 • dt 2 Now we construct a Gaussian mixture of solutions.Get wrapped position from the model that looks one timestep ahead µ p t , Σ p t ← p(θ p , x t )µ v t , Σ v t ← v(θ v , x t )We construct m possible solutions which have the same covariance but have means shifted by integer multiples of λ 2 in the Z 1 /Z 2 space.We denote µ i p t , ∀i ∈ (0, 1, • • • m) as these adjusted means where µ i pt is computed by transforming it into the Z 1 /Z 2 space and applying integer multiple adjustments by λ/2 and then transforming it back to PRCL MICH space.