Nonlinear Wave Front Reconstruction from a Pyramid Sensor using Neural Networks

The pyramid wave front sensor (PyWFS) has become increasingly popular to use in adaptive optics (AO) systems due to its high sensitivity. The main drawback of the PyWFS is that it is inherently nonlinear, which means that classic linear wave front reconstruction techniques face a significant reduction in performance at high wave front errors, particularly when the pyramid is unmodulated. In this paper, we consider the potential use of neural networks (NNs) to replace the widely used matrix vector multiplication (MVM) control. We aim to test the hypothesis that the NN's ability to model nonlinearities will give it a distinct advantage over MVM control. We compare the performance of a MVM linear reconstructor against a dense NN, using daytime data acquired on the Subaru Coronagraphic Extreme Adaptive Optics system (SCExAO) instrument. In a first set of experiments, we produce wavefronts generated from 14 Zernike modes and the PyWFS responses at different modulation radii (25, 50, 75, and 100 mas). We find that the NN allows for a far more precise wave front reconstruction at all modulations, with differences in performance increasing in the regime where the PyWFS nonlinearity becomes significant. In a second set of experiments, we generate a data set of atmosphere-like wavefronts, and confirm that the NN outperforms the linear reconstructor. The SCExAO real-time computer software is used as baseline for the latter. These results suggest that NNs are well positioned to improve upon linear reconstructors and stand to bring about a leap forward in AO performance in the near future.


INTRODUCTION
AO systems are essential to the scientific productivity of many ground-based optical/IR telescopes, as they enable diffraction-limited imaging performance despite the angular smearing caused by seeing, which would otherwise limit resolution to approximately 0.5-2 arcseconds.The main components that make up an AO system are: a wavefront sensor (WFS) that measures instantaneous phase aberrations, a deformable mirror (DM) which rapidly corrects aberrations upon reflection, and a control system that determines the DM shape from the wavefront sensor.The post-AO image quality relies on the accuracy to which aberrated wavefronts can be measured, reconstructed, and compensated; it is therefore ideal to employ highly sensitive wavefront sensors and effective reconstruction techniques.
Employment of PyWFSs 1 has increased as AO systems have begun to trade the more conventionally used Shack-Hartmann wavefront sensor (SHWFS) [2][3][4][5] for the PyWFS, which offers greater sensitivity.The PyWFS also offers users flexibility as its sensitivity and dynamic range can be tuned through modulation and, when used with a charged-coupled device (CCD) camera, the resolution of the PyWFS image can be changed to accommodate various noise levels.PyWFSs are in use at some of the world's leading general and high-contrast AO facilities, [6][7][8] including SCExAO. 95][16] These telescopes are ∼ 4 times larger in diameter than the current largest optical telescopes and will benefit greatly from the improved resolution enabled by the AO systems of tomorrow.The advent of the ELTs with their AO systems will enable diffraction-limited imaging with visible-wavelength angular resolutions better than ∼10 mas.
One major challenge that comes with using a PyWFS is that it is an inherently nonlinear sensor (approximately linear only for small wavefront aberrations) while most of the commonly used wavefront reconstruction techniques are linear.Modulation -steering the beam using a rapid scanning tip-tilt mirror synchronised with the acquisition camera -increases the linear regime of the PyWFS, but at the cost of sensitivity.The modulation amplitude can be chosen based on seeing conditions to maximize sensitivity while maintaining favorable conditions for stable closed-loop control with a linear reconstructor.No facility PyWFS AO system, to date, has achieved a consistent, reliable operational performance while using a PyWFS at its maximum, unmodulated sensitivity.Yet, there is strong interest in achieving such a design, particularly to meet the requirements of extreme adaptive optics (ExAO) systems.This has motivated the development of various nonlinear wavefront reconstruction techniques. 17Iterative phase retrieval algorithms 18 are not amenable to fast real-time use, and approaches have been proposed to address this via pre-computation of optical systems and solving with quasi-Newton methods. 19Another approach consists in dynamically adapting a linear reconstruction law to accommodate for nonlinear changes in the PyWFS firstorder response. 20,21 redictive control algorithms 22,23 can also help tackle the nonlinearity problem.
Reconstruction techniques have more recently turned to applying methods that have emerged from deep learning (DL).Early examples where DL has been applied to AO include the design of NNs to predict Zernike coefficients from in-focus and out-of focus images [24][25][26] or off-axis WFS measurements. 272][33] More recently, NNs with more complex architectures have been applied to wavefront prediction and reconstruction.This includes the use of recurrent neural network (RNN)s and convolutional neural network (CNN)s on e.g.SHWFS [34][35][36] data; the PSF; 37 or pairs of in-focus and out-of-focus images. 38[41] Previous research has already demonstrated wavefront reconstruction from PyWFS measurements using NNs, first using CNNs to recover wavefronts from a three-sided PyWFS 42 for opthalmologic applications.More recently, a CNN has been used in combination with an MVM prediction and has shown that a model combining both improves the reconstruction for large wavefront aberrations where the PyWFS response is nonlinear 43 in closed-loop operation.Additionally, a comparative study was conducted that investigated the performance of a number of NN architectures in their ability to reconstruct wavefronts in the Zernike basis from both a SHWFS and PyWFS. 36 a tangentially related and rapidly growing area of research, automatic differentiation has proven to be a powerful tool for the optimisation of optical systems. 44,45 his has since been used to jointly optimize the sensitivity of Fourier-filtering wavefront sensors and their reconstructor and are able to achieve greater sensitivity compared to the PyWFS. 46Reinforcement learning is also surfacing as a promising tool in AO control, which could be used in conjunction with a PyWFS. 47 this paper we compare the performance of various NNs against linear reconstructors, using data obtained using the SCExAO system at the Subaru Telescope, in off-sky testbed mode.Test wavefronts were applied to the  system's DM and measured using the system's PyWFS for a range of modulation amplitudes.We then measured the accuracy to which these wavefronts could be reconstructed.
In Section 3 we study the reconstruction of low-order wavefront errors comprised of Zernike polynomials and compare the reconstruction accuracy between methods and for different pyramid modulation amplitudes.We also compare the performance of a NN predicting modal coefficients, and one directly predicting actuator values.In Section 4 we repeat this analysis using realistic, high-order wavefront error data created by applying von Kármán phase screens to the SCExAO DM.We show that the NN greatly out-performs linear reconstructors at low modulation amplitudes, and at high wavefront error.

METHODOLOGY
The data used for our experiments were obtained using the SCExAO system at the Subaru Telescope.Using this system, we acquired data consisting of wavefronts applied by the system's DM, together with corresponding PyWFS signals recorded at a wavelength of 750 nm and a 50 nm bandwidth.
SCExAO is equipped with a 2040-actuator Boston Micromachines DM, with a 50 × 50 square pitch grid layout.Of these, 1365 actuators lie within the illuminated area of the pupil and are considered active.The root mean squared (RMS) reconstruction errors and the wavefront RMSs are calculated using only these active elements.The PyWFS modulates the beam via a 2-axis piezo steering mirror and uses an OCAM 2 K EMCCD camera from First Light Imaging as its detector.The frame rate (and modulation speed) used was 1 kHz.For further details on the SCExAO system, see Ref. 48.
The SCExAO bench was used to produce two kinds of wavefront condition: a low-order Zernike dataset and a dataset intended to mimic atmospheric turbulence.For each dataset, we built linear reconstructor models (described in Sec.2.1) and NNs (described in Sec.2.2).For clarity, we provide visualisations of the training and testing processes in Figures 3 and 4.

Linear reconstructors
For each of the experiments described in this paper, we produced a simulated dataset using the SCExAO bench as well as measuring a conventional response matrix for the linear reconstructor.The dataset was used to train the NN, and to evaluate the success of both the NN and linear methods.This allows us to benchmark the NN performance against linear reconstructors.For the low-order Zernike set, a response matrix was acquired by a classical push-pull method around the best-effort flat wavefront used as reference on SCExAO (which provides ⪆ 97% H-band Strehl).The 14 first non-piston Zernike terms were probed, as shown on Fig. 1.These also formed the basis for wavefront generation for our dataset.The system control matrix, which lets us retrieve DM-space wavefront maps from normalised PyWFS measurements, was obtained by direct inversion of the response matrix.The restriction to 14 modes against a high-order PyWFS guaranteed proper conditioning of the matrix.
For the turbulence-like dataset, we used the existing AO control software used for SCExAO operations, the Compute-and-control for Adaptive Optics (CACAO 49 ) software.A more complete description of the AO calibration process using CACAO can be found in Ref. 49.A response matrix is measured from push-pull sequences of a basis of Hadamard modes; the control matrix is obtained through a block-wise double diagonalisation of The first 14 modes obtained through the CACAO response matrix calibration method, which is used as a benchmark for the neural network method.The color scale is in units of microns.These modes are normalised to 1 rad RMS wavefront at λ = 750 nm. the response matrix, applying singular value decomposition (SVD) truncations on each of 15 blocks spanning ∼1500 modes.For illustrative purposes, we plot in Fig. 2 the 14 lowest order modes obtained through this double-diagonalisation inversion process.

Neural network implementations
For comparison with the MVM linear reconstructors, we implemented and trained dense NNs.Two models were evaluated -one which predicted Zernike mode coefficients, and one which directly predicted the 50×50 DM actuator map.Our NNs followed a simple architecture and comprised of fully connected layers of neurons.The input to each network was 14 400 flattened pixel values from a 120×120 PyWFS image and the output was either the set of mode coefficients or the set of 2500 DM actuator values.
Reconstruction in the Zernike mode basis was performed for the low order experiments using the same basis that was used to generate the wavefronts (low-order Zernike-basis NN hereafter).This is advantageous for basic testing as it provided a complete reconstruction, with no loss of modal subspace between the introduced modes and the reconstructed modes.We later switched to perform wavefront reconstruction in the actuator basis where the network directly outputs stroke commands for each actuator (low-order actuator-basis NN ).This is more realistic, because typically the optimal mode basis is unknown.Additionally, the actuator basis is complete and can be easily extended to real seeing data, which we demonstrate in Section 4.
The number of hidden layers were adjusted per experiment and their exact architectures are detailed in the respective sections of this paper (Section 3 and Section 4).The NNs were implemented with a ReLU 50 activation function, which gave them the advantage over the linear reconstructors as it allowed the NNs to model nonlinearities.To improve model performance, batch normalisation 51 was used between the hidden layers.All networks were implemented using Keras, 52 with the Tensorflow 53 backend.Training and predictions were run on an Nvidia 2080Ti Graphics Processing Unit (GPU).
A key advantage of NNs over linear reconstructors is that they have much more powerful regularisation capabilities. 54Regularisation of the MVM models was achieved by truncating singular values in the SVD computation as the small singular values tend to encode noise.NNs can represent more complex relationships, and so regularisation is key to good performance by preventing overfitting.Methods of regularisation used in regression L1 (lasso), 55 L2 (ridge) or a combination (elastic net) are often applied to NNs.L1 regularisation penalises the norm of the weights in the NN, which encourages sparsity and L2 regularisation penalise the square of the weights in the NN, which encourages edges in the NN graph to be close to 0. Elastic net is a weighted combination of the L1 and L2 regularisation.A more popular method of NN regularisation is dropout, 56 in which neurons (or nodes) in the NN are randomly removed i.e., 'dropped out' during training so that the NN doesn't become heavily reliant over a subset of the neurons, and effectively 'trains' sub-networks within the NN.As a result, the NN behaves like an ensemble of smaller networks at test time.To determine the optimal method of regularisation and for hyperparameter tuning we performed holdout validation.This is where we put aside a small portion of the data during model training so that we could evaluate our models on unseen data and select For each dataset, we built 3 models: the MVM, Zernike-basis NN and actuator-basis NN.The MVM was 'trained' by calculating a response matrix as described in Sec.2.1.The NN models were trained via backpropagation on a dataset consisting of 70 000 pairs of wavefronts and their corresponding PyWFS measurements.The Zernike-basis NN was trained on wavefronts represented by Zernike mode coefficients while the actuator-basis NN was trained using the pixel values of the wavefronts.At test time, all three models were used to reconstruct wavefronts from PyWFS images, which were then compared to the true wavefronts.The results of these models for each modulation are shown in Figure 6.1: L1 regularisation hyperparameters for the NNs trained in Section 3.These values were selected using holdout validation.
the model with the best generalisation capabilities.We found that in all experiments L1 regularisation was the most effective.
NNs work well in practice, but often with little insight into how or why.To gain an understanding of what a NN might learn from our data, we also implemented a bottleneck-NN.This is a NN where the network has its smallest layer (least number of neurons) at the middle of the network, hence creating a 'bottleneck' through which information passes through.To gain insight into how a NN 'thinks' the bottleneck can be examined to reveal learned representations of the network, because the bottleneck forces the NN to learn an efficient representation of the information it carries.This network, and its results, are described in Section 3.4.

Dataset
100 000 wavefronts were generated from a linear combination of the first 14 Zernike modes where the coefficient, c, for each mode was generated from a uniform random distribution (c ∼ U[−1, 1]).Wavefront maps were then scaled such that the wavefront RMS of the samples was approximately uniformly distributed between 0 and 3.25 radians.These samples were broken up into 70 000 training samples, 10 000 validation samples and 20 000 test samples.

Model hyperparameters
Reconstructing in the Zernike mode basis only required small NNs as the model only had to output 14 mode coefficients.These low-order Zernike-basis NNs had three hidden layers with 3000, 2000 and 1000 neurons respectively and had a learning rate of 10 −5 .Reconstructing in the actuator mode required larger networks due to the increased complexity of the basis.These low-order actuator-basis NNs had four hidden layers with 10 000, 7000, 5000 and 5000 neurons respectively and had a learning rate of 10 −3 .We applied L1 regularisation to the weights of these models and have tabulated the corresponding hyperparameter in Table .1.These networks were trained for 2500 epochs.

Discussion and results
Results showing the reconstruction accuracy for the 25 mas pyramid-modulation dataset are shown in Figure 5.These heatmaps visually show the RMS reconstruction error against wavefront across the 20 000 test samples.Here we define RMS reconstruction error as σreconstr.where the PyWFS response is the most linear.However, the NNs are seen to work successfully even at large wavefront RMSs, despite the PyWFS' nonlinearities.All NNs have very similar performance, which implies that the actuator basis is as effective as the optimal Zernike mode basis.
where pred i and true i are the wavefront value (in radians) for the ith actuator number, for a total of N active (i.e.unobstructed) actuators.
Figure 6 shows a summarised view of these reconstruction errors alongside those for the 50, 75 and 100 mas modulation datasets.In Figure 6, the point at which the PyWFS response becomes nonlinear -and the reconstruction accuracy for linear models rapidly decreases -is clearly visible as the 'knee' of the plot.As expected, the onset of nonlinearity occurs at a larger wavefront RMS as the modulation radius is increased.
An important result that can be observed from Figure 6 is that the NNs are able to handle the PyWFS nonlinearities induced by introducing a wavefront whose gradient reaches into a saturated regime.Further adding to this result is that the NN offers better performance over the MVM at all modulations and all wavefront errors.This implies the NN may be suitable for wavefront reconstruction from a pyramid with no modulation and, therefore maximum sensitivity.We considered this to be beyond the scope of this paper as the operation of SCExAO at 0 PyWFS modulation is technically more involved as it is not typically used.Additionally, at the time the dataset was obtained, the tip-tilt drift was too large to allow for consistent unmodulated dataset, but we hope that it will be the subject of future work.
Figure .6 shows a clear offset in the RMS reconstruction errors, with the best predictions plateauing at ∼0.1 radians RMS reconstruction error, even as the applied wavefront RMS approaches zero.This is consistent across all model types.Inspection of the MVM model trained on the 25 mas dataset showed that the average reconstruction residual map for samples with wavefront RMS < 0.5 radians was found to have an overall wavefront RMS of 4.5 × 10 −2 rad, with a maximum standard deviation for a given pixel to be 0.125 rad, which is relatively small and indicates high confidence in our mean residual maps.We obtained similar results when inspecting the other datasets and implies that there isn't a systematic bias in the reconstructions and that the offset is consistent with measurement noise.
It should also be noted that there is a small difference in offset between the MVM models and the NNs.While it is expected that both models should offer the same performance a low wavefront RMS where the PyWFS response is linear, we must remember that the MVM has been globally optimised to perform well at all wavefront RMS.It is likely that the MVM sacrifices some performance at low wavefront RMS to achieve better performance at high wavefront RMS.
A selection of representative wavefront samples are shown in Figure .7, and shows reconstructions for wavefronts with various RMSs.
Another important outcome of this demonstration is that the actuator-basis NNs performed equally as well as the Zernike-basis NNs.As the Zernike-basis NN was, by design, optimal on the Zernike dataset, this result implies that the NN models are agnostic to the mode basis.This means an actuator basis should be suitable for reconstructing arbitrarily shaped wavefronts and additional efforts into choosing an optimal mode basis may not be necessary.

Bottleneck network
As mentioned in Section 2.2, to gain an insight into how our NNs 'think', we produced a bottleneck network, which we trained on one of the low-order datasets used in Section 3.This NN performed reconstruction in the actuator basis but was designed with a central bottleneck layer with just 14 neurons, one for each mode used to generate the data.This encoder-decoder structure allowed for a mode 'basis' to be recovered by breaking the network at the bottleneck and then activating each of the 14 neurons, thus emulating the two-step structure of a dual-stage modal MVM reconstructor, from wavefront measurements into modal space and then into actuator space.The set of wavefronts represented within the bottleneck-NN were probed by maximally activating a neuron (inputting a large value) in the bottleneck layer while setting the remaining 13 neurons to 0. For each excited neuron in the bottleneck there was a corresponding wavefront mode produced by the network.The result is shown in Figure 8.This is analogous to the modal basis of a linear system, except in this case these 'modes' are created and combined in a nonlinear fashion to produce the output wavefront.Thus, these are not modes in the usual sense, but they do permit qualitative insight into the features the network is most strongly discriminating on.True Wavefront Predicted Wavefront These bottleneck modes bear a striking resemblance to the Zernike modes (Figure .1), with difference due to the network's ability to generate and combine these modes nonlinearly.It is interesting to observe that for many modes, the NN naturally trained to learn direct-Zernike terms rather than linear combinations thereof, which in a truly linear regime would be equivalent.We performed this experiment a number of times, and the resultant mode basis always resembled the Zernike basis, but with slight variations to the mode shape and with random changes to the mode order.

RECONSTRUCTION OF HIGH-ORDER TURBULENCE-LIKE WAVEFRONTS
The efficacy of the NN compared to the linear method was also tested on high-order wavefront error data, with data generated via a simulated atmospheric phase screen applied to the SCExAO DM.While for the linear model the standard response matrix was used, the choice of training basis for a NN is arbitrary.Any set of known input wavefronts can be used to train the model, as long as they broadly span a similar volume of wavefront space similar to that of the real, on-sky wavefronts.To that end, we simply used other randomly generated sets of Kolmogorov phase screens as the training data.

Dataset
The high-order data set was produced with wavefronts generated using HCIPy 57 to simulate real seeing conditions.We simulated a telescope with an 8.2 m diameter aperture (to match the Subaru telescope) and used a wavefront sensing wavelength of 750 nm.One dataset was produced with outer scale L 0 = 20 m to simulate general atmospheric conditions, and another dataset was produced with outer scale L 0 = 1 m to simulate residual seeing after initial low spatial frequency correction from the facility AO system (AO188), after which SCExAO is located.For simplicity and to ensure the absence of statistical correlation across the training set, we generated independent wavefront maps that were not temporally related.The amplitudes of these wavefront maps were then multiplied by a scaling factor to produce the desired distribution of wavefront RMS.
The datasets were taken at a PyWFS modulation of 75 mas, which is typical for SCExAO observations.Each dataset contained 220 000 samples, which were broken into 170 000 training samples, 20 000 validation samples and 30 000 test samples.The test and validation sets were constructed to contain a uniform distribution of RMS wavefronts between 0 and 3 radians, which was achieved by multiplying wavefronts by appropriate constants.In a similar fashion, the training set was constructed to follow a cubic distribution so that there were more training samples available for wavefronts with large aberrations.This was based on an educated guess as it is expected that more training samples were required for large wavefront RMS, and the optimal distribution was not know to us.When the full 50 × 50 wavefront images are analysed the wavefront RMS follows a true cubic distribution, however we re-examined these distributions after masking out the inactive pixels and retaining only the active pixels (corresponding to actuators in the pupil), and results in the adjusted cubic distributions shown in (Figure.9).

Model hyperparameters
The network trained on the L 0 = 20 m dataset had three hidden layers with 5000, 4000 and 3000 neurons and used L1 regularisation with hyperparameter λ L1 = 10 −9 .It was trained for 2500 epochs with a learning rate of 10 −4 .The network trained on the L 0 = 1 m dataset had four hidden layers with 3000, 2000, 2000 and 2000 neurons.This network was trained for 2500 epochs with a learning rate of 10 −3 and found that it performed best with no regularisation.Batch normalisation was used for both models.These networks were trained using the raw, un-normalised PyWFS images.
We chose NNs that were overly large and complex for the problem at hand so that regularisation could be used to automatically reduce model complexity for us and select a model of appropriate complexity.It was important that we didn't start with a model that was too simple, because a simpler model cannot become more complex, but a more complex model can become simpler through regularisation.
For the linear reconstructor benchmark, the control matrix was produced as per Section 2.1.The modes are roughly sorted such that as the mode number increases, so does the spatial frequency.The first 14 modes are shown in Figure .2. To prevent overfitting, the MVM was regularised by selecting the number of modes kept for reconstruction which was chosen using holdout validation.This has the effect of reducing the complexity of the MVM model and while it may perform worse on the training data, it should generalise better to unseen data.In this process, the reconstruction accuracy was measured for the same data but containing a varying number of modes.The number of modes which produced the most accurate reconstructions was then used for the subsequent experiments.For the L 0 = 20 m dataset this kept the first 136 modes and for the L 0 = 1 m dataset it kept the first 379 modes.

Discussion and results
Heatmaps showing the RMS reconstruction error versus the wavefront RMS are illustrated in Figures 10 and 11, and tabulated summaries are provided in Tables 2 and 3.The results clearly show that the NN outperforms the MVM method at all wavefront RMS for both datasets.The performance improvement of the NN over the MVM is much more pronounced for the L 0 = 20 m dataset than for the L 0 = 1 m dataset.This may be because the MVM algorithm is better optimised for the L 0 = 1 m dataset (i.e. for the wavefront properties seen following correction by the facility low-order AO system), whereas the NNs have been equally optimised for both datasets.
An obvious observation is that in both datasets the NN outperforms the MVM even when the wavefront RMS is small, i.e., when the PyWFS is operating in the linear regime.Initially, one would expect that both models should offer similar performance here, however it is important to recognise that the regularisation for the MVM was optimised on the entire dataset and most likely some performance on low wavefront errors was sacrificed in order to obtain a better performance on higher wavefront errors.
Additionally, reconstruction errors for the L 0 = 20 m dataset are smaller than the L 0 = 1 m datasets for both the MVM and NN.This indicates that the L 0 = 1 m dataset is more difficult to model than the L 0 = 20 m dataset owing to the higher spatial frequencies in the dataset.This is supported by the fact that the MVM required more modes to reconstruct the L 0 = 1 m than the L 0 = 20 m wavefronts.Additionally, the NN did not benefit from regularisation, which implies that it may not have been complex enough to model the trends in data.
One potential reason that the NN performs better on the L 0 = 20 m dataset rather than the L 0 = 1 m dataset is that they both have datasets of the same size but each wavefront in the L 0 = 1 m dataset contains significantly more information and is harder for the model to learn.More training data is expected to improve the performance of the model, so the results presented do not necessarily demonstrate the best performance a NN could theoretically achieve.
To further understand the performance of the MVM and NN models, we show some representative reconstruction samples in Figure .12. Like in Figure .7, we see that while the MVM does well to reconstruct the The first column of each subfigure shows the true wavefronts that were applied to the DM with the corresponding wavefront RMS amplitude in radians shown in white.For display purposes, these figures were cropped to show only the active pixels.The color scale is in units of microns.We also see a similar effect to that in Figure .7 where the MVM method underestimates the wavefront amplitude as the aberrations increase.
wavefront profile, it tends to underestimate the magnitude of the wavefront, particularly when the wavefront errors become large.The NN does not appear to suffer the same problem.Additionally, the NN does appear to more accurately reconstruct the wavefront profile than the MVM, most clearly seen in the L 0 = 1 m examples in Figure 12.
Figure. 13 shows sample NN reconstructions without the masking of inactive actuators.Superimposed over the top is an outline of the SCExAO pupil.It is interesting to note that the NN somewhat sensibly extrapolates the wavefront outside of the pupil, a region where it has no information about the wavefront.In order for the network to inpaint outside the pupil, it has presumably learnt about the statistics behind the seeing, which it then uses to extrapolate the missing wavefront regions.That is, in the cases of spatial frequencies of a similar spatial extent to the missing regions, the set of wavefront solutions that maintain spatial continuity and obey the learned distribution is well constrained.

CONCLUSION
Here we have presented a number of experiments which compare the performance of the traditional linear, MVM-based method of wavefront reconstruction with a dense NN reconstructor.Test data was acquired using the SCExAO system's PyWFS, for a range of test wavefronts applied to the system's DM.A key finding is that the NN was much better able to handle nonlinearities that arose from the PyWFS in the presence of large wavefront aberrations.The NN performed well on all PyWFS modulation amplitudes.while the performance of the MVM control matrix quickly decreased as the modulation decreased.It is possible then that a NN could be used in conjunction with an unmodulated PyWFS, thus taking full advantage of its high sensitivity.
The MVM method and the NN were also compared using simulated seeing data, with and without simulated low-order pre-correction.It was found that the NN method would consistently perform more accurate wavefront reconstructions than the MVM method at all wavefront RMSs, suggesting it may offer improved performance under poor seeing conditions.While the results presented in this paper largely address an open loop AO system, where the WFS errors are large and more likely to push the PyWFS into the linear regime, these results still have important implications to a closed loop system.Firstly, nonlinear reconstruction at large wavefront errors makes it easier to close the AO loop during bad conditions.While in theory you can close the loop with any large wavefront error, convergence can be slow.Thus, it can be impractical (or impossible) to close the loop during bad conditions when seeing is varying extremely rapidly, but this can be alleviated by being able to handle large wavefront errors.Secondly, in closed loop operation, there are many modes that cannot be sensed by the PyWFS.These interact nonlinearly with the lower order modes that the PyWFS can 'see', which means that nonlinear reconstruction is still beneficial.
A clear next step is to test a NN reconstruction in closed loop, on sky.One requirement is for the NN to reconstruct on sub-millisecond timescales.Current benchmarks demonstrate latencies well below 1 millisecond on far more complicated NNs with 100s of layers, 58 but work needs to be done to implement this within existing AO software stacks.Either an existing low-latency machine-learning framework could be used (such as TensorRT 59 ), or since operations for a NN can be broken down into a series of matrix operations, each followed by an evaluation of the activation function for the neurons, current AO software could be adapted to implement specific models.An ideal test configuration would be to quickly switch between NN and MVM based algorithms while observing a given target, repeated for range of seeing conditions.
In addition to the absolute wavefront correction performance, the robustness of a NN model over different seeing conditions and across multiple observing epochs needs to be evaluated.NN models generally take some time to train (the models in this paper taking ∼10 hours to train on a single consumer-grade GPU) compared to an MVM control matrix calculation.Having one pre-trained model, but with a small additional amount of 'fine-tuning' training applied on sky, should be investigated.
The fully connected networks used in this paper are a very simple architecture.Further work in improving the NN model is required, including experimenting with different network architectures, such as a convolutional neural network, which is an ideal choice as it is designed for processing image data.The translation-invariance of these network's inputs may also make them robust against PyWFS misalignment.These also have fewer free parameters, though may take longer to train.Additionally, a time domain component can be included in the model to allow for predictive control and overcome latencies in the AO loop.Sensor fusion is also an important area of interest, but it is difficult for the MVM method to handle a variety of different data types from multiple sources.A NN however, should be much more amenable to this problem.

Figure 1 :
Figure 1: The first 14 Zernike modes (excluding piston) as used on the SCExAO 50 × 50 actuator grid.The color scale is in units of microns.Each mode is normalised to 1 radian wavefront RMS at λ = 750 nm.

Figure 2 :
Figure2: The first 14 modes obtained through the CACAO response matrix calibration method, which is used as a benchmark for the neural network method.The color scale is in units of microns.These modes are normalised to 1 rad RMS wavefront at λ = 750 nm.

Figure 3 :
Figure 3: Visualisation of the training and testing process on the low order Zernike wavefronts as described in Sec. 2 and Sec.3.For each dataset, we built 3 models: the MVM, Zernike-basis NN and actuator-basis NN.The MVM was 'trained' by calculating a response matrix as described in Sec.2.1.The NN models were trained via backpropagation on a dataset consisting of 70 000 pairs of wavefronts and their corresponding PyWFS measurements.The Zernike-basis NN was trained on wavefronts represented by Zernike mode coefficients while the actuator-basis NN was trained using the pixel values of the wavefronts.At test time, all three models were used to reconstruct wavefronts from PyWFS images, which were then compared to the true wavefronts.The results of these models for each modulation are shown in Figure6.

Figure 4 :
Figure 4: Visualisation of the training and testing process on the turbulence-like dataset as described in Sec. 2 and Sec. 4. For each dataset, we built 2 models: the MVM and actuator-basis NN.The MVM was 'trained' by calculating a response matrix as described in Sec.2.1.The NN was trained via backpropagation on a dataset consisting of 170 000 pairs of wavefronts and their corresponding PyWFS measurements.At test time, both models were used to reconstruct wavefronts from PyWFS images, which were then compared to the true wavefronts.The results of these models for each modulation are shown in Figures 10 and 11.

Figure 5 :
Figure 5: Heatmaps illustrating the RMS reconstruction error for different wavefront RMSs on the test data for a PyWFS modulated at 25 mas.(a) Linear reconstructor (b) low-order Zernike-basis NN and (c) low-order actuator-basis NN.The counts in these heatmaps are normalised by dividing the raw counts by the total number of samples.

Figure 6 :
Figure 6: Average RMS reconstruction error for the MVM and NNs for different pyramid modulations, for the low-order experiments.The MVM models are plotted with a solid line, the Zernike-basis NNs are plotted with a dashed line and the actuator-basis NNs are plotted with a dotted line.The width of the shaded region shows 1σ uncertainties and the colors indicate the PyWFS modulation.The MVM only performs well at a high modulationwhere the PyWFS response is the most linear.However, the NNs are seen to work successfully even at large wavefront RMSs, despite the PyWFS' nonlinearities.All NNs have very similar performance, which implies that the actuator basis is as effective as the optimal Zernike mode basis.

Figure 7 :Figure 8 :
Figure 7: Sample reconstructions from the MVM, the Zernike-basis NN and the actuator-basis NN on the 25 mas modulation dataset with the RMS reconstruction error in radians shown in white.The first column shows the true wavefronts that were applied to the DM with the corresponding RMS amplitude in radians shown in black.Note the differing color scales for each row.The color scale is in units of microns.

Figure 9 :
Figure 9: Distribution of the wavefront RMSs in the training data for the two outer scales tested, normalised so that the area under the curve is 1.There are more training samples at larger wavefront RMS where the mapping from PyWFS measurements to wavefront is more complex.Wavefront RMS (rad) 0-1 1-2 2-3 Model MVM 0.40 0.73 1.32 NN 0.18 0.31 0.48 Table 2: A summary of the mean RMS reconstruction error (rad) for different wavefront RMSs for the L 0 = 20 m dataset comparing the MVM (linear reconstructor) with the NN.

Figure 10 :
Figure 10: Heatmaps showing the mean RMS reconstruction error for samples of different wavefront RMSs, for the L 0 = 20 m dataset, which simulates general seeing conditions.

Figure 11 :
Figure 11: Heatmaps showing the RMS reconstruction error for samples of different wavefront RMSs, for the L 0 = 1 m dataset, which simulates wavefronts already corrected by a low-order AO system.

Figure 12 :
Figure12: Sample reconstructions from the MVM and NNs models with the RMS reconstruction error in radians shown in white.The first column of each subfigure shows the true wavefronts that were applied to the DM with the corresponding wavefront RMS amplitude in radians shown in white.For display purposes, these figures were cropped to show only the active pixels.The color scale is in units of microns.We also see a similar effect to that in Figure.7 where the MVM method underestimates the wavefront amplitude as the aberrations increase.

Figure 13 :
Figure 13: Typical samples from the L 0 = 20 m and L 0 = 1 m datasets.Left: True wavefront.The 50 × 50 array of actuator commands to apply a wavefront to the DM.Right: NN reconstruction.The NNs produce a 50 × 50 pixel map.Superimposed over these images is the SCExAO pupil.It is interesting to note that the PyWFS is blind to anything outside of the pupil, however the NN can makes a sensible estimate of the wavefront in these regions based on the statistics learned from the training dataset.