Rapid retrieval of femtosecond and attosecond pulses from streaking traces using convolutional neural networks

Attosecond streaking is a powerful and versatile technique that allows the full-field characterisation of femtosecond to attosecond optical pulses. It has been instrumental in the verification of attosecond pulse generation and probing of ultrafast dynamics in matter. Recently, machine learning (ML) has been applied to retrieve the fields from streaking data (White and Chang 2019 Opt. Express 27 4799; Zhu et al 2020 Sci. Rep. 10 5782; Brunner et al 2022 Opt. Express 30 15669–84). This offers a number of advantages compared with traditional iterative algorithms, including faster processing and better resilience to noise. Here, we implement a ML approach based on convolutional neural networks and limit the search to physically realistic pulses that can be specified with a small number of parameters. This leads to substantial reductions in both training and retrieval times, enabling near kHz retrieval rates. We examine how the retrieval performance is affected by noise, and for the first time in this context, study the effect of missing data. We show that satisfactory retrievals are still possible with signal to noise ratios as low as 10, and with up to 40% of data missing.


Introduction
The capability to generate attosecond light pulses via the process of high harmonic generation (HHG) [1][2][3] has enabled the probing of ultrafast electron dynamics in matter [4,5] on their natural timescales and spawned the field of attosecond science [6,7]. However, future progress in this field, including the attainment of shorter attosecond pulses and the development of new attosecond light sources, e.g. from x-ray free electron lasers (XFELs), relies on precise knowledge of the electric field of the attosecond pulses. Such pulses, which are typically in the extreme ultraviolet (XUV) to soft x-ray range, cannot be characterised using techniques such as SPIDER [8], FROG [9] and d-scan [10] employed for visible to infra-red ultrafast lasers, the primary reason being the lack of suitably fast nonlinear processes in this spectral range. Hence, most measurements currently use the attosecond streaking technique [11] where the XUV pulse ionises atoms in the presence of a synchronised streaking laser field (typically NIR-THz) that modulates the final kinetic energy of the photoelectrons. Recording the photoelectron spectrum as a function of the delay between the two fields results in a 2D streaking trace (photoelectron energy versus delay) that contains sufficient information to fully reconstruct the fields of both pulses, but the information is encoded in a complex way which usually requires sophisticated iterative algorithms [12][13][14][15][16] to retrieve it.
Despite substantial effort in developing improved algorithms, this retrieval remains a challenging and error-prone process. For example, most methods assume that the spectrum of the XUV pulse is narrow compared to the ejected photoelectron's central energy (Central Momentum Approximation [16]) which greatly simplifies the retrieval, however this assumption is often dubious. Further, the retrieval process can be relatively slow. One of the fastest conventional iterative algorithms, ePIE [14], takes at least minutes to run, while other commonly used algorithms [15,16] take 30-90 min, which limits analysis of very large data sets and prohibits real-time pulse characterisation. Real-time characterisation refers to the ability to fully characterise the field(s) at or near the full repetition rate of the source. This is a requirement for real-time control and optimisation of the source, e.g. via feedback control.
Up until very recently, the bottleneck was the acquisition time of the streaking trace which requires the acquisition of data for a range of delays between the XUV and streaking pulses. This typically takes of order tens of minutes for sources operating at 1 kHz due to the need to average many shots at each delay for improved statistics. However, an attosecond source operating at 100 kHz was recently demonstrated [17] based on the latest developments in laser technology, with higher repetition rates almost certain to follow in the coming years. There are also single-shot implementations of attosecond streaking [18][19][20] which are required for characterising attosecond sources for which pulse averaging is not possible due to high shot-to-shot pulse variation, e.g. XFELS [21]. The bottleneck for real time attosecond pulse characterisation is thus becoming the time required for the retrieval. To match the kHz rates at which streaking traces can be currently generated requires new approaches, beyond the standard iterative algorithms.
With its success in numerous other fields, attention has turned to the use of machine learning (ML). The retrieval task can be viewed as a type of image identification problem (mapping streaking traces to fields) and therefore can leverage the power of neural networks (NNs) that are the workhorse for ML image analysis. NNs are large functions that are structured to mimic the way information passes through the human brain. The parameters controlling these functions are found by training the NN with large amounts of data for which the target (in this case the fields) is known. After training, unseen data can be fed into the NN which can then very rapidly return the result, with much greater tolerance to noisy data than conventional algorithms. NNs have been applied successfully to retrieve the fields of femtosecond laser pulses from 2D traces generated by FROG [22] and d-scan [23] pulse characterisation techniques.
In this paper, we present retrieval results from a purpose-build convolutional neural network (CNN) that has a similar retrieval accuracy to previous work but with substantially less training (four times fewer streaking traces were used) and achieves retrieval times of around 2 ms on a standard laptop without GPU acceleration. We also show that the retrievals are robust against noise and missing data.

Attosecond streaking
Attosecond streaking is a widely used technique for measuring the electric-field waveform E(t) of femtosecond to attosecond pulses [11,24]. It transfers the pulse information to the energy distribution of photoelectrons produced when the attosecond pulse (typically in the XUV region, but also in the vacuum ultraviolet range [25,26]) ionises a medium (usually in the gas phase). The photoelectrons are produced in the presence of a second, 'streaking field' (usually in the near-infra red) which modifies the photoelectron spectrum in a characteristic manner dependent on the relative phase between the two fields. For attosecond pulses generated via HHG, the streaking field is conveniently obtained from the HHG driving field that is typically of femtosecond duration and intrinsically synchronised with the attosecond pulse. By measuring the photoelectron energy (K) spectrum at different values of the delay (τ d ) between the two fields one can obtain a streaking trace s(K, τ d ) [27] which contains all temporal information about both pulses: Here, s(⃗ v, τ d ) is the probability density (in energy) of the photoelectrons (in an experiment this is the number of photoelectrons at each energy value K = 1 2 v 2 ), v is the photoelectron's velocity, I p is the ionisation energy of the atomic target, Φ G (⃗ v, t) is the quantum phase term that contains the information about the streaking field, ⃗ d p (t) is the dipole moment between the atom and the streaking field, and p = ⃗ v + ⃗ A(t) is the instantaneous momentum of the free electron in the laser field.
Because equation (1) is a 2D streaking trace, phase information is encoded in the differences between the spectra at different delays and therefore can be recovered, despite s(K, τ d ) being the modulus squared of a complex quantity. However, this is not a straightforward process and most methods of doing it use an iterative algorithm that circles over the same trace hundreds of times before converging to a result.

Frequency-resolved optical gating (FROG)-complete reconstruction of attosecond bursts (CRAB)
FROG [9] is a well-established laboratory technique to measure the amplitude and phase of femtosecond laser pulses. The technique cannot be straightforwardly extended to XUV pulses primarily due to the challenge of obtaining the necessary nonlinear material response in the XUV range. However, under certain assumptions, including the central momentum approximation, an attosecond streaking trace assumes the mathematical form of a FROG trace and therefore can be processed in a similar way using an algorithm known as FROG-CRAB [12]. The details of FROG-CRAB are beyond the scope of this paper, but it is a widely-used algorithm which we have used to benchmark our CNN approach.

Previous work using NNs
Within the last few years NNs have begun to be applied to field retrieval from attosecond streaking traces. In essence, a NN is a mapping function between input values and the output values, found by iterating over large quantities of known inputs and outputs. This can be applied to attosecond streaking by treating the streaking trace as the input, from which the NN is trained to retrieve the pulses as its output.
In the first application of NNs for attosecond streaking retrieval [28], the authors built a CNN which is a NN comprising a series of convolutional layers followed by a series of densely connected layers, a configuration commonly used for image processing [29]. The architecture is similar to that used for FROG phase retrieval using a NN [22]. The convolutional layers effectively reduced the size of their input streaking trace (301 × 58 points) to a more manageable vector length of 1024 that was then fed into the first of two densely connected layers, with final output a vector of length 290 representing the complex fields in the frequency domain (250 points for the XUV pulse and 40 points for the streaking pulse). They trained the CNN on a Nvidia Titan X Graphics card using 80 000 synthetic streaking traces produced by a physics model, taking 3.5 h. The CNN was then able to successfully retrieve the XUV field, with a sub-second processing time and eliminating the need for the CMA.
The same group subsequently demonstrated an alternate ML approach using a conditional variational generative network (CVGN) that can model all possible pulses that are consistent with a given streaking trace, and can thus provide a retrieval error [30]. Rather than outputting the complex fields themselves (250 numbers previously), the CVGN was set up to output just 9 numbers; the first 5 coefficients of the Taylor series expansion of the XUV spectral phase (which together with a measurement of the spectrum is sufficient to reconstruct the XUV pulses) plus 4 numbers parameterising the streaking field in the time-domain (its CEP, central wavelength, pulse duration and peak field strength) which was assumed to be an unchirped Gaussian pulse. Their network was trained on synthetic streaking traces with various levels of Poisson noise added, to better simulate experimental data which typically suffers from significant statistical noise due to low XUV photon flux levels. By considering the retrieval error from the CVGN when fed with both synthetic and real streaking traces, they found that a signal to noise ratio (SNR) of at least six was required to obtain a satisfactory retrieval.
In recently published work [31], the authors describe the implementation of a data pipeline for processing streaking traces using ML, including preprocessing of the traces to improve the performance of their NNs. They used common CNN architectures, including Google's GoogLeNet. For XUV and streaking field retrieval they trained their CNNs with 275 000 synthetic streaking traces. The exact details are not provided, but their CNNs were trained to output the vector potential of the streaking field and the XUV wavepacket. Performance in the presence of synthetic noise was compared to ePIE. They found that their CNNs outperformed ePIE for SNR values greater than 6, and reported retrieval times of 100 ms, though no details are provided on the processing power used.

NN
We built our own CNN which was primarily optimised for speed. To simplify its architecture and achieve a substantial speed-up, we model both the XUV and streaking fields as Gaussian wavepackets with the possibility of linear chirp, requiring ten parameters to describe both fields, i.e., the output vector of our CNN was length 10. Neglecting higher order phase distortions of the pulses (third-order and higher) is a good approximation for attosecond streaking experiments, since the laser pulses used for streaking are usually carefully optimised using separate laser pulse diagnostics to minimise spectral phase distortions and are typically close to transform-limited to ensure isolated attosecond pulse generation. Furthermore, it is well known that attosecond pulses from HHG exhibit a closely linear 'attochirp' , with a positive slope for the short trajectories and a negative slope for the long trajectories [32]. We are thus able to achieve a substantial speed-up by restricting the CNN to only search for physical solutions. An additional benefit of our parameterisation compared with parameterising the pulse spectral phase, as in [30], is that no separate measurement of the XUV pulse spectrum is required, which is advantageous for high speed processing. Our CNN consisted of six convolutional layers and two dense layers, all using the ReLU rectified linear unit activation function. Our architecture is schematically depicted in figure 1. It was built in Python using Keras,

Electric field representation
We describe our pulses as linearly-chirped Gaussian wavepackets: where the five parameters used in this description are E 0 the amplitude, ω 0 the central frequency, ∆ω = 4 ln(2)/∆t the full width at half maximum (FWHM) of the pulse spectrum (where ∆t is the FWHM of the transform-limited pulse intensity profile), ϕ the carrier envelope phase (CEP) and q a quantity describing the linear chirp in terms of the group delay dispersion (GDD) [33]: The output of our CNN is a ten-vector ⃗ y which contains these five parameters representing the XUV field and another five for the streaking field.

Training
We produced a total of 20 500 streaking traces out of which 18 000 were used for training and 2500 were used for validation during training. We did this by integrating equation (1) for different parameter vectors ⃗ y. These were constructed by randomly sampling the pulse parameters from normal distributions describing physically realistic ranges. For each trace (⃗ x), a grid of dimensions (50 × 20) was used, corresponding to 50 steps in energy, ranging from 70 eV to 140 eV, and 20 steps in time (delay), ranging from −3 fs to 3 fs. The training data was fed into the CNN to find the set of weights ( ⃗ θ) that minimises the cost function, C( ⃗ θ,⃗ x), which is a function that determines the difference between the predicted output ⃗ y pred ( ⃗ θ,⃗ x) and the true value ⃗ y true for a certain input streaking trace ⃗ x: where the y i are the individual entries of the two vectors and n is their length. The task can be thought of as defining a multi-dimensional error surface and trying to reach its global minimum [29]. To achieve this, we used a well-established gradient descent method (Adaptive Moment Estimation, ADAM [34]).

Adding noise
A key feature of any streaking trace inversion scheme is its robustness to noise. In [28], their network was trained with noisy samples and was able to achieve a similar retrieval accuracy as FROG-CRAB for narrow band XUV pulses. In this work, we have carried out a systematic study of the effect of different levels of noise on the retrieval accuracy. We added noise to streaking traces according tõ whereS 0 is the matrix corresponding to the noise-free trace, andÑ is a noise matrix sampled from a Gaussian distribution with mean of zero and a standard deviation σ which we could adjust to simulate different SNRs. Specifically, we calculated the meanS of the clean trace and set σ =S/100 and σ =S/10 to generate three data sets: the clean data, and SNRs of 100 and 10. Then we used each of these data sets to train a different CNN, which we label CNN1, CNN2 and CNN3, corresponding to clean data, SNR = 100 and SNR = 10, respectively.

Retrieval accuracy
To quantify the accuracy of the retrievals, we use the root-mean-square field error [17,35] where E true i and E pred i are the time-sampled true XUV field and the time-sampled XUV field predicted by the CNN, respectively, and N is the number of sample points. To ensure a reproducible value, the fields were normalised before calculating the error. Figure 2 shows three XUV retrievals with increasing ε field values (0.0012, 0.10 and 0.26) to provide a visual reference for this error metric. From this, we see that ε field < 0.1 corresponds to a high retrieval accuracy, while ε field < 0.3 corresponds to a satisfactory accuracy for most purposes.

Results
We generated 4000 new streaking traces not previously seen by any of the models (i.e. not shared with the validation set). The XUV pulse parameters were selected randomly from physically sensible ranges. We then added noise to these traces with a SNR of 100 and 10, resulting in three data sets: no noise, SNR 100 and SNR 10, each comprising 4000 traces. A single retrieval took on average 2 ms. Figure 3 shows the retrievals obtained by CNN1 (trained on noise-free data) when applied to the noise-free data set not previously seen by CNN1. We note in the right hand column of the figure, the excellent agreement between the XUV waveforms predicted by CNN1 and the true waveforms. This validates our methodology of predicting the XUV pulse parameters rather than the field itself. To quantify the agreement, we sampled the predicted and true waveforms at N = 1000 evenly spaced time points and used equation (7) to calculate the field error. For this noise-free data set, CNN1 achieved an average retrieval error of 0.102 ± 0.005. Hence the average agreement across the entire data set was similar to the centre plot in figure 2.

Resilience to noise
Having shown that that our CNN formulation is effective, we proceeded to investigate its resilience to noise. Figure 4 shows the reconstructions obtained from the three CNN models (CNN1-3). As expected, the performance is excellent in cases where the model has been trained with data of the same SNR as presented to it (rows (a)-(c), (d)-(f) and (g)-(i)). Row ((d)-(f)) shows the reconstruction when a model is presented with data that has a different SNR to its training data-in this case CNN1 trained on noise free data, presented with data with SNR = 100. The agreement is still satisfactory.
In table 1, we list the reconstruction errors for each CNN applied to each of the three data sets. These are plotted in a bar chart in figure 5. If the SNR of the experimental data is known in advance, the best reconstruction will be achieved with a CNN trained with data of a similar SNR. However, we see-with reference to figure 2 -that all the CNNs provide a satisfactory agreement. Not surprisingly, CNN2, which was trained on data with an intermediate level of noise, performs best overall, providing reconstructions ε field ≲ 0.13 for data with a SNR down to about 10. This is consistent with the finding of the previous studies [30,31].

Resilience to missing data
It is not uncommon for data to be missing or corrupted in an experimental streaking trace, e.g. due to laser drops-outs, electronic glitches or due to intentional skipping of delay positions. We have simulated such traces by randomly selecting columns and replacing them with zeros. Figures 6(a)-(c) show that when 10% of the columns are missing the model can still produce an acceptable reconstruction (ε field = 0.309), but the reconstruction is unsatisfactory when 40% are missing (ε field = 0.359), see insets (d)-(f). However, by substituting each missing column with the average of its adjacent columns, a satisfactory reconstruction is achieved (ε field = 0.313), as can be seen in (g)-(i).

Comparison to FROG-CRAB
Here, we applied the FROG-CRAB iterative retrieval algorithm to streaking traces from the three data sets with different noise levels in the same way that we did above, and summarise the results in table 2. FROG-CRAB took an average of 1.33 s for SNR 100 and 2.46 s for SNR 10 (cf 2 ms for our CNNs). The retrieval error for FROG-CRAB (≈0.38) was consistently higher than for CNN2 (<0.13) for the different  noise levels. The algorithm was run until the error between subsequent reconstructed traces stopped varying more than 0.001%, i.e. when further iterations stopped improving the reconstruction.

Conclusion
We implemented CNNs that were trained for pulse retrieval from attosecond streaking traces. By using physically realistic models of the pulses specified by a relatively small number of parameters, retrieval times of 2 ms were achieved on a mid-range laptop computer. Training times were reduced by about 20 times compared to previous work [28], with similar levels of retrieval accuracy. Our CNNs were able to make accurate retrievals of noisy traces and traces with substantial amounts of missing data. Though this work is computational, our simulated streaking traces are derived from an established physics model of streaking, and we included real world effects such as noise and missing data in these traces. This gives confidence in the effectiveness of our approach for accurately retrieving pulses from experimental data. We anticipate that this work will be useful for high throughput streaking analysis, e.g. real-time pulse retrieval for either light sources operating at a few kHz rates using single-shot streaking implementations, or light sources operating at much higher repetition rates where conventional multi-shot streaking implementations can deliver traces at rates of a few kHz. We note that the methodology is not restricted to photoelectron energy versus delay streaking traces-a CNN can be trained to retrieve pulse information from other 2D streaking images, e.g. photoelectron momentum maps from angular streaking measurements [20].

Data availability statement
The data cannot be made publicly available upon publication because they are not available in a format that is sufficiently accessible or reusable by other researchers. The data that support the findings of this study are available upon reasonable request from the authors.