Bayesian experimental design and parameter estimation for ultrafast spin dynamics

Advanced experimental measurements are crucial for driving theoretical developments and unveiling novel phenomena in condensed matter and materials physics, which often suffer from the scarcity of large-scale facility resources, such as x-ray or neutron scattering centers. To address these limitations, we introduce a methodology that leverages the Bayesian optimal experimental design paradigm to efficiently uncover key quantum spin fluctuation parameters from x-ray photon fluctuation spectroscopy (XPFS) data. Our method is compatible with existing theoretical simulation pipelines and can also be used in combination with fast machine learning surrogate models in the event that real-time simulations are unfeasible. Our numerical benchmarks demonstrate the superior performance in predicting model parameters and in delivering more informative measurements within limited experimental time. Our method can be adapted to many different types of experiments beyond XPFS and spin fluctuation studies, facilitating more efficient data collection and accelerating scientific discoveries.


Introduction
Ever since the discovery of x-rays, considerable breakthroughs have been made using them as a probe of matter, from testing models of the atom to solving the structure of DNA.Over the last few decades with the increased number of synchrotron x-ray sources around the world, the application to many scientific fields has progressed tremendously and allowed studies of complicated structures and phenomena like protein dynamics and crystallography [1,2], electronic structures of strongly correlated materials [3,4], and a wide variety of elementary excitations [5,6].With the the development of the next generation of light sources, especially the x-ray free electron lasers (X-FEL) [7,8], not only have discoveries accelerated, but completely novel techniques have been developed and new fields of science have emerged, such as laboratory astrophysics [9][10][11][12][13] and single particle diffractive imaging [14][15][16].
Among these emerging techniques brought by X-FELs, the development of x-ray photon fluctuation spectroscopy (XPFS) holds particular relevance for condensed matter and material physics [17][18][19][20].XPFS is a unique and powerful approach that opens up numerous opportunities to probe ultrafast dynamics on timescales corresponding to the µeV to meV-energy level [21,22].As the high-level coherence of the x-ray beam encodes subtle changes in the system at these timescales, XPFS is capable of investigating fluctuations of elementary excitations, such as that of the spin [23].The fluctuation spectra collected using this method can be directly related back to correlation functions derived from Hamiltonians [24], yielding invaluable experimental insights for theoretical developments and in-depth understandings of the underlying physics.
Despite such breakthroughs, the critical dependence of XPFS on the rare experimental resource of X-FEL beamtime has prevented widespread adoption of such advanced XPFS measurements and hindered further scientific explorations.Targeting measurements in fluctuating dynamics are often complicated.For instance, in the study of ultrafast spin fluctuations, a multitude of excitation modes can exist simultaneously, leading to complex time-dependent signals that requires many delay-time measurements to accurately capture the oscillatory and decaying profiles that reveal the crucial physical information and that quantitatively inform model parameters.The scarcity of beamtime resources and complicated measurement signals underscore the vital importance of theory-informed and data-driven experimental design methods when utilizing techniques such as XPFS for the study of ultrafast fluctuations.
One promising avenue to streamline XPFS measurements is to couple data acquisition with Bayesian optimal experimental design (BOED) statistical methods [25][26][27][28] which have recently been harnessed in real experimental applications [29][30][31].To achieve the most informed experimental design, physically realistic forward computations of system dynamics directly from model Hamiltonian are critically important.However, such forward model evaluations are often computationally intensive and practical applications of BOED can become prohibitive due to the considerable number of forward model computations required for its distribution updates and utility function calculations [27,32,33].Therefore, the ability to perform fast and cost-efficient forward model computations is a key factor in the successful incorporation of BOED in XPFS measurements for studying ultrafast dynamics.It is also important to note that the conventional sequential Bayes update method could fail given poorly initialized prior distributions, which are typically based on human input [34,35].As such, the implementation of a distribution correction mechanism is also crucial to the successful application of BOED besides the rapid forward computation, ensuring that errors in the initial parameter estimation can be accurately identified and rectified during the data collection.
Some recent progress has hinted at the potential for achieving more computationally efficient BOED by incorporating machine learning techniques into the workflow of Bayesian experimental designs and parameter estimations [33,36,37].However, the combined machine learning (ML)-BOED methods tailored for XPFS and ultrafast dynamics have still not been the focus of these types of developments.In this work, we introduce a ML-enabled BOED approach, specifically designed to guide measurements of ultrafast dynamics with XPFS and one which will greatly drive advances in this field.For concreteness, we focus our interests on ultrafast spin fluctuations for a realistic Hamiltonian, modeled for XPFS.Our method relies on the use of a neural network as an efficient and accurate surrogate model for linear spin wave theory (LSWT).This model facilitates precise evaluations of distributions and utility function calculations and thereby enables LSWT-guided, real-time Bayesian design and estimation.Furthermore, with the automatic differentiable forward model, gradient descent (GD)-based parameter estimation become feasible and can be naturally incorporated into BOED to achieve distribution corrections and more robust parameter estimation under various experimental conditions.We demonstrate the performance of our method through a comprehensive benchmarking using simulated experimental data.Our approach provides a powerful tool that leverages both physical models and Bayesian analysis for real-time guidance in ultrafast spin fluctuation studies.

Spin excitation and its neural network surrogate model
In the work described here, we demonstrate our approach by choosing a specific spin model Hamiltonian relevant for van der Waals, and other 2D, magnets.This model contains a relatively complex parameter phase space in the spin interaction degree of freedom and uses an in-plane honeycomb lattice, as illustrated in figure 1(a).The spin Hamiltonian is where the J and ⃗ D = [0, 0, D] ⊤ represent the exchange and Dzyaloshinskii-Moriya (DM) interaction strengths between the nearest neighbors and second nearest neighbors, respectively.The J ⊥ characterizes the inter-layer exchange coupling and D z defines an easy-axis anisotropy for each spin.This model has been employed to describe inelastic neutron scattering observations of topological spin excitations in materials such as CrI 3 [38] and CrXTe 3 (X = Si, Ge) [39], where the DM interaction is considered responsible for the spin gaps at the Dirac points of the magnon dispersion [40,41].
We calculate the spin excitations of Cr 3+ with spin S = 3/2 ions on the honeycomb lattice and their dynamical structure factors S(q, ω) from the Hamiltonian in LSWT approximation with the SpinW package [42]; since for large spin, the magnetic excitation spectrum is reasonably captured by LSWT.An exemplary calculated magnon dispersion and S(q, ω) is shown on the left side of figure 1(b).
In addition to the magnon-related inelastic peaks in S(q, ω), we also include a magnon-unrelated peak near ω = 0 to the intermediate scattering function (ISF) calculation as the collected intensities in a real experiment will most certainly include those unwanted elastic or quasi-elastic scattering contributions, such Van der Waals-layered hexagonal lattice structure and spin interactions (the z-axis anisotropy is not shown for clarity).(b) Magnon dispersion ω(q) and dynamical structure factor S(q, ω) along a high-symmetry path (left) and its slice at the K-point (right).In the right panel, we add the Gaussian-shaped peak (dashed gray curve) to simulate magnon-unrelated intensities captured by detectors.(c) The absolute squared intermediate scattering function (ISF), |S(q, t)| 2 , calculated from both magnon-related and magnon-unrelated scattering signals at the slice of q = K.(d) Neural network-based surrogate model for experimental measurement s(t).The calculation from y to s(t) follows equation (2).(e) Two-dimensional histogram showing excellent agreement between neural network predictions and the ground-truth on the testing dataset.
as in structural or diffuse scattering signals, which can be modeled by a Gaussian-shaped peak for concreteness, as illustrated by the dashed curve in the right panel of figure 1(b).More details on this magnon-unrelated factor will be presented in appendix F.
The surrogate model for spin excitations is a fully connected neural network (NN).As shown in figure 1(d), the network model takes in Hamiltonian parameters x = [J, D] as input and predicts the dynamical structure factor S(q, ω) in the form of y = [ω 1 , ω 2 , S(q, ω 1 ), S(q, ω 2 )] at some fixed momentum vector q.Owing to the specific symmetry of the considered Hamiltonian [38,43], we restrict our attention to dynamics associated with the K reciprocal lattice point (the Dirac point) of q = [1/3, 1/3, 0] in this work, which informs the energy gap induced by the DM interaction [38,40,41].This corresponds to the scenario in an XPFS measurement where the detector is adjusted to capture speckle patterns generated over an area of momentum space which will provide the most valuable information.In figure 1(e), we find the trained NN model can predict the magnetic excitations for the testing parameter sets very well.More detailed discussions about XPFS and the neural network model are provided in appendices A and B.
To simulate time-dependent signals that are expected to be extractable from the squared contrast functions C 2 (q, t) determined by XPFS, we use the Fourier-cosine transformation to convert the dynamical structure factor S(q, ω) to the corresponding ISF, denoted by S(q, t).The transformed time-domain signal is then multiplied by an exponential-decay function with the constant magnon inverse lifetime Γ.The Γ is separated from the network output as the magnon lifetime and is not intrinsically connected to the Hamiltonian in the low-temperature limit of LSWT.The resulting function is then convolved with a rectangular function Π(t; a) = h(t + a/2) − h(t − a/2) with h(t) being the Heaviside step function, which approximates the effects induced by the finite photon-pulse width a.The mapping from y to S(q, t) is expressed as follows: For instance, the slice of S(q, ω) at the K-point (white dashed line in figure 1(b)) and its ISF are shown in the right panel of figures 1(b) and (c), respectively.It is worth mentioning that this transformation is differentiable with respect to the inputs, e.g.ω i and S(q, ω i ).Since the direct quantity derived from C 2 (q, t) is the squared ISF |S(q, t)/S(q, 0)| 2 , we denote the s q (t) = |S(q, t)/S(q, 0)| 2 and further s(t) ≡ s K (t) for brevity.The complete mapping from x to s(t) is displayed in figure 1(d).

Bayesian experimental design and parameter estimation
In this section, we briefly introduce the formulation of BOED in the context of XPFS measurements for magnetic excitations, based on the recent advancements in BOED [25][26][27][28]45].We assume the measurements are taken around the fixed K-reciprocal lattice point and at different delay times between two x-ray probe pulses, t.The central idea in BOED includes taking full advantage of previously obtained measurement data to make the most informed decisions, and estimating unknown parameters where an analytical model is assumed, as illustrated in figure 2(a).We adapted the Python implementation of BOED, optbayesexpt [28,45] for online parameter learning and experimental steering.We define the probability distribution of the model parameters after measuring n different delay-times as P n (x) = P(x|S n , T n ), where S n = {s 1 , s 2 , . . ., s n } and T n = {t 1 , t 2 , . . ., t n } are sets of measured data and time points, respectively.To find the next measurement that maximizes information gain, we define the utility function U n (t) as the P(s n+1 |t n+1 )-averaged Kullback-Leibler divergence of distributions between two scenarios: before and after making the measurement s n+1 at delay-time t n+1 .The complete expression of U n (t) reads where P n+1 (s|t) is a shorthand for P(s n+1 |t n+1 ).This expression can be computationally expensive to evaluate, but is well substituted by the variance σ 2 s (t; P n (x)) of simulated signals s(t) based on parameter distributions P n (x) [28].A detailed discussion about this simplification is provided in appendices C and D. Additionally, we consider a cost function of the form c(t ) 2 ] to keep the proposed measurements reasonably apart from each other, where t n ∈ T N represents previous measured delay-times, and the height and width parameters are h = 10 and w = 0.25 ps.Thus, the final effective utility function used in this work is which provides a rapid assessment of the key information contained in the original expression (3).The next suggested delay-time point to measure can be obtained by finding the maximizer, namely, Upon collection of the new data point s n+1 at t n+1 , the likelihood P(s|x, t) is then given by a Gaussian distribution with σ = √ η max(s n+1 , 1.0), where s n+1 and ŝn+1 denote measured and predicted signal values, respectively.Following the calculation of likelihood, the probability distribution of model parameters x after collecting n + 1 data points is obtained by applying Bayes' theorem, In figure 2(b), we depict the mean model prediction at the nth measurement step in the blue curve based on P n (x), with the shaded area representing the magnitude of the utility function U n (t).The suggested time delay for the subsequent measurement is selected to maximize this utility function, as denoted by the orange marker.Following the acquisition of the new data point s n+1 at t n+1 , parameter estimations are updated by applying Bayes' theorem in equation (6).The evolving statistics of parameters J and D through the measurement iterations are illustrated in figure 2(c).Additionally, the joint prior, likelihood, and posterior (right) distributions illustrate the application of Bayes' theorem for the selected iteration given the new measurement s n+1 at t n+1 .The heat-maps in (d) are obtained by kernel density estimation (KDE) from the original particle representations (particle locations and weights in the particle filter method), while the black markers in the prior and posterior distributions show the corresponding particle locations.© [2022] IEEE.Reprinted, with permission, from [44].
distributions of parameters x = [J, D] are presented in figure 2(d).These figures illuminate the step-by-step updating process for the parameter distribution P n (x) by iteratively applying equations ( 4)-( 6) as shown earlier in figure 2(a).

Automatic differentiation-enabled distribution correction
While BOED is effective in estimating parameter distributions from sequential measurements, it can be limited by chosen prior distributions, making it susceptible to poor initializations [34,35].To address this challenge, we take further advantage of our NN-based forward model by utilizing its automatic differentiation (AD) capability to conduct GD optimizations to update model parameters.
Specifically, the mean-squared error between measurements s N and model predictions ŝN (x) for each particle x is calculated as The gradient ∂L/∂x obtained from AD is then used to update each x through any gradient-based optimization algorithm.This process is indicated by the horizontal dashed gray line in figure 2(a).
In figure 3, we demonstrate the effectiveness of this GD-enhanced BOED strategy for correction of poor priors, where J was initialized from the uniform distribution J ∼ U(−2.5, −1) with its lower bound being higher than the true value J * ≈ −2.7, indicated by the dashed blue line.Initially, the standalone BOED fails to accurately estimate the parameter values correctly, as evidenced by the two plateaus in figure 3(a) preceding the gray markers.The parameter distribution is heavily skewed towards incorrect values, as depicted in figure 3(b).Although the resampling algorithm in the particle filter method succeeds in placing some particles near x * , their contributions are overwhelmed by the dominant population of incorrect estimations.With the application of AD-enabled GD optimization at the nth step, a greater number of particles are updated to areas close to the true value.While GD optimization does not immediately move all particles near the true values, it effectively overcomes the limitations set by the poor priors as demonstrated in figure 3(c).This step lays the foundation for BOED to converge to x * in subsequent measurement iterations.Here, we intentionally choose a smaller parameter space that excludes x * for the purpose of demonstration in figure 3. A sensible choice of the initial parameter space can be identical to the parameter space of training dataset, in which case this hybrid BOED-GD strategy remains effective as shown in section 2. 4.
In practice, this GD optimization can be performed intermittently to reach a balance between BOED and GD optimization, e.g. after a certain number of measurement iterations or upon meeting specific criteria.For instance, the GD optimization could be triggered when there are substantial discrepancies between experimental measurements and model predictions and when parameter estimations stagnate.It is worth noting the potential risk of underestimated parameter uncertainties brought by reusing the obtained data during the GD, since the prior distribution produced by GD is optimized to reduce discrepancies on previously measured delay times.However, the task of reaching a balance between achieving more robust parameter estimations and avoiding significant underestimation of uncertainties is left for future studies.

Benchmark results
In this section, we present the results on applying our proposed method to simulated experimental data and discuss its performance in comparison with other strategies.We conduct a thorough performance evaluation of four unique experiment steering strategies: random delay-time selection, sequential delay-time selection, and ML-BOED selections, both with and without GD.These strategies are tested over the same testing dataset containing 100 samples (different parameters x i = [J i , D i ]; see figure 1(e)) under six distinct experimental conditions, encompassing two x-ray pulse durations, a = {0.1,0.2} ps, and three different noise levels, η = {0.5, 1.0, 2.0}. Figure H4 showcases some examples of these distinct experimental conditions.
To further provide realistic features to the simulations, we add randomly generated Gaussian-shaped lower-energy peaks before calculating s(t) = |S(K, t)| 2 .These extraneous peaks S ext (K, ω) are generated based on the highest magnon peak where h ext and w ext are clamped random variables with details presented in appendix F.
The successful detection of spin excitations with XPFS will require this magnon-unrelated factor to have a weak enough or comparable intensity with magnon-related intensities to avoid obscuring the magnon-induced features in s(t).This could be achieved by particular instrument configurations, e.g.incident angles and photon polarization analysis, or looking at certain areas with weak elastic scattering signals in momentum space.The final s(t) is calculated from the ISF S (q, t) = S mag (q, t) + ˆ∞ 0 ˆ∞ 0 S ext (q, ω) cos (ωτ ) Π (t − τ ; a) dω dτ (7) at q = K, where the first term S mag (q, t) follows equation (2).Specifically, we first normalize s(t) such that s(0) = 100 and then introduce noise by sampling from the Poisson distribution, P(λ = s(t)/η), and multiplying the sampled signal by η.The varying noise levels can be seen as different signal collection durations.For instance, shorter collection durations would yield lower intensities and signal-to-noise ratios (SNRs).Given that we have normalized the signal based on s(0), any experiment-dependent variations in SNR can be addressed by adjusting η to align with the experimental SNR.
In each test run for a given strategy and condition, we gather data for 40 delay-time points, with more details provided in appendix E. This amount of measurements roughly equates to four to five 12-hour shifts at the Linac Coherent Light Source (LCLS) of SLAC National Accelerator Laboratory, which is a common allocation for most beam time proposals.To obtain more reliable benchmark results for the 4 different strategies, we conduct five runs for each set of parameters (J, D) under each of the six experimental conditions.All strategies are initialized with the same set of parameters for each test run.In the left panel of figure 4(a), we depict the mean absolute errors (MAEs) of J and D over 40 measurement iterations for one specific experimental condition of a = 0.2 ps and η = 1.0.Each curve represents the average of five repeated runs over 100 testing samples using each strategy.The distribution of MAE of parameters after the last measurement step is displayed in the right panel. Figure 4(b) shows the MAE averaged across all six experimental conditions.Here, we observe that the ML-enabled BOED strategies outperform the the sequential and random measurement strategies.Remarkably, the combined BOED-GD strategy results in the lowest online learning error.
In particular, the sequential mode displays significant convergence rates within roughly the first 10 iterations, but becomes plateaued thereafter.We attribute this behavior to the fact that, for this particular investigated system, the most informative range of delay times resides near t ≈ 1 ps, as indicated by the left two panels of figure 6 and further explained below.Given the time separation ∆t = 0.075 ps between two successive measured delay times in the sequential mode (refer to appendix E), the most informative range has been largely exposed to the sequential mode within the first 10 iterations.Furthermore, the relatively The correlation between initial and final estimation errors (distances between predictions and truths) for all test samples.higher magnitudes of s(t) encountered at smaller delay times also result in larger variations in the likelihoods.This rapidly narrows down the parameter distributions and makes the sequential mode more susceptible to getting trapped in local minima during parameter estimations.The lack of valuable information after the initial iterations and over-tightened distribution all together lead to the observed behavior of the sequential mode.
The detailed comparisons are presented in figure 4(c), where the top two panels provide the MAE after the last measurement iteration for J under two different pulse durations: top and bottom panels for a = 0.1 ps and 0.2 ps, respectively.Each panel depicts the MAE under three different noise levels for each strategy, namely, the left, middle, and right bars for η = 0.5, 1.0, and 2.0, respectively.In general, lower online estimation errors are expected for shorter pulse durations and lower noise levels.
To investigate the impact of parameter initialization on final estimation errors, we visualize the Euclidean distances, d = √ (J − J * ) 2 + (D − D * ) 2 , between predictions x = [J, D] and their true values x * = [J * , D * ] in figure 5(a).Generally, higher estimation errors are expected when the true parameters are located around the boundary of initialized parameters.Such a relationship can be reflected alternatively through the positive correlation between initial and final estimation errors (Euclidean distances between x and x * ), as shown in figure 5(b).
In addition to online learning accuracy, the informativeness of the data collected using different strategies is another significant factor to consider.We consider the absolute values of the derivative, |∂s(t)/∂t|, as an indicator of the informativeness of a measurement.In general, delay-times with higher absolute derivatives carry more information about curve profiles as they cover regions where s(t) changes more rapidly, especially at both sides of an extremum of s(t), as illustrated in 4(d).These measurements with higher |∂s(t)/∂t| may lead to better offline parameter fitting results, i.e. data analysis after collecting all data during the allocated beam time, since the extrema reflect physical information about the magnon excitation modes in the time domain.We summarize the comparison of the average informativeness in figure 4(e), which is derived from all 40 recommended delay-times over 100 testing samples across 6 experimental conditions.We find that the ML-BOED methods significantly outperform the other methods in terms of this informativeness indicator.More detailed comparisons per pulse duration and noise level are presented in figure 4(f).
To offer a macroscopic perspective, we plot the measurements suggested by each strategy from one run (out of their five test runs) across all testing samples as a function of measurement iterations in figure 6.The left two panels for ML-BOED methods are color-coded in blue and red by the parameters J and D, respectively, which have been zoomed-in to provide more detailed visualizations for the first 15 iterations, while corresponding complete views are provided in figure G3 (appendix G).It is noticeable that the suggestion patterns of the ML-BOED methods initially focus on local domains around 1 ps and subsequently branch out to cover the entire measurable domain.This implies that these delay-times around 1 ps contain at first the most critical information to narrow down the Hamiltonian parameterizations.However, it does not suggest that only the initial portion of delay times is important and one can simply measure the range sequentially, as evidenced by the stagnation of the sequential mode shown in figure 4(a).The sequence in which the most informative time ranges are measured plays a crucial role to correctly narrow down the parameter distribution.In subsequent measurement steps, the ML-BOED methods strategically avoid measurements in close proximity to these domains, as demonstrated by the blank areas in figure 6; instead, they explore broadly throughout both smaller and larger delay-times, and gradually take the most informative time ranges into account.Ultimately, the entire measurable domain is filled up as each parameter set necessitates measurements from distinct regions to capture corresponding nuanced features.In contrast, the sequential and random strategies yield either trivially linear or uniformly dispersed patterns as shown in figure 6, representing two distinct extremes in experimental measurement design.
The distributions of suggested measurements in the first few iterations can also illuminate where the most distinctive features of different parameter values appear in the time domain.In general, the patterns in the first few iterations displayed by the ML-BOED methods in figure 6 can be interpreted as an estimated significance ranking of the time domains, where a domain appearing in earlier iterations suggests a higher priority for measurements.Thus, in case there is no condition to run the proposed method online during data collection, an alternate application of our method could involve preparing these scatter plots from simulated data prior to the beam time.Then, delay-time measurements can be arranged based on the importance of each time domain informed by the pre-prepared scatter plots.

Discussion
In this work, we presented a ML-enabled BOED method for measuring magnetic excitations with XPFS.We leverage a NN-based forward model to serve as a surrogate model for LSWT in order to enable massive forward computations that are crucial to precise distribution estimations and utility function calculations in BOED.We further incorporate the capability of AD from our NN-based forward model in the BOED workflow as a complementary parameter optimization method.A thorough benchmarking conducted over 100 testing samples under six distinctive experimental conditions demonstrated the superior performance of the ML-enabled BOED methods, with the joint Bayes and GD strategy being especially noteworthy.
Despite these promising results, there is still room for further improvements in terms of broadening application scenarios and refining utility functions.Currently, our method is developed for suggesting delay-time measurements only.More advanced machine learning techniques can be adopted into the presented framework to predict more complicated spin excitations and guide more 'tuning knobs' (experimental parameters) in experiments.For instance, while the current work focuses on a single momentum vector q = K with the NN model predicting a vector including S(K, ω) for a fixed number of energies ω, the proposed method can be readily incorporated with alternative NN surrogate models.One example is that one could employ implicit neural representation-based forward models for S(q, ω) that takes in continuous momenta q and energies ω as inputs and outputs a scalar of the S(q, ω) as demonstrated in [46], which could be utilized to capture more excitation modes and guide measurements in the momentum space.
Moreover, the effective utility function is considerably simplified from the original Kullback-Leibler (KL) divergence-based expression and may not be a faithful representation for the full expression.Future work could explore policy optimization and reinforcement learning techniques for the development of faster and more powerful NN-based utility functions.Another interesting extension of our work would be considering additional interactions in the Hamiltonian, such as the Kitaev interaction, which provide a theoretical platform to perform estimation of competing parameters between nearly indistinguishable dispersion features [43,47].
This proposed framework combines conventional BOED methods and ML techniques in a synergistic manner.It allows for more informed experimental planning, harnessing the power of physical models and Bayes' theorem to collect more meaningful data and richer information within same amount of allocated beam time.Admittedly, the demonstrated case of the single q-point magnon calculated in this work does not fully necessitate a neural network surrogate model in some scenarios when the original forward calculation is efficient enough for real-time experimental planning and parameter estimation.However, the rapid inference speed using a neural network can provide additional benefits other than steering the experiment with generally slower, non-ML forward models, such as an increased number of particles in the particle filter for more accurate probability estimations or multiple realizations of parameter initialization for more robust performance.We present a prototypical work that can be readily extended to multiple or continuous q-points calculations with either adopting multiple neural networks for each q-point or using neural representations.Furthermore, the generality of the network modeling allows it to be trained with simulation data generated by more advanced computational methods, such as exact diagonalization (ED) or density matrix renormalization group (DMRG), to capture other fundamental spin excitations beyond that described by LSWT.Therefore, for experiments that measure extensive ranges of q-points and for material systems that require more advanced modeling, the integration of neural networks are expected to play a significantly more essential role.This paper sets the groundwork upon which these more complex applications can be built.
Finally, this method can be applied to guide measurements beyond magnetic systems and XPFS measurements, such as in guiding time-resolved resonant inelastic x-ray scattering with a surrogate model for ED [48,49].The increased information gain provided by this method will facilitate more efficient measurements and expedite the capture of complex physical phenomena.We expect the developed method to greatly benefit simulation-based experiment planning and eventually, to accelerate scientific discovery.

Appendix C. Details of Bayesian optimal experimental design
We will elaborate on each component of the BOED in detail in this appendix, which is basically an expanded version of section 2.2.Suppose that after measuring n different delay-times, we have the current probability distribution of the model parameters as P n (x) = P(x|S n , T n ), where S n = {s 1 , s 2 , . . ., s n } and T n = {t 1 , t 2 , . . ., t n } are sets of measured data and time points, respectively.The distribution is numerically represented by a group of discrete particles (parameters) x with weights to represent P n (x), which is known as the particle filter method [25,28,58].Here, the utility function U(t) is defined as the P(s n+1 |t n+1 ) averaged-distance of distributions between two scenarios: before and after making the measurement s n+1 at delay-time t n+1 .More specifically, it can be expressed through the KL divergence between P n+1 (x) and P n (x), where we used shorthand P n+1 (s|t) and P n+1 (s|x, t) for P(s n+1 |t n+1 ) and P(s n+1 |x, t n+1 ).The first term in equation (C1) represents the negative expected differential entropy of noise distributions only, while the second term is the differential entropy of P n+1 (s|t), i.e. the distribution of predicted signal at t being s with the current parameter distribution P n (x).In the ideal case where P n+1 (s|x, t) and P n+1 (s|t) are both Gaussian distributions, the utility function can be reduced to the form where σ η and σ s (t) are standard deviations of measurement noise and noise-free simulated signal.However, the second term can no longer be decomposed as the sum of two variances in the case of Poisson measurement noise.Despite this restriction, considering that σ s (t) reflects the parameter uncertainties in the measurable domain and taking into account the computational efficiency, we simply choose σ η , the first term is simply E x [− 1 2 ln σ 2 η ] + const.The dependence on x comes from the fact that the Gaussian approximation for a Poisson distribution has a varying standard deviation depending on the calculated signal value σ η (t; x) ∼ √ s(t; x).However, when the noise is Gaussian with a fixed variance σ 2 η , the first term reduces to − 1 2 ln σ 2 η + const.If the noise is Gaussian with a fixed variance σ 2 η and when the calculated noise-free s(t; x) from the parameter distribution P(x) is also Gaussian with variance σ 2 s (say, P sim n+1 (s|t) ∼ N (µ s , σ s )), the second entropy becomes H[P n+1 (s|t)] = 1 2 ln(σ 2 η + σ 2 s ) + const, owing to the fact that P n+1 (s|t) is the convolution of two Gaussian distributions.These assumptions leads to the expression U n (t) = ln(1 + σ 2 s /σ 2 η ).Strictly speaking, this decomposition no longer holds when Poisson noise is adopted, since now the noise distribution has signal-dependent variances.This can be demonstrated by the two blue curves in figure D2, which represents Gaussian approximated noise distributions when s = 35 and 65, respectively.However, we numerically show that P n+1 (s|t) can still be well approximated by a Gaussian distribution, as illustrated by comparing the black curve and the dashed gray curve in figure D2.This implies that we can still safely approximate H[P n+1 (s|t)] with 1  2 ln σ 2 total + const.
In practice, given that the σ 2 s can be directly calculated from P n (x), while σ 2 total requires additional calculations, and considering that the utility is a monotonically increasing function of σ 2 s , we simply take σ 2 s as our effective utility function.Since the noise variance σ 2 η (t) is no longer constant across the measurement domain, this simplification could lead to different suggested delay-times from those calculated from the original utility function.However, σ 2 s still provides the key information included in (3) and serves as a good effective utility measure as discussed in section 2.4.Further improvements on the effective utility function are left for future studies.

Appendix G. Full view of settings
We present the full settings suggested by two ML-BOED strategies in figure G3.Different from figure 6 in the main text, there is no cutoff in the x-axis for the left two panels.

Appendix H. Noise levels and pulse widths
The effects of different noise levels and pulse widths are demonstrated in figure H4.

Figure 1 .
Figure1.Illustration of the physical problem setup and the machine learning surrogate model.(a) Van der Waals-layered hexagonal lattice structure and spin interactions (the z-axis anisotropy is not shown for clarity).(b) Magnon dispersion ω(q) and dynamical structure factor S(q, ω) along a high-symmetry path (left) and its slice at the K-point (right).In the right panel, we add the Gaussian-shaped peak (dashed gray curve) to simulate magnon-unrelated intensities captured by detectors.(c) The absolute squared intermediate scattering function (ISF), |S(q, t)| 2 , calculated from both magnon-related and magnon-unrelated scattering signals at the slice of q = K.(d) Neural network-based surrogate model for experimental measurement s(t).The calculation from y to s(t) follows equation(2).(e) Two-dimensional histogram showing excellent agreement between neural network predictions and the ground-truth on the testing dataset.

Figure 2 .
Figure 2. Illustrations of Bayes optimal experimental design and parameter estimation.(a) A schematic illustration of machine learning-enabled Bayesian experimental design framework.The speckle pattern shown in the left panel is the magnetic scattering from a van der Waals magnetic system at the LCLS at the Ni K-edge.(b) New measurement is proposed by finding the time delay that maximizes the chosen utility function (the amplitude of blue shaded area).Blue and orange curves represents mean model predictions based on Pn(x) and P n+1 (x), respectively, for n = 8.(c) Mean values (solid lines) and standard deviations (shaded area) of key model parameters J and D versus measurement iterations.(d) Joint prior (left), likelihood (middle), and posterior(right) distributions illustrate the application of Bayes' theorem for the selected iteration given the new measurement s n+1 at t n+1 .The heat-maps in (d) are obtained by kernel density estimation (KDE) from the original particle representations (particle locations and weights in the particle filter method), while the black markers in the prior and posterior distributions show the corresponding particle locations.© [2022] IEEE.Reprinted, with permission, from[44].

Figure 3 .
Figure 3. Effectiveness of automatic differentiation (AD)-based optimization in correcting false parameter priors.(a) The pure Bayes method fails to correctly estimate parameter values due to the narrow prior, J ∼ U (−2.5, −1).(b) The resulting parameter distribution has most of its particles concentrated at some values away from the correct value.(c) After application of the AD-based optimization, more particles are brought closer to the true values.In (b) and (c), the pink star-shaped marker represents the ground-truth x * .

Figure 4 .
Figure 4.The summarized benchmark results over the testing dataset.(a) Online estimation errors versus measurement iterations for one tested experiment condition (a = 0.1 ps and η = 1.0); the right panels illustrate final error distributions at the last measurement iteration.(b) Final error distributions averaged from all six tested conditions a = {0.1,0.2} ps and η = {0.5, 1.0, 2.0}.(c) Detailed final error distributions for each tested condition.For each of parameters J and D, the top and bottom panels display results for a = 0.1 and 0.2 ps, respectively.Within each panel, three bars are shown for three noise levels, i.e. bars from left (shallowest color) to right (darkest color) correspond to η = 0.5, 1.0, and 2.0, respectively.(d) A representative example of |∂s(t)/∂t| and its original signal s(t).(e) Benchmark of mean |∂s(t)/∂t| by four investigated strategies averged from all six tested conditions.(f) Detailed comparison of mean |∂s(t)/∂t| for each condition, listed in the same order as in (c).

Figure 5 .
Figure 5. Effects of initial parameters on final estimation accuracies.(a) The estimation error visualized on true parameter locations.Each panel showcases 100 test samples, with the top-left panel colored by absolute error and subsequent panels by relative error, i.e. positive color indicates higher error than the top-left one.Circle size indicates error standard deviations across 6 conditions and 5 test runs.(b) The correlation between initial and final estimation errors (distances between predictions and truths) for all test samples.

Figure 6 .
Figure 6.Visualization of suggested delay-time measurements by different strategies.The Bayes-based strategies has two panels where the top panel is colored by values of J and the bottom panel by D, the positions of plotted points are same.The ML-BOED strategies have the same initial suggestion t0 since the same set of randomly sampled prior parameters is used throughout the 100 test samples for each test run.
Figure B1.Training and validation loss as a function of training steps.The black dashed line represents the step of smallest validation loss.

Table F2 .G3.
Marginalized prior distributions P0(xn) for the parameter xn.The parameter vector x reads x = [J, D, . . ., wext], and U (a, b) represents a uniform distribution from a to b. Complete visualization of suggested delay-time measurements by different strategies.The Bayes-based strategies have two panels, where the top panel is colored by values of J and the bottom panel by D, and the positions of plotted points are the same.

Figure H4 .
Figure H4.Illustration of noise levels and pulse widths.The top left panel shows the pristine simulated s(t) without noise and pulse duration convolution.The center and lower left panels gives corresponding examples on pulse duration convolutions with a = 0.1 and 0.2 ps, respectively, with the black dashed curves being the s(t) shown in the top left panel.The three right panels have different levels of Poisson noise, namely η = 0.5, 1.0, and 2.0, on top of the s(t) from the center left panel (the black dashed curves).

Table B1 .
Details about neural network architectures.
2 s (t) as our basic utility function in this work.A detailed discussion about this simplification is provided in appendix D. In practice, σ 2 Comparison between a representative P n+1 (s|t) and a fitted Gaussian probability density.We used P sim n+1 (s|t) = N (µ = 50, σ = 10) as the distribution of noise-free simulated signals, and the noise distribution is simply N (s, √ ηs) with η = 2.0.Two representative noise distributions centered at s = 35 and 65 are shown in blue curved.The black solid curve represents the final convolution results for P n+1 (s|t), which can be well approximated by a Gaussian N (µ = 49.25,σ = 14.13), shown in the dashed gray curve.P n+1 (s|x, t) in the first term represents the measurement noise distribution when the delay-time t and model parameter x are both fixed.Suppose P n+1 (s|x, t) is a Gaussian distribution, or a Gaussian-approximated Poisson distribution, with variance σ 2 s (t) is calculated by passing x ∼ P n (x), i.e. the particles (parameters) used to represent P n (x), into the NN Figure D2.n+1 (s|x, t) dx ds − ˆPn+1 (s|t) ln P n+1 (s|t) ds = ˆPn (x) [ˆP n+1 (s|x, t) ln P n+1 (s|x, t) ds ] dx − ˆPn+1 (s|t) ln P n+1 (s|t) ds, n (t) = − ˆPn (x) H [P n+1 (s|x, t)] ds dx + H [P n+1 (s|t)] .