Neural networks as effective surrogate models of radio-frequency quadrupole particle accelerator simulations

Radio-frequency quadrupoles (RFQs) are multi-purpose linear particle accelerators that simultaneously bunch and accelerate charged particle beams. They are ubiquitous in accelerator physics, especially as injectors to higher-energy machines, owing to their impressive efficiency. The design and optimization of these devices can be lengthy due to the need to repeatedly perform high-fidelity simulations. Several recent papers have demonstrated that machine learning can be used to build surrogate models (fast-executing replacements of computationally costly beam simulations) for order-of-magnitude computing time speedups. However, while these pilot studies are encouraging, there is room to improve their predictive accuracy. Particularly, beam summary statistics such as emittances (an important figure of merit in particle accelerator physics) have historically been challenging to predict. For the first time, we present a surrogate model trained on 200 000 samples that yields < 6% mean average percent error for the predictions of all relevant beam output parameters from defining RFQ design parameters, solving the problem of poor emittance predictions by identifying and including hidden variables which were not accounted for previously. These surrogate models were made possible by using the Julia language and GPU computing; we briefly discuss both. We demonstrate the utility of surrogate modeling by performing a multi-objective optimization using our best model as a callback in the objective function to select an optimal RFQ design. We consider trade-offs in RFQ performance for various choices of Pareto-optimal design variables—common issues for any multi-objective optimization scheme. Lastly, we make recommendations for input data preparation, selection, and neural network architectures that pave the way for future development of production-capable surrogate models for RFQs and other particle accelerators.


Introduction
The replacement of highly accurate, but computationally costly, particle-in-cell simulations with surrogate models (sometimes called virtual accelerators) is a topical field of increased interest [1,2,3,4,5,6].Surrogate models use machine learning (ML) to create fast-executing virtual representations of a complex system like a particle accelerator.We can then use this surrogate model to, for instance, speed up (multi-objective) design optimization, or obtain real-time feedback during the commissioning, tuning, and running of the particle accelerator.The surrogate model is typically built from a neural network (NN) or some other statistical learning technique (like polynomial chaos expansion [2]).In the case of using the surrogate model as an autonomous tuning tool, training data can be obtained not just from simulations, but also by measurements from existing hardware [7].
The design of the IsoDAR (isotope decay-at-rest) project [8,9,10], a planned experiment in neutrino physics, is the primary motivation for this work.In IsoDAR, a compact particle accelerator produces a 10 mA proton beam that impinges with an energy of 60 MeV/amu on a beryllium target surrounded by lithium-7, producing electron antineutrinos (ν e ) with a well-understood energy distribution through isotope decay-at-rest [10] (as opposed to other experiments using decay-in-flight).The νe can then be measured in a nearby liquid scintillator detector via inverse beta decay (IBD).This configuration yields unprecedented sensitivity to so-called sterile neutrinos, hypothesized new particles thought to resolve νe deficits observed at experiments worldwide [11,12,13,14].The requirements for IsoDAR are 10 mA of protons on target in a continuous wave beam at 80% duty factor to produce about 1.15 • 10 23 νe over the course of 5 years.Paired with the planned 2.3 kiloton liquid scintillator detector (LSC) [15] in Korea, this will yield 1.67 million IBD events in the detector.
The IsoDAR particle accelerator comprises an ion source, radio-frequency quadrupole (RFQ), and a cyclotron [16,17].Surrogate modeling has proven invaluable to IsoDAR's development, allowing us to demonstrate the robustness of IsoDAR's cyclotron design [17] through uncertainty quantification [2], and to perform a small pilot study to investigate the use of surrogate models for RFQs [18].
In this paper, we expand upon work presented in Ref. [18] to build a neural network-based surrogate model of an RFQ, but rethink the RFQ parametrization to account for collinear effects in the feature space, physical RFQ design constraints, and incorporate variables previously hidden from trained surrogate models.We use these insights to generate an accurate surrogate model for a 32.8 MHz RFQ covering a wide design parameter space (subject to physical design constraints).We use the highly efficient Julia programming language [19] to train our NNs with a widely cast net for hyperparameter tuning and an unusually large batch size.
The working principle of an RFQ and the generation of training data for the surrogate model were discussed in detail in Ref. [18].We briefly summarize them in Sec.1.1, and Sec.2.1, respectively.In Sec. 2, we discuss our methods, including data preparation and how we enhance the predictive accuracy of beam summary statistics outputs like the beam emittance (an important figure of merit for beam quality), followed by our results for training of several NNs and optimization of the RFQ in Sec. 3. Finally, we discuss our results, general observations, and recommendations for future development of NN-powered surrogate models in Sec.4-5.Code relevant to this project is available on GitHub [20].

The Radio-Frequency Quadrupole
An RFQ is a multi-purpose linear particle accelerator element able to bunch (compress along the direction of movement) and accelerate a high-current ion beam, while keeping the beam tightly focused in the transverse direction [21,22,23].In an RFQ, an oscillating electric field is generated between four vanes (or rods).The arrangement for one cell can be seen in Fig. 1, where the length is β • λ/2, with β the ratio of the beam velocity to the speed of light, and λ the wavelength corresponding to the electromagnetic wave driving the RFQ.Typical RFQs comprise tens to hundreds of such cells, each defined by three main parameters (the focusing strength B, the phase Φ, and the modulation m).The IsoDAR RFQ is discussed in detail elsewhere [24,25].To optimize an RFQ, the parameters of each cell have to be fine-tuned to yield desirable beam output qualities like a high beam transmission.In this paper, we call the RFQ parameters design variables (DVARs) and the beam output parameters objectives (OBJs).

The Julia Language
Julia is an open-source dynamically typed programming language with significant performance improvements over other languages common in scientific computing like Python and Matlab [19].Julia's computational efficiency is especially apparent when performing costly calculations like neural network training, motivating our use of the language throughout this analysis and allowing for straightforward implementation of multi-threading and distributed computing to facilitate the completion of expensive computational tasks.In addition, Julia has built-in support for GPU programming using CUDA.jl[26,27], making the language an obvious choice for building and training series of NNs on both local (CPU) and remote (GPU) machines.

Data Generation
In this study, we reuse the dataset from Ref. [18], consisting of 217,292 samples, each representing a randomly configured RFQ with corresponding output beam parameters obtained from beam dynamics simulations using the well-established code PARMTEQM [28].The beam dynamics characteristics of an RFQ are fully described by a set of three main cell parameters (B n , Φ n and m n ) for each RFQ cell n, resulting in a total of 3n design variables.While historically the LANL Four-Section Procedure (FSP) has been used as a design strategy since the very first proof-of-principle RFQs at LANL at the end of the 1970s [22], over the years, modified approaches have evolved to improve RFQ performance for specific applications.In the FSP, the RFQ is divided into four sections with dedicated functions: a radial matcher, a shaper, a gentle buncher and an accelerator, with the overall aim of smoothly matching the beam to the structure, bunching it, and accelerating it with as little losses as possible.
As a first approach for the IsoDAR RFQ, a baseline design was created via an adaptation of the FPS framework, but with special characteristics: (1) in addition to the typical linear increase of the synchronous phase Φ and the modulation m in the shaper (called the "linear shaper"), a section of exponential increase is introduced (called the "exponential shaper"); (2) in the gentle buncher, not only the modulation but also the synchronous phase is ramped up exponentially (called the "exponential buncher"); and (3) the accelerator section is omitted due to the IsoDAR RFQ being intended for use as a dedicated pre-buncher.
For a machine learning based design optimization, a reduction of the number of the initially 3n design variables was pursued and thus a parametrization according to Fig. 2 was performed.This allowed to capture all crucial characteristics of the design functions while reducing the number of design variables to 14.While DVAR1 corresponds to the value of the (constant) focusing strength and DVAR2 and DVAR3 set the lengths of the linear and exponential shaper, the total slope and smoothness of the shaping/bunching effect are characterized by DVAR4-DVAR13.DVAR14 specifies the design energy at the RFQ exit, which determines the length of the RFQ.
Besides the 14 design variables (DVARs), the dataset from Ref. [18] contains 6 objectives (OBJs): beam transmission, output energy, RFQ length, and three beam emittances (one longitudinal, two transverse to the beamline).This data is summarized in Tab. 1.

Data Preprocessing
Unlike Ref. [18], we perform transformations on the data set, as some variables have values that directly affect values of others.In particular, for some upper bounds u i and buffers δ i , i ∈ {3, 5, 9, 12, 13}, after a subset of the 14 DVARs for the jth sample are drawn from their respective uniform distributions, the rest are acquired via: DVAR12 j ∼ Uniform(DVAR5 j + δ 12 , u 12 ) DVAR13 j ∼ Uniform(DVAR9 j + δ 13 , u 13 ) With: • δ 3 the minimum allowable length of the exponential shaper, • δ 5 the minimum allowable modulation increase in the exponential shaper, • δ 9 the minimum allowable phase increase in the exponential shaper, • δ 12 the minimum allowable modulation increase in the exponential buncher, and • δ 13 the minimum allowable phase increase in the exponential buncher.
Values of the δ i and u i can be inferred from Tab. 1. Having these buffers ensures that physically realistic RFQ sections are created.For instance, if DVAR3 j ∼ Uniform(DVAR2 j , u 3 ), it is possible that a randomly generated RFQ would have an exponential shaper component of zero length, which is nonphysical for the family of accelerators considered in this study.As a result, design variables like DVAR3 are not uniformly distributed.We can introduce a variable DVAR3 ′ that is uniformly distributed according to the following transformation of DVAR2, DVAR3, and constants: And likewise for the remaining pairs of features.A side-by-side comparison of distributions of DVAR3 and the transformed DVAR3' is shown in Fig. A1 in Appendix A.
Correlation matrices for the 14 design variables before and after the described pre-processing transformation of necessary DVARs are shown in Fig. 3.We call these transformed design variables decorrelated, since all correlations have been removed.Each feature (after the decorrelating step) was later scaled to have minimum −1 and maximum +1.
To assess the impact of decorrelating data on NN training and performance, we trained two identical neural networks on both correlated and decorrelated scaled datasets.We observed no significant difference in the training times or prediction accuracy between these two NNs; this is to be expected, NNs are, in general, over-parameterized so decorrelation of features is not generally a necessary step.Eventually, however, the top-performing surrogate model in this study will serve as an important component of a design optimization.It is easier in a prebuilt Bayesian optimization package to enforce the relationships between these DVARs listed in Eqs. 1 by working with decorrelated features; we can perform a multi-objective design optimization under no additional constraints (besides the constraint that the scaled, decorrealted features should have minima −1 and maxima 1).All NNs we present in the remainder of this study are trained on the uncorrelated data, and we recommend decorrelating the design variables as a best practice moving forward, particularly when built surrogate models will be plugged into any kind of optimization scheme.

Number of RFQ Cells as a Previously Hidden Variable
The models reported in Ref. [18] and early NNs built in this study have one thing in common: predictions of transverse emittances (OBJ5 and OBJ6) were significantly less accurate than those of the other objective variables.In Refs.[7,17] the authors also observe this for a cyclotron and the LCLS-II injector beam line, making the failure to accurately predict RMS emittance a problem worth exploring.In Appendix B, we show that for surrogate models similar to those presented in Ref. [18], patterns in the true and surrogate-model-predicted emittances indicate some underlying hidden variable should be included to resolve the asymmetry between the two transverse directions.
To this end, we extend the the dataset from Ref. [18] to include a fifteenth feature used for training the surrogate models produced in this work; namely, whether the number of RFQ cells is even or odd.As discussed in Sec.1.1, across any particular RFQ cell, the beam is simultaneously focused in one of the two transverse directions while being defocused in the other.The number of RFQ cells (specifically whether the number of RFQ cells is even or odd, which we denote as the "parity" of the RFQ in this work) is essential in resolving any asymmetries in the beam characteristics across the two transverse directions, such as the two transverse emittances ϵ x and ϵ y .Histograms of ϵ x − ϵ y for simulated RFQs with odd and even numbers of cells are shown in Fig. 4, further elucidating this concept.We added an extra column to the features: a binary variable denoting whether the total number of RFQ cells is odd or even.This "15th DVAR" can be obtained from the other 14 DVARs in a fully deterministic way, and this calculation becomes another preprocessing step of our data.The algorithm that we devised in Python is based on the work of Kapchinskii and Teplyakov [21] as well as Crandall and Wangler [29,30,31,32,23], calculating an ideal particle's energy gain for each RFQ cell, based on the parametrization of cell parameters shown in Fig. 2. Each preceding cell thus determines the position along the z-axis at which to evaluate the parametrization curves for the next cell.We tested the algorithm on all data in the training and validation sets and obtained a > 83% success rate in predicting even or odd number of cells (cf.Fig. 5).‡ The code for ‡ Feeding NNs the raw number of cells, rather than the binary parity variable, experimentally proved this routine is available in the compute cellnumber/compute cellnumber.pyfile on our GitHub [20].The ≈ 17% miss rate can be attributed to the fact that our Python script uses approximations for each full cell, while Pari, an RFQ design code used in sequence with PARMTEQM, integrates over the cell numerically by slicing it.A much slower, but equally legitimate way of calculating DVAR15 during preprocessing would be to perform a system call to Pari.exe and to read the corresponding output file.This would predict the parity of the cell number with 100% success.This is how the training set was generated.For the optimizer, we chose the fast Python-based algorithm as it is not restricted to Windows platforms and not export-controlled in the USA (i.e.only allowed to be used without restrictions within the USA).
It is important to note that the inclusion of this additional design variable does not impact the integrity of the surrogate model, nor does it significantly increase our surrogate model's computational overhead.The algorithm is fast-executing and the 15th DVAR follows from the other 14 DVARs fully deterministically.All NNs are trained on correct parity information; the approximation algorithm is only used during the optimization phase.During the optimization process, the use of the Python algorithm, which is incorrect ≈ 17% of the time, may lead to slightly longer time-tosolution and a larger error associated with the emittance results, however, this error is comparable with the NN MAPEs for the emittance in general.Furthermore, each point in the Pareto-front is ultimately calculated by a call to the Surrogate Model.We observed empirically that artificially swapping the parity from +1 to −1 has the effect of (approximately) swapping the predictions of ϵ x and ϵ y (i.e.moving from one branch to the other in Fig. B1), while keeping predictions on the other 4 objectives constant.Since the distribution of even and odd in the incorrectly calculated parities is about equal, the Pareto front will look approximately the same.Finally, the chosen Pareto-optimal point after optimization will have to be corroborated by the original simulation software, where incorrect parity predictions will be evident.The designer can modify the machine accordingly.Even with a 17% miss rate in predicting the parity of an RFQ, we still achieve substantial performance improvements over previous efforts to predict emittances.

Surrogate Model Reporting
Neural network performance is evaluated by computing R 2 scores and mean absolute percent errors (MAPEs).We cite the "aggregate R 2 " score, the average of the R 2 scores for each of the six response variables, to report the total accuracy of the model.This does not, however, give any insight into predictive accuracies of each of the six objectives separately: we use MAPEs on the unscaled data to handle this task.difficult for the NNs to use when determining the sign of ϵ x − ϵ y .We suspect that this is due to the fact that the number of cells in the virtual RFQs in this study is ∼ O(100), so performing the uniformminimum-and-maximum transformation degrades the ability of the network to resolve whether the number of cells was originally odd or even.

Training on GPUs
As part of our efforts to fully explore the Julia language, and to exploit the significant speedup coming from massive parallelization of matrix algebra, we migrated NN training to GPUs.Here we used two NVIDIA A30 Tensor Core GPU cards with 24 GB of VRAM each, combined with Intel Xeon(R) Silver 4310 12-core CPUs (we assigned one thread per GPU) on the MIT "SubMIT" cluster [33] for training "RFQNet1" and "RFQNet2" (see below).We used CUDA version 11.6.While we do not present a GPU benchmark, we can point out that the speedup over CPU training was significant.

Implementation of Neural Network Training and Hyperparameter Scans
Operating on the full decorrelated dataset, we use Julia to implement grid search scans over NN architecture hyperparameters; namely, width and depth.We hold batch size, learning rate, and activation functions constant for this study but plan to expand future searches to these hyperparameters, as well.All NNs were trained using mean squared error as the loss function.Scanned NN hyperparameters are summarized in Tab. 2. Neural network training was performed primarily using the Flux.jlpackage, an open-source Julia library used for training deep ML models Ref. [34,35].NNs of different architectures were trained in parallel to reduce script runtime.Results from Ref. [18] indicate that neural networks trained with the largest batch sizes were desirable.While larger batch sizes may be preferred in principle to show NNs more data per training step, they ultimately increase total training time [36].The use of Julia allows us to feasibly consider training neural networks with larger batch sizes in a more reasonable amount of time.Hence, we choose to hold batch size constant at 1024, outside of the range of that explored by Ref. [18].While it is difficult to interpret why the surrogate models developed in this study seemed to prefer larger batch sizes, we suspect that because we are sampling the 14-dimensional feature space uniformly, each training step sees a more representative sample of the training data, allowing our networks to generalize better.Also, to fully leverage the parallelization capabilities of GPUs used for training these NNs, larger batch sizes are computationally preferred.Data was split into training and test sets (of proportions 80% and 20%, respectively).The test set was withheld from any analysis for the entirety of the scan, and was only used for computing the final test-set MAPEs shown in Tab. 3. We used the MLUtils.jlpackage (Ref.[37]) to perform 5-fold cross-validation, wherein each neural network was trained on an 80% subsample of the training data, with the remaining 20% used as a validation dataset to estimate out-of-sample model performance.NNs were not trained on this smaller subset of data.The validation sets used in each fold of the cross-validation procedure were disjoint.
Each neural network was constructed using a sigmoid activation function.The Flux.jl ADAM (Ref.[38]) optimizer was used to perform stochastic gradient descent in neural network training, using the default learning rate of 0.1% unless otherwise specified.All neural networks were trained for 2500 epochs, which appeared experimentally to be a reasonable threshold for convergence (one in which improvements in predictive accuracy began to plateau while preserving agreement between the validation and training set prediction errors).The results of this hyperparameter scan are reported in Sec.3.1, and the best performing of these networks is referred to as "RFQNet1".This network is summarized in table Tab. 4.
In prototyping of the surrogate models developed in this study, we also added dropout regularization to trained neural networks and tested different combinations of dropout rates and learning rates.The inclusion of dropout regularization, in general, negatively impacted the validation-set predictive accuracy of the networks in the worst case, while different objectives preferred different combinations of dropout and learning rates, in the best case.None of the tested networks outperformed RFQNet1 and RFQNet2 presented in this work.

Additional Design-Motivated Data Preprocessing
The RFQ dataset used both in this study as well as in Ref. [18] was generated by performing basic random sampling of the 14 design parameters within their respective specified value ranges, and determining the objective variables by simulating beam traversing the RFQ using a physics engine.However, generating RFQs in this way can result in RFQs that are especially undesirable and even unrealistic.Many of the virtual RFQs were determined to have an alarmingly low beam transmission of 60% or less.In the case that an accelerator engineer would like to use a surrogate model to optimize their RFQ design, it is reasonable to assume their RFQ's beam transmission will be much higher.We therefore perform the same hyperparameter scan outlined in Sec.2.6 on data culled to have simulated beam transmissions of at least 60%.The bestperforming neural network of this hyperparameter scan is referred to as "RFQNet2".RFQNet2's structure is compared to that of RFQNet1 in Tab. 4.
Due to RFQs being complex machines with many cells, and parameters like transmission depending on the interplay between the DVARs rather than being smooth monotonically changing functions, it is not possible to determine ab-initio which DVARs need to be restricted (and where, as there are many local minima) to limit transmission to > 60%.Thus, we cut based on the OBJ rather than the DVARs.Chosing a value of 60%, which is still far away from the desired transmissions of RFQs, gives the surrogate model enough margin for us to be able to perform optimizations.

RFQ Design Optimization
To demonstrate the utility of such ML-powered surrogate models, we used RFQNet2 as the surrogate model for the beam dynamics simulator to run an optimization of the design parameters of an RFQ like the one proposed in the IsoDAR experiment.Such an RFQ has the following optimal properties: • Maximal beam transmission • Minimal difference between output and target (70 KeV) energies Finding solutions to such a multidimensional optimization problem motivates the use of Bayesian optimization.The goal of such a procedure is to find the set of RFQs that are Pareto-optimal, where all points in the Pareto front are optimal in the sense that improvements in some objectives come at the cost of others.To this end, we make use of the package Surrogates.jl, as part of the open-source Scientific Machine Learning Initiative of Julia [39,40].This package allows us to use our surrogate model as part of the acquisition function in the optimization algorithm to select and evaluate data points in the feature space.As the algorithm selects each 14-dimensional data point to test for optimality, we take the intermediate step of employing the procedure outlined in Sec.2.3 to determine whether the number of RFQ cells will be odd or even to extend the feature vector with a fifteenth entry.To narrow the Pareto-optimal solution space to a set of RFQ design parameters feasible for the goals of IsoDAR's RFQ, we search for points in the Pareto set with beam transmission of at least 90% and as low as possible transverse beam emittance (ϵ x , ϵ y ≤ 0.04 cm mrad).Since longitudinal and transverse beam emittances are correlated, we expect that the longitudinal beam emittance will similarly be low.

Results of the Two Preliminary Hyperparameter Scans
Loss curves for the full-dataset training of the 9 different neural network architectures are shown in Fig. 6, top, and the validation set aggregate R 2 scores are summarized in Tab. 3, top.The NN with the highest validation set aggregate R 2 is the neural network with depth 5 and width 100, which we select over the 6 × 100 network due to the lower cross-fold standard deviation.This 5 × 100 neural network is referred to as RFQNet1.Complete validation set MAPEs for all neural networks generated in this scan are summarized in Tab.C1 in Appendix C. As discussed in Sec.2.7, limiting the dataset used in training and testing NNs to samples with a transmission of at least 60% is physically well-motivated.Loss curves for each of the neural networks trained on the data subset having transmission ≥ 60% are shown in Fig. 6, bottom, and the validation-set aggregate R 2 scores are summarized in Tab. 3, bottom.The depth 6 and width 100 neural network had the highest aggregate R 2 score, and is referred to as RFQNet2.The complete validation set MAPEs for similar NNs scanned are summarized in Tab.C2 in Appendix C.
The test set MAPEs for both RFQNet1 and RFQNet2 are compared to the bestperforming model from Ref. [18] in Tab. 5. Checks for overfitting to the data are shown by reporting the training and validation set MAPEs for each of the 5 folds performed in the cross validation for choosing the optimal architectures of the NNs, Fig. 7   RFQNet2 should be chosen over RFQNet1 for use as a surrogate model in a design optimization.

Design Optimization of IsoDAR's RFQ
As mentioned in Sec.2.8, we use RFQNet2 as a surrogate model for the multi-objective Bayesian Optimization of an RFQ suitable for IsoDAR's needs.This model was a depth 6, width 100 neural network trained on data having to beam transmissions of ≥ 60%.2D projections of the 6D Pareto front are plotted in Fig. 8.Of the hundreds of points computed to be Pareto-optimal, we narrow down the possible designs to ones having transmissions of at least 95% with radial and longitudinal beam emittances below 0.04 cm-mrad and 0.04 MeV • , respectively.We then weigh these remaining options according to their overall performance in energy matching and minimal RFQ length.The optimum's design variables and corresponding beam output parameters from this study are summarized in Tab. 6.
It is important to note that the boundaries and relationships between the first 14 design variables were chosen to ensure physical realizability of an RFQ.Because these were folded into the optimization procedure (via the decorrelation procedure described in Sec.2.2), the optimal RFQ is indeed a physically realistic design.6. Optimal design configuration and corresponding beam summary statistics of RFQ for IsoDAR's use case.This RFQ's design was created by running a multiobjective Bayesian optimization using the best-performing neural network, RFQNet2, as a surrogate model for the beam dynamics simulation.For the optimal set of design variables shown, RFQNet2's predictions for the beam summary parameters are shown alongside PARMTEQM's simulation results.

Discussion
In this work, we, for the first time, build a surrogate model (RFQNet2) for an RFQthroughgoing beam simulator that can achieve < 6% MAPE on predictions of the outgoing beam dynamics.The performance of this surrogate model by objective studied can be seen in Tab. 5. Significant improvements in prediction accuracy were achieved by training a series of neural networks with an unusually large batch size of 1024, and by unveiling a previously hidden variable: the number of RFQ cells, which is computed deterministically from the other 14 design variables.The test-set MAPEs are improved over previous work for each of the objectives studied; in the case of transverse emittances, the MAPEs are a third of those seen in Ref. [18].However, we should note that there is no one-size-fits-all NN that achieves the lowest possible MAPE for all objectives studied; this is emphasized by the boldfaced MAPEs shown in the tables in Appendix C.

Time Costs of Surrogate Models and their Development
The development of surrogate models, in this study, is meant to (1) reduce the computational burden that running high-fidelity simulations places on optimization routines; and (2) facilitate quasi-real-time feedback for accelerator operators during tuning and running a machine.The timescale of fine-tuning an RFQ can quickly balloon; a single run of PARMTEQM to compute the summary statistics of a beam traversing an RFQ of arbitrary design takes, conservatively, 30 seconds, while the surrogate models RFQNet1 and RFQNet2 execute in just a fraction of a second.We must point out, however, that the end-to-end development of the surrogate models presented do not, on their own, beat PARMTEQM at saving time for a single optimization.Data generation for the training and test samples took about a week of continuous running on a single workstation.Performing hyperparameter scans and cross-validation took another day of runtime.A single optimization run took about 1000 function calls to the surrogate model.
However, considering the lengthy process of designing an accelerator system (typically consisting of more elements than the RFQ) and iterations on design choices, it is not realistic to assume a single optimization run will yield the desired machine.Rather, tens of optimizations will have to be performed with ever changing output energies, beam sizes, etc. to match the RFQ to the previous and following acceleration stages.
Finally, an avenue of future study will be to understand the lower limit of feasible training sample sizes.Generation of the training data used in this study served as the main bottleneck, and a circumstance in which fewer training samples could be used to build surrogate models with comparable predictive accuracy may, eventually, prove faster than PARMTEQM.
The prototyping of an accurate surrogate model demonstrated in this study paves the way for future work using high-fidelity simulation codes based on the particle-incell (PIC) method, like OPAL [41] and WARP [42].As opposed to PARMTEQM, these codes can accurately calculate the behaviour of space-charge dominated beams, albeit at much higher computation time (it can take hours to compute a beam traversing a single RFQ).The replacement of these softwares with fast-executing surrogate models would result in orders of magnitude improvement in runtime [1].Moreover, even though PARMTEQM is much faster than these higher-fidelity codes, it would still be hard to use it in a real-time commissioning tool due to its ≈ 30-second runtime, while RFQNet1 or RFQNet2 are, by comparison, virtually instantaneous.One can envision a program that allows an accelerator scientist to make small adjustments to certain controllable RFQ parameters (such as the target energy or the electromagnetic fields responsible for beam focusing, shaping, and bunching) and see, in almost real-time, a description of the outgoing beam.Such a program could find use during the running of an accelerator, where rapid computations are necessary to ensure the successful operation of the RFQ, and would further justify the need for the development of surrogate models similar to those presented here.

Recommendations
The significant improvements in MAPE for top-performing NNs in this work over [18] lead us recommend the following avenues for future work: (i) Explore different neural network architectures for different objectives.Different objective variables preferred different NN architectures (indicated explicitly in the out-of-sample MAPEs for each objective in Appendix C).We can feasibly run hyperparameter scans for NNs aiming to predict one (or perhaps some subset of) objective(s) for additional boosts in performance.We are especially interested in implementing such models for the poorest-performing objective, namely, longitudinal emittance.
(ii) Limit datasets to physically realistic scenarios.A large boost in performance of the neural network was due in part to the rejection of data samples with simulated beam transmission below 60%.In the real-world commissioning of an RFQ, beam transmissions below 60% are unrealistic, and we expect that the quality of the simulation of softwares like PARMTEQM would deteriorate in these circumstances.While the design variable boundaries were set as large as practically physically applicable when generating the initial training dataset to cover a possible large number of (even unreasonable) RFQ configurations, ensuring physically realistic scenarios enhances general data quality and predictive accuracy.
(iii) Explore surrogate model architectures beyond fully-connected neural networks and summary statistic inputs.We emphasize that multivariate regressive tasks are not limited to just fully-connected neural networks, or that the only way to develop an RFQ surrogate model is by feeding the exact same 14 design variables that we reference in this study.We are especially interested in the use of Residual Neural Networks (ResNets) for improving the predictive accuracy of the general multivariate regression task, as well as the use of Convolutional Neural Networks (CNNs) that can be trained on pixelated "pictures" of the throughgoing beam, giving a more granular picture of the underlying physics.Recurrent neural networks (RNNs) may also be useful in propagating a beam iteratively through some number of RFQ cells.

Conclusion
We presented an in-depth investigation of surrogate modeling for RFQ linear particle accelerators.We especially scrutinized input data preparation, and unearthed hidden variables to address the predictive shortcomings of previous works.
Correlations between the design variables did not impact the quality of the model, though we still recommend decorrelation especially when performing Bayesian optimizations necessitating complex relationships between feature variables.
Choosing physically sensible limits and removal of nonphysical data points is essential.For example, we omitted all data with RFQ transmission < 60 % from NN training, as these data points are undesirable (and sometimes even nonphysical) in the context of particle accelerator engineering.
In order to resolve asymmetries between response variables that are otherwise symmetric, such as the two transverse emittances, it is essential to carefully prepare data to include some variable having this distinguishing power.For example, determining whether the emittance in the x or y directions were higher was only possible if we included whether the number of RFQ cells was odd or even.
With these considerations in mind, our best-performing model was a depth 6, width 100 neural network trained on a ≈160,000-subsample of >200,000 point dataset having 14 design variables, one additional deterministic feature, and 6 objectives, which performs exceptionally well compared to previous efforts of a similar kind (we reproduce all relevant objectives with a MAPE of < 6 %).To demonstrate the utility of this surrogate model, we performed a multi-objective optimization of the IsoDAR RFQ, and confirmed the optimized design using PARMTEQM (the simulation software used to generate the training data).
Our next steps are to test network architectures other than deep fully connected networks and to build surrogate models for other types of particle accelerators such as the IsoDAR cyclotron.These are important steps forward to creating a system of surrogate models that can speed up the development and real-time tuning of a multitude of particle accelerator experiments.Evident in these plots is the fact that the "double band" structure discussed in Fig. B1 is not recovered.These predictions were generated by a neural network that did not include the number of cells as a fifteenth design variable.

Appendix C. Out-of-sample MAPEs for scanned neural networks
Since the validation-set R 2 of each of the 9 NN architectures presented in Tab. 3 do not account for the predictive accuracies of each of the 6 objective variables separately, we report the MAPE for each objective for each neural network archtecture scanned, as the cross-fold average ± standard deviation.No one neural network observed the lowest validation set MAPE for all objective variables, highlighting the conclusion that different neural network architectures may be better equipped to predict different objectives.

Figure 1 .
Figure 1.A single RFQ cell with electric fields in quasi-static approximation.Left: Front view (beam and z-axis point into the paper).The green arrows indicate the focusing of the charged particle beam and the red arrows the defocusing.Right: Side view (beam propagates from left to right).Cell geometry parameters are shown, and the electric field is indicated by arrows.From Ref. [18].

Figure 2 .
Figure 2. The three main cell parameters (focusing function B, phase Φ, and modulation m), parametrized along the length of the RFQ by fourteen DVARs (DVAR14, not shown, is the design energy determining the overall length).From Ref. [18].

Figure 4 .
Figure 4. Overlaid distributions of the difference in transverse emittances for samples having odd and even cell numbers.The clear separation of these distributions indicates the inclusion of the number of cells is vital to the prediction of transverse emittance.

Figure 5 .
Figure 5.Comparison of algorithmic predictions for the total number of RFQ cells based on the 14 other DVARs to ground truth values.

Figure 6 .
Figure 6.Training set loss curves (mean squared error) for scanned neural network architectures (Tab.2), trained on the complete dataset (top) and samples having beam transmission ≥ 60% (bottom).The solid line in the center of each curve represents the cross-fold loss mean, and one standard deviation is shaded above and below.

Figure 7 .
Figure 7.Comparison of mean absolute percent errors (MAPEs) on each of the 5 performed cross-validation folds on training and validation datasets for RFQNet1 (top) and RFQNet2 (bottom).RFQNet1/RFQNet2 are the optimal networks whose defining characteristics are summarized in Tab. 4. Nearly equivalent MAPEs for both training and validation sets point to models that are unlikely to be overfit to the training data.

Figure 8 .
Figure 8.The RFQ design optimal for IsoDAR's use case was computed by running a multi-objective Bayesian optimization scheme using the best-performing neural network as the surrogate model for the algorithm's acquisition function.2D projections of the 6D Pareto front are shown above, with the optimal design configuration (see Tab. 6) marked with an orange star.

Figure B2 .
Figure B2.Joint distributions of predicted x and y emittances (OBJ5 and OBJ6, respectively) for sample training and test sets.Evident in these plots is the fact that the "double band" structure discussed in Fig.B1is not recovered.These predictions were generated by a neural network that did not include the number of cells as a fifteenth design variable.

Table 1 .
Summary of design variables (DVARs) and objectives (OBJs) used for surrogate model training.* In uniformly drawing random RFQ design variables, certain relationships between these parameters must be satisfied to ensure the RFQ is physical.This is further discussed in Sec.2.2.

Table 2 .
Values chosen for initial hyperparameter scan in Julia.

Table 3 .
. The MAPEs for the training and validation sets are generally close enough that we can conclude that neither model is overfit to the training data.Since RFQNet2's MAPEs for each of the 6 objectives are equal to or lower than those of RFQNet1, Aggregated validation-set R 2 scores for each set of hyperparameters (Tab.2), for NNs trained on the complete dataset (top) and samples having transmission ≥ 60% (bottom).Each R 2 score is computed across all objective variables, but is not representative of the prediction accuracy of individual objectives; MAPEs for each objective for each NN architecture are shown in Appendix C. Small apparent differences in validation-set R 2 scores correspond to a few percentage-point differences in MAPEs.

Table 4 .
Summary of RFQNet1 and RFQNet2, whose architectures where chosen by the cross-validation results of a hyperparameter scan over neural network depth and width.R 2 scores correspond to those reported in Tab. 3.

Table 5 .
[18] set MAPEs for each of the 6 objective variables studied, compared to previous results in Ref.[18].Specifications of RFQNet1 and RFQNet2 are given in Tab. 4. These NNs were chosen via the hyperparameter scans summarized in Tab. 3, top and bottom, respectively.

Table C1 .
Cross-fold average ± standard deviation validation-set MAPEs by objective variable for each of the 9 NN architectures explored in the initial hyperparameter scan (Tab.2), trained on all available training data.RFQNet1 was selected to be the depth 5, width 100 neural network based on scores reported in Tab. 3, top, though this model was not the best-performing for each of the 6 objectives studied.The lowest MAPEs for each objective are shown in bold.

Table C2 .
Cross-fold average ± standard deviation validation-set MAPEs by objective variable, for each of the 9 NN architectures explored in the initial hyperparameter scan (Tab.2), trained on samples having transmissions ≥ 60%.RFQNet2 was selected to be the depth 6, width 100 neural network based on scores reported in Tab. 3, bottom, though different objectives preferred different NN architectures.The lowest MAPEs for each objective are shown in bold.