An iterative deep learning procedure for determining electron scattering cross-sections from transport coefficients

We propose improvements to the artificial neural network (ANN) method of determining electron scattering cross-sections from swarm data proposed by coauthors. A limitation inherent to this problem, known as the inverse swarm problem, is the non-unique nature of its solutions, particularly when there exists multiple cross-sections that each describe similar scattering processes. Considering this, prior methods leveraged existing knowledge of a particular cross-section set to reduce the solution space of the problem. To reduce the need for prior knowledge, we propose the following modifications to the ANN method. First, we propose a multi-branch ANN (MBANN) that assigns an independent branch of hidden layers to each cross-section output. We show that in comparison with an equivalent conventional ANN, the MBANN architecture enables an efficient and physics informed feature map of each cross-section. Additionally, we show that the MBANN solution can be improved upon by successive networks that are each trained using perturbations of the previous regression. Crucially, the method requires much less input data and fewer restrictive assumptions, and only assumes knowledge of energy loss thresholds and the number of cross-sections present.


Introduction
Electron transport models are crucial to enable the predictive control of low-temperature plasma systems [1].Underpinning these techniques are the use of accurate and complete electron scattering cross-section sets.The derivation of scattering cross-sections is typically conducted through experimental and theoretical techniques, coupled with verification through swarm scattering experiments to ensure their validity [2].In regions where these techniques are limited, "educated guesses" and numerical techniques are often used to bridge the gap.This knowledge gap motivates the need for reliable and benchmarked numerical techniques to aid in the development of accurate and complete cross-section sets.
Here, we focus on the determination of cross-section sets from swarm data, otherwise known as the inverse swarm problem [3].Presently, two primary numerical techniques exist that aim to solve the inverse swarm problem: the iterative swarm technique and, more recently, the application of Artificial Neural Networks (ANNs).The iterative swarm technique first used approximate distributions, such as a Maxwellian or Druyvesteyn distribution, of the electron energy distribution function to calculate transport coefficients that are compared to experimental transport coefficients and improved iteratively [4,5,6].The accuracy of the iterative swarm technique was then improved with the inclusion of an accurate electron energy distribution functions derived from the solution of the Boltzmann equation [7,8,9,10].
A substantial limitation to solutions of the inverse swarm problem lies in its illposed nature.In particular, the existence of multiple cross-sections that describe similar scattering processes, such as similar threshold vibrational modes, results in substantial degeneracy of transport data for a given species [11,3].In this limit, the iterative swarm technique relies on the intuition and experience of an expert, which, along with the trial and error nature of the approach, results in an inefficient procedure that is difficult to reproduce.Several methods that attempt to automate this methodology have been proposed [12,13,14,15,16,17] to address this issue.Of interest to this study, is use of ANNs trained on existing cross-section data to determine scattering cross-sections for electron transport in gases.
In the early 1990s, Morgan et.al. [18] first demonstrated a solution to the inverse swarm problem through an ANN trained on example cross-sections and their associated transport coefficients.Recently, Stokes et.al. [3] revisited this problem utilising advances in network architecture, model size and available cross-section data to improve the network's predictive power.Since then, the method has been successfully applied to the improvement of tetrahydrofuran, α-tetrahydrofurfuryl and nitric oxide electron scattering cross-sections [19,20,21].Jetly et.al. [22] evaluated the performance of three network architectures for the regression of single electron-scattering cross-section and found that a DenseNet architecture resulted in the highest regression accuracy.
While the application of ANNs to the inverse swarm problem shows great promise, the method is constrained due to a number of key limitations.This study aims to address the following two limitations.First, we demonstrate that a conventional ANN limits the ability of the network to model multiple independent cross-section regressions when compared to an equivalent network that uses physics informed parallel branches of densely connected layers.Additionally, as the prediction of multiple cross-sections is limited due to the degenerate nature of the inverse swarm problem, we propose an iterative procedure to enable the network to incrementally explore the solution space of the problem.
In Section 2, we outline each modification to the network architecture and methodology before evaluating their performance using methane as a case study.We then summarise the results in Section 3.

Artificial neural network regression of cross-section sets
The application of ANNs towards determining complete cross-section sets through the inversion of macroscopic experimental data has been the focus of a recent project at James Cook University [19, 3, 20, 21, 23, ].While the technique has predominately been used to improve existing cross-section sets [19, 20, 21, ], the determination of complete cross-section sets for complex targets remains elusive due to the ill-posed nature of the problem.In this section, we present two modifications to the methodology that aim to improve the ability of the network to determine complete cross-sections from transport data.
First, to aid the network in representing the independent nature of each crosssection, we propose a Multi-Branch Artificial Neural Network (MBANN) and compare its performance through a regression of the cross-section set for methane recommended by Biagi [24].To reduce the impact of the non-unique nature of determining crosssection sets with multiple similar scattering processes, we then propose an iterative procedure that incrementally explores the solution space by using perturbations of the previous regression.To demonstrate the iterative procedure, we compare the initial regression and the best regression found for methane's cross-section set.

Multi-branch neural network regression
Stokes et.al. [3] proposed an ANN where each element of the output corresponds to a single cross-section.The simultaneous prediction of each cross-section ensures that the full set of cross-sections are self-consistent, which ensures an accurate replication of the target swarm transport data.In their investigation of various ANN architectures, Jetly et.al. used separately trained networks to enforce independent feature maps for each cross-section.The authors state that the simultaneous prediction of multiple cross-sections would force the network to share feature maps across different types of cross-sections and thus severely inhibit its predictive capability.
Here, we propose a Multi-Branch ANN (MBANN) to bridge the gap between the requirement for self-consistency and the desire for independent feature maps for each cross-section.That is, for each cross-section, there exists an independent block of dense layers that each extend from a single block of dense layers.Each parallel branch is then allowed to develop a feature set specific to a single cross-section while still ensuring each regression is conducted in context of the full cross-section set.
We utilise a MBANN of the form, ) where A i (x) ≡ W i x + b i are affine mappings defined by dense weight matrices W i and bias vectors b i , and mish (x) = x tanh (ln (1 + e x )) is a nonlinear activation function [25] that is applied element-wise.The final output, σ n , then represents the n th cross-section of interest within a set of N cross-sections.A n 3 , A n 4 and A n 5 form an array of N parallel branches that each utilise the output of A 2 to independently represent the n th crosssection.b n 3 and b n 4 contain 32 elements each, b n 5 contains 1 for each output n, while those in the initial two layers contain 128.The weight matrices are sized accordingly.
From previous investigations and a simple hyper-parameter optimization procedure outlined in Appendix B, we found that approximately 32 neurons in each parallel layer is required for a suitable regression of each cross-section with more neurons resulting in modest improvements.Here, we choose this minimum to isolate and demonstrate the differences between a MBANN and an equivalent ANN architecture.Each other parameter, such as the activation function and number of hidden layers, was chosen from a set of reasonable values using a comparison of validation accuracy and prior experience.A schematic representation of the MBANN architecture is shown in Figure 1.For comparison, we use an ANN of the form, where b 3 and b 4 now contain 32 × n elements while b 5 contains N elements to match the number of outputs.The ANN thus contains the same number of neurons per layer as the MBANN network outlined above.We note however that the number of trainable parameters is larger than that of the MBANN.Each cross-section is a function of the incoming projectile electron kinetic energy, ε, which, alongside the available swarm data, forms the input to the neural network, where n 0 , W, n 0 D L and k eff are the neutral density, bulk drift velocity, reduced bulk longitudinal diffusion and effective ionisation rate of the electron swarm evaluated at a number of reduced electric fields E/n 0 .
To train each neural network, we generate an appropriate set of physically plausible example swarm transport data using augmentations of cross-sections from the LXCat project [26,27,28].The data generation process and subsequent training procedure follows the method outlined by Stokes et.al. [3] with some modifications that are described in detail in Appendix A.
A total of 10 5 training iterations are performed, each consisting of a mini-batch of 32 cross-section sampled at 128 energies from a total of 2 × 10 4 training examples for each cross-section.For each training set, a multi-term Boltzmann equation solver was used to calculate W, n 0 D L and k eff at 80 log-spaced reduced electric fields between 10 −3 and 10 4 Td while the cross-section regression was conducted between 0.01 and 200 eV.A detailed outline of the training procedure can be found in Appendix B.
To demonstrate the improvements offered by the proposed architecture, we present a regression of methane's cross-section set for both the MBANN and an equivalent ANN network.The cross-section set was retrieved from the LXCat database [26,27,28] and originates from Biagi's Magboltz code (version v7.1) [24].In this work, we perform a regression of the elastic momentum transfer, total ionisation, total attachment, and each of the 6 excitation cross-sections.In addition, while it has been shown than an ANN can determine some energy loss thresholds [3], any target cross-section set that exhibits multiple similar threshold processes introduces a high degree of degeneracy.In this work, we assume knowledge of each energy loss threshold and leave their determination for future investigations.
A comparison between the resulting regression for both the ANN and MBANN architectures is shown in Figure 2. In each, the extent of the 100 best fits sampled during the training process is shown as a shaded region to provide an indication of the network's variability.The ANN regression resulted in a Mean Absolute Relative Percentage Difference (MARPD) of 2.2, 6.3 and 21 % for W , n 0 D L and k eff , respectively.The MBANN regression resulted in a comparable MARPD of 2.7, 5.2 and 21 % for W , n 0 D L and k eff , respectively.While each network exhibited a similar global accuracy in the replication of transport coefficients, non-physical fluctuations are present in the ANN regression of the elastic and excitation cross-sections between 0.1−0.4eV and 7−20 eV.It is clear that the large gradients present in these regions, due to the energy loss thresholds of 0.162, 0.363, 7.5, 9.1, 12.36, 15.5 and 15.5 eV, resulted in the ANN being unable to independently represent each cross-section's feature map when compared to the MBANN fit, despite the same number of neurons available to each architecture.While a larger ANN would be required to mitigate this effect, the MBANN is able to leverage its efficient and intuitive architecture to represent the feature map for each target cross-section.
As demonstrated by both networks presented here, there remains much room for improvement in the regression of methane's electron-scattering cross-section set.While additional improvements of the network architecture may be available, such as those seen in the work of Jetly et.al. [22], the ill-posed nature of the inverse swarm problem places an inherent limit on the accuracy of methods that seek to learn the feature map between transport data and cross-section sets.In what follows, we aim to mitigate this restriction through a new procedure that uses a sequence of MBANNs to incrementally explore the solution space.

An iterative approach to neural network regression
The regression of numerous similar cross-sections poses a substantial challenge for solutions to the inverse swarm problem.As shown previously by Stokes et.al. [3,20,21], the predictive power of the neural network can be improved by restricting the training data to perturbations around a reference cross-section set.We extend this work and propose an iterative procedure in which a sequence of networks are trained using a weighted mixing of the previous solution with example cross-section data.As illustrated in Figure 3, the procedure consists of three phases; initialise, explore and refine.
In the intialise phase we follow the procedure outlined in Appendix B. In this phase, no prior information of the target cross-section set is given to the network other than energy loss thresholds and the number of processes present.The resulting best 100 regressions then form an array of current fits σ c .During the following two phases, we seek to improve the regression by generating stochastic perturbations around each current fit through a weighted mixing with example cross-section data.
To train each subsequent network, we use augmented LXCat cross-sections to generate perturbations around each current fit.First, σ s , is generated with the same method used in the initialisation phase.We then use σ s to generate a perturbation around the i th current fit σ c,i using a weighted sum in log space, where i is a uniformly distributed random number and r is a pseudo-random number sampled from a scaled Laplace distribution.The parameter r then defines how similar each training sample σ s is to σ c,i , where values close to 1 results in minor perturbation around σ c,i while values close to 0 results in major perturbations.Values of r greater than 1 can be used to produce accentuated perturbations around σ c,i to extend the solution space beyond the available data.The extent of these perturbations then define the network's ability to either explore the solution space or refine the existing solution.If the training data is restricted to minor perturbations, the solution may become trapped in a local minimum.Conversely, major perturbations may result in the network being unable to determine a sufficiently accurate cross-section set.The explore and refine phases of the procedure aim to strike a balance between these two regimes.In the explore phase, major perturbations of σ c are made to assist the network in traversing the solution space beyond the current fits while in the refine phase, minor perturbations are used to further refine σ c .Through a trial and error process, we found the following parameters to be suitable for each training phase.In the explore phase, we conduct two iterations while in the   [26,27,28].During training, the network's output is periodically sampled and their associated transport coefficients are verified against the target transport coefficients to determine the best 100 fits.In the explore and refine steps, augmented LXCat data are used to generate major and minor perturbations, respectively, of the previous best 100 fits before utilising the same training and verification procedure as the initialisation step.refine phase we conduct five to help ensure sufficient refinement of a particular solution is conducted after each exploratory phase.For high energy (> 10 eV) processes, such as electronic excitation and ionisation, we sample r from the domain [0.5, 0.8] for each iteration in the explore phase while during the refine phase we set r = 0.8.For low energy processes, such as vibration and elastic, r is sampled from [0.5, 1.5] in the explore phase and [0.8, 1.2] in the refine phase.In the case of low energy processes, r values greater than 1 are used to generate training examples that accentuate low energy cross-section features that are present in the sample data.
Direct parallels can be drawn between the iterative MBANN technique and the well known iterative swarm technique.In each, an informed supervisor guides the procedure towards both a physical and accurate solution to the inverse problem.In the iterative swarm technique, this is generally the role of an expert in the field who may make adjustments to the solution or procedure where necessary.In the iterative MBANN technique this role is, in the ideal case, automated by the neural network.Depending on the application, the guidance of an expert may still be required to choose suitable parameters and monitor its performance.
We demonstrate the proposed iterative procedure through a regression of methane's cross-section set presented in Section 2.1.While 32 neurons was chosen for the hidden layers in each parallel branch as the limiting case in the previous section, we increase this to 64 in what follows due to modest improvements in the validation accuracy.In Figure 4, we compare both the initial and the best regression found of methane's crosssection set during the procedure, along with their associated transport coefficients.The associated transport coefficients of the initial fit of methane's cross-section set results in substantial discrepancies to the original set.The initial MBANN regression resulted in a MARPD of 1.8, 5.1 and 27 % for W , n 0 D L and k eff , respectively.After 40 iterations, the procedure was then able to substantially improve upon the initial regression with the best iteration resulting in a MARPD of 0.48, 1.69, and 5.84 % for W , n 0 D L and k eff , respectively.
Crucially, we also find substantial improvements in the agreement between the total cross-sections for each collision type in the set and the target cross-section set.Provided as shaded regions in Figure 4, is the extent of the best 100 regressions found in the initial fit.Both the total excitation and the attachment cross-section of the best regression exist, in part, outside of the initial extent of σ c .The network was therefore able to effectively explore where necessary to improve the resulting fit of the target transport coefficients.
In its current form, the procedure assumes the prior knowledge of threshold energies.This assumption is particularly important when groupings of similar thresholds are present.If instead, effective excitation cross-sections are utilised in the network to represent groupings of similar threshold energies, this assumption could be avoided at the expense of physical threshold energies in the resulting cross-section set.
In this investigation, we utilise only calculated transport coefficients over a large range of reduced electric fields.In reality, such a range is not often available.While tailoring the energy domain of the cross-section regression to the transport data available will alleviate the limitation in part, this has limited returns.We thus encourage the measurement of swarm coefficients over a broad range of electric field domains where possible.Finally, while we have made a concerted effort to develop a robust iterative procedure, depending on the particular problem and the extent of available transport data, the network may still produce non-physical cross-sections or become trapped in a local minimum.The parameters utilised here should thus only serve as a guide for future applications to be modified as needed.

Conclusion
In this work, we demonstrate a new iterative procedure that uses a Multi-Branch Artificial Neural Network (MBANN) to solve the inverse swarm problem.Building upon the foundations outlined by Stokes et.al. [3,19,21,20] and Jetly [22], we address two key limitations of an ANN solution to the inverse swarm problem.
We first evaluate the use of a MBANN that includes an independent branch of dense layers for each output that each stem from common feature map of the input.We then compare the MBANN to an equivalent conventional ANN using Biagi's methane cross-section set [24] and demonstrate that the use of parallel layers can improve the resulting regression as the network is able to efficiently and independently represent multiple distinct cross-sections.
In addition, taking inspiration from the iterative swarm technique, we propose an iterative MBANN procedure to that incrementally explores the solution space to reduce the ill-posed nature of the problem.After an initial regression is found, we use a sequence of MBANNs that are each trained using perturbations around the previous regression.The iterative MBANN procedure then converges towards a particular solution of the inverse swarm problem.
To demonstrate the iterative MBANN procedure, we evaluate its performance using Biagi's methane cross-section set [24].In the 40 iterations that were conducted, the MARPD of the initial regression's resulting transport coefficients was substantially decreased from 1.8, 5.1 and 27 %, to 0.48, 1.69, and 5.84 % for W , n 0 D L and k eff , respectively.Additionally, the total cross-section for each collision type within the best set found exhibited good agreement with the original set, in contrast to the initial regression.
Overall, we have demonstrated an improved artificial neural network solution to the inverse swarm problem that utilises both an iterative procedure and parallel branches of densely connected layers that represent each cross-section.In conjunction, these additions improve the ability of the network to generate both self-consistent and physical cross-sections, particularly when large degeneracies may exist for a particular species.
In future work, we aim to apply this procedure to derive complete cross-section sets for complex targets while also investigating the use of convolutional architectures.decaying distribution may be sufficient for this purpose however.Ratios greater than 1 are used here to accentuate cross-section features found within the sample set to reduce the extend of outliers within the set.Note that for the elastic cross-section, we sample two cross-sections from the three groups so that each group is equally represented in the training set.
In addition, due to the limited nature of the available data, the solution may exist at the extremes of the available training data which can introduce unwanted bias in the data augmentation process.To alleviate this, Equation (A.1) is modified such that the energy domain and magnitude of each cross-section are multiplied by the scaling factors 10 a and 10 b respectively.Each factor is a pseudo-random number uniformly distributed within a defined range.Here, we set a ∈ [−0.5, 0.5] , b ∈ [−0.5, 0.5] for elastic cross-sections, a = 0, b ∈ [0, 2] for excitation cross-sections, a = 0, b ∈ [0, 1] for ionisation cross-sections and a ∈ [−1, 1] , b ∈ [−1, 1] for attachment cross-sections.Each was chosen to reasonably extend the extent of the available training data.Finally, we log transform then scale each cross-section between −1 and 1.If a cross-section magnitude is below δ = 10 −6 , it is replaced by 10 −7 before applying the transform.
For each generated cross-section set, the resulting transport coefficients are calculated through a multi-term Boltzmann equation solver according to the target experimental transport coefficients.If a particular set results in non-physical transport coefficients, the set is removed from the training set and a new set is found until the condition is satisfied.For this investigation, physical transport coefficients are defined as W > 0 and n 0 D L > 0, where the electric field is directed along the negative z-axis.In addition, we apply a logarithmic transformation to ensure that all inputs and outputs of the network are dimensionless and lie within [−1, 1], with special consideration given to k eff due to the presence of negative values.The vectors k + eff and k − eff are created to represent the positive and negative portions of the original input respectively.For k + eff , each negative value is set to a sufficiently small positive value while for k − eff , each positive value is set to a sufficiently small negative value before taking the absolute value of each.

Appendix B. Training procedure
Training of the network is conducted using a mini-batch of 32 cross-section sets evaluated at 128 energies for a particular iteration.Each weight and bias is then updated using the 'NAdam' optimiser [29] with a learning rate of 0.001 and 0.9 and 0.999 as the exponential decay for the first and the second momentum estimate respectively.For each batch, random noise is applied to each transport coefficient.Noise is sampled from a lognormal distribution with a standard deviation chosen to replicate experimental uncertainties typically observed for each transport coefficient.
The prediction of multiple cross-sections of the same collision type has previously been shown to be a highly degenerate problem, particularly in the case of excitation collisions [3].To emphasise the importance of the total cross-section, we use a loss function that includes a penalty for the total cross-section for each process type, in addition to each individual cross-section.The loss function is defined as follows, where σ i is the set of M i cross-sections for the collision type i while I is the number of collision types present.Note that the calculation of the total cross-sections is conducted in linear space before scaling is re-applied.
To validate the network, we set aside 5 % of the available cross-sections for each collision type to form the basis of our validation set.In the case of the elastic crosssections, 5 % of each of the feature groups is selected.The chosen validation crosssections then undergo the same data augmentation as the training data to produce 100 samples.Every 100 iterations we test the network on the validation set and calculate the MARPD over the energy domain, MARPD = 1 where σ t,i is the total cross-section for the collision type i.During training, the validation loss is used to both compare network parameters and aid in preventing overfitting.
In addition, every 10 training iterations we store the neural network's prediction of the target transport data and compare the resulting transport coefficients.Due to the presence of multiple distinct fit parameters, we repeatedly select one random parameter and then remove the worst predicted cross-section set until only one remains.This process is then replicated 1000 times before the cross-section set which appears most frequently is chosen as the best overall fit.As desired, the best overall fit can be removed, and the processes repeated to find the next best fit.
To select each hyperparameter, a simple tuning process was conducted that compared the validation loss between reasonable values for batch size (32, 64, 128), number of hidden layers (2,3,4,5), hidden layer size (32, 64, 128, 256) and the optimiser used (NAdam, Adam).For simplicity, this processes was conduced with a conventional ANN in which each layer contained an equal number of neurons and the output consisted of an elastic and ionisation cross-section along with three electronic excitation cross-sections.The tuning processes indicated that 4-5 hidden layers with 128-256 neurons resulted in the smallest validation error.Due to computational limitations, a more extensive tuning that includes branching hidden layers and a greater variation of parameters was not conducted.

Figure 1 :
Figure 1: Diagram of the multi-branch artificial neural network used for the regression of cross-sections (green) as a function of energy (orange), given an associated set of transport coefficients (blue).The first two hidden layers contain 128 neurons while each hidden layer within each parallel branch contains 32 (shown not to scale).Each output layer then contains 1 neuron and are concatenated to form an array of N elements to match the number of output cross-sections.

Figure 2 :
Figure 2: Comparison of a conventional (a) and an equivalent multi-branch artificial neural network (c) applied to the regression of Biagi's methane cross-section set [24].Shown as shaded regions, are the extend of the best 100 regressions of each network.While only the total excitation cross-section is shown here for simplicity, each of the six excitation processes present in the original set are included in the regression.The bulk drift velocity W , bulk reduced longitudinal diffusion (n 0 D L ) and the effective ionisation rate (k eff ) for each network is shown in (b) and (d).All transport coefficients displayed here are calculated with a multi-term Boltzmann equation solver.Below each figure, is the Absolute Relative Percentage Difference (ARPD) across the energy domain between each regression and the original set.

Figure 3 :
Figure 3: Illustrative diagram of iterative neural network procedure (top) along with a flow chart outlining each step (bottom).The procedure consists of 3 phases: initialise, explore and refine.The initialise step follows a similar methodology outlined by Stokes et.al. [3].First, training data is generated through augmentations of existing LXCat cross-sections[26,27,28].During training, the network's output is periodically sampled and their associated transport coefficients are verified against the target transport coefficients to determine the best 100 fits.In the explore and refine steps, augmented LXCat data are used to generate major and minor perturbations, respectively, of the previous best 100 fits before utilising the same training and verification procedure as the initialisation step.

Figure 4 :
Figure4: MBANN regression of Biagi's methane cross-section set[24] using an iterative procedure.(a) compares the initial regression and the best regression found to the original cross-section set.Shown as shaded regions, are the extent of the best 100 regressions for the initial regression.While only the total excitation cross-section is shown here for simplicity, each of the six excitation processes present in the original set are included in the regression.The bulk drift velocity W , bulk reduced longitudinal diffusion (n 0 D L ) and the effective ionisation rate (k eff ) for each regression is shown in (b).All transport coefficients displayed here are calculated with a multi-term Boltzmann equation solver.Below each figure, is the Absolute Relative Percentage Difference (ARPD) across the energy domain between each regression and the original set.