Paper The following article is Open access

Extracting electron scattering cross sections from swarm data using deep neural networks

and

Published 14 June 2021 © 2021 The Author(s). Published by IOP Publishing Ltd
, , Citation Vishrut Jetly and Bhaskar Chaudhury 2021 Mach. Learn.: Sci. Technol. 2 035025 DOI 10.1088/2632-2153/abf15a

2632-2153/2/3/035025

Abstract

Electron-neutral scattering cross sections are fundamental quantities in simulations of low temperature plasmas used for many technological applications today. From these microscopic cross sections, several macro-scale quantities (called 'swarm' parameters) can be calculated. However, measurements as well as theoretical calculations of cross sections are challenging. Since the 1960s, researchers have attempted to solve the inverse swarm problem of obtaining cross sections from swarm data; but the solutions are not necessarily unique. To address these issues, we examine the use of deep learning models which are trained using the previous determinations of elastic momentum transfer, ionization and excitation cross sections for different gases available on the LXCat website and their corresponding swarm parameters calculated using the BOLSIG+ solver for the numerical solution of the Boltzmann equation for electrons in weakly ionized gases. We implement artificial neural network (ANN), convolutional neural network (CNN) and densely connected convolutional network (DenseNet) for this investigation. To the best of our knowledge, there is no study exploring the use of CNN and DenseNet for the inverse swarm problem. We test the validity of predictions by all these trained networks for a broad range of gas species and we deduce that DenseNet effectively extracts both long and short term features from the swarm data and hence, it predicts cross sections with significantly higher accuracy compared to ANN. Further, we apply Monte Carlo dropout as Bayesian approximation to estimate the probability distribution of the cross sections to determine all plausible solutions of this inverse problem.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Plasma science has an admirable track record as an enabling technology that underpin our modern society and has the potential to make wide-ranging contributions to address many societal challenges [1, 2]. Technologies based on low-temperature plasmas (LTPs) are ubiquitous in today's society. These include mature technologies such as fluorescent lamps and gas lasers, for example, as well as other more 'modern' technologies in use but still being developed, such as plasma reactors for processing of semiconductors, for fabrication of microelectronics to name a few [3]. Today, there are extensive research activities and rapidly emerging applications of LTPs in medicine and in agriculture [4, 5]. LTPs are generated most simply by applying a sufficiently high voltage across a gas gap separated by two electrodes [6]. The properties of the plasmas so generated vary considerably with the experimental parameters—gas pressure and composition, geometrical configuration, means of applying an electromagnetic field (e.g. application of a DC, AC and/or rf, pulsed or steady-state voltage across the electrodes, injection of microwave) and the specificities of the external circuit. For purposes of discussion here, we will consider only an applied electric field. LTPs consist of electrons and ions flowing through a neutral background gas in response to the applied electric field, and, for many applications, the number density of the neutral molecules exceeds that of the charged particles by many orders of magnitude. Our knowledge of the electron and ion interactions with atomic and molecular species within the plasma, and evaluation of cross-sections and reaction rates for such collisions has played an important role in the exploitation of plasmas in several applications [3].

Being much lighter than ions, electrons are more easily accelerated in the electric field that sustains the plasma, and hence the electrons are the vector through which electrical energy is transferred to the neutrals through collisions. For a wide range of conditions in LTPs, electrons collide predominately with background neutral gas molecules in their ground state. In these conditions, the electron energy distribution function is generally non-Maxwellian and the electron 'temperature' is much higher than the temperature of the ions or that of the neutrals. Because of the huge range of realizable conditions, optimization of a LTP plasma for a particular application necessarily requires a combination of experiment and modeling. The data required for modeling LTPs depend on the level of description but are in all cases are extensive [7, 8]. In fluid models, where the electrons and the ions are treated as separate fluids because of their widely disparate temperatures are coupled to Maxwell's equations for the EM. In their simplest form, fluid models require electron and ion mobilities, diffusion coefficients, and electron ionization/attachment rate coefficients. The product of the mobility and neutral density, µN; the product of the diffusion coefficient and neutral density, DN; and rate coefficients are dependent on E/N, the ratio of the electric field strength to the neutral density in the limit of a constant (in time and space) electric field. These transport and rate coefficients as functions of E/N are commonly called 'swarm' parameters in analogy with a drifting spreading swarm of bees where the average kinetic energy is much higher than the directed or drift energy. On the other hand, the more detailed kinetic models (such as particle-in-cell simulations with Monte Carlo collisions) require electron-neutral and ion-neutral cross sections vs. energy for each possible outcome of the collision, whether it be elastic, excitation, or ionization. Of course, there are many different possible excitation channels and a cross section for each as a function of energy is required in general. The future developments in the LTP areas will be based upon our ability in the manipulation of the plasma properties which requires a thorough understanding of plasma chemistry, and availability of accurate cross section data and swarm parameters [9, 10]. Swarm parameters can be measured fairly easily and with very high accuracy (0.5% for the drift velocity, for example), and since the works of Ramsauer, Mayer, Townsend and Bailey in the early 1920s researchers have aimed to extract information about microscopic cross sections from measurements of macroscopic swarm parameters. Cross sections, on the other hand, are much more difficult to measure and highly accurate quantum mechanical calculations for simple atomic target species are just now becoming available. However, despite their significance, many plasma processes are not well understood because of the lack of availability of the required cross sections and its absence is a major impediment for experimentalists as well as computational investigators.

The complete sets of cross sections include a momentum transfer cross section (MTCS) for elastic scattering, and cross sections for excitation and ionization processes for a given target species. A partial set includes a subset of the important scattering processes for that species. Complete sets of cross sections are needed as input to a Boltzmann equation solver to determine the electron or ion energy distribution function. Therefore, complete sets of cross section data play an important role in designing new experiments as well as simulations. A typical inverse swarm problem consists of deriving cross sections from swarm data as shown in figure 1. The advantage of swarm-derived cross sections lies in the fact that it contain all processes either explicitly, as individual cross sections, or implicitly within other cross sections [11]. Obtaining cross sections from swarm parameters was pioneered by Townsend and Ramsauer in the 1920s. The method used in those early analyses involved inverting the integral relating the drift velocity and MTCS using a simplified expression of the electron energy distribution. This approach was significantly improved in the 1960s by Phelps along with other collaborators, employing iterative methods to solve the Boltzmann's equation to obtain an accurate energy distribution of the electrons [1215]. This allowed accurate computation of the momentum transfer and lower energy inelastic cross sections from the available swarm data. The iterative process of inverting the swarm data to obtain cross sections involves solving the Boltzmann's equation, calculating the electron energy distribution and altering the model cross sections till a satisfactory match is found between the original and computed swarm parameters, making it a computationally expensive problem to solve. To address this issue, [16, 17] used numerical optimization algorithms to help speed up the process of obtaining cross sections from swarm data. But, the inverse swarm problem is ill-posed, especially when there is a lack of available swarm data. Therefore, these optimization algorithms would often get stuck in a local minima due to the non-uniqueness of the inverse swarm mapping.

Figure 1.

Figure 1. Forward problem of mapping from a set of cross sections to a set of corresponding swarm data is kind of well-posed and can be solved numerically by solving Boltzmann's equation. On the other hand, the inverse problem is ill-posed and this inverse mapping function does not exist.

Standard image High-resolution image

Neural networks have been successfully used to learn the non-linear mappings between two sets of data, and once the network has been trained, it can give the outputs in roughly $\mathcal{O}(1)$. Also, it is relatively easier to avoid local minima during optimization using neural networks. Hoping to utilize these advantages, W L Morgan investigated the feasibility to use neural networks to solve the inverse swarm problem [18] and concluded that neural network were indeed useful to determine the cross sections from electron swarm data but could not achieve high accuracy levels due to the lack of quality cross section data available, along with various limitations of the commercially available neural net simulator of those times. Artificial neural networks (ANNs) have been also used to successfully predict the proton impact single ionization double differential cross sections of atoms and molecules [19].

Since Morgan's findings in 1991, there has been an increase in the amount of available cross sections and swarm data (LXCat [20]). Recently, study carried out by Stokes et al [21] verified Morgan's claims and their work [22] demonstrated their automatic solution using ANN had an accuracy comparable to that of a human expert in determining cross sections of the biomolecule tetrahydrofuran (THF) using experimentally measured swarm data. The obtained THF cross sections can be considered as self-consistent because it accurately reproduced many of the swarm measurements that were used to derive the cross sections [22]. In [21], they also showed that use of large amount of synthetic training data generated using the real cross sections available from LXCat indeed gave good results when used to predict elastic momentum transfer and ionization cross sections of helium and argon. However, the same needs to be verified for a number of different gas species to safely conclude the feasibility of this machine learning (ML) approach to solve the inverse swarm problem. Moreover, their study was limited only to the use of ANN, which had minor improvements over the architecture proposed by Morgan to increase the parameter efficiency and training speed of the model. Additionally in the last decade, there has been drastic increase in computing power along with vast improvements of ML algorithms, allowing creation of large and powerful neural networks. There are numerous applications in computer vision and image processing where other neural network types, such as convolutional neural network (CNN), outperform ANN's predictions because of its ability to extract spatial information. In this problem too, the swarm data which is to be used as an input to the neural network is in the form of continuous series and thus, it becomes imperative to study performance of the CNN architectures in solving the inverse swarm problem. Additionally, since this inverse problem in itself is ill-posed in nature, it is more reasonable to find the entire distribution from which the plausible solutions can be sampled.

Existing literature suggests that one of the major challenges which modelers often face is associated with the inconsistency between the available cross sections and swarm measurements. This arises because many times researchers need to acquire cross section data from disparate sources due to unavailability of self consistent data. This also leads to guessing cross sections values using intuition and experience. Therefore, it is important to explore and establish ML based alternative approaches to obtain high quality cross section data, for the most important collision processes, that is consistent with swarm measurements. One of the major advantages of this data driven approach is that it provides an effective way to evaluate the accuracy and self-consistency of cross sections. Thus, in this study, we explore the suitability of deep neural networks to identify the inverse relationship for a wide range of gas species and assess the efficacy of different neural network architectures in predicting scattering cross sections from simulated swarm data. Furthermore, we perform uncertainty quantification (UQ) to estimate the distribution of all the plausible solutions of the inverse problem. To this end, study exploring the use of CNN and densely connected convolutional network (DenseNet) for this inverse swarm problem has not yet been reported. In section 2, we describe our complete data-driven methodology starting from data preparation to the implementation of two new neural networks (CNN and DenseNet based) for the solution of this inverse swarm problem. In section 3, we present a detailed comparison of the performance of three neural networks (ANN, CNN and DenseNet) in determining the cross sections of seven gas species. Reliability of the predictions has been also evaluated by using an UQ method in section 3.1. Finally, we conclude the paper with a summary of our results and brief discussion on how accuracy of this data driven approach can be further improved.

2. Methodology

Our data driven methodology for determining a set of cross sections consistent with swarm parameters involves several steps such as data collection and profiling, feature engineering, building the suitable ML models followed by training and evaluation. Figure 2 describes the complete workflow used in this study for solving the inverse swarm problem. Firstly, complete sets of cross sections for different gas species are obtained from the LXCat [20] database, however since this data is limited, we generate abundant synthetic cross sections data. Secondly, using the cross section data, we compute the corresponding swarm coefficients using the freeware Boltzmann equation solver BOLSIG+ [23]. Thirdly, we perform a feature selection followed by data normalization. Finally, different neural network models are designed and are trained using the combination of cross section and swarm data. The predicted results are compared to the cross sections obtained from LXCat. We then also estimate the complete distribution of the plausible cross sections by quantifying the uncertainty in the solution using Monte Carlo Dropout [24]. In the following subsections we provide a detailed description of each of the above mentioned steps.

Figure 2.

Figure 2. Complete workflow used in this study for solving the inverse swarm problem.

Standard image High-resolution image

2.1. Dataset

Efficient training of the neural network to identify an inverse non-linear relationship between swarm data and cross sections requires abundant training data. Morgan generated these training cross sections using a power-law model of the form, $\sigma(\epsilon) = \epsilon_0 / \epsilon^{p}$, where $\epsilon$ and p are randomly chosen from $(10^{-17},\; 10^{-14})$ and (0,   1) respectively [18]. This parameterized method allows to generate an infinite number of training examples and thus can be considered ideal for ML problems. However, this parameterized equation represents only a small subset of physically plausible cross sections. To expose our deep learning models to more realistic data, we use cross sections data for gas species compiled on the LXCat website, shown in figure 3. The cross sections include the energy-dependent MTCS for elastic scattering, and total (angle integrated) cross sections for excitation and ionization processes for a given target species. In general, the probability of a collision of a particular type occurring depends on the relative velocity of the collision partners and the scattering angle. However, it has been shown that the additional detail regarding angular scattering has very little effect on the calculated swarm parameters. Note that there are many different excitation processes with different energy thresholds, and predicting all (or even the most important) of them is a challenging task. In this work, and for the sake of demonstrating the features of different ML algorithms, we consider only one excitation cross section; that is, the training data consider only the lowest excitation process from the compilations available on LXCat. The Boltzmann equation is solved using only these three input cross sections, and the swarm parameters so calculated are not to be compared with those tabulated from experiments on the LXCat website, which are generally well predicted when a complete set of cross sections is used. This procedure considerably simplifies the computational requirements and is expected to correctly demonstrate the capabilities of each of the ML algorithms studied here.

Figure 3.

Figure 3. Complete cross section data of various gas species obtained from LXCat. These consists of Ar, Kr, SF6, Xe, CO2, D2, H2, He, Ne, O2 and N2 sourced from [25]; Si(CH3)4, CF4 and CHF3 sourced from [26]; C, Be, F, C(2p(2)_1S), C(2p(2)_1D), Be(2 s_2p_1P), N, C(2p3s_1Po) and C(2p3s_3Po) sourced from [27]; H(1S), H(2P), H(2S), H(3D), H(3P), H(3S), H(4D), H(4F), H(4P) and H(4S) are sourced from [28]; CH4, H2O, HCl, SiH4, C2H2, and C2H4 are sourced from [29]; O, N-elec, CO and H are sourced from [30]; C3H6 is sourced from [31]; Cu is sourced from [32]; O2(0.98) is sourced from [33].

Standard image High-resolution image

2.1.1. Extrapolation of inelastic cross sections

In this work, we aim to predict elastic momentum transfer, ionization and excitation cross section for the energies in range (10−1 eV, 102 eV), (100 eV, 104 eV), (10−1 eV, 103 eV) respectively, and as evident from figure 3, inelastic cross sections of many gas species in LXCat databases are not available for the entire energy domain under consideration. Thus, we use an analytical expression to extrapolate these cross sections to higher energies. For the ionization cross sections, we use the parameterization (equation (1)) proposed by Rost and Pattard [34]:

Equation (1)

where E is the excess energy of the system measured from the ionization threshold, EM corresponds to the energy where the cross section attains a maximum value, k and α are the parameters which are computed to obtain the best fit.

Various approximation from quantum mechanics could be used to extrapolate excitation cross sections to higher energies. However, we have chosen to simply use a power-law relationship, equation (2), to extrapolate the data:

Equation (2)

2.1.2. Synthetic data generation for training

Deep neural networks require large training datasets for effective performance. The cross section data obtained from LXCat however, are very limited (complete cross sections of only 46 different gas species) and clearly insufficient to properly train the model. Therefore, we generate synthetic training examples by interpolating the actual cross sections. Firstly, all the 46 gas species for which complete sets of data exist on LXCat are manually classified into three different groups based on the visual inspection of their elastic MTCS's characteristics as shown in figure 4. Group-1, Group-2 and Group-3 consists of 12, 18 and 16 different species respectively. To avoid generation of nonphysical cross sections, a new artificial cross section is calculated by taking a weighted geometric average [21] of two actual cross sections belonging to the same group using equation (equation (3)):

Equation (3)

where $\sigma_1(\epsilon)$ and $\sigma_2(\epsilon)$ are the cross sections of gas species belonging to the same group, $\epsilon_1$, $\epsilon_2$ and $\epsilon_1^{1-r}\epsilon_2^r$ are the threshold energies of $\sigma_1(\epsilon)$, $\sigma_2(\epsilon)$ and $\sigma_{\textrm{new}}(\epsilon) $ respectively, and 0 ≤ r ≤ 1 is a uniformly distributed random variable. This formula generates a physically-plausible cross section and retains the correlation between the magnitude of a cross section and its threshold energy [21].

Figure 4.

Figure 4. Gas species separated into three different classes based on the characteristics of their elastic MTCS Group 1 consists of Ar [25], Kr [25], SF6 [25], Xe [25], CO2 [25], Si(CH3)4 [26], CF4 [26], CH4 [29], H2O [29], HCl [29], SiH4 [29] and Cu [32], Group 2 consists of D2 [25], H2 [25], He [25], C [27], Be [27], C2H2 [29], Ne [25], O2 [25], N2 [25], F [27], C(2p(2)_1D) [27], C(2p(2)_1S) [27], C2H4 [29], O [30], N-elec [30], CO [30], C3H6 [31] and O2(0.98) [33], and Group 3 consists of CHF3 [26], Be(2 s_2p_1P) [27], N [27], C(2p3s_1Po) [27], C(2p3s_3Po) [27], H(1S) [28], H(2P) [28], H(2S) [28], H(3D) [28], H(3P) [28], H(3S) [28], H(4D) [28], H(4F) [28], H(4P) [28], H(4S) [28] and H [30].

Standard image High-resolution image

Out of the 46 gas species available, we set apart one gas species so that we can later on use our deep learning model to predict its cross sections and compare that with the actual cross section from LXCat to determine the accuracy of our model. Then for our training data, we use equation (3) to generate a total of 10 000 different cross sections (figure 5) including the actual complete cross section of 45 different gas species. Thus, only the cross sections of the gas species for which the model is to be tested is excluded, while all the other gas species contribute equally in generating these synthetic training cross sections. Through a simple visual comparison between figures 3 and 5, one can verify that the general trends of synthetically generated cross sections are indeed similar to those of the actual cross sections. Subsequently, these cross sections are sampled at 100 discrete log-spaced energy values, between the energy range considered for prediction. So, we have a total of 106 energy–cross section pairs in our training dataset.

Figure 5.

Figure 5. Synthetically generated cross sections data.

Standard image High-resolution image

2.1.3. Swarm data calculation and feature selection

Finally, we complete the input–output training pairs by computing the swarm coefficients corresponding to the cross sections present in our training dataset. Swarm data are computed using the BOLSIG+ [23] solver for the numerical solution of the Boltzmann's equation [35], with the cross sections data as input. Swarm data are calculated at temperature T = 300 K for 100 equally log-spaced reduced electric fields in the range 10−3 Td to 103 Td (1 Td = 10−21 Vm2). Note that BOLSIG+ can extrapolate cross sections to higher energies if needed for very high E/N.

Mean energy, mobility, diffusion, energy mobility, energy diffusion, total collision frequency, momentum frequency, total ionization frequency, Townsend ionization coefficient, power, elastic power loss, inelastic power loss, growth power, maximum energy and drift velocity are 15 different quantities which are included in the output of the BOLSIG+ solver. However, unlike BOLSIG+, in most Boltzmann solvers, the max energy is input. Using all of these quantities as input to the neural network is not feasible as it would increase both the training time of the model and its memory requirements. It might even reduce the overall effectiveness of the model and hence, we use feature selection to reduce the input data by removing redundant variables. We compute the Pearson correlation coefficient between all these possible inputs as depicted in figure 6. Pearson correlation coefficient shown in figure 6 is average of all the gas species except for helium (assuming He is the test species on which the model will be tested). Features with high correlation value (${\gt}0.85$ or ${\lt}{-0.85}$) are more linearly dependent and hence have almost the same information content, thus we keep one and drop rest of these highly correlated variables.

Figure 6.

Figure 6. Pearson correlation coefficient of different swarm parameters obtained from BOLSIG+.

Standard image High-resolution image

Using this feature selection method, we are left with mean energy, mobility, diffusion, Townsend ionization, elastic power loss and inelastic power loss. However, mean energy, elastic power loss and inelastic power loss are the swarm coefficients whose data is not readily available on LXCat because these are generally not experimentally measured and thus we drop them from our feature set because in the long-term, we would like to apply our methodology to analysis of experimental swarm data. However, as a future study, it will be interesting to see how the results will get affected if we include these three parameters in the data.

2.2. ML methods and model training

2.2.1. Data normalization

Cross sections along with swarm data scale across many orders of magnitude. Directly using this data to train the neural network will severely impede neural network's ability to learn meaningful trends in the data. Also, large input values would result in large weight values of the neural networks, making them highly unstable. Small input values having zero mean and standard deviation of one are generally considered as ideal for neural networks, and thus we log transform everything (equation (4)) and then subsequently normalize it to [−1, 1] range (equation (5)):

Equation (4)

Equation (5)

If the data value is zero, then it is replaced by sufficiently small positive quantity (δ = 10−50) before applying the log transformation.

2.2.2. Neural network architecture

Input to our network consists of different swarm parameters—mobility (µN), diffusion coefficient (ND) and Townsend ionization coefficient (α/N)—measured at 100 distinct reduced electric fields E1/N, E2/N, ..., where N (or n0) is the number density of the background neutrals. We use neural network itself to estimate the cross sections as function of energy. Thus, energy $\epsilon$ is also added to the input to the neural network and output is the single value of cross section corresponding to that energy, $\sigma(\epsilon)$

Equation (6)

Neural networks are composed of several artificial neurons. The structure of these neurons and its connections play an important role in inferring the function which maps the input to the output. Hence, we test different neural networks to study these performance changes. ANN is the most basic form of neural network and its use to solve the inverse swarm problem has been proposed in [18]. Minor improvements were made to this architecture by [21], which made the network simpler and faster to train. This ANN architecture has three hidden layers, each having 128 neurons, with swish as non-linear activation function, where ${\textrm{swish}}(x) = x/(1+\exp{(-x)})$. We consider this as our benchmark architecture.

For our other model, we implement a 1D CNN [36] because of its ability to extract spatial information from the input data, which is in the form of continuous series. Various CNN architectures were trained to determine optimal hyper-parameters and figure 7(a) shows the one for which the best results were obtained. Features from the different swarm coefficients are extracted by three successive blocks, each consisting of batch normalization (BN) layer, convolutional layer with 64 filters and kernel size of 5 × 1, and a swish activation layer. This is followed by an average pooling layer, which is then flattened and passed to two fully connected layers along with the energy input. FC layers have 256 and 64 neurons with swish activation function. Finally, it is connected to the linearly activated single output neuron.

Figure 7.

Figure 7. Neural network layouts used in this study. Various CNN and DenseNet architectures with different hyperparameters like number of convolutional filters, kernel size, number of hidden layers and the choice of activation function were implemented, and the layout having best performance has been finally chosen.

Standard image High-resolution image

DenseNet is extension of CNN, which provides substantial performance improvement with comparison to previous CNN architectures [37] and hence, we try 1D-DenseNet architecture as our third model. DenseNet improves the information flow between the layers by introducing direct connections from any layer to all subsequent layers. This also leads to feature reuse throughout the network and hence, it requires fewer parameters than a CNN architecture to achieve similar performance (Parameter efficiency). The concatenation of feature maps of all preceding layers, $x_0, x_1, \ldots, x_{l-1} $ are provided as input to the lth layer:

Equation (7)

The composite block Hl consists of three successive layers: BN, followed by swish activation and a convolution layer.

We trained and tested different DenseNet layouts having 3–6 such composite blocks, where the number of convolution filters in each block were kept constant (32 filters) and zero padding was applied to each end of the input so as to keep the feature map's size fixed. Highest accuracy levels were achieved for DenseNet having five composite layers Hl (figure 7(b)). We use longer convolution kernel to begin with, and gradually decrease its size in the subsequent layers. Concatenation of all these different length features allows the network to learn short-term as well as long-term trends from the swarm data but on the downside, these accumulated features substantially increases the model size. Hence, we use a 1 × 1 convolution followed by average pooling, to reduce the dimension of feature map and avoid overfitting before passing it, along with energy input, to two fully connected layers having size 128 and 64, with swish activation.

The output layer in each of the architectures is a single neuron corresponding to the cross section being predicted. Hence, we need to train three separate models to predict elastic momentum transfer, ionization and excitation cross sections, for each architecture discussed above. This also allows the network to have feature maps pertinent to each cross section type. This separate training can be eliminated if the output layer is increased to three neurons, one for each cross section type. However, this would severely inhibit the network's capability as it would force the network to work with same feature maps even for different types of cross section. Thus, we avoid simultaneous prediction of different cross section types.

Purely from the ML perspective, neural networks are trained to improve their predictions by heavily penalizing large errors. Although this seems logical, Stokes et al [21] found that using L2 loss actually provided worse results than L1 loss for the inverse swarm problem due to the inherent uncertainty in the solution of this inverse problem and consistently trying to fit these uncertain cross sections impeded the model's overall performance. Hence, we also choose mean absolute error (L1-norm) as it is less sensitive to large errors, but make a slight modification to improve model's performance. As discussed earlier, zero-valued cross sections are replaced with a small threshold value δ = 10−50 before performing data normalization. This is just an approximate value of δ and clearly it would be wrong to penalize the network if the predicted value lies in the range [0, δ]. Thus, we use a custom L1 loss function

Equation (8)

where N is the number of training examples, yi is the model's prediction, $\hat{y_i}$ is the target value and Δ is log normalized value of δ calculated using equations (4) and (5). This loss function clips the predicted output if it is less than Δ, allowing the network's final prediction to be less than δ without any penalty. This slight modification significantly improves the prediction results of ionization and excitation cross sections.

The training dataset is divided into batches containing 103 samples and all the models are trained by minimizing equation (8) using Adam optimizer [38] with learning rate of 10−4 and exponential decay rates of the first moment (β1) and second moment (β2) as 0.9 and 0.999 respectively. The models were implemented using Keras (2.3.0) [39] with Tensorflow (2.2.0) backend [40] having GPU support.

2.2.3. Determining training duration

During the iterative training of our neural networks, its error on the training set continuously decreases. However, the same does not apply on its generalization error (errors on unseen data), which actually begins to increase after a point in training (overfitting) and hence, ideally we must stop training our network when the generalization error is the least. Since it is not possible to calculate the generalization error explicitly, we try to roughly determine it using k-fold cross-validation as dividing our data simply into training and validation dataset is not feasible due to the insufficient availability of the actual cross sections. Each of the three groups of gas species divided earlier (figure 4) are subdivided in two separate parts (randomly) to form a total of six parts. Of these six parts, one part is kept as validation data and the remaining five parts will be present in training data set. The synthetic cross sections, which are generated using equation (3) from two cross sections σ1 and σ2 will be split as per the following criteria—if both σ1 and σ2 belong to the newly formed training dataset, then this artificial cross section too will be added to the training dataset whereas if both σ1 and σ2 belong to the newly formed validation dataset, then it will be added to the validation dataset. Then, we train the networks on this newly formed training dataset and monitor the changes in validation error at each epoch. This process is repeated six times, with each of the six parts used exactly once in the validation data. This ensures that no data is wasted and our models get the opportunity to train on multiple train-validate splits. All the six validation errors are averaged at each epoch and this averaged validation error can be considered as a close substitute for the generalization error. Thus, we determine the optimal number of epochs when this averaged validation error reaches a minimum value. Later while testing our models, we will train them again on all the 106 examples (no division into training-validation dataset) for this optimal number of epochs.

3. Results

All the architectures—ANN, CNN and DenseNet—were trained to separately predict elastic momentum transfer, ionization and excitation cross sections using a total of 106 examples which were generated using the process described earlier. These trained models were used to predict the unseen cross sections of nitrogen (N2), argon (Ar), helium (He), fluorine (F), methane (CH4), oxygen (O2) and sulphur hexafluoride (SF6). Here, we would like to explicitly state that even though only one cross section type was predicted at a time, no assumption was made whatsoever regarding the values of other cross section types while predicting a particular cross section—i.e. while estimating the elastic MTCS of a gas species, we do not provide any details about the values of its ionization or excitation cross sections. We test the trained models for such a wide range of gas species, having different physical and chemical properties, to ensure robust performance.

Figures 810 shows the comparison between different architecture's estimate of elastic momentum transfer, ionization and excitation cross sections respectively, with the cross sections available on LXCat. There are multiple databases on LXCat which contain the cross section sets of the same gas species, but still have slight variations among them. So to access the accuracy of our model, we select cross sections from only one database for a given gas species. For gas species N2, Ar, He, O2 and SF6, cross sections (Actual CS depicted in the figures 810 with black line) are sourced from the Biagi database [25], while those of F and CH4 are taken from BSR [27] and Hayashi [29] database respectively. These cross sections were used to generate the simulated swarm data of these gas species using BOLSIG+, which were then used as the input to our trained model to predict the cross sections. Cross sections, sourced from other databases available on the LXCat, are plotted on the same graph (labeled as Other CS data available on LXCat) just to give an estimate of the inherent variations in the values of cross sections available in the literature from past research works. Figure 11 shows the similar comparison for the total cross sections, which is calculated by summing elastic momentum transfer, ionization and excitation cross sections. The predicted cross sections are again used to calculate the corresponding swarm coefficients using the BOLSIG+ solver and its comparison with the swarm coefficients calculated using the actual cross sections is shown in figure 12.

Figure 8.

Figure 8. Prediction of elastic MTCS of various gas species. Actual CS represents the cross section which was used to generate the swarm data required as the input to the trained models. It is to be noted that in some cases 'Other CS data available on LXCat' (shown in gray color) consists of both elastic momentum transfer and total elastic scattering cross sections. The gray lines simply provides some estimation of the inherent variations in determination of cross sections already available in the literature.

Standard image High-resolution image
Figure 9.

Figure 9. Prediction of ionization cross sections of various gas species. Actual CS represents the cross section which was used to generate the swarm data required as the input to the trained models. It is to be noted that in some cases 'Other CS data available on LXCat' (shown in gray color) consists of both individual ionization processes as well as sums of all ionization processes. The gray lines simply provides some estimation of the inherent variations in determination of cross sections already available in the literature.

Standard image High-resolution image
Figure 10.

Figure 10. Prediction of excitation cross sections of various gas species. Actual CS represents the cross section which was used to generate the swarm data required as the input to the trained models.

Standard image High-resolution image
Figure 11.

Figure 11. Predicted total cross sections of various gas species.

Standard image High-resolution image
Figure 12.

Figure 12. Comparison of swarm parameters reconstructed using—actual cross sections available on LXCat vs. DenseNet's predicted cross sections.

Standard image High-resolution image

As evident from figures 8 and 9, prediction of both elastic momentum transfer and ionization cross sections, from all the three neural network architectures, agrees reasonably well over the entire energy range with the experimentally measured cross sections obtained from LXCat. Further, to quantitatively compare the performance of different architectures, we use three different metrics: mean absolute error (log-normalized scale), coefficient of determination (R2) and mean absolute relative percentage difference (MARPD):

Equation (9)

where N is the number of data points, yi is predicted value and $\hat{y_i}$ is the true value. Mean absolute error on log-normalized scale depicts the error as seen by the model (test loss). Coefficient of determination (R2) quantifies the degree of correlation between the actual and the predicted values. Its value lies between $(-\infty,\,1]$, with 1 representing complete dependency between the quantities being compared. MARPD provides a standardized error value, which is not only comparable but also more interpretable even to those unfamiliar with the measurement scale of electron cross sections. These three metrics collectively provide a better understanding of network's performance compared to what a single metric alone provides.

From the performance metrics (shown in table 1), we can safely conclude that the DenseNet architecture performs significantly better compared to CNN, which in turn yield better results than ANN architecture for predicting the elastic momentum cross sections over the entire energy domain considered, of all the gas species considered in our study. A common trend across all the gas species in prediction of elastic MTCS is that all the three architectures predict the cross section with significantly higher accuracy for the range 30–100 eV. To further comment on the accuracy of the architectures, we analyzed the prediction trends of individual gas species in detail. Nitrogen's elastic MTCS has a characteristic peak between 2 and 2.5 eV, which is not present in any of the other gas species used in the training data and thus, both ANN and CNN fail to predict this peak. This is due to a quantum mechanical effect specific to N2 in this energy range and it may be difficult to teach the network about the same. DenseNet, on the other hand, does notably better in predicting the presence of this peak, yet, its estimate of the energy at which it occurs is off by sim0.5 eV. Likewise, argon has Ramsauer–Townsend minimum whose value is significantly lower than all other gas species considered in the training data (another quantum mechanical effect). Still, DenseNet is able to predict the presence of Ramsauer–Townsend minimum at the correct energy value, only erring slightly in determining its magnitude, whereas both CNN and ANN fail to even determine the presence of this minimum. Such trends are observed in prediction of elastic MTCS of all other gas species too, wherein DenseNet is able to determine the characteristic local maximum/minimum values and its locations with remarkably higher accuracy compared to ANN or CNN architecture. The size of the convolution kernel basically determines the receptive field of the network. Our use of convolution kernels of varying sizes allowed the DenseNet architecture to have small as well as large receptive field and this gives the network the capability to train on both small and long range correlations, which can be equally important in making predictions about the cross sections. We believe this allowed the DenseNet architecture to better understand the trends of swarm data which in turn lead to this enhanced performance. Also, layers in the DenseNet architecture receive additional supervision from the loss function through shorter connections, alleviating the vanishing gradient problem and improving the flow of information and gradients throughout the network. This deep supervision provided by the DenseNet could also be one of the reasons for this improved accuracy of the predicted cross sections.

Table 1. Performance metrics of all architectures implemented in this study.

  ANN a CNNDenseNet
SpeciesCross sectionMAE R2 MARPDMAE R2 MARPDMAE R2 MARPD
N2 Elastic0.02850.5787.17%0.02240.6155.65%0.01860.6374.67%
Ionization0.01640.9915.65%0.00830.9913.04%0.00800.9962.99%
Total0.03020.4687.57%0.02390.5045.99%0.02050.5675.16%
ArElastic0.06610.72415.95%0.05840.66214.23%0.03150.9317.90%
Ionization0.04070.86714.99%0.01650.9685.52%0.00790.9942.93%
Total0.05970.72214.46%0.05510.65913.46%0.02740.9356.88%
HeElastic0.00670.9861.70%0.00480.9971.21%0.00320.9990.81%
Ionization0.01250.9704.67%0.00810.9892.62%0.00850.9753.17%
Total0.00620.9851.58%0.00430.9971.09%0.00450.9981.15%
FElastic0.02050.8035.18%0.01430.9313.62%0.01040.9862.65%
Ionization8.8971−96470.51%0.1307−10.735.65%0.04110.83612.97%
Total0.01890.8144.78%0.01480.9293.74%0.01020.9872.56%
CH4 Elastic0.02930.8727.39%0.01980.9785.01%0.01650.9804.17%
Ionization0.03320.90211.29%0.01390.9954.87%0.01800.9536.56%
Total0.05190.83312.85%0.02760.9786.95%0.01830.9784.64%
O2 Elastic0.01290.8893.28%0.01040.9482.61%0.00790.9462.01%
Ionization0.06300.71920.04%0.47060.9917.46%0.02520.9808.11%
Total0.0165−0.2814.16%0.01370.6193.47%0.01120.7732.84%
SF6 Elastic0.01780.9734.50%0.01690.9804.27%0.01250.9863.17%
Ionization0.03850.95113.85%0.01730.9676.05%0.01560.9755.78%
Total0.01540.9803.90%0.01690.9824.82%0.01220.9873.08%

a ANN architecture adopted from [21].

For predicting the ionization cross sections, both DenseNet and CNN gives equally good results compared to ANN over the entire energy domain, according to the performance metrics. Moreover, even though no prior information about the threshold energy of ionization cross sections was provided to any neural network, both CNN and DenseNet were able to predict the threshold energy of all gas species, with an accuracy up to one decimal place. Specifically for the case of fluorine, all the three models somewhat struggle to determine the ionization cross section with accuracy as compared to other predictions. This is purely due to the fact that the ionization cross section of fluorine is unusually lower than the ionization cross sections of the other gas species in the training data and thus can be considered as an outlier.

As discussed earlier, it is important for the predicted cross sections to not only be accurate but must also be self-consistent. Swarm data provide a useful way to check the self-consistency of the cross sections. Thus, we reproduced the swarm data using the predicted cross sections and found them to be consistent with the swarm data that was calculated using the actual cross sections, as shown in table 2.

Table 2. Performance metrics of reproduced swarm coefficients by predictions of all architectures implemented in this study.

  ANN a CNNDenseNet
SpeciesSwarm coefficient R2 MARPD R2 MARPD R2 MARPD
N2 Mobility0.59216.61%0.9706.38%0.9156.41%
Diffusion0.86819.15%0.9555.71%0.9896.18%
Townsend ionization0.9986.47%0.9986.37%0.9995.05%
ArMobility0.87710.02%0.90110.99%0.9443.44%
Diffusion0.6738.54%0.7268.34%0.9014.77%
Townsend ionization0.96720.93%0.98919.41%0.99914.97%
HeMobility0.9990.78%0.9990.74%0.9990.72%
Diffusion0.9901.85%0.9981.35%0.9971.49%
Townsend ionization0.98814.48%0.98513.12%0.9962.51%
FMobility0.9066.54%0.9963.88%0.9996.043%
Diffusion−0.7517.96%0.35612.03%0.9877.78%
Townsend ionization0.09832.24%0.65826.67%0.98217.58%
CH4 Mobility0.71212.08%0.7168.01%0.9667.26%
Diffusion0.00121.83%0.6629.97%0.9337.41%
Townsend ionization0.98912.43%0.9996.25%0.9959.29%
O2 Mobility0.9078.38%0.9678.26%0.9844.39%
Diffusion0.8777.24%0.8629.58%0.9484.92%
Townsend ionization0.96712.88%0.99812.78%0.9953.13%
SF6 Mobility0.9235.47%0.9723.09%0.9901.71%
Diffusion0.9785.69%0.9903.79%0.9982.60%
Townsend ionization0.9966.02%0.9995.17%0.9980.98%

a ANN architecture adopted from [21].

Prediction of excitation cross sections by all the architectures differ substantially from the actual cross sections. A possible reason for the same is that the swarm data themselves provide less information about the excitation cross sections compared to elastic momentum transfer and ionization cross sections. This assumption is backed up by a comparison of the two sets of swarm parameters, as depicted in figure 12. The first being calculated from predicted elastic momentum transfer, ionization and excitation cross sections, while the second set of swarm parameters is calculated using the actual elastic momentum transfer, ionization and excitation cross sections. Another point to be noted here is that only the lowest threshold processes is used in the training and for many cases, this is far less than the sum of all the excitation cross sections. Although the predicted excitation cross sections differed substantially from the actual cross section, the same is not replicated in the comparison of swarm parameters, whose metrics (table 2) are almost consistent with those predicted of elastic momentum transfer and ionization cross sections' predictions. Thus, we can attribute lack of information content about excitation cross sections in swarm coefficients as one of the possible reasons behind the inaccuracy of predicted excitation cross sections. However, this requires a more detailed investigation in future.

3.1. Uncertainty quantification

Solutions obtained using deep learning methods have some inherent uncertainty. Quantifying this uncertainty would assist us in determining the reliability of the predictions. Moreover, the mapping of swarm coefficients to cross sections is non-unique—there exist multiple cross sections which map to the same swarm coefficient and the probability distribution of the cross sections generated by the UQ allows us to sample all these plausible solutions.

Bayesian neural networks (BNNs) [48] predicts the complete probability distribution of the output variable and hence are most suited to determine the model uncertainty. Yet, BNNs are not frequently used due to its high computational cost. Thus, Monte Carlo Dropout [24] is generally used as an approximate Bayesian inference and we apply this to quantify the uncertainty in our inverse solution. It is implemented by first replicating the DenseNet architecture outlined previously. Subsequently, a dropout of 20% is introduced in the dense layers. These neurons are disabled randomly during both the training and the testing phase. Therefore, every time an input value is passed to the model, different values are predicted which are sampled from some probabilistic distribution. We deduce this distribution by sampling a total of 104 estimated cross sections, and the results are shown in figures 1315, which depict the confidence intervals in which the cross section value might lie. We observe a general trend for all gas species except helium, that the model has a higher uncertainty in determining elastic MTCS at low energies (0.1–0.8 eV) compared to that at high energy values. Conversely, the model has higher uncertainty in predicting the ionization cross section at higher energies (${\gt}4000$ eV). Additionally, we find that the model is absolutely certain about the predicted ionization threshold energy but is less certain in determining the peak value of the ionization cross section, even though it gives almost accurate results for both of these quantities. Further, the uncertainty in predicting the excitation cross sections is more compared to both elastic momentum transfer and ionization cross section and as suggested earlier, the lack of information content about the excitation cross sections in swarm data, could be one of the possible reasons for this higher uncertainty of excitation cross sections.

Figure 13.

Figure 13. Uncertainty in prediction of elastic cross sections of various gas species.

Standard image High-resolution image
Figure 14.

Figure 14. Uncertainty in prediction of ionization cross sections of various gas species.

Standard image High-resolution image
Figure 15.

Figure 15. Uncertainty in prediction of excitation cross sections of various gas species. 'Actual' curves shown here correspond to only a part (only the lowest energy process) of real excitation.

Standard image High-resolution image

4. Conclusion

We have presented a data-driven approach, to obtain cross sections from the corresponding swarm data using different deep learning models which are trained upon the synthetic data generated from cross sections available on the LXCat. We have demonstrated the feasibility and the robustness of this deep learning based approach, by testing the trained networks to predict the elastic momentum transfer, ionization and excitation cross sections of various gas species, having diverse physical and chemical properties, and found the predicted cross sections to be consistent with the cross sections for elastic momentum transfer and ionization. Also, the swarm coefficients calculated using the predicted cross sections agrees reasonably well with those calculated using the cross sections sets for each species from LXCat (considering only the lowest energy excitation process). We have quantitatively analyzed the performance of three different neural network architectures (ANN, CNN and DenseNet) in finding the solution to the inverse swarm problem and found that the DenseNet, due to its ability to effectively extract both long and short term trends from the swarm data, significantly outperformed ANN used in previous works, as indicated by the ensemble of metrics used to access the accuracy of the architecture. In summary, we have tested our models on a wide range of gas species, used more performance metrics for statistical analysis and determined cross sections over a greater energy range compared to previous works based on ANNs. Finally, the UQ of the model provides us a good estimate of the probability distribution of the cross sections from which all the physically plausible solutions of this inverse swarm problem can be sampled. Based on our results, we can conclusively say that CNN based models, particularly DenseNet, are better compared to ANN models in accurate determination of cross sections from swarm data. Interestingly, unlike ANNs, DenseNet could also predict characteristic peaks in specific energy ranges present in some gas species such as nitrogen and argon; these peaks are due to quantum mechanical effects and require domain expertise for such analysis. These significant improvements in prediction accuracy and pattern recognition while using DenseNet will provide the required confidence to the LTP community to accept such data driven approaches. However, additional work is needed before using actual swarm measurements (experimental) as input to such models. Many real gas species have multiple excitation cross sections and they all have an effect on the corresponding swarm coefficients but our proposed model is trained upon the swarm data which is computed using only a single excitation cross sections. Future works should address this issue.

The performance of deep learning models is highly dependent upon the training data fed to it. In this work, we have generated synthetic training data by interpolating the actual cross sections which have been categorized based on the characteristics of elastic MTCS. This approach is sufficient to provide the model with a large amount of data to train upon but clearly limits new trends in the synthetic data. Thus, we believe the performance of these neural networks would further improve if we actually use a sophisticated synthetic data generation scheme which can provide artificial cross sections which are physically-plausible, yet have unique trends of their own. One such possible approach is to use generative adversarial networks (GANs), which is a ML framework used to extract complex features from a dataset and based on it, generate completely new data with random noise as input. Work on improving the quality of synthetic data with the use of GANs is currently underway. Also, as discussed earlier, swarm data provides a useful way to assess the self-consistency of the cross section set and we think that the neural networks' performance can be further improved if a custom loss function is used, which somehow also takes into consideration the errors on the swarm data calculated from the predicted cross sections. This feedback would force the neural networks to focus on maintaining the self-consistency of the predictions along with improving its accuracy and the future works could address this potential improvement.

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: www.lxcat.net.

Acknowledgments

The authors would like to thank Dr Leanne Pitchford, emeritus senior research scientist at LAPLACE Laboratory, CNRS, Toulouse, France, for discussions about the inverse problem in the context of swarm parameters, and for her valuable comments after careful reading of this manuscript. The authors would like to gratefully acknowledge DA-IICT, India for providing the computational facilities and kind support to carry out this research work.

Please wait… references are loading.
10.1088/2632-2153/abf15a