Towards a Machine-Learned Poisson Solver for Low-Temperature Plasma Simulations in Complex Geometries

Poisson's equation plays an important role in modeling many physical systems. In electrostatic self-consistent low-temperature plasma (LTP) simulations, Poisson's equation is solved at each simulation time step, which can amount to a significant computational cost for the entire simulation. In this paper, we describe the development of a generic machine-learned Poisson solver specifically designed for the requirements of LTP simulations in complex 2D reactor geometries on structured Cartesian grids. Here, the reactor geometries can consist of inner electrodes and dielectric materials as often found in LTP simulations. The approach leverages a hybrid CNN-transformer network architecture in combination with a weighted multiterm loss function. We train the network using highly-randomized synthetic data to ensure the generalizability of the learned solver to unseen reactor geometries. The results demonstrate that the learned solver is able to produce quantitatively and qualitatively accurate solutions. Furthermore, it generalizes well on new reactor geometries such as reference geometries found in the literature. To increase the numerical accuracy of the solutions required in LTP simulations, we employ a conventional iterative solver to refine the raw predictions, especially to recover the high-frequency features not resolved by the initial prediction. With this, the proposed learned Poisson solver provides the required accuracy and is potentially faster than a pure GPU-based conventional iterative solver. This opens up new possibilities for developing a generic and high-performing learned Poisson solver for LTP systems in complex geometries.


Introduction
Poisson's equation is an elliptic partial differential equation (PDE) of great relevance.It is used in the theoretical description of many physical systems, such as hydrodynamics, Newtonian gravity, quantum mechanics, and electrostatics/magnetostatics.For example, Poisson's equation describes a gravitational field generated by a massive object in a Newtonian system [1].In a hydrodynamic system of incompressible flows, one solves the equation to obtain the scalar pressure field [2].In this work, we are particularly interested in the use of Poisson's equation for the modeling of low-temperature plasma (LTP) systems.Low-temperature plasmas are one example where the electrostatic approximation is frequently applied [3].Therein the electric potential across a domain arises from electric charges within the domain and the domain boundaries.Moreover, they present physical systems, where the dynamics of the electromagnetic fields and the charged species are strongly coupled.The numerical simulation of such coupled systems requires a solution of Poisson's equation in every simulation time or iteration step.The obtained potential distribution at the current time step is used to calculate the electric field, which directly influences the dynamics of the charged species, thus the outcome of the plasma simulation.Notice that although the subsequent discussion is centered around the electrostatic approximation, it can be easily transferred to the magnetostatic approximation or to electrodynamics using the Helmholtz equation [4].
The repeated calculations for solving Poisson's equation over numerous simulation time steps amount to a significant computational cost of the entire simulation.This is especially true in multi-dimensional simulation settings, where the time for solving the equation represents a large part of the overall time consumption.Many numerical methods have been developed for solving elliptic boundary value problems (BVPs) such as the Poisson equation.These include the multigrid methods, which are currently among the most efficient iterative numerical solvers [5][6][7].Recently, a spectral element method based on the hierarchical Poincaré-Steklov scheme has shown promising results in terms of fast convergence and computational efficiency [8][9][10].Nevertheless, the elliptic nature of Poisson's equation implies information propagation throughout the complete domain, limiting possible performance optimization techniques of the applied numerical methods [10].
In the field of machine learning, artificial neural networks (ANNs) have been continuously improved in solving real-world tasks (e.g., image classifications, object recognitions, machine translations, etc.) since its breakthroughs in the early and mid-2010s, particularly through the use of modern ANN architectures such as convolutional neural networks (CNNs) [11], and the invention of the transformer networks [12].These breakthroughs were facilitated by advances in graphics processing units (GPUs) that have made training deep networks on large datasets computationally feasible [11,13,14], in addition to the increasing availability of data required for the training.
The application of ANNs for solving PDEs dates back at least to the 1990s [15][16][17].Due to the aforementioned recent successes of ANNs, coupled with the everadvancing computing hardware, the research interest in solving PDEs with ANNs has gained further momentum.Speedup in computation time and generalization of illposed problems remain major motivations.Among the influential works in this area are physics-informed neural networks (PINNs) [18][19][20] and neural operators (NOs) [21][22][23][24][25].In [26], Shi et al. have proposed a data-free paradigm for NOs, and Wang et al. have combined ideas from PINNs and NOs for a more data-efficient operator learning in [27].
It is worth noting that solving PDEs using ANNs has sound theoretical foundations, e.g., Hornik et al. [28,29] have shown that ANNs with multiple layers can approximate any continuous functions, and recently Zhou et al. [30] have extended the earlier work to CNNs.In addition to the direct solution of PDEs using ANNs, neural solvers with convergence guarantees have been proposed for fast iterative solutions [31].
On a more specific note, several recent works have applied CNN-based methods for solving specifically the Poisson equation in spatially two-dimensional (2D) domains for fluid dynamic and electrostatic cases [32][33][34].These approaches aim to solve Poisson's equation subject to different source term values, boundary conditions, and domain sizes, without having to retrain the network.
In this paper, we propose a machine learning approach to approximate the solution of Poisson's equation in complex 2D geometries.These geometries are particularly relevant to the simulation of LTPs.The approach employs a hybrid CNN-transformer network architecture combined with a weighted multiterm loss function and a highlyrandomized synthetic data generation scheme.The paper is structured as follows: in section 2, we first briefly review recent works on solving Poisson's equation using ANNs, then we specify the problem definition and the scope of the present work.In section 3, we describe the dataset preparation, objective function formulation, network architecture, and the details of training experiments.Subsequently, we present and discuss the results in section 4, which include the training success, model evaluations, and accuracy of the calculated electric fields from the predicted potential profiles as well as the wall time performance of the learned solvers.Finally, we conclude the work in section 5 and provide the outlook for future developments.

Problem definition and scope
Poisson's equation is often written in the following form: where f (r) is a known source term, r is the position vector, and ϕ(r) is the unknown variable to be solved.Boundary conditions such as Dirichlet, Neumann, and mixed types must be applied to solve the equation.Fundamentally, solving Poisson's equation with machine learning-based methods falls into the category of regression problems due to the continuous nature of the output.Typically for 2D cases, the prediction is done at once for all spatial points in the domain, and the input and output data are co-located and described in the same spatial configuration.To set the scope, we briefly review two recent developments in machine learning-based Poisson solvers for 2D cases using CNNs [32,34].Both referenced works have set up the Poisson problem in this fashion.Özbay et al. [32] have proposed a CNN-based model trained using a novel L p -norm based loss function to solve the Poisson equation in fluid simulations of incompressible flows.The presented model was trained in a supervised learning scheme on a randomly generated dataset defined on Cartesian grids.The input data for the model consist of the source term, the domain size, and the boundary conditions (Neumann or Dirichlet boundary conditions).This expressive input data allows for a wide range of problem configurations that the trained model can solve.Additionally, they have demonstrated that a significant increase in accuracy can be achieved by using the predicted solution as an initial guess for a single iteration of a conventional iterative solver.The presented model, however, is not applicable to use cases with mixed boundary conditions and the presence of objects within the domain.
For electrostatic systems, Cheng et al. [34] have investigated two well-established CNN architectures, and have studied the network hyperparameters well suited for solving Poisson's equation in fluid simulations of low-temperature plasmas.As a first step, the architectures have been trained to solve equation ( 1) with homogeneous Dirichlet boundary conditions and a given source term as input, in a supervised learning method similar to [32].The best-performing architecture, in this case the U-Net architecture [35], has then been trained on simulation-specific datasets to solve the equation in real settings such as plasma oscillation and double-headed streamer simulations.The trained model has been successfully evaluated on the corresponding simulation cases.Since the model uses only the source term as an input parameter, while boundary conditions and domain size are implied in the training data, this limits the range of problems (e.g., different reactor geometries) that the model can solve without retraining.
The learned Poisson solvers discussed above have so far dealt only with Poisson problems in simple geometries and in the absence of objects inside the computational domain.Furthermore, the boundary conditions are defined only on the outer boundaries of the domain and with limited configurations.This limits the usability of the learned solvers for application in LTP simulations without retraining the networks.For example in the simulation of a dielectric barrier discharge (DBD) or packed-bed reactors, one may incorporate inner electrodes and geometrically complex dielectric materials with varying coefficient values within the computational domain.Furthermore, mixed Neumann-Dirichlet boundary conditions are often encountered [36][37][38].In addition, the electrodes may be dynamically driven, meaning that the boundary conditions (value or gradient) change over time [36][37][38][39].
In this work, we aim to investigate the feasibility of developing a 2D machine-learned Poisson solver that is generic enough for most use cases in LTP simulations.Generic here means that the learned solver must be able to handle arbitrary charge density distributions, dielectric material distributions, domain sizes, boundary values, as well as different boundary conditions (Dirichlet, Neumann, and mixed) while avoiding the expensive cost of retraining.At this stage, we still restrict the problems to the use cases in LTP simulations with structured Cartesian grids and fixed numbers of computational nodes to make the task tractable.To account for the above-mentioned requirements, we consider Poisson's equation in 2D Cartesian geometries in the following form: where ε r denotes the spatial distribution of the dielectric constant, ρ is the charge density, ε 0 is the vacuum permittivity, ρ/ε 0 = f is the source term, h is the Dirichlet boundary condition value defined on Γ Dirichlet ⊂ ∂Ω, g is the Neumann boundary condition value defined on Γ Neumann ⊂ ∂Ω, and n is the unit normal to the boundary ∂Ω (both inner and outer boundaries).Lastly, Ω is the solution domain.The solution domain and the boundaries are defined on a rectangular domain with width w and length l.

Methods
In connection with computer vision, discretized 2D Poisson problems defined in structured Cartesian grids can be thought of as pixel-wise regression problems, where input and output data are arranged in the spatial domain equivalent to images (or tensors).Consequently, each pixel in an image can be viewed as a computational node within the domain.Monocular depth estimation tasks are a good example, where a machine learning model must predict the depth value of each pixel in the output image from its input image.Like in [32,34], we tackle this regression problem in a supervised learning manner, i.e., learning from input-output data pairs.

Datasets
One of the expected features of the learned solver presented in this work is that it should generalize well on unseen data, i.e., data excluded from the training process.For the given task, this generalization capability is (among other aspects) indicated by how well the learned solver can handle reactor geometries of arbitrary configurations.It is well known that ANNs thrive on a large and high-quality dataset.However, generating such a dataset from actual numerical simulations using actual reactor geometries is a computationally expensive process and limits the diversity of the data.Therefore, we choose to use a random process to generate highly unique data as done similarly in [23,32,34].In this way, we can generate training data that are distinct from each other in terms of physical quantities and spatial features (shapes of objects within the domain) or geometries.This diverse training data encourage the network to learn the appropriate mapping between the input and the output spaces and prevents the network from overmemorization (overfitting).Thus, improving the overall generalization capabilities of the learned solver.Let X ∈ R H×W ×C be an input data sample and Y ∈ R H×W be the corresponding output data sample, where H is the number of pixels in the y direction (height), W is the number of pixels in the x direction (width), and C denotes the number of channels of the image.For example, a color image is often described in the RGB color space (C = 3), though other types of color space also exist such as CMYK (C = 4).The output data Y corresponds to the electrostatic potential which is a scalar field equivalent to a monochromatic image, e.g., a grayscale image.For solving equation (2), the tensor X is composed of the different scalar input fields required to solve the equation such as ρ/ε 0 , ε r , g, h, ∂Ω and Ω (see equations ( 2)-( 4), detailed in section 3.1.3).
3.1.1.Feature scaling For real applications, the physical quantities of the mentioned parameters are of vastly different orders of magnitude.This may have severe effects on the learning success.Because ideally, every input parameter has to contribute equally in the gradient calculations during training [40,41].Therefore, we use the following feature scaling scheme.For the cases of LTP simulations, we can recast equation (2) into a dimensionless problem where ν is a scaling coefficient, while quantities denoted by * and * are the scaled quantity and the scaling factor of the corresponding variable, respectively.The ∇-operator is scaled using a typical length scale ȓ = (xy) 1/2 as scaling factor.The scaling factors may be obtained from representative LTP discharge parameters, with the goal to obtain quantities of the order of unity.The scaling coefficient ν may be estimated correspondingly.
Consider the example of a two-dimensional low-pressure capacitively coupled radiofrequency (CCRF) discharge with an approximate maximum charge density in the sheath regions of ρ ≈ 5 • 10 −5 C/m 3 , a powered electrode voltage of φ ≈ 250 V, and a plasma reactor of size x × y = w × l = 25 × 50 mm 2 .For the sheath region, it may be estimated that ν ≈ 56.In contrast, the quasi-neutral LTP bulk typically extends over a   [39], packed-bed DBD [37], and surface DBD [38] reactor geometries, which are used to generate , respectively.BCs stands for boundary conditions.large part of the domain.To also accommodate for ρ ≪ 5 • 10 −5 C/m 3 in this region, a global scaling factor of ν = 1 is chosen.While for the sheath regions, this corresponds to typical values for the scaled potential of φ ≈ 56, typical values of φ ≲ 0.1 are expected in the bulk.Note that the remaining scaled quantities * are within the range [−1, 1], which we will use for generating the training data.
We can now define all input parameters h, g, ρ, w, and l without physical dimensions and within the same range (except for ε r which is always dimensionless).Here, we define the lower and upper bounds of the input parameters, h, g, ρ ∈ [−1, 1], and w, l ∈ [0, 1].In the case of ε r , we choose ε r ∈ [2,30].This is because, firstly, the dielectric materials must have ε r > 1, since ε r = 1 is the dielectric constant for free space.Secondly, we consider the practical use cases in LTP, which rarely use dielectric materials that have ε r > 30.Note that the described scaling specification is chosen for the sake of simplicity and may not cover every use case in LTP simulations.However, for practical and specific use cases, the specification can be adjusted accordingly with ease, i.e., by adjusting ν and the upper and lower bounds of the input parameters.

Data generation and normalization
Objects inside the computational domain are made up of ε r or h or a combination thereof.For these two parameters, we generate instances that have random shapes and values within the defined range.Additionally, h can also be defined at the outer domain boundary, and in this case, we randomly define the nodes (equivalent to pixels in an image) at the outer boundary with values of h.This also applies to g, which is exclusively defined at the outer boundary.In the case of ρ, unfortunately, it is not a straightforward task to procedurally generate fake charge density distributions that are physically plausible.To this end, we use 2D random noise generation methods (such as Perlin and Simplex noises) [42], considering that it may capture the features of real physical distributions.Lastly, we randomly define the spatial dimension w and l.We generate the input parameters as images of the same size, X εr , X ρ , X h , X g , X w , X l ∈ R U ×V , where U × V is the size of the images.Here, relevant information in X εr , X ρ , X h , and X g do not overlap with each other.For example, we consider that there are no electric charges within dielectric materials and that dielectric and electrode materials are clearly distinct.Therefore, we set the values of all non-relevant pixels in the images to zero.As for X w and X l , we discretize Ω ∪ ∂ Ω = [0, w] × [0, l] such that x mn w = m∆x w and x mn l = n∆x l , where ∆x w = w/(V − 1), ∆x l = l/(U − 1), x mn w is the pixel value at an (m, n) coordinate point in X w , and similarly for x mn l .Furthermore, each element in X εr , X ρ , X h , X g , X w , and X l is co-located with each other meaning that every element with the same coordinates directly contributes to the prediction outcome that shares the same coordinates.Using the generated input data, we can create the corresponding solution data φ by solving it using a conventional PDE solution method.In this case, we use a second-order finite difference method (FDM) solver implemented in the NumPy (version 1.24) and SciPy (version 1.9.3) libraries [43][44][45][46].Similarly, the solution data is arranged as an image Y ∈ R U ×V .
To further ease the learning process, we apply the min-max normalization to all input data effectively scaling their values to a uniform range of [0, 1].As for the solution data, the min-max normalization is done by using the minimum and maximum values from the corresponding h, since φ is unbounded.In practice, we found that almost all normalized solution data are within [0, 1].Thus, we obtain the normalized data (i.e., images) This final normalization reduces the size of the search space to mostly positive values further increasing learning efficiency.

Input data configuration
We can exploit the co-location and non-overlapping properties of the input parameters to configure the input data efficiently.First, we apply edge-padding operations with pad size p to X ′ h , X ′ g , X ′ ρ , and X ′ εr .This is mainly done to increase the contributions of g and h defined at the outer boundary.Because in the images, this is equivalent to lines at the edges with a thickness of one pixel, which is relatively small.For X ′ w and X ′ l , we apply zero-padding operations with the same p.As mentioned previously, the relevant information in X ′ h , X ′ g , X ′ ρ , and X ′ εr do not overlap with each other.This allows us to arrange them together into one image without losing any important information, and by doing so, we obtain the value image X value ∈ [0, 1] (U +2p)×(V +2p) .Up to this point, it is unlikely for the network to be able to identify the different types of parameters in X value .To this end, we construct a type image , where each type of input parameter is assigned a distinct value in [0,1].And in this case, we choose type ρ = 0.25, type εr = 0.5, type g = 0.75 and type h = 1.0.Finally, we can stack X value , X type , and the padded X ′ l and X ′ w together into one final input data X ∈ [0, 1] (U +2p)×(V +2p)×4 .In fact, U + 2p = H and V + 2p = W is the final size of the input data, thus, X ∈ [0, 1] H×W ×C with C = 4. Figure 1 depicts the general overview of the input data configuration.Figure 1(a) shows an example of input data visualized in the CMYK color space, and figure 1(b) shows the different components of the input data and how it can be stacked together to create the final input data.Lastly, to keep the uniformity throughout the whole process, we apply zero-padding with the same p to Y ′ , which results in the final output data Y ∈∼ [0, 1] H×W .An example of an output image is shown in figure 1(c).Additionally, we prepare six datasets from three reference reactor geometries loosely based on [37][38][39], D for further evaluation of the learned solvers.Each of these sets has N = 10 3 .Note that the reference geometry datasets are not obtained from real numerical simulations.Random processes are still used to generate the charge density and most of the input parameter values following the same scheme as the random datasets.Figure 2 presents the known reactor geometries and their configurations.

Objective function formulation
Now, we assume that there is a function G that can approximate the mapping between X and Y to a certain degree of accuracy.The function G is to be learned using an ANN.G learns from a dataset containing N observations of input-output pairs {X i , Y i } N i=1 obtained from solving X using a conventional method such as FDM.We therefore can formulate the objective function of the training procedure as an optimization problem that minimizes a loss function L where θ denotes trainable parameters of the neural network G θ (a function G parameterized by θ).For a finite number of N observations of training data, equation ( 7) can be written as where Y i and X i are the i-th true output and i-th input data, respectively.Finally, G θ (X i ) := Ŷi denotes the i-th predicted output.
For successful learning, the loss function must be designed appropriately.The most ubiquitous loss function for regression problems is the L 1 -norm loss, which directly measures the distance between the true and predicted values at each pixel from the absolute differences of their coordinates, defined by One property that the predicted output (solution to the Poisson equation) must have is smoothness, both in terms of mathematical and perceived smoothness.Inspired by several works in monocular depth estimation tasks, we consider a single-scale structural similarity index measure (SSIM) [47] loss combined with the disparity smoothness loss [48,49] Contributions of equations ( 9) and ( 10) have been shown to enforce the network to produce predictions that are perceptually similar to the true output and with correct smoothness [48,49].To the best of our knowledge, the use of L SSIM and L smooth have not yet been reported for PDE-related problems.Referring to the parameters of SSIM described in Appendix A, we set the Gaussian window size to 3 × 3 and the dynamic range value λ to 1.0 for equation (9).Note that λ is defined as the difference between the maximum and the minimum of possible output values.In this work, we found that the output values reside mostly within [0, 1] (see section 3.1), which is why we opt for using λ = 1.0.Furthermore, we consider a contextual physics loss, which in this case is defined by the relative L 2 -norm loss between the gradients of the true and predicted potentials It measures the accuracy of the predicted electric field from the predicted potential Ŷ, which is indeed the ultimate goal for the field calculation step in self-consistent electrostatic LTP simulations.It has been shown in [24,50] that a relative loss term such as equation (11) has a positive regularization effect on the training.Finally, we can write the total loss function as a sum of these loss terms where α, β, γ, and δ are the weights or hyperparameters.This weighted multiterm loss function is designed to enforce the network to produce both numerically accurate and smooth predictions.We found that setting α, β, γ, and δ to 0.8, 1.0, 0.9, and 0.8, respectively, gives satisfactory results.

Network architecture
CNNs have been used predominantly for dealing with data that can be represented as images or tensors mainly for computer vision tasks.Recently, transformer networks, originally designed for machine-translation tasks by Vaswani et al. [12], have been widely adopted in computer vision prompted by the work of Dosovitskiy et al. [51], and rapidly replacing state-of-the-art CNN-based models in many computer vision tasks [52][53][54][55].In this work, we employ a slightly modified version of a hybrid CNN-transformer architecture proposed by Chen et al. [56] dubbed TransUNet, which was initially designed for medical image segmentation tasks.
TransUNet follows the u-shaped design with skip connections introduced by Ronneberger et al. in [35].The design consists of encoder and decoder parts, where feature information from encoding stages gets relayed directly to the corresponding decoding stages via skip connections.In TransUNet, the transformer layers are exclusively used in the encoder part in conjunction with the typical convolution and pooling layers, while the decoder part follows the usual CNN-based decoder design.The hybridization of transformers and CNNs supported by the skip connections leads to the ability of TransUNet to encode strong global and low-level features [56].This is one of the primary reasons why we presume such an architecture design is well suited for the given problem, considering its elliptic nature.In the following, we describe in more detail the modified TransUNet architecture used in this work, which we call TransDenseUNet (TDN).The high-level overview of the architecture is depicted in figure 3.

Encoder
The input data of size H × W × C is first downsampled through several convolution and pooling operations.Unlike in the original implementation [56], we employ a block of densely connected convolutional layers following the DenseNet architecture [57] to downsample the input data in place of regular CNN layers.The DenseNet approach offers several advantages such as enabling training deeper layers with a reduced number of parameters and strengthening feature propagation, which enables efficient and powerful representation learning [57].In total, we employ 38 convolutional layers that are densely connected, thus, we call this first encoder part the DenseNet-38 block.Intermediate outputs are employed as skip connections to the decoder network.
The final output of this block is a feature map of size H 8 × W 8 × 512.A high-level structure of DenseNet-38 is presented in Figure 4.For further details regarding the DenseNet architecture, we refer the reader to [57].
From here on, the encoding process is carried out by the transformers.A standard transformer layer requires a 1D input sequence, therefore, the outputted 2D features must be transformed into a sequence of flattened patches x p ∈ R S×512 , where S = HW 8 2 P 2 is the resulting number of patches, and P ×P is the dimension of each patch.Here, we opt for P = 1 resulting in S = HW 8 2 , and at this stage, we obtain the tokenized feature maps (or patches) x p .Following [51,56], we evaluate patch and position embeddings using a trainable linear projection, which transforms x p into an embedded sequence z 0 ∈ R S×D , where D is the embedding dimension.Finally, the embedded sequence z 0 is the input to the transformer block, which contains L transformer layers.Each transformer layer consists of a multihead self-attention (MSA) block with N MSA self-attention heads, a multilayer perceptron (MLP) block, two-layer normalization operations, two skip connections, and two vector addition operations, structured as shown in Figure 5.The MLP block has two layers of ANNs, where the first layer has N neuron neurons and the last layer has the same number of neurons as the embedding dimension D. Finally, from the last transformer layer, we obtain the final latent representation z L ∈ R S×D , which is ready to be reconstructed into a prediction by the decoder.For this task, D = 256, L = 8, N MSA = 4, and N neuron = 1024 provide a good balance between prediction accuracy and computational cost (as well as memory requirement).We refer interested readers to [12,51] for more in-depth descriptions of the transformer architecture.The self-attention mechanism is often regarded as the key factor in the transformer architecture.As described in [12], MSA enables the network to pay attention to different pieces of information in the input sequence and utilize them efficiently.In natural language processing (NLP), MSA may consider different parts of the input text (a text corpus, a page in a book, etc.) and learn the global context.In [24], Shuhao Cao has presented a modified version of the self-attention mechanism and successfully applied it in an operator learning task to solve an elliptic BVP such as the Darcy flow problem.

Skip connections
Skip connections play an important role since a significant number of the training data contains objects inside the domain.For such data the input and the reciprocal output share similar spatial features like shapes of the inner electrodes and dielectric materials, which can occupy a considerable space in the domain.The use of skip connections ensures the preservation of these spatial features from high to low spatial resolutions, connecting different stages of the downsampling process to the corresponding upsampling process (in each of the expansion stages) [35,56].
For the presented architecture, the downsampling process is primarily carried out by the DenseNet-38 block [57].From this process, we extract two feature maps of sizes (referred to as skip connections in Figure 4) and concatenate them to the corresponding upsampled feature maps that have the same spatial sizes.Additionally, we apply (two times) a one-stride 3 × 3 convolution with 32 filters to the input data, followed by a ReLU activation [58,59] and a batch normalization layer [60].This results in a feature map of size H × W × 32, which we use for the concatenation of features that have the original spatial size H × W .Note that this additional process is not connected to the down-/upsampling process, and is solely done for superposition of high-resolution features.In total, this architecture has two levels of skip connections and concatenation processes, plus one direct connection and concatenation process.

Decoder
Ideally, the latent representation z L obtained from the encoder contains all relevant information about the prediction.It is the task of the decoder to reconstruct z L ∈ R S×D into the prediction Ŷ ∈ R H×W .The decoder consists of several expansion stages.At the first step, we reshape z L into a 2D feature map of size H 8 × W 8 × D, similar to the original implementation [56].After that, we run the feature map through three expansion stages, where the k-th expansion stage results in a feature map of size For each expansion stage, we apply a set of operations successively very similar to the implementation in [35].Here, we opt for using a 2 × 2 bilinear-upsampling operation followed by a one-stride 3 × 3 convolution operation and a ReLU activation in the expansion operation (green-blue arrows in figure 3).The concatenation of features takes place directly after the expansion operation, which is then followed by two times of a one-stride 3 × 3 convolution operation and a ReLU activation.Finally at the end of the last expansion stage, we apply a one-stride 1 × 1 convolution operation followed by a linear activation.This operation concludes the decoding process, which transforms a feature map of size H × W × 64 to the reconstructed output prediction Ŷ ∈ R H×W .
272 = D 272 , and use D 528 as it is.In total, we obtain four sub-and datasets for training and validation.We split each set into a training set (80% of the total data) and a validation set (20% of the total data).Four models are trained each using one of the prepared datasets, and then evaluated on the prepared evaluation sets (as described in section 3.1) according to its data resolution.The models and the corresponding datasets as well as the number of trainable parameters are summarized in table 1. TDN528 30k has slightly more trainable parameters than the rest of the models due to the increase in resolution.
For all training experiments, we choose the Adam optimizer [61] and the cosine decay learning rate schedule [62] with a warm-up.For this learning rate schedule, we set the hyperparameters as follows: start learning rate lr start = 0.0, maximum learning rate lr max = 4•10 −4 , epoch = 200, batch size = 8, hold = 10, and warm-up steps = 10•s, where s is the number of steps per epoch.The learning rate schedule is shown in figure 6.During training, we apply a basic (image-)data augmentation process consisting of random flipping and rotation operations.Finally, we use the relative L 2 -norm error ε L 2 as an evaluation metric during and after the training, given by where Y ′ and Ŷ ′ are normalized true and predicted output data, respectively.We implement the TransDenseUNet architecture and all training experiments using the TensorFlow library (version 2.11) [63,64] supported by the NumPy library (version 1.24) [43,45] in the Python programming language (version 3.10.6)[65].All models are trained on a single Nvidia Tesla P100 16GB GPU, except for TDN528 30k which is trained on a single Nvidia A100 80GB GPU.Although the training data are prepared using double precision (FP64), the models are trained using single precision (FP32) for significantly faster training sessions and inferences.The training time required for each model can take from several days to a week, depending on the size of the model and the dataset.

Results and discussion
In this section, we validate the training success and evaluate the performance of the trained models in terms of quantitative and qualitative accuracy.Subsequently, we discuss the electric field accuracy calculated from the predicted potential.Furthermore, the need for refinement using a conventional iterative solver to address the spectral bias problem will also be discussed.Finally, we present and briefly discuss the wall time performance of the learned solvers in comparison with a conventional GPU-based iterative solver.

Training results
Training success can be generally assessed from the evolution of the loss terms on validation data during training.The validation total losses (equation ( 12)) and the relative errors (equation ( 13)) are shown in Figure 7.It decreases gradually over the training epochs and finally plateaus toward the end of training.There is no sign of overfitting observed in any model, i.e., no increase in validation loss at a later epoch.This indicates good learning convergence.We also observe that after around epoch 125, there are smaller fluctuations in the validation losses.This correlates with a small learning rate (1.5 • 10 −4 ) at this stage.From this observation, it follows that the use of gradually decreased learning rates implies a stable learning behavior.However, the choice of a small learning rate must always be taken with caution to avoid underfitting and a long training time.
Furthermore, Figure 7 shows the effects of the number of training data and resolution on the training.In terms of loss, TDN528 30k achieves the lowest loss as presented in Figure 7(a).However, this does not reflect directly on the relative error from equation (13).In Figure 7(b), we can see that TDN272 50k has the lowest relative error, whereas TDN528 30k and TDN272 30k share similar relative error and TDN272 10k is by far the worst in terms of loss and relative error.Relative error Figure 7. Evolution of (a) validation total loss (equation ( 12)) and (b) relative error (equation ( 13)) during training.
The loss and relative error discrepancy of TDN528 30k is mostly attributed to the contributions of individual loss terms as depicted in Figure 8.We can see from Figure 8(a) that L L 1 follows a very similar trend as observed in the relative error.This suggests L L 1 is the main contributor to the numerical accuracy of the models and that outliers do not contribute significantly.As mentioned in section 3.2, L SSIM and L smooth are mainly intended to enforce smooth and perceptually similar predictions.Figure 8(b) implies that the network slightly struggles to learn this property for higher resolutions.To verify this assumption, a hyperparameter tuning for L SSIM is of interest.Nevertheless, the presented SSIM losses still achieved quantitatively satisfying results (≪ 1, recall that we set λ = 1 in L SSIM ).For L smooth , we can see a large discrepancy between the 272 and 528 models as shown in Figure 8(c).Notice though that there is no significant contribution from this loss term following the first few epochs.We further assume that removing this loss term may only negligibly affect the overall performance.Lastly, we can see in Figure 8(d) that L E converges similarly for all models at the end of training.

Model evaluations
We evaluate the raw prediction accuracy of the trained models on validation and evaluation sets using the relative L 2 -norm error ε L 2 from equation (13).From the results shown in table 2, we can infer that the model trained on the largest dataset, in this case TDN272 50k , outperforms other models.To put things into prespective, TDN272 50k achieves ε L 2 = 8.7•10 −3 on its validation set, which is better than the performance of the NO model (ε L 2 = 1.09 • 10 −2 ) in [23] for a similar yet simpler task, i.e., the Darcy flow problem with homogeneous boundary conditions and a constant source term.It is also worth noting that TDN272 50k performs competitively with the latest state-of-the-art NO model in [24] (ε L 2 = 8.44 • 10 −3 on the Darcy flow problem).
Furthermore, we can see that TDN272 30k and TDN528 30k , both trained with the same number of training data, perform similarly on their corresponding validation set Table 2. Averaged raw prediction accuracy of the trained models on the validation and evaluation sets.The scores are the prediction relative error ε L 2 from equation ( 13) with respect to the corresponding ground truth.Here, * means data resolution (which corresponds to the model resolution).The best score from each set is presented in bold.

Model
Validation set D .Slight discrepancies are observed on the reference geometry datasets, whereas the TDN528 30k model performs slightly better than TDN272 30k on average.Figure 9 shows an example of raw prediction from each model for the same random reactor geometry data sampled from D (test) * . We can see from the middle panels of the figure that all models produce qualitatively similar predictions to the ground truth (left most top panel).A closer inspection reveals that TDN272 50k produces a better prediction, which can be verified by the accuracy of the contour lines.This is also well reflected in the relative error, which is the smallest (ε L 2 = 8.98 • 10 −3 ).The bottom panels of Figure 9 show the error maps of the predictions in percentage (the values are truncated at 100).We can see that most of the errors accumulate at the zero crossing, i.e., the value transition from positive to negative (or vice versa).This is mainly due to the division by close to zero (i.e., very small numbers) and the models' inability to predict values with great precision, which is a trade-off of using the FP32 format (i.e., finite precision and round-off errors).

Accuracy of the predicted electric field and prediction refinement
With regard to regression tasks and given the complexity of the problem, TDN272 50k performs exceedingly well both qualitatively and quantitatively.However, predicting or calculating the potential profile is often only halfway through the field calculation step in LTP simulations.The next and final step is to calculate the electric field, and in this case, we use the predicted potential given by Ẽ = −∇ φ.
Figure 10(a) and Figure 10(b) depict the absolute electric fields calculated from the ground truth potential profile and from the potential profile predicted by TDN272 50k (middle panels of Figure 9), respectively.It is clear from the figures that the calculated electric field from the prediction is both qualitatively and quantitatively dissimilar to the ground truth.This is because small fluctuations in the potential profile φ can affect the gradient of the electric field greatly.We can see these small fluctuations better in the 1D line plots of the potential (taken along the x and y directions through the The true and predicted absolute electric fields are calculated from the ground truth in Figure 9 and the predicted potential profile from the TDN272 50k in Figure 9, respectively.The predicted absolute electric field is calculated from the refined predicted potential using 10 iterations of a GPU-accelerated GMRES solver. center of the domain, respectively) as presented in Figure 11(a).Although the profiles match closely, small fluctuations can still be observed, especially in the regions with high spatial dynamics corresponding to the high-frequency features of the potential profile.Further observation on Figure 10(b) reveals that the model struggles to recover the highfrequency features, i.e., spatial details mostly found at the interface between different materials.
Neural networks have been observed to initially learn the low-frequency features of the target function, and very slowly converge on the high-frequency features; this phenomenon is referred to as the spectral bias [66][67][68][69][70]. Unfortunately, the training time required for the networks to learn the high-frequency features properly may not be tractable for practical use cases.To mitigate this problem, we follow the same method as demonstrated in [32,66,70].That is to refine the predicted φ using a conventional iterative solver, i.e., by using the predicted φ as an initial guess for the conventional solver instead of a zero initial guess.To this end, we utilize a GPUaccelerated GMRES solver from the AMGX library [71,72] (wrapped in the PyAMGX library [73] for usage within Python) and refine the prediction for 10 iterations.The GMRES solver is preconditioned using an Algebraic Multigrid (AMG) solver from the same library [71].We set the absolute tolerance for stopping criteria to 10 −10 for all cases in this work.
Figure 10(c) shows the electric field distribution calculated from the refined prediction.As apparent from a comparison with Figure 10(b), the refinement alleviates the high-frequency problem and produces a qualitatively identical electric field to the ground truth (Figure 10(a)).We also observe a considerable decrease in relative error, which reduces from ε L 2 = 8.98 • 10 −3 to ε L 2 = 3.7 • 10 −4 .Figure 11(b) further quantifies this improvement, where the predicted and refined line profiles agree with the true profiles.It is worth noting that for this particular example, the iterative solver with a zero initial guess achieves ε L 2 = 1.034•10 −2 after 10 iterations, and the resulting potential profile remains qualitatively suboptimal (see Figure B1 in Appendix B).Furthermore, the iterative solver with a zero initial guess requires 68 iterations until convergence (with tolerance = 10 −10 ), whereas the solver with a prediction from TDN272 50k as an initial guess requires 57 iterations, which is about 16% fewer.Using this strategy of combining a neural network-based solver (such as TDN272 50k ) and a conventional solver, one can arrive at a reasonable solution of the Poisson equation with fewer iterations and thus potentially faster calculation time.It is worth noting that a highly accurate initial prediction may result in a significantly faster computation time (fewer iterations needed for refinement) and a more accurate final solution.Therefore, it is imperative to address the problem of spectral bias in future developments.At this point, the ability of TDN272 50k to generalize on geometries excluded during the training should be discussed.Figure 12 compares the results for the true (top panels) and predicted (middle panels) potential profiles of reference reactor geometry data (presented in physical units).The agreement suggests that the strategy of training    the networks with randomized geometry data is effective in ensuring the generalizability of the model to various geometry configurations.The bottom panels of Figure 12 show the refined potential profile predictions, where we can observe the apparent corrections in the contour lines.The final result is exceedingly similar to the ground truths.In terms of quantitative comparison, the relative errors of the predicted potential profiles amount to ε L 2 = 1.2 • 10 −2 , 5.5 • 10 −3 , and 9.4 • 10 −3 for asymmetric CCRF discharge, packed-bed DBD, and surface DBD reactors, respectively.This is more or less consistent with the expectations for the test dataset D (test) * in table 2. However, this calls for a similar need for refinement for the electric field calculation.This can be clearly observed in the electric fields as shown in Figure 13 (middle and bottom panels), which again exhibits the problem of a spectral bias.
The relative errors of the refined potential profiles (middle panels of Figure 12), following the same order, are ε L 2 = 4.845 • 10 −7 , 1.04 • 10 −3 , and 2.9 • 10 −4 .This reveals that the existence of dielectric materials and the geometry complexity impact the overall number of iterations needed to achieve satisfying results.For instance, the asymmetric CCRF discharge reactor geometry, the simplest geometry, has the lowest error after 10 iterations of refinement.Whereas the packed-bed DBD reactor geometry, the most complex geometry, has the highest error of the three after the same number of iterations.

Inference times of the learned solvers
Although the computation speedup of the learned solvers is not the primary focus of the present work, it is still worth discussing their wall time performance.Modern machine learning libraries are well-optimized for GPUs.While the training of the models can take several days, the inference times of the trained models only take a fraction of a second on GPUs.For example, TDN272 50k requires 70 ms on average for one inference and 124 ms for TDN528 30k on a single Nvidia RTX 3080 12GB GPU.In comparison, a conventional iterative solver (GMRES) [71] running on the same GPU requires around 333 ms and 503 ms on average to calculate the solutions in 256×256 and 512×512 domain resolutions, respectively.The learned solvers are in fact faster than the conventional solver.Additionally, the inference times of the learned solvers scale nonlinearly with a batch prediction.For instance, for a batch of 8 predictions, the total inference times for TDN272 50k and TDN528 30k are 155 ms and 683 ms, respectively.Typically, the inference time stays the same for any input data as it depends solely on the sizes of the model and input data.The fast and stable inference time (equivalent to solving time) is indeed an attractive feature of a learned solver.
As discussed in section 4.3, using a learned solver alone is not enough to achieve reasonable numerical accuracy for practical use cases, and a prediction refinement must be conducted using a conventional iterative solver.We show earlier in section 4.3 that using a predicted potential from a learned solver can decrease the overall iteration steps of a conventional iterative solver.And greater prediction accuracy results in fewer iteration steps, thus faster calculation time.However, further investigation is necessary to determine the quantitative computation speedup that the learned solvers can provide.Nonetheless, this work demonstrates the potential of the learned solvers to enhance conventional iterative solvers in achieving faster calculation times.

Conclusion and outlook
In this paper, we present a machine learning-based 2D Poisson solver for use cases in low-temperature plasma simulations that can accommodate varying degrees of geometric complexity and various boundary condition configurations (Dirichlet, Neumann, and mixed boundary conditions).The followings are the summary and highlights of the used approach: • Hybrid architecture consisting of CNNs and transformer networks called TransDenseUNet (TDN), which is based on the existing TransUNet architecture developed initially for medical image segmentation tasks.
• Weighted multiterm loss function consisting of L 1 -norm loss term (L L 1 ), loss terms adopted from computer vision such as SSIM and disparity smooth loss terms (L SSIM and L smooth ), and finally, a contextual-physics loss term (L E ), which in this case takes the form of the relative error of the gradient of the predicted potential (electric field).
• Using the dimensionless Poisson's equation as a mean of feature scaling.
• Highly-randomized data generation scheme: random reactor geometries (shapes of electrodes and dielectric materials), charge distributions, and boundary condition configurations.
• Using paddings to improve the contributions of outer boundary conditions in the input data.
• Multichannel input data configuration that exploits co-location properties of the input parameters encoded in several channels: type, value, and (two) position channels.
• Prediction refinement using a conventional iterative solver on a GPU.
From the presented results, the best performing learned solver (TDN272 50k ) is shown to generalize well on new reactor geometries with practically identical (or better) accuracy to the one obtained from the validation and test sets.We also show that better solver accuracy can be attained by increasing the amount of training data.Additionally, increasing the resolution (of both data and model) has little to no effect on the accuracy.The results also qualitatively show that the predicted potential profiles exhibit an exceedingly good level of smoothness.
With regard to the practical application of the developed learned solver, raw solutions (predictions) from the learned solver, although very accurate as far as regression tasks are concerned, are not yet accurate enough to be used directly in LTP simulations.Therefore, we recommend conducting a prediction refinement using a conventional iterative solver.We show that this approach can achieve a numerically accurate and potentially faster solution (fewer iteration steps until convergence) than a conventional solver alone running on a GPU for high-resolution domains.
This work reveals that the learned solvers suffer from the spectral bias problem, wherein the model is unable to effectively learn the high-frequency features (primarily present at the interfaces between distinct materials within the domain) within a reasonable training time.Therefore, it is highly imperative to tackle this problem in future learned solvers.We presume further theoretical work on the network architecture and optimization technique is needed to resolve the spectral bias problem.Presently, this work has addressed problems defined on structured Cartesian grids.In many cases, plasma simulations employ unstructured grids or meshes for more stable and accurate simulations of certain phenomena within plasma physics.Hence, it is also of great interest to consider this requirement in future developments.Preliminary work has been done by Pfaff et al. [74] and Lötzsch et al. [75] that addresses this requirement using graph neural networks (GNNs).In the immediate next steps, we will conduct model optimization and evaluate the developed (and optimized) learned solvers in real simulation settings such as Particle-in-Cell/Monte Carlo Collision (PIC/MCC) or fluid-Poisson simulations.The evaluation will address aspects such as the accuracy of the simulations using the learned solvers in place of pure conventional solvers, and how much practical speedup the learned solvers can provide.
As final remarks, the generalization capability of the learned solver can prevent the high cost of computation incurred from retraining.And it is worth bearing in mind that the proposed approach can be straightforwardly adapted to many different 2D PDE and image-to-image translation problems.Lastly, a pure machine-learned Poisson solver that tackles all the requirements presented in this work may not be readily available in the short term, and a marriage between ML-based and conventional solvers is recommended as far as practical applications are concerned.The proposed approach is the next step further in achieving a generic and high-performing machine learning-based Poisson solver, especially for LTP simulations in complex geometries.
• µ x i is the i-th mean of local area in image X, • µ y i is the i-th mean of local area in image Y, • σ x i is the i-th standard deviation of a local area in image X, • σ y i is the i-th standard deviation of a local area in image Y, • σ xy i is the i-th co-variance of a local area between image X and Y, • N is the number of local image areas processed by the Gaussian window.
Note that the presented formulation of SSIM is for single-channel images.In practice, for multichannel images, one can compute the SSIM of each channel and take the average of all channels.For a more detailed description of SSIM, we refer interested readers to the original publication by Wang et al. [47].The top panels are the solution evolution of an iterative solver using a prediction from TDN272 50k as an initial guess (rTDN GPU ), and the bottom panels are from an iterative solver using a zero initial guess (GMRES GPU ).Note that at iteration 6, rTDN GPU has already achieved significantly better accuracy than GMRES GPU at iteration 10 and produced a qualitatively identical solution to the ground truth presented in Figure 9.

Figure 1 .
Figure 1.Input data configuration.(a) Configured input data presented as a CMYK image, (b) different components of input data, and (c) the corresponding output data (potential distribution).

3. 1 . 4 .
Prepared datasets For a complete dataset, we generate a number of subsets for training, validation, and testing or evaluation of the learned solvers.For training and validation, we generate two main sets D 272 and D 528 .Here, D 272 contains N = 5 • 10 4 input-output data pairs with H × W = 272 2 (U × V = 256 2 and p = 8), and D 528 contains N = 3 • 10 4 input-output data pairs with H × W = 528 2 (U × V = 528 2 and p = 8).For testing or evaluation, we generate two sets, D (test) 272 and D (test) 528 , each with N = 5 • 10 3 .

Figure 3 .
Figure 3. Overview of the TransDenseUNet architecture.The encoder incorporates DenseNet and transformer networks, whereas the decoder is made of the typical CNN blocks.

Figure 4 .
Figure 4. High-level structure of the DenseNet-38 block used in the TransDenseUNet.It consists of 38 densely connected convolutional layers.

Figure 5 .
Figure 5.General structure of a transformer layer used in the TransDenseUNet architecture.

3 EFigure 8 .
Figure 8. Evolution of individual loss terms during training.From left to right: (a) L 1 loss, (b) SSIM loss, (c) disparity smoothness loss, and (d) contextual physics loss terms.

Figure 10 .
Figure 10.Comparison of (a) true, (b) predicted, and (c) refined absolute electric fields.The true and predicted absolute electric fields are calculated from the ground truth in Figure9and the predicted potential profile from the TDN272 50k in Figure9, respectively.The predicted absolute electric field is calculated from the refined predicted potential using 10 iterations of a GPU-accelerated GMRES solver.

Figure 11 .
Figure 11.1D line plots of (a) predicted and (b) refined potential profiles against the corresponding true profiles across the x and the y axes (taken from the middle of each axis).The true and predicted potential profiles are based on the ground truth and the prediction of TDN272 50k in Figure9, respectively.

Figure 12 .
Figure 12.Raw and refined potential profile predictions of reference reactor geometry data in comparison with the ground truths.From left to right: asymmetric CCRF discharge, packed-bed DBD, and surface DBD reactor geometries.From top to bottom: true, predicted, and refined potential profiles.From left to right, the relative errors ε L 2 of the raw predictions (middle panels) are 1.2 • 10 −2 , 5.5 • 10 −3 , and 9.4 • 10 −3 .For the refined predictions (bottom panels), the relative errors are 4.845•10 −7 , 1.04 • 10 −3 , and 2.9 • 10 −4 , following the same order.

Figure 13 .
Figure 13.Raw and refined absolute electric fields of reference reactor geometry data in comparison with the ground truths.From left to right: asymmetric CCRF discharge, packed-bed DBD, and surface DBD reactor geometries.From top to bottom: true, predicted, and refined absolute electric fields.

Figure B1 .
FigureB1.Evolution of iterative solutions.The top panels are the solution evolution of an iterative solver using a prediction from TDN272 50k as an initial guess (rTDN GPU ), and the bottom panels are from an iterative solver using a zero initial guess (GMRES GPU ).Note that at iteration 6, rTDN GPU has already achieved significantly better accuracy than GMRES GPU at iteration 10 and produced a qualitatively identical solution to the ground truth presented in Figure9.

Table 1 .
TDN models and the corresponding datasets for training and validation as well as numbers of trainable parameters.In this work, we train several models primarily to assess the influence of the number and the resolution of training data on the model performance.To this end, we prepare subdatasets D