Data-driven plasma modelling: surrogate collisional radiative models of fluorocarbon plasmas from deep generative autoencoders

We have developed a deep generative model that can produce accurate optical emission spectra and colour images of an ICP plasma using only the applied coil power, electrode power, pressure and gas flows as inputs—essentially an empirical surrogate collisional radiative model. An autoencoder was trained on a dataset of 812 500 image/spectra pairs in argon, oxygen, Ar/O2, CF4/O2 and SF6/O2 plasmas in an industrial plasma etch tool, taken across the entire operating space of the tool. The autoencoder learns to encode the input data into a compressed latent representation and then decode it back to a reconstruction of the data. We learn to map the plasma tool’s inputs to the latent space and use the decoder to create a generative model. The model is very fast, taking just over 10 s to generate 10 000 measurements on a single GPU. This type of model can become a building block for a wide range of experiments and simulations. To aid this, we have released the underlying dataset of 812 500 image/spectra pairs used to train the model, the trained models and the model code for the community to accelerate the development and use of this exciting area of deep learning. Anyone can try the model, for free, on Google Colab.


Introduction
Generative models are a type of deep learning model that can produce new, unseen samples when trained on un-labeled data.These types of models have not been used previously in the field of low-temperature plasmas, but have been used to great effect in generating text, images and 3D models.They can offer many benefits by creating synthetic data for modelling and experiment design, replacing parts of computational models with fast surrogate models and providing a foundation for models that predict expensive and difficult to measure parameters from simpler diagnostics.

Background
Synthetic data can be an extremely useful resource in plasma physics for developing experiments, understanding diagnostics and training models and controllers for plasma applications.Synthetic data tools have been used in fusion [1,2,3,4] and laser plasmas [5,6,7,8,9] to aid simulations, experiment design and for training Machine Learning (ML) and Deep Learning (DL) models.However, such approaches have been used less frequently in low-temperature plasmas [10,11,12].
Methods for generating synthetic data, used in plasma physics, can be split into three main groups -generating synthetic sensor data from simulation or analytic models [1,5,6,7,8,9,4,12], inverting analytic methods for extracting parameters from sensor data [10,2,11] and augmenting existing experimental data to create new data [3].However, DL generative models have not been used for synthetic data generation in plasma physics.This approach uses DL models, such as autoencoders (AE), generative adversarial networks, diffusion models or transformers as a generative model that can create new synthetic data (see [13] for a recent review of the area).Outside the field these approaches have been used for improving medical image classification [14], drug design [15], chemical reaction discovery [16], cyber security [17], music generation [18] and image generation [19], and many other applications besides.
Deep learning approaches have had many successes in the field, applied to controlling atmospheric pressure plasma jets [20,21], a fast replacement for computed tomography for tokamak radiation profiles [22], predicting electron energy distribution functions from optical emission spectra (OES) [23] and creating surrogate models of neutral beam injection [24], sputtering processes [12] and plasma etching [25].
In this work we demonstrate how deep autoencoders can be used to generate synthetic sensor data from large amounts of unlabelled experimental data.We show how to train a deep autoencoder on unlabelled data and then how to train a model to learn to 'map' from an input space of physical variables into the latent space of the autoencoder to produce a generative model.
In the context of the literature on deep learning, there has been a great deal of interest in developing generative models for some time, such as variational autoencoders (VAE) [26], generative adversarial models [27] and diffusion models [28].Earlier work focused on developing models that were capable of generating good outputs through random sampling, more recent work has focused on how to guide generative models to produce desired generative outputs.This can be referred to as learning a prompt for generative output or a map to a latent space.Recent examples include generating music [18,29], transforming facial expressions [30] generating energy angle distributions in sputtering processes [12], and new high quality image generation from prompt models such as DALL•E 2, Parti and Stable Diffusion [19,31,32].

Autoencoders
Autoencoders are an early type of neural network model that learns to copy its input at its output [33].Autoencoders consist of an encoder, z = f (x), that learns to map input data, x ∈ R r , into a latent space (z ∈ R l ) and a decoder, x = g(z) that learns to map the latent space representation back to the input [33], see figure 1.The model is trained to minimise the reconstruction error between the input data and the reconstructed output.On the face of it this does not seem like a very useful network, but by making the latent space much smaller than the input data (l << r), the network is forced to learn a low dimensional representation of the input data by learning relationships and patterns within the input data.
VAEs are an extension of ordinary autoencoders, where an additional training objective, the Kullback-Leibler (KL) divergence, is added to guide the . . . . . . . . .distribution of the latent space to follow a normal distribution with a diagonal covariance matrix, z = N (z; 0, I).This gives VAEs a continuous latent space that can be easily sampled from to generate new samples.This has lead to VAEs being widely used in the field of generative modelling, however, they have had issues from their inception, as they are difficult to train and suffer from mode collapse [34,35] and that the latent space does not always end up having the desired property of being a normal distribution [36], such as in figure 4 of [12].
Recent work in the field of generative modelling has demonstrated that the VAE process can actually hamper the ability of the model to learn a useful representation through over-regularisation and that large autoencoders are good generative models, outperforming VAEs repeatedly [37,38,39].In recent work, Autoencoders have been used to learn features for virtual metrology models from optical emission spectroscopy (OES) [40] and defect detection in semiconductor processing [41].We use autoencoders in this work as they are easier and more predictable to train than VAEs, while providing equal or better performance as a generative model, making them more suitable for widespread use in scientific applications.
Our contributions in this work and the structure of the paper are laid out as follows.In section 1 we provide a background to synthetic data generation, deep generative models and how it has been applied in other fields.In section 2 we describe how we created an experiment to gather 812,500 optical emission spectra and colour images in fluorocarbon plasmas in an industrial plasma etcher.In section 3 we describe how to build and train an autoencoder and how to train a small model to map physical tool inputs to the latent space and turn the decoder into a conditional generative model.In sections 4 and 5 we look at the structure of the latent space produced by the model for different sizes of latent space and the difficulty of evaluating generative models.In section 6 we demonstrate using the generative model to carry out synthetic experiments looking at line ratios in Argon and Ar/O 2 plasmas covering 10,000 points varying power and pressure in seconds.We consider any limitations of the approach and future work, and detail the open source release of code and experimental results in sections 7 and 8, followed by a conclusion to the work in section 9.
The data set we have gathered has been released under a creative commons license (CC BY-4.0) and can be used by anyone for academic purposes.The model's code and pre-trained models have been released as open source under the MIT License.

Data collection and experimental design
A dataset of 812,500 optical emission spectra (OES) and RGB images of the bulk plasma above the wafer surface were gathered from an Oxford Instruments Plasma Technology PP 100 industrial plasma etcher with a Cobra300 cylindrical ICP source.Quartz windows were used for all optical diagnostics, for OES an Edmund Optics UV/VIS collimator (88-173) was used to collect light into a Thorlabs round to linear fibre bundle, consisting of seven 200 µm solarisation resistant fibres.An Avantes ULS4096CL-EVO-RM 200-1100 nm spectrometer was used with a 10 µm slit.Optical images were collected with a FLIR Blackfly 0.4 MP colour camera (BFS-U3-04S2M-CS) and a 6mm focal length lens (SV-0614V).
Data was collected across the entire operating region of the plasma source in argon, oxygen, Ar/O 2 , CF 4 /O 2 and SF 6 /O 2 .The experimental operating space consisted of the power delivered to the ICP source, the power to the table, the pressure in the chamber and the flow rate of one or two gases.The operating space varied for each gas due to differing lower limits on the minimum power and pressure to form a stable plasma or the requirement to keep the DC bias below 1kV.The operating space is summarised in table 1.
Our aim was to make measurements at sample points across the operating space and gather the most amount of information within a fixed budget of samples.Naively, we could have used a grid search, however, a 10 point grid across 5 dimensions would require 100,000 points with very poor space filling, i.e there would be only 10 unique values in each  dimension.The next simplest approach would be to sample randomly, for large numbers of samplesthis is quite likely to fill the parameter space, but there is no guarantee on how efficiently we can fill the operating space.The efficiency of filling a space and how well the points are separated can be measured by the discrepancy of the entire set, in particular, we use the L2 discrepancy to measure this [42,43].Quasi-random sequences offer a very effective way to generate sets of sample points that offer some guarantees on efficiency of filling a parameter space while still providing enough random spread to cover the interactions of many variables [42,43], i.e. they have a low discrepancy.Two of the most common quasirandom sequences are Latin Hypercube Sampling (LHS) and Sobol sequences, both have the properties that we desire, but Sobol sequences have an advantage the you can generate further elements of the sequence, using the same random seed.This is important if you need to extend your dataset at a later time point.There is no guarantee that the combination of two LHS sets does not have a higher discrepancy than one generated with the combined number of data points and you cannot truncate or randomly sample from a large LHS and maintain the low discrepancy.However, with a Sobol sequence you have a guarantee that the extension to your dataset has the same discrepancy as if you had started by generating the sequence of that length [42,44].
Using a Sobol sequence, we generated 10,000 points each for argon and oxygen, 30,000 points for Ar/O 2 and 60,000 for CF 4 /O 2 and 70,000 for SF 6 /O 2 .To actually cover the entire sequence in our experiment, we sorted each sequence such that pressure followed a relatively flat ramp over the whole range and other variables followed a triangle wave shape of increasing speed, as shown in figure 2. This enabled us to maintain tool stability between sample points and reduced the settling time between setpoint changes.Setpoints were changed every 5 seconds and a optical image and OES were taken every second starting at the beginning of the setpoint change.A plain, unpatterned, silicon wafer was clamped to the table at all times and the process was only stopped to replace the wafer when it had become too thin from etching.The dataset consists of 5 image spectra pairs, [i n,0 , . . ., i n,4 ], [s n,0 , . . ., s n,4 ] and setpoint readbacks from the tool [t n,0 , . . ., t n,4 ], taken at each setpoint [P 0 , . . ., P n ] for each gas mixture.
The setpoint readbacks consist of the net power (forward-reflected) on the ICP coil and table, pressure in the chamber, gas flow from each mass flow controller and DC bias at the table.
The experimental points sampled did not perfectly align with our planned sweeps; some areas had unstable plasmas, could not sustain a plasma or exceeded parts of the tool's operational envelope, such as pressure control.The measured data is summarised in table 2, all of the runs have a small portion of results with momentary high reflected power, but not for long enough to cause the plasma to extinguish.In CF 4 /O 2 plasma the high pressure region above 70 mT was unstable due to a combination of reduced plasma stability and limited control margin of the pressure controller and the sweeps were not continued above this pressure.In SF 6 /O 2 , the minimum power required to sustain a plasma increased with pressure and so the sequence was extended to 70,000 points and the minimum ICP power raised to 1500 W above 40 mT to yield more measurement points.The experiment yielded a total of 812,500 image spectra pairs, at 162,500 unique setpoints in the operational space of the tool.
The data was split into train, validation and test sets with a 80/10/10 split.However, since we hold and take 5 measurements at each set point, naively randomly splitting the data would result in leakage from the test data into the train split, i.e. some measurements at a single setpoint would be present in each split.To avoid this, the data is kept together in blocks of 5 and the blocks are randomly assigned to the three sets.The spectra are processed by subtracting the average of the counts at the dark pixels from each spectra and removing the data from pixels outside the calibrated range of the spectrometer, this leaves 3072 pixels covering 200-1100 nm.The intensity of each spectra is min-max scaled to between 0 and 1 and a 5 pixel wide Hann window [45] is used to smooth out noise in the spectra.The camera produces a 720x540 pixel image with an RGGB Bayer mask, rather than perform standard Bayer interpolation to produce a 720x540 colour image, we treat the camera like a hyperspectral camera with very poor spectral resolution.We take all the red and blue pixels and one of the green pixels to form three 360x270 images.These are cropped to the central area of the image, resized and stacked to produce a 128x96x3 image.The pixel intensities are well controlled by the camera's autoexposure algorithm and are all clustered around a 50% grey value, requiring no further normalisation.The camera ADC is set to a 10-bit resolution and values are stored as 16-bit integers, all images are divided by 2 16 to rescale their pixel intensities between 0 and 1.The values from the tool's setpoint readbacks are all in the range of 0-10 V or 0-5 V and are simply divided by 10 to rescale them between 0 and 1.
This process of the rescaling and normalisation of inputs is a particularly important step in preparing data for training in any machine learning approach.It speeds up and stabilises convergence in training the model [46,47], as gradients in the model will be within expected bounds for the optimiser and the inputs are within the expected bounds of activation functions, such as sigmoid and ReLU.

Building deep generative autoencoders for synthetic data generation
Our model architecture is based on ConvNeXt, a state of the art convolutional neural network architecture [48].We use the base ConvNeXt blocks and stem, with 1D or 2D convolutions for OES or images to form our image and spectra encoding branches, the basic Spectra Decoder In this work we have only used two branches, both based on convolutional networks, but any number of branches can be used with any kind of network architecture encoding some input data.The decoder is simply the reverse of the encoder and finishes in a 1D or 2D convolution that reconstructs the input.
The encoder learns a function to project the input image and spectra i n , s n pair into a latent space, z n = f (i n , s n ), each decoder branch then learns a function to project the latent space vector back into the real diagnostic space, în = g(z n ), ŝn = h(z n ), this overall structure is shown in figure 4. The loss is a reconstruction loss between input, i n , s n , and reconstructions, în , ŝn .This loss can be weighted to favour one input over another to embed prior assumptions about the relative importance of each diagnostic.The model is trained with the Adam optimiser [49], using a cosine decay learning rate schedule [50] with a linear warmup, and Mean-Squared Error (MSE) as the loss, using Keras [51]/Tensorflow [52].Full details of the training and fine-tuning settings are in table 4. The model was trained on 4 Nvidia A100 GPUs for 100 epochs, taking roughly 20.5 hours to train.

Tool to latent model architecture
Our decoder model can be used on its own for generative modelling, by randomly sampling over values of z we can generate random output spectra and images from our model, however, this is of limited practical use.To make this model into a synthetic data generator we need an additional model to learn to map from tool parameters t to the latent space, z = f (t).This is similar in its way of thinking to text-to-image models, such as Stable Diffusion [32], where the model is trained with pairs of text descriptions and images.In this work we train an additional model to produce latent representations, z, from tool parameters that    match the ones from their associated image and spectra pair.The parameters used were the net power on the ICP coil, table power, gas flows and pressure.
The model is a multi-layer perceptron, a stack of identical dense neural network layers, trained with the latent representations, z, as a supervised objective.As we do not have a reference architecture for this model, and since its small size and low complexity mean it is fast to train, we used KerasTuner [53] to carry out a multi-objective Bayesian-optimisation of the number of dense layers, number of neurons and the learning rate for each of models with l = [4,16,32,64].We considered using the top 5 models as an ensemble, but we did not see a discernible improvement.

Evaluating the quality of unsupervised models
It is inherently difficult to evaluate the quality of unsupervised models as we do not have direct access to the objective that we are optimising for.In this work we trained our models to reduce the MSE between the original image and spectra and their reconstructions.However, this does not tell us if our latent space has useful information, i.e. if the encoding into this space is a useful empirical model of plasma information contained in the diagnostic data and/or if the latent representations produces by our tool model project back to the correct diagnostic information.
To evaluate this we have to create surrogate objectives that we believe provide us some insight into how well we achieve our underlying objective.The simplest method is to look at the performance of our models on our hold-out test data, if the model has simply memorised the input data and cannot generalise and interpolate between the trained data we will see poor reconstructions of the test data.To evaluate if our latent representation is useful for generating synthetic data we can look at the distribution of points in the latent space and make subjective judgements, e.g.large gaps and spaces between points are areas that cannot be sensibly interpolated across by our generative decoder.To evaluate the empirical quality of the models we can evaluate their behaviour around known mode transitions like the E-H mode, comparing trends to previous experimental data and changes in gas stoichiometry.

Properties of the latent space
The overall aim of latent space modelling is to project input data onto a manifold in the latent    space while preserving information and relationships within the data that are physically real and sensible, whilst not overfitting on spurious relationships that are not physically real or sensible.To make our latent representation usable we would like it to have some properties, for points to be close to a normal distribution, for points that are close in the real space (i.e. two plasmas that are similar to each other) to be close in the latent space and the reverse to be true, and for the latent space to be interpolatable, i.e. we can smoothly move through the latent space from one area to another without sharp discontinuities.Many of these properties can be gained by simply using a large enough deep learning model with enough data.Large neural networks are inherently selfregularising [54] and with increasing size, reach a point where their outputs become Lipschitz continuous [55].When training generative models on existing benchmark datasets, it is possible to use measures of image similarity to evaluate the performance of the model, such as the Fréchet inception distance [56].However, these use pre-trained image classification networks to evaluate the quality of generated images.If our data was similar to the data used to train the classification network these methods can be used, or if you have some labelled data you can fine-tune one of these models for this use case.However, an OES of an Argon plasma has little similarity to images of planes and cats (which are typically employed in pretrained networks) so we would not have any guarantee that these methods would work.This is an area of active research in the field of generative modelling and so in time new evaluation methods may appear that overcome this issue.
Without a quantitative measure of performance we are left with qualitative evaluations of our generative capabilities.The simplest is to look at the distribution of points in the latent space.If our model and dataset are large enough and the model is well trained, our latent space should be well behaved -close to a normal distribution and interpolatable.In figure 5 we show three examples of the latent space of a trained model, 'bad', 'better' and 'good'.The bad example shows a latent space that is extremely sparse and has significant spikes in the concentration of points, it would be very difficult to interpolate between points in this space as it has significant discontinuities and no meaningful representation moving off the central axis the points are stretched across.In the better example most of the points are reasonably close, although we have a strongly multimodal distribution and has separated into two clusters that would be extremely difficult to interpolate between.The good representation shows what we are looking for, our points are more smoothly distributed and there are no discontinuities within the latent space itself.
Unfortunately we cannot always expect our data to be perfectly well behaved like our 'good' representation.We cannot rely on the assumption that our data is independent and identically distributed.The conditions of one plasma are affected by the history of plasmas within that tool and we expect our latent space to encode some physically real multi-modal distributions, like E-H mode transitions, different gas stoichiometries and pressure regimes.Figure 5c shows a 'good' representation, the latent space is smooth and interpolatable, but one dimension has a bimodal distribution.We expect to see different physical modes in the data form independent normal distributions in the latent space and as long as it is physically possible to transition between these modes, and we have data covering the mode transition, the latent space can be used to interpolate between these modes.

Evaluating the generative model
A summary of the results from training the autoencoder model is given in table 5.The training data split was used for directly training each model, the validation split was used to independently evaluate model performance for hyperparamter optimisation of the model learning rate.The optimal hyperparameters found for the training and fine-tuning step are summarised in table 4. The test split was kept as a holdout set for final model evaluation and was not used at any time during training and hyperparameter optimisation.The test and train errors are very close for all latent space sizes, indicating that the model has not overfit to the training data.In Figures 6 and 7 we show 3 random examples, from the test split, l = 64, of the original and reconstructed data in each of our three gas mixtures  and l = 4 for the Ar/O 2 example.The error on the reconstruction is extremely low for l = 64, but as can be seen in table 5 and figure 6a, the reconstruction error decreases significantly for larger latent space size.In particular, figure 6a shows that the small latent space model makes significant errors in reconstructing the relative height of peaks in the spectrum and at l = 64 these are greatly minimised.
To evaluate the quality of our model's latent space we can look at the distribution of points encoded into the latent space.In figure 8 we can see the type of    all strongly multimodal.For l = 32 and 64 some of our latent dimensions have a uni-modal distribution, but the majority have multi-modal distributions, and there is some complexity in the distributions.There are spikes present in l = 32 suggesting that some mode collapse has occurred (e.g.multiple measurements mapped to the exact same place in the latent space), but not l = 64.In l = 64 there are no gaps in the latent space, although there are areas of very low density of points between parts of the distribution in a few of the latent dimensions, but l = 32 does have two areas of nearly zero density, suggesting a gap in the latent space..These qualitative assessments suggest that our l = 64 model can be used for generative modelling as we can smoothly interpolate between different areas of the latent without discontinuities, but the smaller l = 8 is unsuitable and l = 32 would be suitable for most areas, but would struggle around its discontinuities.

Results of synthetic experiments
To carry out a synthetic experiment we use our tool-to-latent model, z = f (t), to produce latent representations, z, and our two decoder branches, i = g(z), s = h(z), to generate spectra and images.We can generate an image spectra pair for one experiment point in 0.13 s/0.79 s on GPU/CPU, can compute a batch of 128 points in 0.25 s/51.22 s and a batch of 1024 in 1.34 s on an A100 GPU.In the simplest form, we can generate the expected spectra and image at a desired set of powers, pressures and gas mixture.We can also simply perform more complex experiments where we sweep across parameters in fine steps very quickly.Figure 9  In figure 9a we show the variation in (I 811.5 /I 750.5 ) ratio with power at pressures between 5 and 100 mT, at 10 and 60 mT we also plot the ratio at points in the data set that are close to the sweep.We can see that the points in the data are reasonably close to the generated data and follow the same trend.The overall trend in the data is in agreement with other experimental data by Czerwiec and Graves [57], although their reactor was a significantly different geometry.The trend in power shows a linear rise to the E-H mode transition point around 500-600 W and then decreases.Their data is at higher pressures, above 100 mT, and shows no change with pressure, our model shows a strong trend in an increase in (I 811.5 /I 750.5 ) from 10-40 mT, then showing similar behaviour with little change with increasing pressure.
In figure 9b we show the variation in (I 844.6 /I 750.5 ) ratio with power at pressures between 5 and 100 mT, at 20 and 50 mT we also plot the ratio at points in the data set that are close to the sweep.The points in the data show general agreement with the trends in the data, but the scatter in the points is quite high.The overall trend in the (I 844.6 /I 750.5 ) ratio is in good agreement with earlier work by Fuller et al. [58] with a relatively linear rise with applied power.

Limitations of the model and future work
The encoder model is able to embed any image / spectra pair into the latent space and very accurately decode them back into the real measurement space.
Differences between the real plasma conditions of these measurements are represented by different coordinates in the latent space.When using the encoder model to monitor a plasma, the latent space representation will capture dynamic changes in the plasma over time.However, our tool to latent model is very simplistic, it can only map a set of powers, gas flows and pressures to their average coordinate in the latent space, it cannot capture any dynamics.
We show an example of this in figure 10, at two pressures in a CF 4 /O 2 plasma, we show the spectra generated at the latent coordinate produced by the tool to latent model and the 10 nearest spectra to this point in the dataset.At 40 mT there is a high variation in the spectra around this area as each point will have had a different history and will each be at different points of rising or falling power in the data collection sweep.This is reflected in the latent representations of these different plasmas, but our tool encoder finds a latent representation that produces an average of these spectra.At 20 mT, there is much less variation in both the measured spectra and latent representation and so there is close agreement between all measurements and generated spectra.
To overcome this issue we do not need to make any modifications to the autoencoder model itself, the latent space representation is capable of representing changes in the plasma and does not collapse to a single point for similar plasmas.We would need to replace our simple tool-to-latent model with a more complex model to account for trajectory of powers and pressures in the experiment.This could be achieved with a sequence-to-sequence model, where the sequence of output latent representations is able to account for previous conditions.This represents one of the advantages of this approach, the unsupervised learning approach allows us to easily disaggregate different parts of a problem and combine the parts of our autoencoder with different models to achieve different goals and these models can be trained with different data sources, where data much more limited or measurements more difficult.

Open source release of the dataset, trained models and code
The underlying dataset is available at https:// doi.org/10.5281/zenodo.7704879,configured as the train/validation/test splits used in the paper and is released under the Creative Commons Attribution 4.0 International.The model code and trained models are available here and are released under the MIT license.An example notebook of using the model is available here and is released under the MIT license.

Conclusion
We have demonstrated that recent advances in generative modelling can be applied to optical diagnostics in low-temperature plasmas.These approaches require a heavily automated approach to experiments, to allow large amounts of data to be gathered in a reasonable amount of time.Large autoencoder models can be trained, using existing open source libraries and model architectures, for a low cost on cloud GPUs or in a relatively short time on local GPU clusters.
We have shown that the latent space of autoencoders, trained on real plasma diagnostic data, is very sensitive to the size of the latent space.Any implicit bias to produce a model with the smallest number of parameters must be balanced by ensuring that the latent space is smooth and interpolatable if we want the model to be useful or have any capacity for generalisation.
Once trained, these autoencoders provide a lowcost method to generate large volumes of synthetic data for use in other work, such as validating or creating models.This is achieved by training an additional model to sample the latent space in the way required for the synthetic experiment.
We have demonstrated this capability with a simple model to map tool inputs into the latent space and generate synthetic data that shows good agreement with experimental data in Argon and Ar/O 2 plasmas.
Large autoencoders can become a foundational building block for a wide array of plasma physics experiments and models when trained with large datasets of simple, but information dense diagnostics.The encoder can produce latent representations of diagnostics that are smoothly interpolatable and sensibly separates similar and dissimilar plasmas.These latent representations can be used for monitoring experiments or as inputs for other predictive models.The decoder can produce realistic and accurate data from latent representations and can be extended with auxiliary models to make a powerful generative model for synthetic experiments, which we aim to exploit in future work.

Figure 6 :
Figure 6: Measured OES and reconstructions in Ar/O 2 for l = 4 and 64, CF 4 /O 2 , and SF 6 /O 2 plasmas, green line is the measured spectrum, blue line is the reconstructed spectrum.Given the difficulty of telling them apart, the red line below shows the mean squared error at each wavelength.

Figure 8 :
Figure 8: Histograms of the distribution of points in each latent dimension space for all image spectra pairs in the test set.

Figure 10 :
Figure10: Generated spectra and first 10 latent coordinates at 2400 W ICP, 300 W Table, 10 sccm O 2 , 10 sccm CF 4 and spectra of 10 nearest points in the dataset.In the upper plots, the solid line is the generated spectra and the dotted lines are the nearest 10.In the lower plots the first 10 bars are the latent coordinates of the 10 nearest, the black line is their average and the red is the generated latent.

Table 3 :
L2 discrepancy of different sampling methods in 5 dimensions (lower is better, bold is best).

Table 4 :
Settings for autoencoder model training and fine-tuning.

Table 5 :
Results of autoencoder model training.
−4 2.09 × 10 −4 2.06 × 10 −4 1.06 × 10 −3 1.10 × 10 −3 1.09 × 10 −3 shows a simple experiment where we sweep from 400-3000 W applied to the ICP source, 1024 steps, in pure argon and 8 sccm Ar, 50 sccm O 2 at at different 11 pressures from 5-100 mT.We plot the line ratio of the Ar 811.5 nm and 750.4 nm lines in pure Ar and the ratio of the O 2 844.6 nm and Ar 750.4 nm lines in the Ar/O 2 mixture.