Optimization of high-temperature superconducting multilayer films using artificial intelligence

We have studied the possibility of utilizing artificial intelligence (AI) models to optimize high-temperature superconducting (HTS) multilayer structures for applications working in a specific field and temperature range. For this, we propose a new vortex dynamics simulation method that enables unprecedented efficiency in the sampling of training data required by the AI models. The performance of several different types of AI models has been studied, including kernel ridge regression (KRR), gradient-boosted decision tree (GBDT) and neural network. From these, the GBDT based model was observed to be clearly the best fitted for the associated problem. We have demonstrated the use of GBDT for finding optimal multilayer structure at 10 K temperature under 1 T field. The GBDT model predicts that simple doped-undoped bilayer structures, where the vast majority of the film is undoped superconductor, provide the best performance under the given environment. The obtained results coincide well with our previous studies providing further validation for the use of AI in the associated problem. We generally consider the AI models as highly efficient tools for the broad-scale optimization of HTS multilayer structures and suggest them to be used as the foremost method to further push the limits of HTS films for specific applications.


Introduction
The high-temperature superconductors (HTS) possess enormous potential for various applications in a wide range of science and technology.The HTS materials are most prominently applied in the form of thin films, typically referred to as coated conductors, in various power and magnet applications [1].Moreover, HTS films are considered attractive candidates for various microwave applications associated with quantum technology and dark matter research due to their unequaled tolerance to magnetic fields [2,3].
The quantization of magnetic flux into discrete vortices is a characteristic property of the HTS materials.The presence of vortices has turned out to yield the greatest challenge for the applicability of these materials, as the current applied through the HTS film drives the vortices into motion which induces power dissipation within the superconducting lattice.As a consequence, and in particular under high-magnetic fields, the critical current density (J c ) of the HTS film is limited by this vortex motion-induced power loss.The overall dynamics of the vortices is also greatly affected by the associated temperature range, making the magnetic field and temperature the most critical parameters affecting the performance of HTS materials in various applications.
The J c under high-field conditions can be effectively increased by doping the superconductor with non-superconducting impurities forming nano-sized defects.These are able to strongly pin the vortices and thus reduce their unfavorable motion.These structures are typically referred to as artificial pinning centers (APC) and have been an extensive topic of research in the HTS community.The currently known APCs can be divided into two distinct classes.These include randomly distributed (uncorrelated) isotropic pinning centers, i.e. nanodots, such as Y 2 O 3 [4] and BaCeO 3 [5].Secondly, there are correlated defects, that includes materials which under optimized deposition conditions self-assemble themselves into a lattice of parallel oriented nanorods.Most prominently, these materials include BaHfO 3 [6], BaZrO 3 [7], BaSnO 3 [8], Ba 2 YTaO 3 [9] and Ba 2 YNbO 6 [10].The nanorod-like correlated pinning structures can also be created by irradiating the superconducting films with high-energy ions [11].
The nanorod structures, in particular, have proven to be extremely effective at improving the performance of the HTS films by tremendously increasing the J c under high-magnetic range.Different properties of the nanorods, and the resulting vortex pinning performance, can be effectively tuned to fit the necessary requirements.In particular, the nanorod radius has been shown to be solely determined by the elastic strain between the HTS, APC and substrate unit cells [12] and can be thus controlled by the choice of the associated materials.The density of the nanorod lattice, on the other hand, can be independently controlled by changing the initial doping concentration of the target material [13].
Despite increasing the J c in the high-field range, the increasing amount of APCs within the superconducting lattice degrades the overall superconducting properties of the film accordingly.This is manifested as significantly decreased zero field critical current density and critical temperature.Thus, the pinning landscape has to be carefully optimized for a specific application operating under a given magnetic field and temperature ranges.Lately, increasing attention has been put into growing different multilayer (ML) structures [14][15][16][17][18][19] that would combine the desirable low field performance of undoped or slightly doped films with the favorable high-field performances of the more heavily doped films.Despite the lack of any rigorous optimization, these ML solutions have already outperformed the corresponding single layer films.Moreover, we have provided strong theoretical evidence for the enhanced superconducting properties of bilayer structures when compared with single layer films in our previous work [20].As the optimal ML structure can be expected to outperform any single layer film, finding of the optimal structure within a set of ML candidates is of particular interest.
The main difficulty associated with MLs is identifying the optimal structure for a specific field and temperature range.In addition to a large variety of different dopants and concentrations, one also has to consider different possibilities for the total number of layers and the associated thicknesses.The extensive amount of different combinations results in a massive search space from which the optimal structure has to be determined.The sampling of such large search space is evidently to be addressed computationally and ultimately requires the evaluation of both the zero field (J c,0 ) and vortex dynamics-limited critical current (J c,v ) for every plausible ML structure.
In our previous work, we have proposed a simplified model to evaluate J c,0 as a function of APC concentration.This can be done with negligible computational effort by considering the reduced cross-sectional area by the APCs via simple surface geometry.The J c,v , on the other hand, is much more difficult to evaluate.This is because in addition to the pinning potentials one has to consider the vortex-vortex interactions which result in high computational costs in particular under high-field range in the presence of many vortices.The effective broad-scale ML structure optimization requires the J c,v to be sampled with minimal computational cost.Several approaches has been made to model the J c,v , most notably including numerical solutions of the Ginzburg-Landau (GL) equations using various different methods [21][22][23][24][25]. Despite being a low-level model based on the basic assumptions of free energy associated with superconductivity and thus physically the most accurate approach, obtaining the numerical solutions to the GL equations is computationally an extremely heavy task.Thus, this approach is not well suited for extensive sampling required for optimization purposes.To date, molecular dynamics (MDs) based upper-level models, where the vortices are modeled as classical objects that interact with the environment based on formulas originating from the profound GL theory, have turned out to be the most efficient approach to simulate J c,v [26][27][28][29][30][31].In fact, we have previously addressed the optimization of the layer thicknesses in a simple bilayer structure with fixed APC size and concentration using our own MD based model [20].However, the MD based models are still too slow for the sampling of large datasets, preventing their use for effective broad-scale optimization considering a wide range of different ML structures and associated APC configurations.
The number of ML structures to be sampled for the broad-scale optimization problem could be significantly reduced by the use of artificial intelligence (AI) models.The AI models could be trained with significantly smaller dataset when compared with the size of the associated search space.The AI would then learn to associate an arbitrary ML structure with a certain fitness value corresponding to some combination of J c,0 and J c,v .
The objective of this work is to propose a new simulation method that enables unprecedented fast evaluation of J c,v required for efficient ML optimization.This model is then used to evaluate the J c,v , along with the J c,0 , for a set of 20 000 randomly generated YBa 2 Cu 3 O 6+x (YBCO) based ML structures with varying number of layers (1-5), layer thicknesses and APC types (nanorod, nanodot) and concentrations (0%-10%).The resulting dataset is then used to train three different types of AI models including kernel ridge regression (KRR), gradient-boosted decision tree (GBDT) and neural network (NN).All of the used models were optimized so that their hyperparameter combinations provided the best overall functional properties.The GBDT was determined as the best performing model which was further used for feature analysis to establish which specific ML parameters correlate most strongly with the favorable properties.Based on the feature analysis, we have determined that simple bilayer structures are associated with the highest combinations of J c,0 and J c,v .A rigorous search of properties associated with the optimal bilayer structures was performed.Our results further boost the increasing interest towards the use of AI models within the field of applied superconductivity [32].

Simulation of J c,v
The proposed model for J c,v is based on calculating the vortex potential energy distribution within a two-dimensional xy-grid oriented along the ab-plane of YBCO, that is along the surface of a typical YBCO film.The pinning landscape of an arbitrary ML structure is projected onto the two-dimensional grid by randomly placing different radii (R) and length (l z ) pinning centers according to the associated dopant concentrations (ρ) and layer thicknesses (t).The pinning potential at position r was calculated as [33] where the index i runs through a total of N ps pinning sites, ϵ 0 = Φ 2 0 /(4πµ 0 λ(T) 2 ) is an energy constant.The λ(T) and ξ(T) are the temperature dependent magnetic penetration depth and coherence length along the ab-plane of YBCO, respectively [33].
As presented here, our simulation model only considers a magnetic field oriented perpendicular to the two-dimensional simulation grid, that is along the YBCO c-axis.Depending on the area of the grid (A) and the desired field (B), a total of N = BA/Φ 0 vortices were placed one by one to the calculated energy minima of the grid.As more vortices are added, one also takes into consideration the vortex-vortex interaction potential ) , where index i runs through a total of N v vortices, t f is the total thickness of the film and K 0 is the zeroth order modified Bessel function.It should be pointed out, that since the ξ determines the radius of the vortex core, the form of equation ( 2) allows the creation of a vortex associated with two flux quanta.However, the energy required for the double flux quanta vortices to appear is so large, that this was never observed in the performed simulations.The total potential at position r is the sum of equations ( 1) and (2).Periodic boundary conditions were applied along the xy-plane (ab-plane) of the grid.The generation of the initial state is further illustrated in figure 1, which presents the evolution of the potential maps as the vortices are being inserted one by one within the grid including a total of 9 nanorods.
Once the initial state of the system has been created, the associated potential wells of the vortices along the x-direction (E i (x)) are calculated.The calculation of the potential well for a vortex located at (x 0 , y 0 ) is illustrated in figure 2(a), where one defines the scanning area via parameters ∆x and ∆y.The energy of the vortex at positions x ∈ [x 0 − ∆x/2, x 0 + ∆x/2] is determined by calculating the energies at (x, y) = {x, y|y ∈ [y 0 − ∆y/2, y 0 + ∆y/2]} and choosing the corresponding minimum energy position resulting in pinning potential E(x).
After calculating the individual potential wells, one calculates the effective potential well (E(x)) by averaging the individual E(x)s together as illustrated in figure 2(b).Notice that as the vortices are placed in the energy minimum positions, the center of the E(x)-curve is also a minimum.The J c,v is then determined by assigning different values of J for the function where x has values ranging from 0 to the desired length of the grid along the x-direction and thus J • x represents the applied electric potential manifested as the current through the film.The linearity of the We have validated the proposed simulation model by studying the evolution of J c,v as a function of nanorod radius and magnetic field.The exact details and results of the validation are presented in supplementary data (SD).In summary, the simulation successfully captures the essential experimentally observed features associated with vortex dynamics.However, the proposed model, as it has been presented here, contains some flaws that should be pointed out.These are discussed in detail in the following section.

Limitations
The outstanding computational performance of the proposed simulation model naturally comes at a cost of reduced physical accuracy when compared with e.g. the MD based models.For example, we have previously postulated that the J c,v can be considered to be proportional to the probability that a dynamic vortex encounters a pinning center [34].This probability is evidently correlated with the number of effective pinning centers within the superconducting lattice.The effect of the associated probability is explicitly taken into account by MD based simulation models, where the vortices are set to dynamical initial state and the possibility for flux creep is further enabled.On the contrary, this effect is completely neglected in the presented static vortex pinning model.However, one could also argue that the static approach would be more appropriate when considering J c,v .After all, the vortices can be assumed to be rather static in the absence of applied current.Whether the dynamical or static picture of vortex pinning reflects reality better than the other remains up to debate.
Another disadvantage associated with the proposed model, as it has been presented, is the lack of accounting for the elastic properties of the vortices.It should be pointed out, that the overall elastic behavior of vortices under magnetic fields oriented perpendicular to the surface of the film is rather dubious.However, this might be relevant for ML structures associated in particular with a large number of layers as the possibility for multi-APC pinning would increase the average pinning forces.The effect of vortex bending is expected to be more pronounced for complex ML structures with high-densities of APCs.However, these structures can also be associated with more pronounced flux creep, which partly compensates for the lack of vortex bending in the presented model.
Despite the associated weaknesses, the proposed model clearly captures the most essential aspects of vortex dynamics as shown in the SD.Most importantly, the performance of the simulation, when implemented with Python, clearly outperforms the previously used numerical-GL and MD based simulations.For a grid of 150 × 150 nm 2 at 10 K temperature under 1 T field, corresponding to a total of 10 distinct vortices, simulation of an arbitrary ML structure takes about 7 min.If one is able to run 10 simulations in parallel, a dataset of size 20 000 can be created in ∼10 days.Such performance is definitely adequate to address the underlying optimization problem.We want to point out that the data generation could be even further boosted by implementing the proposed model in some lower-level programming language, such as C++, and optimizing the associated code more carefully.This might be of interest for future studies.
In conclusion, we consider the proposed simulation model eligible to be used for the extensive sampling for the broad-scale ML optimization due to the outstanding computational performance and adequate physical accuracy.

Addressing J c,0
We suggest that the effect of APCs on J c,0 are addressed by the same model used in our earlier study [34].The associated model relies on the assumption that the J c of the superconducting lattice is not affected by the presence of the dopants.Thus, the apparent decrease of J c as a function of increasing dopant concentration is due to reduced cross-sectional area for the supercurrent to pass through.As one has to consider only basic surface geometry, the evaluation of J c,0 as a function of dopant concentration is easy and efficient.
In reality, the inclusion of APCs affects the intrinsic J c of the superconducting lattice via induced strain fields.However, taking these effects into consideration is extremely difficult, making the above described model the only reasonable way, both in terms of physical accuracy and computational cost, to effectively address the J c,0 of complex ML structures.Most importantly, experimental studies reporting linear relationship between critical temperature or J c,0 , including [13,[35][36][37], coincide with the predictions of the proposed model presented below.
The critical current in the superconducting-community is calculated as J c = I c /A cross , where the A cross is the experimentally evaluated cross-sectional area for the supercurrent and is considered independent of the associated APC morphology.It follows, that the APC morphology dependent J c,0 can be expressed as where J c,0 (σ = 0) is the zero field current density of an undoped single layer film and a(σ, R) represents the reduced cross-sectional area, which is some function of APC density and radius.The value of J c,0 (σ = 0) has to be measured experimentally, while the approximate form for the function a(σ, R) can be deduced if average dimensions of the associated APCs are known.In the case of nanorods, that are assumed to form a perfect two-dimensional square lattice along the surface of the superconducting film, one obtains where l is the length of the nanorod along the c-direction, assumed to be equal to the associated layer thickness, and N nr is the total number of nanorods that is related to the dopant concentration via equation In the above equation A ab represents the surface area of the film.
In the case of nanodots, that in contrast are assumed to form a three-dimensional cubic lattice, the function takes the form where the total number of nanodots N nd is related to the dopant concentration by and V film is the overall volume of the film.The provided formulas are results of considerable simplifications.In particular, we suggest assigning the numerical value for σ in the above equations with caution when evaluating the associated equations based on experimental data.This is because the initial dopant concentration within the target used in pulsed laser deposition (PLD) process have not been observed to correlate with the observed dopant concentrations within the resulting film.Direct use of target dopant concentrations would lead to major underestimate of J c,0 (σ, R).

Dataset for AI
The dataset was generated by random sampling of ML structures, such as schematically illustrated in figure 3. The total thickness of the ML film was fixed to 200 nm while limiting the number of layers between 2 and 5. Note, that the absolute thickness of the film is completely arbitrary since the potentials (equations ( 1) and ( 2)) are scaled with the thicknesses of individual layers.Thus, only the relative thicknesses of the layers with respect to each other is relevant.Each layer was either doped or undoped, where a doped layer was allowed to contain either nanorods or nanodots.The parameters of the APCs were chosen so that they corresponded to possible experimental realizations.The nanorods were considered perfectly c-axis oriented with length equal to the associated layer thickness and radius between 1 and 6 nm.The concentration of the nanorods varied between 0.01 and 0.12.In the case of nanodots, the associated height is naturally equal to the radius, which varied between 3 and 6 nm.The nanorod concentration was set between 0.01 and 0.07.The considered possibilities for the APC parameters are summarized in table 1.
A total of 23 500 ML structures were sampled and each structure was assigned with J c,0 and J c,v at 10 K temperature under 1 T field using the above described methods.For training the AI models, each of the generated structure had to be associated with a single number (f ) representing the fitness of the associated   ML structure.Thus, a fitness function based on the evaluated values of J c,0 and J c,v has to be proposed.Here, we obviously want to maximize both the associated currents and thus the fitness function should be a sum J c,0 and J c,v .However, depending on the desired field and temperature range, one of the currents might dominate the other.For example, in the high-field range maximizing the J c,v should be considered as a priority.To take these into consideration, the J c,0 and J c,v should be scaled accordingly before the summation.The simplest way to do this is to first normalize the values of J c,0 and J c,v to lie between 0 and 1 within their separate sets.That is, the best ML structures associated with zero field current and vortex pinning properties are associated with J c,0 = 1 and J c,v = 1, respectively.Then one can evaluate the associated fitness via function where the parameters α and β serve as phenomenological constants, the values of which have to be assigned based on the desired field and temperature range.
We have described the ML structures for the AI models as 5 × 5 matrices, an example of which is schematically illustrated in figure 4. Each row of the descriptor matrix represents a single layer.The APC type (none, nanorod, nanodot) was encoded to the first two columns of the matrix, while the associated layer  thicknesses, APC radii and concentrations were encoded to the following three columns.Zero padding was used to supplement the matrix elements in the case of ML structures with less than five layers.The matrices were later concatenated and inserted to the AI models as a single vector.The distribution of generated data points as a function of associated overall fitness score f = J c,0 + J c,v , corresponding to α = β = 1, is presented in figure 5(a).Moreover, the distributions of f = J c,0 and f = J c,v , the convolution of which evidently equals to the f = J c,0 + J c,v , are presented in figures 5(b) and (c), respectively.The calculated statistical quantities of the associated datasets are summarized in table 2.
The datasets are evidently non-uniform, mainly due to bias towards lower values for f = J c,v as observed in figure 5(c).This bias can be attributed to the randomized sampling of the descriptors, as ML structures with poor pinning properties are more likely to get generated.As reduced pinning performance is associated with increased critical current under zero field, this is also reflected in the distribution of f = J c,0 as a slight bias towards higher values.Nevertheless, the attributes of the descriptors are evenly distributed over the whole dataset (SD).Although biased data sets are not optimal for data science, we will show that the associated dataset can be effectively used to carry out supervised regression with reliable results.

AI model preparation and training
To explore the suitability of AI algorithms for this data type, we have trained different supervised learning models, which are associated with different levels of complexity.The models were trained using pairs of input (descriptor matrix) and output (fitness) data corresponding to the previously introduced dataset.First a set of 20 000 data points was separated from the generated dataset, which is to be used as a training set for the used regression models.The remaining 3500 data points were used as a testing set to evaluate the performance of the trained models.
The simplest of the studied models was KRR, which has been observed to perform well in particular for many physics and chemistry related works [38,39].Here, we have used the implementation of KRR provided by the Scikit-learn library.A detailed description of the KRR model and the associated hyperparameters are presented in the SD.The main advantage of KRR is the associated simplicity manifested as a total of only four distinct hyperparameters.Thus, the performance of KRR can be effectively optimized via the associated hyperparameters to a very high extent.In this work, the hyperparameters of the KRR were optimized using grid search within the search space and resulting optimal parameters presented in table 3. Additional information on the associated hyperparameters is given in the SD.
Next, we studied the performance of GBDTs which are associated with significantly increased complexity when compared with KRR.We have used the GBDT implementation of the XGBoost library.The GBDTs are able to describe more complex relationships between input and output data but are also more difficult to optimize due to the larger number of hyperparameters.As a result, one has to rely on Bayesian optimization instead of grid search that was effectively used for KRR.In order to carry out precise effective hyperparameter optimization, we have strived to limit the search space by identifying the most important hyperparameters (SD).This was done by varying the value of a hyperparameter with others set as defaults and observing whether the variation has a remarkable effect on the performance of the trained model.The Table 3.The optimized hyperparameters for (a) kernel ridge regression (KRR), (b) gradient-boosted decision tree (GBDT) and (c) neural network (NN).In the case of GBDT and NN, all of the additional hyperparameters were kept as defaults.Additional information of the associated parameters is found in the SD and documentation of the used models.
resulting search space, consisting of six hyperparameters that were considered to be the most susceptible for the associated problem, is presented in table 3. The Bayesian method was then used to optimize this search space using a different number of iterations (number of sampled parameter settings).The number of iterations was set to 1000, above which the improvement of the GBDT in terms of reduced MAE was insignificant for the underlying task.The optimized hyperparameters are presented in table 3. Again, detailed descriptions of the associated hyperparameters are given in SD.
Finally, we have tested the performance of NNs, which are considered as the most complex and advanced supervised learning model.Here, we used the ML perceptron (MLP) implementation provided by the Scikit-learn library corresponding to the typical feedforward NN.Again, a detailed description of the principles of NNs is given in the SD.An important and difficult set of hyperparameters to optimize is the architecture of the network, that is the sizes of the hidden layers.Thus, we began the hyperparameter optimization by finding the optimal network architecture together with the learning rate while the rest of the hyperparameters were set as defaults.We have aimed to limit the total number and sizes of the hidden layers so that the total number of the associated hyperparameters, which includes the number of nodes and their connections (SD), did not exceed 10% of the training set size.This results in the search space presented in table 3.After fixing the network architecture and learning rate to the values presented in table 3, we maximized the number of epochs to a value of 20 000 so that overfitting (SD) was not observed and further increase of epochs did not provide any significant improvement to the accuracy of the network.Finally, we proceeded to optimize the L2 regularization term within the search space presented in table 3 via grid search with the previously addressed hyperparameters fixed to their optimized values and others kept as defaults.
After the comprehensive hyperparameter optimization, the different AI models were compared with each other on the basis of prediction accuracy measured in terms of mean absolute error (MAE) for the fitness function.In order to evaluate whether the amount of training data is sufficiently large for the used models to produce good quality predictions in the first place, we have trained the models with different subsets of the original training data.The sizes of these subsets were iteratively increased up to 20 000 and the performances of the used models, in terms of MAE, were evaluated after each iteration.The resulting learning curves are presented in figure 6(a).The learning curves for all of the used models reach a flat plateau until the training set size of 20 000 is reached, indicating that the generated dataset is large enough.The GBDT and NN clearly outperform the KRR while having approximately equal accuracy when trained with the full training set.The performance of GBDT, however, is significantly better for small training set sizes.
The regression plots of the AI models trained with the whole training set are illustrated in figures 6(b) and (c).All of the models make more accurate predictions for descriptors associated with small f, while for higher f the predicted values are more likely to get underestimated.This can be associated with the bias towards smaller values of f in the dataset as described in the previous section.By comparing the regression plots in figures 6(b) and (c), the GBDT provides the best predictions over the whole range of f with the associated R 2 -score of 0.89 compared to 0.88 for NN and 0.85 for KRR.We consider the performance of GBDT with the associated MAE of ∼0.029 a.u.adequate to be used as a tool for ML structure optimization considering the uncertainties associated with experimental realization of such ML structures.
Another advantage of GBDT is the remarkably better computational performance when compared with the other tested models.The GBDT was observed to be 15-times faster to train when compared with KRR and overwhelming 380-times faster when compared with the NN.The superior computational performance of GBDT enables more efficient hyperparameter tuning, which should be considered of great importance for future works related to the topic.

AI-based optimization of YBCO heterostructures
In this section, we provide a simple example on how to utilize the previously optimized GBDT to deduce properties of the optimal ML structure.We have trained the GBDT using the same dataset as described in section 3, which corresponds to an environment associated with 10 K temperature and 1 T applied field.However, here we have chosen the parameters α and β of the fitness function (equation ( 4)) differently.Since the values of J c,0 and J c,v were normalized to unity independently from each other, the values of α and β should reflect the estimated difference in the corresponding maximum values.Following the results of our previous study [20], the J c,v associated with doped single layer film can be considered to be up to ten-times higher when compared with J c,0 of undoped single layer film.Thus, in order to balance the significance of J c,0 and J c,v , we have set α = 1 and β = 0.1.
The learning curve and regression plot associated with the retrained GBDT are presented in figures 7(a) and (b).Here, the obtained MAEs are significantly lower for all of the used training set sizes when compared with the ones presented in figure 6(a), corresponding to GBDT trained with data associated with f = J c,0 + J c,v .Moreover, the regression plot shows significantly better fit with the associated R 2 -score of 0.98 compared to the previous score of 0.89.These observations can be explained by the fact that the J c,v is associated with higher standard error when compared with J c,0 .This is evident, since the J c,v is obtained via simulation while the J c,0 is analytically estimated.Since we suppressed the J c,v here by setting β = 0.1, the associated error is scaled down accordingly.Thus, the model seemingly learns better.
First we studied the importance of different descriptor attributes using the built-in feature importance function built-in within the XGBoost-library.The obtained results are presented in figure 8, where one can draw a general conclusion that the attributes associated with the first and second layers seem to have much greater importance when compared with other layers.In particular, the dopant concentrations and APC radius associated with the first and second layer are considered to have the greatest effect on the fitness score.Moreover, the thicknesses of the associated layers can also be observed to be ranked high in the chart of figure 8.These observations suggest that the optimal ML structure could be a simple bilayer film.
In order to investigate the effect of the number of layers further, we have calculated the average fitness scores for descriptors associated with a given number of layers.This was done by first generating a total of 5000 new randomly generated descriptors for which the associated fitness scores were predicted by the GBDT. Figure 9(a) presents the predicted average values and the associated standard errors of the mean associated with ML structures with a given number of layers.The average fitness score can be observed to increase as the total number of layers is reduced.This behavior in particular motivates us to investigate the properties of bilayer structures further.
Next, we wanted to know what kind of bilayer structures provide the best fitness scores.For this, we generated a total of 5000 random descriptors with the number of layers fixed to two, corresponding to the bilayer structures of interest.We found that the bilayer structures associated with an undoped layer adjacent to the doped layer provide significantly better fitness scores when compared with structures, where both of the layers contain APCs.This is illustrated in figure 9(b).The obtained result is in line with our previous hypothesis, that such doped-undoped bilayer structure provides the best results in the associated field and temperature range [20].Thus, we wanted to further focus our study on the properties of these particular type of structures.
After deducing that the doped-undoped bilayer provides the highest fitness scores, we have further investigated how the APC radius and concentration affects the fitness of the associated structures.Again, this was done by generating a new set of 5000 random doped-undoped bilayer structures for which the fitness scores were predicted by the GBDT. Figure 9(c) presents the average predicted fitness scores as a function of APC concentration and radius.Increasing the radius of the APCs can be observed to increase the fitness score, as can be expected.However, increasing the APC concentration of the doped layer is observed to decrease the fitness score, which at first strikes as a rather counterintuitive result.However, it clearly follows from the fact that the J c,v remains about constant for B < B ϕ as demonstrated by the validation results of the simulation presented in the SD.This condition is mostly fulfilled for the chosen field range even in the case of the lowest dopant concentrations.On the contrary, the J c,0 vastly decreases as a function of dopant  concentration as discussed in section 2.3.As the J c,v remains constant, the degradation of J c,0 manifests as the observed decreasing fitness score as a function of APC concentration.This results suggest that the performance of the bilayer films can be substantially improved by tuning the APC concentration so that the corresponding B ϕ is as close to the desired applied field as possible.
Finally, we wanted to study the optimal thickness of the doped layer (t doped ) in the doped-undoped bilayer structure in relation to the total thickness of the film, that is (t doped + t undoped ), where t undoped is the thickness of the adjacent undoped layer.As in the previous paragraph, we have generated 5000 random doped-undoped bilayer structures and plotting the average fitness scores within a specified range of t doped /(t doped + t undoped ) predicted by the GBDT.The results are presented in figure 9(d), where the fitness score can be observed to increase as the relative thickness of the doped layer is decreased.This is precisely in line with our previous work [20], where we concluded that for optimal doped-undoped bilayer structure the thickness of the doped layer should be remarkably smaller when compared with the undoped layer.This result can be intuitively understood so that the vortex pinning properties of a bilayer film are sufficient even for very short nanorods along the c-axis.Increasing the t doped thus overshoots the need for efficient vortex pinning resulting in unnecessary decrease in J c,0 .The average fitness score (normalized) for doped-undoped bilayer structures as a function of the corresponding fraction of the doped layer.This fraction can be explicitly expressed as t doped /(t doped + t undoped ), where t doped and t undoped are the thicknesses of the doped and undoped layers, respectively.

Conclusions
We have investigated the possibility to utilize AI models to optimize complex HTS ML film in specified field and temperature ranges.For this, we have proposed and validated a new type of simulation model, that enables the unpresented fast simulation of vortex dynamics limited critical current.We have also addressed the evaluation of zero field current based on our previous study [20].The discussed methods were used to generate a dataset that was used to train and test several different types of AI models, including KRR, GBDT and NN.From these, the GBDT turned out to be the best option for the associated task.
We have demonstrated the use of GBDT for obtaining general picture of the properties associated with the optimal ML structure at 10 K temperature and under 1 T applied field.The model suggests that best performance is achieved with a doped-undoped bilayer structure, for which the thickness of the undoped layer is substantially greater when compared with the doped one.This result coincided well with the conclusions presented in our previous study [20], which in addition to sensible and intuitive predictions further reflects the validity of the proposed technique.
The possibility for the AI models to make generalized predictions based on the limited amount of training data makes them highly effective tools for various optimization tasks.In our case, the use of AI models allowed us to make reliable predictions based on a relatively small dataset that could be sampled within a reasonable timeframe.We highly encourage the people within the superconducting community to take advantage of the presented simulation and optimized GBDT as a directive prior to experimental optimization of ML films.

Figure 1 .
Figure 1.A schematic illustration of preparing the initial state corresponding to a magnetic field B = 1.33 • B ϕ , where B ϕ is the matching field at which the number of vortices equals to the number of effective pinning centers, for a grid of 9 nanorods.The figure presents the potential energy distributions (color bars) along the xy-plane of the grid, corresponding to the ab-plane of YBCO, as vortices are added one by one to the corresponding energy minima.For the figures corresponding to B < B ϕ , the positions of the nanorods can be seen as the dark blue color low-energy regions, while the vortex cores are indicated by the sharp red color high-energy regions.

Figure 2 .
Figure 2. (a) The calculation of the individual potential wells of the vortices.(b) Calculation of the effective pinning potential (E(x)) by averaging the previously calculated individual potential wells together.(c) The calculation of the Jc,v by applying a linear potential to the previously calculated E(x).The Jc,v is determined as the J for which the probability of vortex thermal excitation over the potential barrier illustrated in the figure exceeds a given value.

Figure 3 .
Figure 3.A schematic illustration of an arbitrary multilayer structure penetrated by two vortices.The illustrated structure consists of four distinct layers, each of which is associated with different APC types and concentrations (including an intrinsic layer).

Figure 4 .
Figure 4.An example of a descriptor matrix used for the multilayer structures.The structural information of individual layers is encoded into distinct rows of the matrix.Zero padding is used for structures with less than five layers.

Figure 5 .
Figure 5. Number of training and testing data as a function of associated fitness functions, including (a) f = Jc,0 + Jc,v.(b) Jc,0 and (c) Jc,v.Note, that figure (a) is a convolution of (b) and (c).

Figure 6 .
Figure 6.(a) The calculated mean absolute error (MAE) as a function of training set size (learning curve) for the studied AI models after hyperparameter optimization.The values of MAEs represent the average of three statistical repetitions, while the error bars represent the associated standard deviation.(b), (c) The associated regression plots for KRR, GBDT and NN trained with the full training set of size 20 000, respectively.

Figure 7 .
Figure 7. (a) The calculated mean absolute error (MAE) as a function of training set size (learning curve) for the retrained GBDT using training data associated with fitness function f = Jc,0 + 0.1 • Jc,v (α = 1 and β = 0.1).The used hyperparameters of the GBDT are the same as optimized in section 4. The values of MAEs represent the average of three statistical repetitions, while the error bars represent the associated standard deviation.(b) The associated regression plot of GBDT when trained with the full training set of size 20 000.

Figure 8 .
Figure 8.The relative importance of different descriptor attributes obtained using the built-in function of the XGBoost algorithm.The labels of the associated diagram indicate the corresponding layer (L#) and associated attribute.The type of an APC is not specified when it comes to the importances of the concentration and APC radius.

Figure 9 .
Figure 9.The average (normalized) predicted fitness values by the GBDT trained for 10 K temperature and 1 T field environment.The average values are calculated from the predicted fitness values of 5000 randomly generated descriptor matrices.The error bars correspond to the associated standard error of the mean.(a) The average fitness score (normalized) for descriptors associated with different numbers of layers.(b) The average fitness scores for bilayer structures consisting of two adjacent doped layers (doped-doped) and an undoped layer adjacent to a doped layer (doped-undoped).(c) The average fitness score associated with doped-undoped bilayer structures as a function of APC radius and concentration within the doped layer.(d) The average fitness score (normalized) for doped-undoped bilayer structures as a function of the corresponding fraction of the doped layer.This fraction can be explicitly expressed as t doped /(t doped + t undoped ), where t doped and t undoped are the thicknesses of the doped and undoped layers, respectively.

Table 1 .
The possible choices of parameters for the different APC-types including radius (R), length along the z-axis (lz) and concentration (ρ).

Table 2 .
Statistical properties associated with differently calculated fitness values (f ) for the generated dataset.