Deep learning neural network for approaching Schrödinger problems with arbitrary two-dimensional confinement

This article presents an approach to the two-dimensional Schrödinger equation based on automatic learning methods with neural networks. It is intended to determine the ground state of a particle confined in any two-dimensional potential, starting from the knowledge of the solutions to a large number of arbitrary sample problems. A network architecture with two hidden layers is proposed to predict the wave function and energy of the ground state. Several accuracy indicators are proposed for validating the estimates provided by the neural network. The testing of the trained network is done by applying it to a large set of confinement potentials different from those used in the learning process. Some particular cases with symmetrical potentials are solved as concrete examples, and a good network prediction accuracy is found.


Introduction
The speed of computerized data processing and the ability to analyze large datasets have increased exponentially as a direct consequence of the rapid development of processors and computing techniques.This evolution proved important not only from a quantitative perspective but also led to a paradigm shift in terms of methods and algorithms for solving difficult problems.In recent years, more and more categories of concrete technical problems but also of fundamental interest problems have been addressed by new artificial intelligence methods, such as machine learning (ML) [1].A concrete way in which this method can be implemented is with the help of neural networks (NNs), logical structures inspired by the functioning of the biological nervous system.With this approach, the problems of automatic classification or recognition of shapes that are almost unapproachable by classical algorithms can be solved.However, various more abstract problems from fundamental disciplines have proved to be approachable from this new perspective.Using NNs to solve problems that already admit more or less sophisticated conventional solutions can also be instructive and present the potential for further extensions to more complicated contexts.One such relatively recent challenge of interest in nanomaterials science and quantum chemistry is to solve the Schrödinger Equation (SE) in one or more dimensions using artificial intelligence methods.Some advantages of using ML methods compared to existing numerical methods for quantum physics are: obtaining much faster estimates of the energy of particles confined in nontrivial potentials, better approaching many-body problems, and predicting phase transitions and properties of quantum systems in inaccessible physical conditions.ML predictive approaches, generally built upon statistical learning theory, represent a different paradigm from the classical methods for solving SE, although they can be based on their results in the training stage.Exact analytical solutions of SE for nanostructures can be obtained in very few simple cases and are therefore of little practical relevance [2].Approximate quasi-analytical solutions can be obtained using variational techniques [3] or perturbative methods [4], which have been extensively studied in the last century [5].However, they are limited in accuracy and impractical for physical systems with nontrivial geometries, dimensionalities, and interactions.Asymptotic iteration methods can be an alternative for solving 1-D Schrödinger-type problems [6,7].Meshless methods and diagonalization techniques provide good results; however, they are more demanding in terms of computational effort [8,9].In parallel with the accelerated development of computers, numerical methods based on spatial discretizations and finite-difference approximations of SE have been increasingly used [10][11][12].Shooting methods use iterative solving of finite-difference equations with discrete energy values in a search interval and the selection of those that best meet the boundary conditions [13,14].For all SE dimensionalities, an accurate and versatile mesh-based approach is the finite element method, which uses a weak formulation of the equation and involves large algebraic systems [15][16][17][18][19][20].
So far, there have been a few studies dealing with artificial intelligence methods for solving SE, the most important of which are mentioned below.
Lagaris et al. demonstrated how artificial NNs can be used to solve partial differential equations [21] and applied the concepts for solving SE in several cases with different dimensionalities [22].Sugawara proposed a new approach for solving one-dimensional (1-D) SE by combining a genetic algorithm and an NN [23].In a remarkable study, Mills et al. introduced a deep learning method for solving two-dimensional (2-D) SE by calculating the ground and first excited states of an electron in different types of confining potentials (CPs) [24].
Vargas-Hernández et al. presented an ML method based on Gaussian process regression to predict sharp transitions in a Hamiltonian phase diagram by extrapolating the properties of quantum systems [25].Han et al. solved many-electron SE using a deep NN with wave function (WF) optimization through a Monte Carlo approach [26].Using NNs, Mutuk addressed the eigenvalue problem of a 1-D anharmonic oscillator [27].
Manzhos reviewed recent ML techniques used to solve electronic and vibrational SEs, which are typically related to computational chemistry [28].Hermann et al. proposed PauliNet, which is a deep NN representation of electronic WFs for molecules with up to 30 electrons, and proved that it can outperform variational quantum chemistry models [29].Pfau et al. introduced a novel deep learning architecture, the Fermionic NN, to approach many-electron SE [30].Li et al. used an NN model to solve SE by computing multiple excited states [31].
Grubišić et al. used a dense deep NN and a fully convolutional NN to approximate eigenmodes localized by a CP [32].Yüksel et al. applied multilayer perceptron architectures to predict the ground-state binding energies of atomic nuclei [33].In a study by da Silva Macedo et al. an NN was trained to predict the energy levels and energy-dependent masses as nonparabolic properties of semiconductor heterostructures [34].The learning ability of a physics-informed proper orthogonal decomposition-Galerkin simulation methodology for QD structures was investigated by Veresko and Cheng [35].In a recently published paper, we used two different neural architectures to approach 1-D SE in quantum wells (QWs) with arbitrary CPs [36].The results were compared and discussed using accuracy indicators and represent the starting point of the 2-D generalization addressed in the present study.
Beyond the theoretical interest, solving a 2-D confinement problem can be useful in practice, mainly for quantum wires [37,38], highly oblate or flat 3D quantum dots [39,40], and 2-D quantum dots [41,42].In the first case, quantum confinement occurs along the transverse directions of the wire, which is where the 2-D character of the SE comes from [43].The 2-D SE energy solutions under the effective mass approximation give the subband edges in the quantum wires.In the second case, the 3-D SE specific to a quantum dot can be adiabatically decoupled into a 1-D problem of strong confinement along the small size of the nanostructure and a transverse 2-D problem with a modified potential [44].In the third case, calculations of the electronic properties are usually performed by expressing the Hamiltonian in a basis function set of atomic orbitals or by using the density functional theory [45,46].
In this study, we propose a deep NN with two hidden layers (HLs) and thousands of subnets to estimate the ground state energy and wave function of a particle confined in an arbitrary 2-D QW.The NN was trained using a set of CPs, energies, and WFs previously generated using the finite element method (FEM).The NN can be understood as a set of separately trained subnets for each element of the position discretization.This makes the training process more transparent and allows for parallelization.Several accuracy indicators have been proposed for the NN testing.The subnets are trained on a large dataset (DS) using the stochastic gradient descent (SGD) method with variable data batches, and the training is validated with respect to a second similar DS.The network was then tested with a third DS, prepared using a different algorithm.In addition, several cases of analytical CP have been solved and discussed.
The contents of the work are as follows: Section 2 contains the statement of the problem and the mathematical principles of our approach; Section 3 presents in detail the obtaining of data samples, the training of the NN, the validation of the results obtained, and their testing based on several accuracy indicators; general conclusions are given in Section 4; examples and technical details concerning the potential samples of the DSs are provided in the Appendix.

Two-dimensional Schrödinger equation and sample data
If a constant effective mass approach is used, the time-independent SE of a particle confined in a 2-D QW is The potential energy is defined such that 0 ≤  (, ) ≤  and the discontinuity domain of  (, ) has a Lebesgue measure of zero.
is the maximum "depth" of the QW, and  is the outermost radius of the confinement zone, that is, of the circle including the subdomain of ℝ where  (, ) <  .
We denote by { } : ℝ → [0,1] a set of  dimensionless confinement functions defined in such a way as to ensure the existence of the ground bound state { ,  } for each of them.In any case of a bound state in the QW, the WF decreases exponentially towards zero outside the geometric confinement domain such that lim ( , )→  = 0, where (, ) ≡  +  is the radial position.An FEM numerical solver using the

Neural networks architecture and underlying functions
Since the mesh used by the FEM can be extremely fine, the data sampling used for the NN may be done in a subset Ξ ≡ {( ,  )} of the mesh, provided that it sufficiently covers the bounded domain on which the standardized SE is defined.The same mesh subset was used for sampling the input data (values of the CP functions) and estimating/predicting the output data (values of the ground state WFs).A deep NN with two HLs is proposed, as shown in Fig. 1.The mesh subset determines the number  of neural nodes in both the input and output layers.Figure 1(a) shows that the NN can be decomposed into  similar separate subnets, which allows easy parallelization of the calculations.Each subnet receives the same data input from all nodes in the input layer (IL) and has a single output node, which is the estimate of the WF at a single mesh point.The subnets are identical in their internal structure but differ in functionality: their neurons are not equivalent after training the network.Figure 1(b) shows the neural architecture of a single subnet enclosing two HLs, each with  neurons.
In a subnet, all nodes of a layer are interconnected with all the nodes of the previous layer, and each neural connection is mathematically coded using a weight coefficient.The number of neural connections in a subnet is ( +  + 1) such that the total number of weights in the NN is ( +  + 1).
The activation function of the neurons of the output layer (OL) is the widely used standard logistic sigmoid (sigm), whose codomain fits the interval into which the values of the range-normalized WFs fall.The neurons of the HLs have the hyperbolic tangent (tanh) as an activation function.We selected these related functions because they are continuously differentiable in ℝ and have simple derivatives.can be reduced to the mathematics of its subnets, we explain the flow are element-wise, except where (ℎ ) is a  × 1 column vector,  is a scale coefficient, Λ is the  ×  weight matrix of HL1 of subnet q, and  (Ξ ) denotes the  × 1 column vector of the sample CP data in the IL, that is, [ ( ,  )] ≡ [ ] .
The data sent by the HL2 to the output neuron is where (ℎ ) is a  × 1 column vector,  is a scale coefficient, and Λ is the  ×  weight matrix of HL2 of the subnet q.
The estimated WF value at node q of the submesh Ξ , for the CP  , that is, the subunitary output of the subnet q, is where Λ is the 1 ×  weight-row vector of the OL of the subnet q.
Combining Eqs.(4-6) into a compact expression, we get An extra single subnet can be used for ground-level energy estimation.Formally replacing  with ̃ and index q by e, we obtain Further we will use the notation  (Ξ ) ≡  , i.e. the neural estimation of the FEM solution  (Ξ ).

Loss function, weight optimization and network training
Training the NN involves approaching the optimal values of the weight matrices such that the network estimates are as close as possible to the solutions expected from the training DS: that is, the loss function is minimized.Optimization is performed iteratively using the gradient descent (GD) method, starting from the initial values Λ (0); Λ (0) ; Λ (0) ,  ∈ {1,2, … , }.The updated weight matrices and the neural estimation of the solution after τ iterations will be denoted by Λ (τ); Λ (τ) ; Λ (τ) ,  ∈ {1,2, … , } and  (Ξ ) (τ), respectively.
In this study, the global NN loss function corresponding to the training DS is defined as where ‖•‖ is the Euclidian norm on ℝ and  (τ) is the local subnet q loss function: Additionally, the energy loss function can be defined in a similar way where ̃ (τ) and  are the neural estimations of the energy after τ iterations and the expected energy, respectively.
Because each subnet is trained independently, the loss function can be minimized separately for each output node q, as presented below.The gradient components of the loss function  (τ) with respect to the weights are the  × ,  × , and 1 ×  matrices, respectively: The gradient of the energy loss function  (τ) is The starting values of all the weight coefficients are randomly chosen and determine the initial values  (0) and  (0) of the subnet loss functions.The weight matrices and, implicitly, the loss functions are iteratively updated by a first-order approximation to minimize the losses in Eqs.(9a-9b), that is, the GD method: Here, λ is the learning rate and  is an integer index such that 0 ≤  < , where  denotes the maximum number of iterations.Theoretically, if the loss function decreases monotonically, the value of  can be established based on the criterion imposed by the maximum allowed variation in the loss function from one iteration to another.Because working with DSs of hundreds of thousands of samples is, in practice, a very expensive computational burden, one may opt for the SGD variant of the GD method.This method replaces the gradient calculation based on the complete DS containing  samples by an estimate calculated from a randomly selected batch of ′ samples ( ≪ ) which can be totally or partially changed at each iteration.The SGD method achieves faster iterations; however, the convergence has a lower rate and fluctuating behavior.Because the loss function has fluctuations that overlap with the average decreasing trend,  should be imposed by the average behavior of the loss function and/or by the available computing resources.The main accuracy indicator is the relative difference between the NN-estimated WF  and the FEMcalculated solution  , that is, the WF relative deviation

Testing the network. Accuracy indicators
The spatial overlap of the exact and estimated WFs may be another indicator of the NN accuracy.The estimated WF relative spatial overlap is defined as The NN-estimated average positions and FEM-calculated average positions are compared by calculating the deviations of the average  and  positions, respectively: As the estimated WF approaches the exact WF, the limits of  ,  , ∆〈〉 and ∆〈〉 are 0, 1, 0 and 0, respectively.
The deviation of the NN-estimated energy with respect to the FEM-calculated value is: or, in a relative expression:

Calculations and Results
In the following, we use the abbreviation ksamp (kilosample) to denote 1000 samples.To train, validate, and test the NN, three DSs have been prepared (hereinafter referred to as DS1, DS2, and DS3), each of them containing  = 100 ksamp.DS1 is used for training, DS2 for validation, and DS3 is reserved for testing.
Various algorithms have been implemented to generate arbitrary CPs for DSs, as explained in the Appendix.
DS1 and DS2 are distinct but they were prepared using the same randomization method, whereas DS3 was prepared using a different algorithm.When we refer to the output provided by the NN, we usually use the word "estimate" but if we want to emphasize that the input was not the training DS, we may use the word "prediction" instead.

Two-dimensional finite element calculation and sample preparation
COMSOL Multiphysics ® FEM software was used to generate samples [47].To compute the expected ground state WFs and energies, the FEM model was built to solve the SE with  = 5 and  = 26 for all CP functions in the DSs.A very small value of the border radius ( < 2 ) may lead to an artificial increase in the confinement, whereas an excessively large value (under the conditions of a prefixed number of nodes) will lead to an important decrease in the density of nodes in the confinement area, which will further negatively influence the accuracy of the calculation.The reference scale factor  was reasonably chosen in accordance with the typical values of real confining systems, so that several bound energy levels exist for all CPs [37,48].It is noteworthy that the scale factor introduced in Eq. ( 3) is proportional to what we could call the "confining volume"   , that is, a cylindrical pseudovolume with the real confining base area  and the energetic "height"  .Intuitively, the NN is trained on a set of problems with confining volumes distributed around the chosen reference value.The variance is introduced through the very algorithm that generates the CPs of the DS by randomly changing the confining perimeter and the variations in the potential inside the confining zone.The numerical choice of  therefore does not mean drastic particularization, but rather offers a plausible reference value in relation to the physical systems of interest.For example, for cylindrical confinement in semiconductor wires based on GaAs/AlGaAs, the value of  previously defined is given by a realistic confinement radius of approximately 8 nm.For any given material, a scale factor that is too small corresponds to an extreme confinement regime, which is difficult to achieve in real semiconductor structures.In addition, for the large values of the kinetic term of the Hamiltonian that are reached in very small semiconductor structures, the effective mass approximation is questionable [2].Too high values of  correspond to systems with a high density of energy levels, in which the quantization fades and the interest in solving the SE is limited.We define a user-controlled mesh of nodes Ξ ≡ {( ,  )} , unevenly distributed inside a circle of radius  .Nodes are relatively rare close to the boundary, densely distributed in the vicinity of the QW perimeter, and very dense in the confining zone.When solving a 2-D differential problem, the FEM software considers the standard physics-controlled mesh with approximately 13000 nodes to be "extremely fine" [47].Indeed, for typical CPs, the spatial element size of this mesh is sufficiently small such that the accuracy of solving the SE is sufficiently high.However, in this study, we allowed the random CPs of the DSs to exhibit fast variations and sometimes several points or lines of discontinuity.Therefore, we chose a user-controlled mesh with a larger total number of nodes, that is  = 18724.Given that for any bound state, the WF decreases exponentially to zero sufficiently far from the confinement zone, the Dirichlet boundary condition ( ) = 0 may be assumed.We determined that the mesh was sufficiently refined and the chosen value of the parameter  was sufficiently high to ensure an accuracy better than 5 × 10 for the calculation of the ground state energy.The typical error can be estimated by comparing the FEM result  = 0.154021 with the exact semi-analytical solution  = 0.154005, in the particular case of a finite-wall cylindrical confinement with (, ) = ((, ) − 1), where  denotes the Heaviside step function [49].
Figure 2 shows the graphical details of the mesh used, at various degrees of image magnification.The mesh is triangular, has circular symmetry and contains four distinct concentric zones colored blue in the figures.The outer area (we call it the "far mesh" in Fig. 2a), has a low density of nodes, because in this region the values of the WFs are very small and slowly variable.This is the circular crown between the circle of radius  = ≅ 2 and the border of radius  .Its inner radius  was calculated such that none of the irregular confinement zones generated by the CP randomization algorithms would enter this zone ( <  for all samples).Another region in the shape of a circular crown of inner radius  = 1.2 2 4 ⁄ (2 4 ⁄ ) ⁄ ≅ 1.5 and outer radius  follows inwards (the "intermediate mesh" in Fig. 2b).Its node density is higher than that of the outer mesh but remains relatively low.This mesh domain ensures the adaptation between the low-density outer mesh and the high-refinement mesh of the confining perimeter, and only a small number of samples with a highly eccentric confining perimeter marginally penetrate this zone.The next smaller circular crown of the mesh (the "near mesh" in Fig. 2c) is highly refined and corresponds to the region where the CP can already exhibit large variations from the external constant value to lower values; that is, it contains large portions of the confining perimeter.The main zone of the mesh (the "central mesh" in Fig. 2d), shaped like a circular disk with radius  = 1.1, is extremely refined and corresponds to the confinement zone in which the most important variations in the CP and WF occur.Only a small number of samples (those with reduced eccentricity, close to a circular confinement perimeter) had confining zones that were completely contained in the central mesh domain   The numbers on the horizontal axis of the histograms also

Defining the input/output layer
The SE was solved numerically using a mesh with a relatively large number of nodes, which was justified by the need to obtain reliable results in cases with rapid variations in impractical to associate each computational node with an input/output node in the WF generally exhibits a slow variation with position.ons obtained for DS1 and DS2 are very similar (Figs.3a and 3b, respectively) and confirm that the sets are sufficiently large to be representative of their common .In these cases, the upper positive skewness of the frequency histograms the energies are unevenly distributed around the mean.Thus, it is more likely to obtain energies a shows the results obtained with the CPs of the third set generated using orithm.The mean energy value was significantly higher and almost equal to the most frequent value.The appearance of the distribution is quite different, almost symmetrical, and shows standard deviation of the energy was also much smaller than that in the previous cases.
layers of the neural network merically using a mesh with a relatively large number of nodes, which was justified by n cases with rapid variations in the CP or many discontinuities to associate each computational node with an input/output node in the NN because the ground a slow variation with position.Nevertheless, if computational cost had not been we would have opted for a full representation of the FEM mesh in the IL and OL of the NN of the NN would be obtained if the IL corresponded to an even denser f the nodes in the original mesh.In this work, we chose a subset Ξ ≡ {( ,  epresent both the IL and OL of the NN.The manner in which we selected the nodes is illustrated in Fig. 4 and is based on the intention of approximately uniform coverage of a circular region that is representative of the most important variations of the CP and WF.The disk containing all the nodes of the = 1.75, which is intermediate between  and  (Fig. 4a) domain contains regions with various mesh refinements and the original mesh is not regular, we selected potentials: a) DS1testing set.The lateral histograms illustrate the occurrence (Figs.3a and 3b, to be representative of their common CP .In these cases, the upper positive skewness of the frequency histograms indicates that energies higher than the s of the third set generated using a higher and almost equal to the most lmost symmetrical, and shows aspects of a in the previous cases. merically using a mesh with a relatively large number of nodes, which was justified by many discontinuities.However, it is because the ground state al cost had not been a limiting of the NN.We are corresponded to an even denser {(  )} of the we selected the nodes is illustrated in Fig. 4 and is based on the intention of approximately uniform coverage of a circular region that is all the nodes of the (Fig. 4a).Because this egular, we selected the original nodes that are closest to the vertices of the triangular tiling densest possible circle packing in the plane.The number of nodes parameter, that is, tiling edge length.If we choose this parameter to be 0.06, we get in these nodes represent the input data of the estimates the energy of the ground level.

Subnet training and energy ev
Heuristically optimizing the NN architecture is prohibitively multiple calculations with different combinati that there is no strict or clear rule in the literature for setting the optimal number.he training of all subnets is done with the SGD method, using batches with  random CPs was divided into subsets of 1 ksamp and, for the current batch is rebuilt from two different subsets chosen randomly and independently.The diagonal lines represent the ideal, plane density of samples.
After training, the energy neural subnet was applied to the three DSs in two scenarios: T=2000 and  = 10 .
The results are presented in Fig. 6  Figures (e) and (f) show that the predictions are also very good for DS3, which is very different from the other two.Therefore, the subnet is efficient in correctly predicting the energy of the ground state for CPs that are very dissimilar to those with which it was trained.The in-plane distribution density of the samples was significantly affected by the number of iterations used.The scattering of bins with non-zero counts is greater after only 2000 iterations (a,c,e) than after 10 iterations (b,d,f).Therefore, the representative points of the graphs tended to accumulate near the diagonal lines as the subnet training improved.
It is worth mentioning that the energy can be estimated with this method without the involvement of the WF, which can be an advantage in applications where a fast response is required.For example, with the spatial discretization used in this work, the NN provides the energy approximately 60 times faster than the FEM.

Training and testing neural subnets for ground state WF estimation in particular nodes
To illustrate the training and predictive efficiency of the individual subnets we select 4 particular nodes of the mesh subset Ξ .These points are illustrated in Fig. 4a: A is the closest node to the origin of the coordinates, where in general the WF has relatively large values; B is approximately the midpoint of the radius of a cylindrical confinement, where there is a large dispersion of possible values; C is close to the perimeter of the confinement zone, where there are generally fast variations of the WF; and D is found outside the confinement, in a position where the values of the WFs are relatively small.There is practically no difference in the settings between the subnets used to estimate the WF and the previously described subnet for energy estimation.All  = 2764 subnets used for the WF estimation have the same structure and IL; they differ only in the output node.Figure 7 shows the learning graphs of the subnets related to output nodes A, B, C, and D. Subnet A learns relatively quickly, reaching a plateau value after 10 iterations, but this value of the loss function is rather high.
Between iterations 100 and 2000, there was an additional decrease of approximately 5%.Based on these observations, we can anticipate that in the central area of the confinement zone, where WF maxima are usually found, underestimates of the true values are often obtained.Subnet B learns more slowly and with greater fluctuations in the relative loss precisely because the dispersion of the possible values of the WF in node B is greater than that in the other cases.After 55 iterations, there was instability in the learning graph, with a sudden increase in the loss function.In the next 50 iterations, the learning stabilizes, and it is observed that around the 200 th iteration, there is a transition to a 5 times lower value of the learning rate, after which the fluctuations remain relatively small.The learning of subnets A and B shows, by the slope of the graphs between iterations 1000 and 2000, that they will still have the potential to decrease if a larger number of iterations is practically feasible.The behavior of the learning graph of subnet C shows some notable differences compared to B: no instability occurs, it seems to reach a plateau value after 1000 iterations, and the value of the loss function after 2000 iterations reaches 5% of the initial value, which is much lower than in the previous cases.Finally, subnet D learns very quickly at the beginning, similar to A, but the learning curve decreases more.After 1000 iterations, the loss function becomes less than 1% of the initial value, and the slope seems to show that the subnet still has a slight learning potential.The partial similarity between the graphs of subnets A and D comes from the fact that, in both cases, the marginal codomain of the logistic activation functions is involved, close to  As explained above for the learning graphs, the predictive efficiency values of the WF approach the ends of the interval derivative of the output neurons.
It was mentioned in Section 2.3 that th method would require the encoding of the SE a physics-informed NN [50].Because coordinates of the WFs, it is clear that such an approach must correlate the tr energy and neighboring nodes.Presumptively, t and a better extrapolation of the method However, these improvements will come with the cost of time.

Training and testing the neural network. Calculation of
After training on DS1 the entire NN estimating the energy, the accuracy indicators can be calculated and compared.
provided with all input data (sample statistically analyzed.in Fig. 9 shows that even though the accuracy is slightly lower for for a different CP type.For DS2, the estimation is better than for DS3 algorithm is absolutely "arbitrary" and therefore the NN not only

21
ence between the predicted and true WFs is somewhat larger for DS3, which shifts ofile to the right as compared to the DS1 histogram.Thus, the maximum of the DS3 histog approximately 3%, but it is approximately the same height.From the insets of Fig. 9a, the energy subnet has a greater tendency to underestimate.This is intuitive was considerably higher than those in DS1 and DS2, as shown in Fig.
hat the true and estimated WFs overlap spatially in a proportion of over 98% for the to an even greater extent for DS3.This can be explained by the stronger nature of DS3, as demonstrated by the higher energies in this set.Stronger confinement means narrower spatial region, resulting in better overlap.The insets of Fig. 9b show histograms of the average position differences and are almost identical, which was expected given that the CPs do not favor any particular direction.
for several symmetric confinement potentials particular cases of symmetrical CPs (which cannot be found in were calculated.These cases were also used to plot and compare the aspect of the WFs predicted

Figure 1 .
Figure 1.(a) Neural network composed of (b) Subnet with  nodes in the input layer Because the functioning of the entire NN of data in a single generic subnet q.In the following expressions, all matrix operations for the matrix product explicitly denoted by " Given a particular CP function  , the data sent by Several quantitative indicators for testing the trained NNs are proposed based on the dispersion from the expected values of the WF, predicted energy, and average position of the particle.The indicators are calculated for each sample  in a DS (0 ≤  ≤ ).Network efficiency is determined by analyzing and comparing the distribution of the values of these indicators in the DSs involved.

Figure 2 . 10 mesh containing about 2 .
Figure 2. Mesh domains: (a) Far mesh mesh containing about 5.1% of nodes; (c) Near mesh consisting of The complete technical details of the mesh used are presented in TableI.TableI.Concentric domains of the finite element method

Figure 3 11 randomized
Figure 3 shows the energies obtained for each of the three sets of random CPs.The mean energy over the entire set, the most frequent (probable) energy  , and the standard energy de indicated.The color scale illustrates the deviation of the energy of each sample from the mean value of the set.occurrence frequencies were created by dividing the energy interval equal bins and counting the results in each bin.The numbers on the horizontal axis of the his

Figure 3 .
Figure 3. Energy distribution of the ground level corresponding to the sets of training set; b) DS2 -validation set; c) DS3 frequency of the results as a function of energy.As expected, the energy distributions obtained for DS1 and DS2 are respectively) and confirm that the sets are sufficiently large randomization algorithm.In these cases, the upper positive skewness of the frequency histograms the energies are unevenly distributed around the mean.most frequent value.Figure3ashows the results obtained with the different randomization algorithm.The mean energy value was significantly frequent value.The appearance of the distribution is quite differen normal distribution.The standard deviation factor, we would have opted for a full aware that better effectiveness of the representation of the nodes in the original FEM calculation mesh to represent both the IL and OL illustrated in Fig. 4 and is based on the intention of approximately uniform coverage of a circular region that is representative of the most important variations of the selection was chosen to be of radius  domain contains regions with various me 12 Energy distribution of the ground level corresponding to the sets of confinement potential validation set; c) DS3 -testing set.The lateral histograms illustrate the occurrence frequency of the results as a function of energy.

Figure 4 .
Figure 4. a) The mesh subset Ξ ≡ original mesh nodes of the finite element method particular nodes marked A, B, C, D were chosen to subnets and the distribution of the wave function selected nodes in the vicinity of the central node A.
empirical choice of taking it close to the geometric mean of the all subnets in this study (energy subnet of the two HLs.The training of all subnets is done with the SGD method, using batches with this, the training DS1 of 100 ksamp random the current batch is rebuilt from two different subsets chosen randomly and independently.SGD learning graph for the energy neu were able to perform a longer training estimate the WF, the training had to be fluctuations that appear along the learning graph are stochastic variance of the different batches 13 original nodes that are closest to the vertices of the triangular tiling {3,6} (Fig. 4b).These are the centers of the the plane.The number of nodes in the subset is then controlled If we choose this parameter to be 0.06, we get  = 2764 represent the input data of the NN that estimates the WF as well as the input of the subnet that estimates the energy of the ground level.≡ {( ,  )} intended for neural network training: blue dots are the of the finite element method and red dots are the  = 2764 selected nodes particular nodes marked A, B, C, D were chosen to further illustrate the training of the corresponding neural wave function estimation errors.b) Details of the triangular tiling and the selected nodes in the vicinity of the central node A. energy evaluation NN architecture is prohibitively difficult because of the large time calculations with different combinations of node numbers.Concerning neurons in the HL rule in the literature for setting the optimal number.Thus, we to the geometric mean of the numbers of nodes in the IL and OL energy subnet included) have  = 2764 nodes in the IL and  = 53

Figure 5 neuralFigure 5 .Fig. 5 ,Fig. 5 ,Figure 6 .
Figure 5 shows the only one subnet for energy estimation, we subnets used to The relatively small and are caused by the in the form of bivariate histogram plots with 2-D bins.The bins were squares with side of 10 .The true energy (FEM solution) is represented on the horizontal axis and the estimated/predicted energy (NN solution) is represented on the vertical axis.The number of samples in each bin is coded by color.The numerical labels on the color bars give the bin counts and, if multiplied by 10, provide the surface probability density.It can be observed that the distributions of the estimated energies for the training set (a,b) and validation set (c,d) are extremely similar.This is proof of the validity of the network training: that is, the number of samples in the training DS1 is large enough to ensure neural learning related to the SE itself and possibly to the algorithm generating DS1 and DS2 random CPs, but not to a particular group of samples.

1
and 0, respectively.Learning reaches the plateau regime faster, because the activation function derivative of the output neurons decreases rapidly.After the same number of iterations, the relative decrease in the loss function exhibited the following trend from the center to the outside: ~7% for A, ~12% for B, ~5% for C, and ~0.7% for D.

Figure 7 .Figure 8 .
Figure 7. Learning graphs the subnets corresponding to the nodes marked in Fig. 4a.Trained neural subnets A, B, C, and D were applied to the training and testing DSs.Bivariate histograms are shown in Fig. 8.The bins are squares with side of 10 , the true value of the WF (FEM solution) is on the horizontal axis, and the estimated/predicted value (NN solution) is on the vertical axis.The values on the color bars indicate the number of samples in the bins and, if multiplied by 10, provide the probability density.Observing histograms (a) and (b) corresponding to subnet A, it is found that the spread of the results is greater in the case of DS3, which translates into a lower prediction efficiency than the estimation efficiency for DS1.In addition, for WF values close to the maximum, the predictions for DS3 considerably underestimate the true values.Regarding subnets B (c,d) and C (e,f), similar behaviors were found: slightly better for DS1 in the case of B and, surprisingly, slightly better for DS3 in the case of C. The histograms (g,h) of subnet D, magnified four times in the insets of the figures, show a higher concentration and lower dispersion of the samples from DS3 in

Figure 9
Figure 9 shows the results obtained in the form of superposed plotted as a contour line to demonstrate should be noted that although individually those in DS1 by the nature of their generating dissimilar to those of DS1 and DS2.

Figure 9 .
Figure 9. Multiple histograms comparing the neural network training DS1, validation DS2 and testing DS3 and the true solution  ; upper inset: deviation inset: energy relative deviation; (b) deviations of averageComparing the results presented in Fig.9NN still behaves very well for a different nothing based on some classical algorithm is ab Schrödinger problem, but also "learns"

"
Figure9bdemonstrates that the true and estimated vast majority of samples, and to an even greater extent for DS3.This can be explained by the stronger nature of the specific confinement of DS3, as demonstrated by the that functions are constrained in a narrower spatial region, resulting in histograms of the average position differences and are algorithms generating the randomized CP3.6.Neural network predictionsThe NN accuracy indicators in particular cases of DSs) were calculated.These cases were NN with the expected WFs calculated by potential wells were considered, all with

FsFigure 10 .Figure 11 .
Figure 10.Confinement potentials, particular cases with analytically-defined allow the contour plots of isolines to be observed confinements of depths 1 and 3/4, respectively; (c) has an elliptical confinement perimeter; (d) is a confinement with variable parabolic depth, and (e) has a square perimeter of confinement.A quantitative assessment of the prediction accuracy 3-10 of the table contain, from left to right: Dirichlet boundary condition at (, ) =  may be used to approximate the ground state WFs { } and

Table I .
SE was solved for all 3 × 10 randomized WFs  of the ground level were stored, forming together with the corresponding sets" DS1, DS2 and DS3.In the following, we (exact) values.
e finite element method mesh and their main geometric characteristics