Neural ODE and Holographic QCD

The neural ordinary differential equation (Neural ODE) is a novel machine learning architecture whose weights are smooth functions of the continuous depth. We apply the Neural ODE to holographic QCD by regarding the weight functions as a bulk metric, and train the machine with lattice QCD data of chiral condensate at finite temperature. The machine finds consistent bulk geometry at various values of temperature and discovers the emergent black hole horizon in the holographic bulk automatically. The holographic Wilson loops calculated with the emergent machine-learned bulk spacetime have consistent temperature dependence of confinement and Debye-screening behavior. In machine learning models with physically interpretable weights, the Neural ODE frees us from discretization artifact leading to difficult ingenuity of hyperparameters, and improves numerical accuracy to make the model more trustworthy.


I. INTRODUCTION
Applying machine learning to solve physics problems [1,2] has generated a growing research interest in recent years.Machine learning holography is an emerging direction in this field, which introduces artificial intelligence to discover the holographic bulk theory behind generic quantum systems on the holographic boundary.Multiple approaches have been developed to capture different aspects of the holographic duality [3][4][5][6][7][8][9][10]. For example, the entanglement feature learning (EFL) [6] can establish the emergent holographic spacial geometry simply from the entanglement entropy data on the holographic boundary.The anti-de Sitter / deep learning (AdS/DL) correspondence takes a different approach [4,5,8,10] by implementing the holographic principle [11][12][13] in a deep neural network, where the neural network is regarded as the classical equation of motion for propagating fields on a discretized curved spacetime.Further progress has been made by the neural network renormalization group (Neural RG) [7], which learns to construct the exact holographic mapping between the boundary and the bulk field theories at the partition function level.All these approaches share a common theme that the emergent dimension of the holographic bulk corresponds to the depth dimension of the deep neural network, and the neural network itself is regarded as the bulk spacetime.As the neural network learns to interpret the holographic boundary data serving from its input layer, the network weights in deeper layers get optimized, which then leads to the optimal holographic bulk description for the boundary data.
However, the development so far has been based on the discretization of the holographic bulk dimension, because the neural network layers are intrinsically discrete in typical deep learning architectures.It is desired to make this dimension continuous, as a smooth holographic spacetime is physically required in the classical limit.In this work, we explore this possibility, based on the recent development of the neural ordinary differential equation (Neural ODE) [14] approach.The Neural ODE is a generalization of the deep residual network [15] to a continuousdepth network with the network weights replaced by a continuous function.It provides a trainable model of differential equations that can evolve the initial input to the final output continuously.The Neural ODE is particularly suitable for the AdS/DL approach because the goal here is precisely to infer the differential equation that describes the propagation of the bulk field in a continuous space-time with smooth geometry.In this context, the continuous network weights of the Neural ODE have a physical interpretation related to the metric function that characterizes the curved spacetime in the holographic bulk.An interpretable spacetime geometry emerges as the neural network is trained, which demonstrate a scenario of machine-assisted discovery in theoretical physics, where the artificial intelligence plays a more active role in the scientific process other than a tool for data processing.
The AdS/DL applied to holographic QCD would be a nice ground to test the effectiveness of the Neural ODE in physics applications.The Neural ODE brings to us two advances: the removal of artificial regularizations and the improvement of accuracy.In previous works [4,5,10,16], due to the discrete nature of the neural network, technical regularization terms are introduced to remove the discretization artifacts and to ensure the smoothness of the network weights.[17]Such regularization is no longer needed in the Neural ODE approach.Furthermore, for the network to be identified with a field equation in the curved spacetime, the Euler method for the ordinary differential equation was introduced for simplicity, though the Euler integration generically suffers from large numerical errors.Replacing the discrete neural network with the Neural ODE provides a natural interpretation of the metric function in the smooth spacetime, and at the same time, would greatly enhance the accuracy.The improved accuracy of the Neural ODE is simply due to the advanced ODE solver equipped in the Neural ODE framework.The discretization along the integrated coordinate is optimized adaptively, rather than given ad hoc as hyperparameters.This is especially useful when the metric function contains coordinate singularity at the black hole horizon.The required accuracy depends on the purpose and the method of how machine learning is applied.[18] In our present case of the AdS/DL, as is explicitly shown, the accuracy improvement is sufficient for exploring emergent geometries at various values of temperature.
In this paper, following the holographic QCD framework of Ref. [4], we use the Neural ODE to find bulk spacetimes emergent out of the given data of chiral condensate of lattice QCD.The Neural ODE not only discovers a spacetime which is consistent with that of Ref. [4], but also greatly enhances the power of machine learning method.The emergent geometry turns out to incorporate automatically the presence of the black hole horizon, and the Neural ODE enables us to further explore geometries for different values of temperature, with improved accuracy.The temperature dependence of holographic Wilson loops, calculated by the emergent geometry trained with the Neural ODE, turns out to coincide qualitatively with the known lattice QCD results of the Wilson loops.Interestingly, we find that the radial derivative of the volume factor of the emergent geometry does not depend on the temperature, and the temperature dependence of the chiral condensate solely stems from that of the bulk scalar coupling constant.
The organization of this paper is as follows.In Sec.II, we briefly review the holographic QCD framework adopted in Ref. [4] and the Neural ODE [14].In Sec.III, we apply the Neural ODE to train the machine (which is equivalent to the holographic QCD system) and find emergent geometry for various values of the temperature.In Sec.IV, we introduce a way to calculate consistent full components of the metric from the emergent volume factor, with which we calculate holographic Wilson loops.They qualitatively agree with Wilson loops evaluated in lattice QCD.Sec.V is for a summary and discussions.Appendix A is about details of the Neural ODE.

II. REVIEW: ADS/CFT MODEL AND NEURAL ODE
A. Bulk field theory The holographic principle [11][12][13], also known as AdS/CFT correspondence, is a profound relation between a d-dimensional quantum field theory (QFT) and a (d + 1)-dimensional gravity theory.It has been successfully applied to a large class of strongly coupled QFTs in high energy theory and condensed matter theory.Despite its success, a constructive way of finding the holographic gravity dual theory for a given QFT is lacking.If we have the experimental response data of a quantum system under external probing fields, can we model it holographically by a classical field theory in some curved geometry?The entanglement feature learning [6,19] and the AdS/DL correspondence [4,5] can answer that question in a concrete setup.Here we briefly review the setup of Ref. [4], for which we apply the Neural ODE method in later sections.
We assume the d + 1-dimensional bulk spacetime coordinated by (t, η, x 1 , • • • , x d−1 ) including the time dimension t, the space dimensions x i and the holographic bulk dimension η.We assume the translation symmetry except for the η direction, and the spacial homogeneity in (x 1 , • • • , x d−1 ), then in the gauge g ηη = 1, the holographic bulk spacetime can be described by the following metric (we will consider d = 4 specifically) The dual quantum field theory lives in a d-dimensional flat spacetime spanned by (t, x 1 , • • • , x d−1 ) on the holographic boundary.We call η the radial coordinate and the others are angular directions.The spacetime volume factor is A scalar field φ in this curved spacetime is described by the action: The saddle point equation (the classical equation of motion) δS/δφ = 0 reads, Since we are interested in homogeneous static condensate in the dual quantum field theory, we assume that φ is only a function of η.Then Eq. (4) becomes, or equivalently, we could write it as where the metric function is (with d = 4) The input data is the pair φ(η ∼ ∞), π(η ∼ ∞) near the AdS horizon.And the field will propagate following the classical equation of motion Eq. ( 6).On the other hand, there is black hole horizon at η ∼ 0. The on-shell static scalar field satisfies the black hole boundary condition or equivalently, we could require The mapping between the asymptotic value of the scalar field φ(η ∼ ∞) and the data of the dual quantum field theory is given by the AdS/CFT dictionary with the asymptotically AdS spacetime with the AdS radius L [4], for an operator O whose dimension is three, corresponding to the bulk scalar field φ with the mass The coefficients are related to the condensate as Here, m O is the source for the operator O of the quantum field theory, and N c denotes the color number in QFT and hence we set N c = 3 as we focus on QCD later.Therefore, the data of one-point function of the quantum field theory {m O , O } is given, it is mapped to on-shell configuration of φ(η) and π(η) (by taking derivative on both sides of Eq. ( 10)) near the holographic boundary η ∼ ∞.
The experimental data pairs (φ(η ∼ ∞), π(η ∼ ∞)) can be viewed as the positive data.And they will satisfy the black hole boundary condition Eq. ( 9)) after following the classical equation of motion Eq. ( 6).We could also view pairs of data φ(η ∼ ∞), π(η ∼ ∞) that does not lie on the experimental curve as negative data.We expect those negative data will not satisfy the black hole boundary condition.Therefore, this becomes a binary classification problem, with the propagation equation Eq. ( 6).
Here, for a given data of the condensate, the parameters in the differential equation to be learned are: the continuous metric function h(η), the AdS radius L and interaction coupling λ are in general unknown.
We regard Eq. ( 6) as a neural network, and the network weights are the metric function and other parameters.For that purpose, the numerical method known as the Neural ODE is a perfect framework to find the optimal estimation for those unknown parameters.In the following, we will briefly review the Neural ODE method.

B. Neural ODE
The Neural ODE [14] is a novel framework of deep learning.Instead of mapping the input to the output by a set of discrete layers, the Neural ODE evolves the input to the output by a differential equation, which is trainable.The general form of the differential equation reads dz(t where the vector z denotes the collection of hidden variables and θ denotes all the trainable parameters (which could also be t-dependent) in the neural network.Without loss of generality, suppose we have observations at the beginning and end of the trajectory: {(z 0 , t 0 ), (z 1 , t 1 )}.
One starts the evolution of the system from (z 0 , t 0 ) for time t 1 − t 0 with parameterized velocity function f θ (z(t), t) using any ODE solver.Then the system will end up at a new state (z 1 , t 1 ).Formally, we could consider optimizing the general loss function L, which explicitly depends on the output z 1 as To back-propagate the gradient with respect to the parameters θ, one introduces the adjoint parameters a(t) = ∂L ∂z(t) and their corresponding backward dynamics, After solving Eqs. ( 12) and ( 14) jointly, the parameter gradient can be evaluated from The derivation Eq. ( 15) can be found in the appendix.

III. EMERGENT SPACETIME FROM NEURAL ODE
A. Learning architecture

Neural ODE and bulk equation
In the form of the first order differential equation, the equations of motion for the bulk field Eq. ( 6) can be translated to the Neural ODE Eq. ( 12) by the following identifications: The bulk metric function h(η) corresponds to the neural network weights θ.To make the network depth finite, we introduce the UV and IR cutoffs for the metric as η ini = 1, and η fin = 0.1 in units of the AdS radius L.
There are two big advantages of using Neural ODE.First, the metric function is smooth and we do not need to add penalty terms for smoothness.Therefore, we can largely reduce the number of hyper-parameters needed in the network.Second, our Neural ODE uses an adaptive ode solver, called "dopri5."This gives us much more accuracy in the integration, and it turns out that the equation of motion in the curved geometry is sensitive to the discretization in some region of η.This adaptive method provides accuracy and efficiency simultaneously.

Bulk metric parameterization
To make the integration variable monotonically increase from the AdS boundary to the black hole horizon, we made a change of variable η = 1 − η for the metric function, and we model the metric function h( η using the following two ansatz: The first one is the Taylor series around the AdS boundary.The second choice explicitly encodes the divergent behavior of the metric function near the black hole horizon at η = 0. Any black hole horizon with a nonzero temperature has f (η) ∝ η 2 with g(η) being nonzero constant.Hence, Eq. ( 7) leads to h(η) ∼ 1/η as the generic behavior of h(η) near the horizon η = 0.The second ansatz Eq. ( 18) explicitly encodes this prior knowledge.We use the lattice QCD data of RBC-Bielefeld collaboration [21] as our input data.The data is the chiral condensate O = qq, as a function of its source, the quark mass m q .A plot is given in Fig. 1 Left.We take the T = 0.208 [GeV] temperature data (the black line in Fig. 1 Left), and the detail of the data is listed in Tab.I. [22] We generate positive data and negative data in such a way that if the data's vertical distance to the experimental curve is less than 0.005, then it is labeled as positive (the label is 0).Otherwise, it is labeled as negative (the label is 1).We collected 10000 positive data and 10000 negative data used for training, as shown in Fig. 1 Right.Our goal is to obtain a holographic description of our QCD data using the Neural ODE method.The variation parameters are λ, L and h(η).

Loss function
As for the loss function L, we use where the first term is the mean square error of the classifier loss function for the output data to approach the true result, Eq. ( 9).The function T (x; , σ) is a specific differentiable nonlinear activation function that maps region [− , ] to 0, and otherwise to 1, in a fuzzy manner, The parameter σ controls the slope of the boundary as shown in Fig. 2. In the mean square error, l is the label of the data (l = 0 for positive data and l = 1 for negative data).The second term in Eq. ( 19), the β penalty term, is to impose the condition that the emergent metric needs to be asymptotically AdS near the boundary η = η ini .Due to nonlinear nature of the ODE function and sensitivity of Neural ODE, one may need to modify the hyperparameters ( , σ) to ensure nonzero value of the gradient during the training.

B. Emergent metric
With the architecture described above, we perform the training.We first choose Eq. ( 17) for the ansatz of the metric function h(η).We randomly initialize the training parameters.The initial configuration of the metric function is given in the subplot (c) of Fig. 3.As shown in the subplot (a) of Fig. 3, the machine with the initial metric judges all the orange+green data as positive data.
After training with 13000 epochs, the loss is reduced to 0.02.The result is shown in subplot (b) & (d) of Fig. 3.As we can see the predicted data agrees well with original positive data.We also observe that the emergent metric is a smooth function.The trained metric function reads:  The machine also finds the optimal values of the coupling constant and the AdS radius, As we can see in subplot (d) of Fig. grow significantly near η ∼ 0. This is indeed the black hole horizon behavior.It is quite intriguing that the machine automatically captures the divergence behavior of the metric function h(η) near the black hole horizon.
As a check, we also perform the training with the second ansatz for the metric function h(η), i.e.Eq. ( 18), which encodes the prior knowledge about the black hole horizon.As shown in Fig. 4, the result looks almost the same as that of the first ansatz that does not use the prior knowledge.Therefore, the regularization to implement the black hole horizon in h(η) is not necessary.This result indicates that Neural ODE can automatically discover the black hole geometry in the holographic bulk and recover the near-horizon metric behavior without prior knowledge.For convenience, we use the training results of the second ansatz to calculate a physical observable (Wilson loop) in the next section.

C. Multi-temperature result
We also applied the above method to the multitemperature QCD data given in Tab.II.During the training, we require different neural networks to share the same value of AdS radius L, and the training results are summarized in Tab.III.The model discovers the optimal emergent metric as well as the coupling constant λ at each temperature.
We have two observations of the trained results shown in Tab.III.First, the obtained metric h(η) and the AdS radius L do not depend on the temperature T .Second, the only dependence on the temperature is encoded solely in the coupling constant λ of the scalar field theory.
The former sounds counter-intuitive, since normally the metric itself should be highly dependent on the temperature, and the change in the metric will modify the gravitational fluctuation, which corresponds to the gluon physics.It is easy to resolve this issue.The obtained function is h(η) and not the full metric components f (η) and g(η).Even for the case of the AdS Schwarzschild geometry in which the metric is temperature-dependent, we find h(η) = 4 L coth 4η L which is temperature independent.In the next section, to compute physical quantities from the emergent h(η), we assume some functional form of g(η) and discuss the temperature dependence of the metric components.
What the machine found is that the reproduction of the input data mainly relies on the temperature dependence of the coupling constant λ in the holographic bulk theory.For lower temperature, we find a strong nonlinear interaction, i.e. larger λ.The value of λ is directly related to the self-coupling of sigma meson.Although we cannot compare our trained results with experiments since the self-coupling has never been precisely measured due to the broad width of the sigma meson, our result provides a unique view of the QCD phase transition, in particular about the mysterious relation between the chiral transition and the deconfinement transition.

IV. PHYSICAL INTERPRETATION OF THE EMERGENT SPACETIME
A. Reconstruction of the metric Since in our case the machine learns only h(η), to compute physical quantities such as Wilson loop, we need to assume the form of g(η) to get f (η).Here we assume the functional form of the AdS Schwarzschild configuration, where A and a are temperature-dependent constant.In particular the constant a encodes the dimensionality of the AdS d+1 -Schwarzschild as a = d/4, and here we just set it as a free parameter.The ansatz Eq. ( 24) also satisfies the criterion that g is a monotonic function of η, which is normally required for spacetimes without a bottle neck.The Hawking temperature T constrains the function f (η) as so, for our calculation we define a new function F (η) as which satisfies the boundary condition Substituting Eqs. ( 24) and (26) to Eq. ( 7), and perform the integration over η with the integration constant fixed by Eq. ( 27), we obtain The overall factor A in g(η) in Eq. ( 24) can be fixed by the following asymptotically AdS 5 constraint at η L, f (η) g(η) e 2η/L+const., which implies h(η) 4/L according to Eq. ( 7).To determine this constant which we require temperature independent, we expand Eq. (28) around η L as Using this constant c(a), the constraint Eq. ( 29) determines the normalization of g(η) as Now, since we require that the constant in Eq. ( 29) is temperature independent, we have a condition Up to an integration constant, we can numerically solve this equation.Assuming that at T = 0.208 [GeV] we have a = 1, we find numerically c(a = 1) = 11.1952.Then the equation above leads to c(a(T = 0.188 [GeV])) = 11.3984 and a(T = 0.208 [GeV]) = 1.098.We are going to use g(η) given by Eq. ( 31) and f (η) given by Eq. ( 26) with Eq. ( 28) for the calculation of physical quantities below.

B. Wilson loop
Following the standard method [23][24][25] for calculating the expectation value of the Wilson loop holographically, we evaluate the Wilson loop for a quark and an antiquark separated by the distance d, using our emergent spacetime.The logarithm of the Wilson loop W , which is proportional to the quark potential V , is the area of the Euclidean worldsheet of a string hanging down from the AdS boundary.The string reaches η = η 0 at the deepest, and both the quark potential V (d) and the quark distance d are functions of η 0 , as Here 1/(2πα ) is the string tension which is undetermined in this work.Eliminating η 0 from these expressions implicitly defines V (d).Note that the integration in V (d) diverges at η = ∞, and we need to introduce a cut-off for the asymptotic AdS boundary for the calculation.The quark potential V (d) has another saddle, which is just two straight strings connecting the black hole horizon and the asymptotic boundary,  We need to adopt V (d) in Eq. (34) or V Debye , whichever is smaller.
Using the metric obtained in the previous subsection, we calculate the quark potential for each temperature.In Fig. 5, we present the quark potential for T = 0.188 [GeV] data and T = 0.208 [GeV] data.They exhibit three phases: at short d, the potential is Coulombic, while at large d, the potential is flat and Debye-screened, and in the middle range of d, the potential is linear, signifying the quark confinement.The set of these features is well-known in lattice QCD simulations (see Fig. 6), and, interestingly, our holographic results reproduce these features.[26] This reproduction was reported in Ref. [4], and here we further investigate the temperature dependence.As we see in Fig. 5, the two plots are identical with each other except for the height of the Debye screening parts.The higher temperature corresponds to the lower height of the flat potential, which is qualitatively consistent with the lattice QCD result, as shown in Fig. 6.

V. SUMMARY AND DISCUSSION
In this paper, we applied the Neural ODE to the AdS/DL correspondence, where the emergent spacetime in the gravity side of the AdS/CFT correspondence is regarded as a deep neural network.Since the classical spacetime is continuous and smooth, the weights of the network need to be interpreted as a smooth function of the depth, thus the Neural ODE provides a very natural scheme for training the bulk geometry.We followed the setup of Ref. [4] of using the lattice QCD data of QCD chiral condensate to train the neural network.We demonstrated that the Neural ODE indeed worked well to discover a bulk geometry which is holographically consistent with the lattice QCD data.Even without including the black hole boundary condition for the ansatz function of the Neural ODE, the machine found automatically the black hole horizon behavior.This proves the ability of the Neural ODE to automate the proposal of the holographic bulk theory from the holographic boundary data in the AdS/CFT setup.
We performed the training with the training data of lattice QCD at various temperatures and found that the optimal volume factor of the emergent geometries shares the same radial dependence except for the overall normalization.The temperature dependence in the behavior of the QCD chiral condensate simply comes from the bulk scalar coupling constant, which corresponds to the meson couplings.The Wilson loops holographically calculated with the machine-trained emergent geometries appeared to have a correct temperature dependence, as in Fig. 6.
For a more quantitative evaluation of the emergent spacetime, here we argue that the slope of the linear part of the plots of the quark-antiquark potential, given in Fig. 5, corresponds to the QCD string tension σ.Since in our formulation, the overall normalization 2πα is not given, we only look at the ratio of the slope at T = 0.188 [GeV] and the slope at T = 0.208 [GeV].A numerical fitting of Fig. 5 gives σ T =0.208GeV /σ T =0.188GeV 1.0.In lattice QCD simulation, this number is expected to be smaller than 1, because the deconfinement transition (which is not the first-order phase transition) occurs when the QCD string tension goes to zero.So our value 1.0 still keeps the tendency of the large N gauge theories where the deconfinement transition is expected to be the first order.
In addition, we notice that the string breaking distance, the value of d at the kink in Fig. 5, is around d ∼ 10 −3 in the unit of L ∼ 5 [GeV −1 ], which is too small compared to the expected QCD value d ∼ O(1)[fm].This quantitative discrepancy would be largely due to our assumed functional form of the metric component g(η) in Eq. (24).In this paper we have seen the qualitative feature of the temperature dependence of the Wilson loops to be consistent with lattice QCD results[28], and further quantitative match will need some different observable data to train f (η) and g(η) independently.
The Neural ODE is quite effective for physical appli- cations of the machine learning method in which neural network weights have physical meanings.Any physical observable, if looked minutely enough, should be a continuous function of space and time.To identify weights of standard deep neural networks with physical quantities, regularizations to make them a smooth function on the discrete network are necessary, which are rather artificial and often still can not remove discretization artifacts fully.In Neural ODEs, the weights are continuous functions in the first place, which hence reduces unnecessary ingenuity of the regularizations.One of the main improvements from Ref. [4], although the physical setup is the same, is that we could remove the artificial regularizations used in [4], and largely improve the prediction accuracy of the emergent bulk metric at the same time.
Since we obtained the emergent volume factor for each temperature, it is possible to ask what kind of bulk action can allow such a metric as a solution of its equation of motion.There is a lot of work that elaborated possible bulk systems dual to QCD, and the major example would be the Einstein-dilaton system [29,30].We want to visit this question in future publications.a i (0) = (A8)

q〉GeV 3 
FIG. 1. Left: The lattice QCD data plot for the chiral condensate as a function of quark mass, for various values of the temperature (excerpt from [20].The horizontal axis is the light quark mass normalized by the strange quark mass.Right: Data used for training.The orange dots are negative data, and the blue dots are positive data.

FIG. 3 .
FIG. 3. Subplot (a): The prediction of the machine before the training.Green dots are positive data.Orange dots are negative data but judged as positive by the machine as false positive.Subplot (b): the prediction of the machine after the training.Blue+green dots are positive data.Green dots are data judged as positive by the machine.Orange dots are the false positive data, which almost disappear after the training.Subplot (c): The metric function h( η) before the training.The horizontal axis is η = 1 − η.The black hole horizon is on the right side, η = 1, and the AdS boundary is on the left side, η = −∞.Subplot (d): the emergent metric h( η) after the training.As we can see, the machine figures out the divergence behavior near the black hole horizon during the training.

FIG. 5 .
FIG.5.The quark antiquark potential V (d).Left: calculated in the emergent spacetime with the metric functions trained with the data at T = 0.188[GeV].Right: that at T = 0.208[GeV].In these figures, the zeros of the vertical axis should be ignored, as they are dependent on the cut-off of the asymptotic AdS boundary.

FIG. 6 .
FIG. 6. Left: calculated quark-antiquark potential for T = 0.188 [GeV] (blue line) and for T = 0.208 [GeV] (red line).Two lines overlap with each other, except for the flat parts.Right: lattice QCD result of the quark-antiquark potential at different values of temperature, taken from Ref. [27].