Physics-informed neural networks for modeling astrophysical shocks

Physics-informed neural networks (PINNs) are machine learning models that integrate data-based learning with partial differential equations (PDEs). In this work, for the first time we extend PINNs to model the numerically challenging case of astrophysical shock waves in the presence of a stellar gravitational field. Notably, PINNs suffer from competing losses during gradient descent that can lead to poor performance especially in physical setups involving multiple scales, which is the case for shocks in the gravitationally stratified solar atmosphere. We applied PINNs in three different setups ranging from modeling astrophysical shocks in cases with no or little data to data-intensive cases. Namely, we used PINNs (a) to determine the effective polytropic index controlling the heating mechanism of the space plasma within 1% error, (b) to quantitatively show that data assimilation is seamless in PINNs and small amounts of data can significantly increase the model’s accuracy, and (c) to solve the forward time-dependent problem for different temporal horizons. We addressed the poor performance of PINNs through an effective normalization approach by reformulating the fluid dynamics PDE system to absorb the gravity-caused variability. This led to a huge improvement in the overall model performance with the density accuracy improving between 2 and 16 times. Finally, we present a detailed critique on the strengths and drawbacks of PINNs in tackling realistic physical problems in astrophysics and conclude that PINNs can be a powerful complimentary modeling approach to classical fluid dynamics solvers.


Introduction
Shocks are ubiquitous throughout multiple astrophysical systems. In this study we focus on the solar termination shock. The termination shock marks the boundary between the supersonic and the subsonic solar wind outflow, as it is the region at which the solar wind speed decreases sharply as a result of its interaction with the interstellar medium. NASA has launched an entire fleet of satellites 3 that measure the solar wind and its impact on the heliosphere. The heliophysics satellites make in-situ measurements at specific locations around the Sun. There have been direct in-situ measurements of the plasma at the Heliopause, see e.g. [1]. The termination shock is a standing shock wave at the distance of about ∼100 AU as measured by Voyager 1 and Voyager 2 space probes [e.g. [2][3][4]. A complete 3D coverage of the entire heliosphere is impossible and especially for the outer parts of the Solar System the observational coverage is very limited. Thus, numerical modeling and simulations are often necessary to study the solar wind evolution throughout the entire heliosphere [e.g . 5].
Machine learning and neural network models have been revolutionizing pretty much any field that deals with data. In their paper [6] first showed that given sufficient expressivity and a deterministic relationship between input and output variables, feedforward neural networks can be used as universal function approximators. Machine Learning and deep learning (neural network-based) methods have already been employed in a number of data-driven solar and space weather applications for tasks that range from principal component analysis, to classification, regression and forecasting [7][8][9][10][11][12][13][14][15]. We refer to [16] and references therein for an in depth review regarding machine learning advancements in space physics.
Physics-informed neural networks (PINNs) are a special category of deep learning models that combine the power of partial differential equations (PDEs) and neural networks to solve a wide range of problems ranging from cases with no data available to data-driven approaches [17][18][19][20][21][22]. PINNs leverage the power of automatic differentiation-a property of deep learning models [see e.g. 23, and references therein]-to combine classic physics laws with neural networks. In general, neural networks learn by example and can reach high accuracy when a lot of data are available. If we have sufficient amounts of observations, PINNs can be used to discover the underlying physics by e.g. determining the parameters of PDEs from observational data in a series of problems known as inverse problems [19,[24][25][26][27][28][29][30][31]. When observational data are difficult to obtain or sparse, PINNs can leverage the power of physics laws expressed as PDEs together with initial and boundary conditions to solve what is known as the forward problem-a process that resembles closely the classic forward modeling from computational fluid dynamics (CFD) solvers for example.
Over the last decade, a number of papers have been published on the applicability of PINNs forward problems on fluid dynamics use-cases [17,18,[20][21][22]32]. While PINNs have been successful in the forward modeling of simpler physical scenarios, they still suffer from (a) low accuracy even with many training points, (b) local minima in cases of non-convex solution spaces and (b) lack of general applicability and high costs as they require extensive hyper-parameter tuning especially for complex physical setups such as compressible fluids and discontinuities [33,34, see e.g. discussions in]. A number of approaches trying to tackle these issues have started to get explored by e.g. iteratively training the neural networks in sub-domains [33,[35][36][37][38], introducing better sampling methods [20,39], adding extra gradient boosted terms [34] or weighting terms in areas of high gradients [22], without a generally acceptable solution existing that tackles all three issues, i.e. low accuracy and local minima, discontinuities and high hyperparameter search.
In a number of recent works PINNs have been used for modeling simpler and classic shock wave problems [17,[20][21][22]. While there have been machine learning studies on interstellar shock prediction or Coronal Mass Ejection transit times [40,41], there have not been any PINNs studies investigating astrophysical shocks in the presence of a gravitational field. In this paper we will use numerical simulation results for benchmarking our PINN model of the evolution of the solar wind from the solar corona to capturing the termination shock at the edge of the heliosphere. This is the first paper where PINNs are applied to an astrophysical shock wave problem. Specifically, PINNs are used to model the standing termination shock created at the edge of the heliosphere.
In section 2, we describe the fluid dynamic based shock wave theory (section 2.1), the core CFD setup that we used to simulate the benchmarking PLUTO dataset (section 2.2) and the PINNs approach (section 2.3). In section 3, we present the main results of this work organized in three subsections, namely the inverse problem in section 3.1, data assimilation with PINNs section 3.2 and the forward problem in section 3.3. Then we move on the Discussion section 4, where we dive deeper into the implications of the present work in the current heliophysics context and discuss the advantages and drawbacks of PINNs. Finally, we finish with our Conclusions in section 5.

Analytic solution
The fluid dynamics conservation laws are a system of hyperbolic PDEs and can be written in the following general form where U is the density of the physical quantity, F is its flux and S are denoting any source terms. Following this formulation, the conservative form of the set of equations describing the hydrodynamics of a compressible fluid in the gravitational field of the Sun can be written as follows ∂ ∂t where ρ is the fluid density, u is the velocity, P is the pressure, r is the radial distance from the Sun, M ⊙ is the solar mass and G is the gravitational constant.
The above system of equations describes the acceleration of the solar wind near the Sun where the flow becomes supersonic, as well as the sharp deceleration near the edge of the heliosphere caused by the interaction with the interstellar medium. As discussed in the introductory section 1, a termination shock is formed, and marks the boundary between the supersonic and the subsonic solar wind outflow. In this paper, we make the assumption of a spherically symmetric solar wind outflow to reduce the problem dimensionality from 3D to 1D and simplify the problem.
There are no explicit heat source terms in the above set of equations (2)-(4). However, in order to properly simulate the solar wind acceleration we need to include some form of energy input into the system. In order to avoid the numerical complexity of explicitly including heating and cooling terms in the analytic and numerical models we approximate the role of the heating through an effective polytropic index γ. The possible range of values for the polytropic index of a natural fluid is γ is 1 ⩽ γ ⩽ 5/3. In particular, for ideal gases where there is no exchange of energy between the gas and the environment the polytropic index is γ = 5/3. We can however use an effective polytropic index (γ < 5/3) that mimics the effects of adding heating in the system. In our case we use an effective polytropic index of γ = 1.1. A more in depth discussion regarding the choice of polytropic index γ can be found in section 4.1.
The solar wind corresponds to a steady-state solution of the above system of differential equations (2)-(4). In fact, we can analytically integrate these equations to find three integrals. The first is the conservation of mass ρur 2 = ρ 0 u 0 R 2 ⊙ , which gives a relation between the mass flux at an arbitrary distance r as a function of the mass flux at the solar surface R ⊙ (quantities at the solar surface are denoted with the subscript 0). The second is the entropy conservation P/ρ γ = P 0 /ρ γ 0 that holds as long as the flow is smooth (without discontinuities). The third is the Bernoulli integral With some algebraic manipulations we can obtain the following differential equation for the radial profile of the velocity, where c s = √ γP/ρ is the sound speed. This equation shows that as the wind accelerates it crosses a sonic critical point (where u c = c s ) at distance r c = GM ⊙ /2c 2 s . We can find parametric solutions of the above equation as a function of the value of the Bernoulli integral. Using the expressions of the integrals the parametric solutions can be computed and visualized as isocontours of the following equation where M = u/c s0 is the Mach number (the velocity in units of the sound speed at the surface of the Sun), M 0 its value at the solar surface, x = r/R ⊙ is the distance from the center of the Sun in units of the solar radius R ⊙ , and λ = GM⊙ is a dimensionless constant. This is essentially a mathematical reformulation of the dynamic solar wind solution [see e.g. [42][43][44][45] and we can use it to obtain all the mathematically acceptable families of steady state solutions of the equation (6). The isocontours and analytic solutions of the PDE system (2)-(4) can be seen in figure 1.
The only physically acceptable solution that connects the subsonic wind near the Sun with infinity is the one that passes through the critical point, since this is the only one that becomes supersonic [42][43][44][45] and can be matched through a standing shock with a low pressure medium at high distances. The conditions at the critical point give the velocity at this point u c /c s0 = (4M 0 /λ 2 ) (γ−1)/(5−3γ) , its position r c = GM ⊙ /2u 2 c , and the corresponding value of the Bernoulli constant E c = 5−3γ 2(γ−1) u 2 c . Combining these three equations with the equation for the isocontours of equation (7), we find that for the critical solution the speed at the solar radius is the solution of the following equation Since M 0 is significantly smaller than unity we can ignore the first term and get the approximate solution Now, if we consider that the solar wind cannot flow to infinity unobstructed, but instead will at some point encounter the interstellar medium that has some pressure of its own P rmax , we get the following. For a given pressure of the environment at distance from the Sun r = r max , i.e. the interstellar medium, P rmax a shock will be formed at the position where the ram pressure of the wind P ram = ρu 2 roughly equals the pressure of the interstellar medium P ram = P rmax . This is expected to happen at a large distance where the wind speed is close to its terminal value u ∞ = √ 2E c and using the conservation of mass we get which is a rough estimation of the location of the standing shock. Using the approximate solution for M 0 we get which is roughly the expected location of the standing shock as a function of the conditions at the solar surface and the pressure at infinity. The shock jump conditions state that the fluxes of mass, momentum and energy before and after the shock remain the same where the square parenthesis represent the mathematical difference between the fluxes in the parenthesis before and after the shock. Coming back to figure 1 of the solution-space for the solar wind and considering the jump conditions, then we expect that the steady state solution of the termination shock is reached once the pressure balance between the solar wind and the interstellar medium has been reached and the new state still respects the conservation laws. Given that the Bernoulli constant remains the same, thus the shock solution at some point r = r shock after the critical point moves to the decreasing branch of the solution space shown in figure 1 with energy E c , if the shock is standing. This solution is represented with the red points in figure 1 and it was obtained numerically using a simulation algorithm as we will describe in the section below.

Synthetic dataset
In the previous section, we determined analytically the characteristics of the expected solution for a standing shock. The complete solution follows the solar wind accelerating branch, that passes through the critical point, and at position r ∼ r shock jumps to the decreasing-with-distance family of solutions as it feels the effects of the interstellar medium's pressure P rmax . We cannot determine analytically the precise location of the shock jump, and for that we need to run a numerical simulation. In order to obtain benchmarking data for training a Neural Network we use the astrophysical code PLUTO [46]. PLUTO is used for simulating the solar wind as a single fluid under the ideal hydrodynamics approximation. PLUTO, similar to other widely used CFD codes, solves the hydrodynamics equations in their non-dimensional form. For that non-dimensional 'code' units are used to avoid numerical issues due to extremely low or large numerical values of the quantities of interest. All physical quantities are normalized with respect to the characteristic scales of the system in question. For this non-dimensionalization process we need to specify three units and express all other quantities with respect to the chosen three. For convenience we chose the following characteristic units (a) the length unit is chosen to be the radius of the Sun r 0 = R ⊙ = 6.957 × 10 10 cm, (b) the density unit is taken as the density at the base of the solar corona (approximately at the solar radius) ρ 0 = 0.2 × 10 −6 g cm −3 , and (c) the velocity unit as the sound speed at the base of the corona u 0 = c s0 . Then if we measure time in units t 0 = r 0 /u 0 , and pressure in P 0 = ρ 0 u 2 0 , the above equations (2)-(4) become dimensionless and the GM ⊙ term should be replaced with the dimensionless constant λ = GM⊙ r0u 2 0 . It's worth noting that the code units used for time in PLUTO are given as derivative units, i.e. as a function of the length and velocity units t 0 = r 0 /u 0 . For our case r 0 is chosen as the solar radius R ⊙ , and u 0 is the sound speed at the solar surface. This dimensionalization of fluid dynamic equations allows us to apply our solution to different stars provided they have the same non-dimensional parameters, which in our case is the parameter λ. In our case, the parameter λ used corresponds to a value of λ = 5.
For this simulation, we use 1D spherical geometry assuming a spherically symmetric solar wind. The geometric domain extends between [1,20] R ⊙ and there is a mesh of 1500 points therein. The time domain covered extends between [0, 200] t 0 until the solution convergences to the steady state. The PDEs are solved with an adaptive time-step computed through the Courant condition and an adaptive mesh refinement of 4 levels with equal weights between velocity, density and pressure u : ρ : p, i.e. 1 : 1 : 1. The initial conditions at r 0 are u i = 2 u 0 , ρ i = 10 −3 ρ 0 and p i = 10 −3 p 0 and the boundary conditions at r min = R ⊙ and r max = 20 R ⊙ are set to ( ∂u ∂r and ( ∂u ∂r The PLUTO simulation starts at t = 0 and after several crossing-times pass and nearly 75 000 solver iterations the propagated disturbances from the boundary conditions settle and we finally arrive at a steady state solution with t final = t ss = 200 t 0 . At the end of the simulation the shock stops travelling and relaxes at its standing location at a distance of r shock ∼ 6 R ⊙ as shown by the red line of figure 1.

PINN models
The most common way to include physics laws in a PINN is to add to the regular neural network data loss L data , the PDE residuals L pde , together with the initial condition and boundary condition losses L icbc , as extra loss terms in the total loss function L PINN . Then, the total PINNs loss becomes where in our case L pde refers to the residual losses of PDE system (2)-(4) or in other words the error within which the model satisfies the PDEs. This formulation allows for the model to leverage the powerful back-propagation and loss minimization methods of modern neural networks to learn a model that not only can match the observations, but also adheres to the fundamental physics principles.
In this work, we employ PINNs [17,18] for the study of astrophysical shocks and in particular for the heliospheric termination shock. For our neural network models we use DeepXDE, a Python-based library for PINNs, which was first introduced and open sourced by [39]. We chose DeepXDE because it is user-friendly, while allowing for multiple standard machine learning backends, such as TensorFlow or PyTorch. For our experiments, we used TensorFlow 2.10 as the backend and hard boundary conditions [39] for all three physical quantities, namely velocity u, density ρ, and pressure P, to match the PLUTO setup as discussed in section 2.2. In terms of the neural network architecture, we have used fully connected neural networks. Finally, we chose a hyperbolic tangent as our activation function, a Glorot uniform initializer and an Adam optimizer [47] with a cosine learning rate scheduler for all the experiments presented in this work. The best network architecture, size and parameters were determined after experimentation.
In this work, we present three different ways PINNs can readily contribute to the study of the solar termination shock. Those three ways are (a) the inverse problem, where PINNs are used to discover the underlying physics from data (section 3.1), (b) the data assimilation, where PINNs can seamlessly incorporate ad-hoc observations to increase their predictive accuracy (section 3.2), and (c) the forward problem, where PINNs get to solve PDEs similar to classic numerical solvers in the forward modeling (section 3.3).

Results
In this study we discuss different applications of PINNs models on astrophysical shock waves. All results referenced in this study can be reproduced using the notebooks provided at the following open sourced repository [48].

Inverse problem
In this section, we use PLUTO data to determine the effective polytropic index value that is a parameter in the PDE equations (2)-(4). Determining an effective polytropic index is equivalent to determining the nature of the heating process present in the solar wind directly from data. In the following experiments, we aim to solve the inverse problem, wherein we use a PINN to model the underlying physics and infer the polytropic index γ in a supervised fashion, i.e. from observations or PLUTO data in this case.
For our first experiment on the inverse problem (i.e. inferringγ from data), we train a fully connected neural network with an input layer with 2 neurons (t, r), ten hidden layers consisting of 64 neurons each and a output layer with 3 neurons (u, P, ρ). We use the mean squared error (MSE) loss function and train the model using the ADAM optimizer over 60 000 iterations. To train this model, we sample 40 000 data points from the PLUTO synthetic dataset and 10 000 data points that are randomly sampled from the entire spatiotemporal domain from t > 0 to the end of the simulation and until the steady state is reached t = t ss . The results for this experiment can be seen in figure 2, where the top panel represents the PDE losses, the middle panel represents the data losses, and the third and bottom panel represents the convergence ofγ to the ground truth value γ = 1.1. During training, both data and PDE losses show a decreasing trend with the predictedγ reaching as low asγ = 1.20. This results in a percentage error of the order of ∼9% deviation from the ground truth γ = 1.1. This experiment corresponds to experiment No. 1 in table 1 and the accuracy of the model's learnt intermediate mapping between input and output variables can be visually inspected in figure 3.
To further improve the model and since neural networks are typically more accurate if trained with more training data, we run another experiment corresponding to No. 2 in table 1 with the same hyperparameters but with 20 000 more data points sampled from the entire t > 0 PLUTO dataset. The experiment did not result in significant improvements to the predictedγ value and instead we once again arrived at an error of about 9%. This average error is computed at the final time-step, i.e. at the steady state, but it is a measure of the ability of the PINN to predict the time-dependent evolution of the fluid throughout the entire temporal domain t > 0. The accuracy of the model's predicted polytropic indexγ depends on the accuracy of the intermediate mapping of the input (t, r) to the output variables (u, ρ, P). We will explore this further in the rest of this section.  As part of our next set of experiments, we use the same model architecture and training hyperparameters, but now we sample 40 000 points from the second half of the transient phase until the steady state is reached from the PLUTO data, i.e. from t > (t ss /2) to t = t ss . The intuition behind choosing a later timestep is to train the model with data when the system has approached closer to the steady state, thus leading to less variance in the training data distribution. This corresponds to experiment No. 3 in table 1 and the visual representation of the results can be seen in figure 2. Clipping half of the time-domain to exclude the first half of the transient phase led to better model performances with our losses and the predicted polytropic indexγ converging significantly faster and falling within less than < 1% error atγ = 1.11 for the majority of these experiments in the data sampling ranges explored, see experiments No. 3-7 in table 1.
As a final experiment, we use the same architecture and training hyperparameters, but this time we sample 1.5 k datapoints from PLUTO where t = t ss as can be seen in experiment No. 8 in table 1. For this experiment we consider a single timeframe when the system has reached the steady state which means there is no time-variability or in other words this corresponds to the least variance experiment. Even with significantly less data (2.5% of previous experiments) our model converges to less that <1% error at γ = 1.09 within the same number of iterations. The faster convergence of this model to the predicted polytropic index valueγ = 1.09 can be seen in figure 2.
The above results lead us to conclude that the PINN has an easier time learning an accurate mapping between (t, r) and (u, ρ, P) in physical situations where the variance in the predicted quantities is lower. A smoother semi-steady state or slowly changing with time flow leads to a more accurate prediction. This is further substantiated and demonstrated in figure 3. The experiment with data sampled from the full time-domain t > 0 (red curve) has a larger error in the predicted output variables (û,ρ,P) than the experiment with data sampled from the second half of the PLUTO simulation (blue curve) for the same data, i.e. 40 000 sample points. Further evidence for the same trend is the fact that the last experiment (No. 8 in table 1) with only 1500 sample points all sampled from the last timeframe of the PLUTO data corresponding to the steady state has the lowest error of predicting the output variables (û,ρ,P).

Data assimilation
In this section, we focus on data assimilation use-cases where we use a small number of training points to help the PINN converge faster in forward modeling. In the rest of this section, we present a few experiments demonstrating the importance of the assimilated data sampling on the PINNs performance in predicting the PLUTO ground truth steady state solution. The setups and the results of these experiments are shown in table 2.
For all the experiments of this section, since we are only interested in the steady state solution we take the time-independent version of the (2)-(4) PDE system, i.e. we ignore the time derivatives. On top of that, we set the r min and r max boundary conditions to hard constraints instead of including them in the total loss function [39]. Hard constraints are a last linear layer of the neural network that enforces the boundary conditions. The reason for setting the boundary conditions into hard constraints is twofold (a) we want to imitate the way classic solvers solve the PDEs as close as possible and in PLUTO for example the boundary conditions are given and they drive the entire solution dynamics and thus variations are not permitted and (b) we want to relieve the PINN from the added difficulty of optimizing another six losses, i.e. losses for all three (u, P, ρ) at r min and r max .
For all the experiments on data assimilation, we train a fully connected neural network with an input layer with one neuron (r), five hidden layers consisting of 100 neurons each and a output layer with 3 neurons. We use the MSE loss function and train the model using the ADAM optimizer over 60 000 iterations. To train this model, we sample 40 000 data points that are randomly sampled from the spatial domain.

Experiment A
In the first experiment we used exactly two data points from the PLUTO steady state data, one just before the shock and one just after the shock. This is experiment A in table 2. The intuition being that giving specifically these two data points should help the network to accurately capture the standing shock region. The results for this experiment can be seen in figure 4. As shown in table 2, the overall percentage errors for the predicted fluid quantities for experiment A are E u = 8.83%, E p = 6.66%, E ρ = 10.56%, where E u , E p , and E ρ are the velocity, pressure and density percentage errors respectively.
In the top left panel of figure 4 we see the velocity which seems to be predicted correctly near both ends of the computational domain, but the predictedû peak is higher than the ground truth and the predicted curve shows some small divergence in the central 50% of the computational domain around the shock area. The shock jump is captured accurately in the narrow discontinuity area. The pressure is shown in the middle left panel of figure 4 and is the one with the lowest error. The pressure profile is predicted accurately in the vast majority of the computational domain including the shock area with only diverging in the near areas before and after the shock. Finally, in the middle right panel of figure 4 we see that the densityρ is accurately predicted from the beginning of the computational domain up until a short distance before the shock wave. The shock jump is captured accurately thus verifying our hypothesis that data sampling near the shock area helps capture the shock dynamics. Further from the shock area, we see that density does a couple of oscillations around the ground truth value from the shock to the end of computational domain at r = 20R ⊙ .
In conclusion, with only 2 training data points and hard boundary conditions the PINN model captured the steady state solution with about 10% error in the fluid quantities after 60 000 epochs. In this case we carefully chose the data points near the shock wave in order to help the neural network to capture the discontinuity.

Experiment B
While it is a good idea to choose the training data points carefully and place them at the areas of interest, i.e. the shock wave in our case, in the general case we might not know where the shock is. In fact, in a real world scenario where we want to model the termination shock, we would use as much satellite data as possible from all available missions in the Solar System without knowing in advance the position of the shock and without having the control over the spacecraft coverage of the Solar System space.
In this experiment, we pick ten data points from the PLUTO steady state data linearly spaced throughout the entire computational domain. Five equally spaced points before and after the shock. This is experiment B in table 2 and the results for this approach can be seen in the left column panels of figure 5. Providing more points helped the network converge closer to the ground truth solution obtained by PLUTO as can be seen in figure 5. We also see a reduction in the percentage error of the velocity and an increase in the percentage error of the pressure as we can see in table 2. The overall percentage errors for the predicted fluid quantities for experiment B are E u = 2.71%, E p = 8.93%, E ρ = 10.64%.
The velocity at the top left panel of figure 5 seems to be predicted correctly at the bigger part of the computational domain including the shock region, but the predictedû peak is marginally higher than the ground truth, but lower than the experiment A predicted velocity. The only area diverging from the ground truth are the regions between the shock points and the neighboring points on both sides of the shock. In the middle left panel of figure 5 we can see the pressure, which has a very similar behavior like that one of experiment A. Finally, in the bottom left panel of figure 5 we see that the densityρ is accurately predicted from the beginning of the computational domain up until a short distance before the shock wave. The shock jump is captured accurately thus verifying once more that appropriate data sampling near the shock area helps capture the shock dynamics. Further from the shock area, we see that density shows several oscillations around the ground truth value passing through all 5 data points that are between the shock and the end of the domain.
The overall retrieval of the ground truth is better in this case where we have sampled data points throughout the computational domain compared to the previous experiment. However, we still see that the losses in pressure and density are around 10%.

Experiment C
This is experiment C in table 2. In experiment B the density showed oscillations at larger distances from the Sun, as it dropped in magnitude with inverse square law and the MSE loss had trouble predicting accurately quantities like the density and the pressure with values much smaller than unity and as low as ρ rmax ∼ 10 −3 . Given that, in experiment C we decided to use ten points again, but to rearrange them so that we now place the data points in the areas where experiment B gave higher errors. In this case, we give more weight in the areas before and after the shock by adding more points and we also upsample the area with the lower density values to rectify for the oscillations at larger distances. This way only 3 points are sampled before the shock Table 3. Forward problem with PINNs. The first four experiments correspond to the forward time-dependent problem with output target variables (u, P, ρ), whereas the bottom three cases correspond to the reformulated (u, y1, y2), with y1 = r 2γ P and y2 = r 2 ρ. and 7 points are sampled after the shock. The results for this approach can be seen in the right column panels of figure 5. As shown in table 2, the overall percentage errors for the predicted fluid quantities for experiment C are E ρ = 4.10%, E u = 1.53%, E p = 5.94%, which are the lowest errors and best results for all the experiments presented so far.
The velocity as seen in right top panel of figure 5 seems to be predicted correctly everywhere apart from the area around the r = 5R ⊙ distance. The velocity has the lowest relative error compared to all three predicted profiles. The predicted pressure is shown in the middle right panel of figure 5 and it appears to follow the PLUTO solution closely throughout the entire domain of interest including the shock region. Finally, the densityρ as seen in the bottom right panel of figure 5 is the most accurate prediction and very close to the ground truth solution everywhere apart from a small area near the r = 10R ⊙ distance. The shock jump is also captured accurately similar to the previous experiments.

Forward problem
The forward modeling refers to using a set of PDEs and initial and boundary conditions in order to obtain an accurate solution for the dynamics of the fluid. Similar to classic CFD solvers, PINNs can also be used for forward modeling [e.g. 17,39].
In this case, we are modeling a shock wave, which appears as a discontinuity or jump in the radial profiles for all the physical quantities (u, P, ρ). Neural networks have numerical trouble predicting discontinuous functions and regions of high gradients. For that reason, we introduced a PINNs weight term similar to what was done in [22] in order to capture the shock more accurately and to allow for faster convergence. The [22] weight λ WE has the following form, which we mention here for completeness whereû is the predicted velocity and ε is the parameter that controls the strength of the down-weighting. The weighting term reduces the importance of the PDE residuals near the shock compression area (∇ ·û < 0), thus allowing the neural network to gradually learn to predict the smooth regions before and after the shock with high accuracy and shrinking their distance until a thin discontinuity region is formed. In this series of experiments, we train a fully connected neural network with an input layer with 2 neurons (t, r), five hidden layers consisting of 100 neurons each and a output layer with 3 neurons (u, P, ρ). We use the MSE loss function and train the model using the ADAM optimizer over 60 000 iterations. To train this model, we sample 40 000 data points that are randomly sampled from the spatial domain. Finally, we use hard constraints for the boundary conditions. The setups and the results of these experiments are shown in table 3.
During this study we perform a number of forward experiments in the time-dependent case. In the left column panels of figure 6 we see the results of these experiments for all three dependent variables (u, P, ρ). In particular, solving the forward problem with PINNs is a computationally challenging task, especially for cases like this one, with complex physics, such as discontinuities in the presence of a gravitationally stratified atmosphere. The problems are especially evident when the time horizon we need to integrate the PDEs towards is long, which is the case here. This is quantitatively showcased in the left column panels of figure 6, where we see a collection of experiments each of which uses as initial condition one of the intermediate steps computed by PLUTO and solves the forward problem until t = t ss = 200. As we can see in those plots and in table 3, we start with a very poor prediction of the dependent variables (u, P, ρ) for the first experiment with t init = 100 and gradually as we use an initial condition closer and closer to the steady state, we approach closer to the ground truth solution.
While the velocity is showing a qualitatively and quantitatively improved convergence to the ground truth data (see figure 6 and table 3) as we get closer to the steady state t ss , the density and the pressure show oscillatory behavior with large relative amplitudes locally. This is due to the fact that density and pressure drop drastically with distance roughly following an inverse square law ρ ∝ r −2 due to the gravitational field of the Sun. In order to boost the model performance we define two new quantities y 1 and y 2 and reformulate the PDEs (2)-(4) to solve for these new quantities, with y 1 = r 2γ P and y 2 = r 2 ρ. The results of these experiments can be seen in the right column panels of figure 6 and the bottom 3 rows of table 3. For most experiments with this reformulation we got a huge performance improvement in the overall relative errors which ranges from a factor 2 for the t init = 190-couple of experiments to a factor 16 for the t init = 100 case.

Discussion
In recent years, PINNs have been successfully used for tackling a range of fluid dynamics forward and inverse problems amongst which, there has also been a handful of studies modeling shock waves [20][21][22]. This is the first paper that uses PINNs to model the heliospheric termination shock from the base of the solar corona to the edge of the heliosphere including the gravitational force and a heating proxy.

Generality preserving assumptions
Accurately capturing the time evolution of the solar wind from the solar corona to the edges of the heliosphere requires a complex multi-physics multi-scale model, combined with multi-messenger observations and access to immense computational resources. There is no such universally accepted model to date and in order to carry out this research we need to make a number of assumptions. Namely, we focus on a hydrodynamic model wherein (a) the solar wind outflow has been assumed to be spherically symmetric which reduces the dimensionality of the problem from 3D to 1D, (b) the boundary and initial conditions are not changing with time, which leads to a steady state solution with a standing termination shock, (c) the solar wind acceleration and heating can be approximated by an effective polytropic index. These assumptions still capture the relevant fluid dynamics processes from the outflow of the solar wind to the creation of the shock as described by equations (2)- (4).
The termination shock is located at a distance of about ∼100 AU [e.g. [2][3][4]. In order to reduce the numerical cost of both the CFD and PINN models and without loss of generality, we have reduced the numerical domain to r ∈ [1, 20] R ⊙ for both PLUTO and PINNs models. As explained in detail in section 2, for a given pressure of the environment P rmax , a shock will be formed at the position where the ram pressure of the wind ρu 2 roughly equals P rmax . In other words, we have ensured that there is no loss of generality and all the relevant shock wave physics are captured properly. Including the full scales, i.e. shock at a distance of ∼100 AU, would only require that we increase the computational domain by a factor 10 3 and set the interstellar pressure P rmax to a reduced value approximately scaling as ∝ r −2 , i.e. reduced by a factor of 10 6 . Such a setup would require thousands if not millions of times more computational resources than what we have used. Indicatively, we mention that the forward experiments took roughly ∼1 h each on a single V100 NVIDIA GPU on the reduced domain. Thus we would need months to years of V100 NVIDIA GPU time to solve the problem on a domain r ∈ [1 R ⊙ , 100 AU] with a similar accuracy.
For all the experiments presented in this work we have used synthetic PLUTO data produced with a polytropic index (γ) of 1.1. The inverse PINNs model was able to retrieve the correct polytropic index used for the synthetic data with high accuracy. In general, the polytropic index (γ) is used in the hydrodynamic PDE system to approximate the solar wind heating and acceleration mechanisms. As shown in a recent paper, this formulation cannot be deemed as a realistic approximation of the solar coronal heating and wind acceleration for γ ⩾ 1.25 [49], which further substantiates the results presented in the classical paper by [42], who showed that transonic winds exist provided that γ < 3/2. Motivated from the above physical reasons we chose a polytropic index gamma of 1.1, which is right in the middle of the range 1 < γ < 1.25. In [50], the authors show in figure 1 therein that larger polytropic indexes (γ) lead to solar winds that become supersonic further away from the Sun. For example for a γ = 1.2 (and λ = 3) the standing shock would form at a distance of about 22 R ⊙ and we would need to extend our computational domain to properly capture that solution. It's worth noting here that we also tested the inverse model for the case of γ = 1.2 and we were able to retrieve the correct valueγ = 1.19, which corresponds to a 0.8% error. While the computational cost for larger values of polytropic indexes γ increases as we need to extend the computational domain, the inverse PINNs model is able to retrieve the right γ-value with the same accuracy.

Strengths of PINNs
In this work for the first time we applied PINNs for modeling astrophysical shocks in a gravitationally stratified environment. PINNs operate in fundamentally different ways than classic physics solvers as they learn by example through back-propagation and gradient descent minimization of multiple losses, while classic solvers have evolved through decades of research into robust solvers with well studied behaviors regarding convergence. The inverse problem is based on utilizing observational data to discover unknown physics and as such it is an area that PINNs-that have been constructed to learn from such data-thrive at, as we showed earlier in this work (see e.g. section 3.1). In particular, we showed that PINNs are able to retrieve the correct γ = 1.1 with a relative error ∼1% even in cases when we use a fraction of the available data (e.g. 30 000 out of 150 000 as done in section 3.1).
Moreover, PINNs are uniquely equipped to deal with data assimilation problems as they have been designed to solve PDE systems alongside observational data, which they can incorporate and learn from in a seamless and native fashion. This scenario has special importance and relevance in heliophysics, as there is a plethora of satellites available taking in-situ measurements throughout the heliosphere. In contrast to PINNs, data assimilation cannot be handled by classic CFD codes, apart from the case of determining initial and boundary conditions. In section 3.2 we showed that even with minimal amounts of data, PINNs can solve the PDE system for the physical quantities of interest with increasing accuracy as we gradually provide them with more observational data.
Another advantage of PINNs is the fact that they are very user-friendly [39] and as such they allow for setting up a new problem within a few lines of code which in turn allows for very quick experimentation cycles. We indicatively mention here the forward problem results that we presented in section 3.3. We were able to tackle the poor accuracy in density and pressure due to their multi-scale variability by simply defining new (y 1 , y 2 ) variables and reformulating the PDE system to solve for (u, P, ρ) in a straightforward manner. The PDE solvers in CFD codes while robust, they do not necessarily provide equally easy ways for customization.

Limitations of PINNs
While PINNs are arguably powerful in adding value when observational data are abundant, they still face several challenges and significant improvements need to be made in order for PINNs to get widely adopted for forward modeling realistic heliophysics problems. First-generation neural network architectures such as multi-layer perceptrons cannot match the robustness and accuracy of CFD codes, such as PLUTO, in integrating through long time-periods [39]. In particular, as explained in [39] currently PINNs are slower than finite element codes, they predominantly operate on a single GPU, their neural architectures are relatively simple, a lot of trial and error fine-tuning is required, and they perform poorly in multi-physics and multi-scale problems.
Furthermore, as argued in [51] fully connected PINNs architectures can lead to unstable training and inaccurate results, in particular in complex physical scenarios with multi-scale variability similar to the case examined in this paper [19,52,53]. Wang et al [54] argued that the reason for these challenges is the fact that PINNs attempt to minimize multiple loss terms at once through gradient descent. The neural network optimizer might have to deal with losses that differ by several orders of magnitude which makes the minimization task and reaching a unique solution challenging. To address this issue [32,54] suggested adaptive weighting, which improves results, but does not address the fundamental problems in the PINNs optimization procedure [53].
Some recent studies have tried to address these challenges in a number of ways. Specifically, [33,38] tried to tackle the poor accuracy of PINNs by dividing the computational domain and solving the PDE system in segments. In section 3.3, we show-cased that integrating over segments does indeed improve the model convergence and accuracy. For a more detailed discussion on the issue of PINNs when long-time integration is necessary and possible solutions we refer to [55] and references therein. Finally, operator neural networks as well as more advanced neural architectures both show promising results [55][56][57][58][59][60].

Conclusions
This is the first paper using PINNs for modeling shocks while accounting for the gravitational force. In particular, we used PLUTO simulated data as benchmark to train PINNs in different training setups. We showed that Neural Networks and their powerful back-propagation and gradient descent learning technique can be combined with the physics PDEs to tackle problems from no or little data to data-intensive cases for astrophysical shocks. We successfully used PINNs to solve the inverse problem and retrieve the correct effective polytropic index that was used to produce the PLUTO data with an astounding accuracy of less than 1%. This demonstrates that PINNs can be used to discover underlying physics as captured in observations. In this case we recovered the correct effective γ-index which is a proxy of the solar wind heating and acceleration mechanism. Classic CFD models have been through many decades of maturity and robust benchmarking, but still to this day they are not able to seamlessly incorporate observational data beyond their initial and boundary conditions. We have showcased that PINNs can be a valuable complimentary modeling approach that can utilize all the available observations in a native fashion and inform the final prediction to improved accuracy. Finally, we showed that PINNs are still in the early phases of their maturity and cannot match the robustness and accuracy of classic physics PDE solvers in forward modeling, but they can still produce acceptable results if used for short time horizon integration. In particular, PINNs suffer from competing losses during gradient descent that can lead to poor performance especially when there is multi-scale variability in the fluid variables, as is the case for the density and pressure in this study due to the gravitational stratification of the solar atmosphere. In order to rectify for this we reformulated the conservative PDE system to absorb the inverse square law r −2 decrease of the fluid pressure and density, which vastly improved the convergence and retrieval of the ground truth solution by 2 to 16 times in some cases. PINNs and in general machine and deep learning have been started to get adopted for a number of use-cases in the heliophysics regime.
There is still a lot of work that remains to be done until the maturity of CFD models is achieved by PINNs, but recent results are promising about PINNs acting as an alternative modeling approach in astrophysical fluid dynamics. The 1D solar wind model we used for this study compared to higher dimensionality models has the added benefit that it accepts an analytic solution. This is an important step in the model development and benchmarking process before one moves towards a more realistic and complex multidimensional solar wind model, for which only numerical solutions exist, which in turn come with their own computational errors. In a future study, we plan to generalize this model to 2D including the following most prominent physical considerations, i.e. (a) of the solar rotation and the (b) solar magnetic fields similar to what was done in [50]. Eventually, the goal would be to show that PINNs are able to stand alongside the realistic state-of-the-art 3D solar wind models that include more sophisticated heating and solar wind acceleration mechanisms through e.g. Alfvén wave heating, [see e.g. 61, and references therein].

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://gitlab. qdatalabs.com/open_source/applied_research_papers/pinns_astrophysical_shocks.