Polynomial differentiation decreases the training time complexity of physics-informed neural networks and strengthens their approximation power

We present novel approximates of variational losses, being applicable for the training of physics-informed neural networks (PINNs). The formulations reflect classic Sobolev space theory for partial differential equations (PDEs) and their weak formulations. The loss approximates rest on polynomial differentiation realised by an extension of classic Gauss–Legendre cubatures, we term Sobolev cubatures, and serve as a replacement of automatic differentiation. We prove the training time complexity of the resulting Sobolev -PINNs with polynomial differentiation to be less than required by PINNs relying on automatic differentiation. On top of one-to-two order of magnitude speed-up the Sobolev-PINNs are demonstrated to achieve closer solution approximations for prominent forward and inverse, linear and non-linear PDE problems compared to established PINNs.


Introduction
Machine learning methods are the state of the art in many areas within scientific computing that cover applications such as unmanned vehicle manoeuvring with recurrent neural nets (Liu et al 2022), generating heterogeneous aquifer structures with GANs (Zhan et al 2022), self-supervised encoding features of protein sub-cellular localisation with the cytoself model (Kobayashi et al 2022), or the very popular AlphaFold (Jumper et al 2021) for protein structure prediction.
Partial differential equations (PDEs) are omnipresent mathematical models governing the dynamics and (physical) laws of complex systems (Jost 2002, Brezis 2011).However, analytic PDE solutions are rarely known for most of the systems being the centre of current research, which requires then, the use of numerical methods for solving such problem.
Applying neural networks (NNs) for PDEs seeks for solving an extend spectrum of problems where the current classic methods are limited.For example (Supekar et al 2023) learn the PDE operators describing active hydrodynamic dynamics for active matter; Bakarji and Tartakovsky (2021) apply sparse regression for learning coarse-grained equations; and Li et al (2017), Lusch et al (2018), Pan et al (2023) address reduced order modelling by learning lower dimensional system representations.Recently, methods such as neural operators approximate infinite dimensional operators by NNs exploiting the integral kernel representation of the operators (Lu et al 2021, Kovachki et al 2022).For an excellent survey on the subjects we refer to Brunton and Kutz (2023).
Here, we focus on NN applications solving forward and inverse (linear and non-linear) PDE problems.Prior approaches addressing the task include: physics-informed GAN (Arjovsky et al 2017), deep Galerkin method (Sirignano and Spiliopoulos 2018), and physics informed NNs (PINNs) (Raissi et al 2019).In contrast to classic solvers, PINNs provide a NN surrogate model parametrising the solution space of the PDEs.PINN-learning is given by minimising a variational problem, which is typically formulated in L 2 -loss terms of the form: being approximated by mean square errors (MSEs) in random nodes P, (Long et al 2018, Yang et al 2020).
Our contribution is given by extending the loss formulations to general Sobolev norms and providing efficient computable approximates due to Sobolev cubatures, exceeding the MSE-approximation quality.

Contribution-replacing automatic differentiation by polynomial differentiation
The concept of Sobolev training (Czarnecki et al 2017) is known to have strong regularisation properties when solving variational problems (Zhu et al 2022), especially when it comes to filter low or high frequency noise due to positive or negative-order-Sobolev-losses.The concept rests on using approximations of the Sobolev H k -norms, which are a generalisation of the L 2 -loss form equation (1), including information of the derivatives of the residual in the loss.The derivatives can be computed when the analytical derivative of the ground truth is known.Apart from this rarely appearing case, computations are commonly realised due to automatic differentiation (A.D.) (Baydin et al 2018) or with finite differences (Kissel and Diepold 2020).
We introduce polynomial differentiation (P.D.) realised as Sobolev cubatures, providing close approximations of the Sobolev-norms.P.D. can be applied to general residuals, without requiring any further information and serves as a replacement of A.D.
We prove and demonstrate that by exploiting the Sobolev cubatures, computing the PDE loss terms required for the PINN training results in a reduction of training time complexity and strengthens the approximation power of the Sobolev-PINNs (SC-PINNs) relying on P.D. compared to A.D. based PINNs.Additionally, we address weak PDE formulations by extending the loss approximates accordingly, approaching a broader class of PDEs than being accessible by exploiting classic PINN-losses.
We want to stress that the SC-PINNs do not rely on changing the underlying NN architecture, but are purely given by replacing the loss formulation and its computation.This makes SC-PINNs realisable for any choice of a NN setup, including optimised choices of activation functions, learning strategies, or other hyper-parameter settings.Consequently, former PINN approaches, as some of which we mention below, might benefit from the novel P.D.-based loss formulations being condensed into the SC-PINNs.

Related work
The importance of the present computational challenge is manifest in the large number of previous works.Consequently, an exhaustive overview of the literature cannot be given here.Instead, we restrict ourselves to mentioning those contributions that directly relate to or inspired our work.

Classic numerical methods
Main classic numerical solvers divide into: finite elements (Ern and Guermond 2004); finite differences (LeVeque 2007); finite volumes (Eymard et al 2000); spectral methods (Bernardi and Maday 1997, Canuto et al 2007, Trefethen 2019) and particle methods (Li and Liu 2007).These class of methods provide solutions with high accuracy, but come with the drawback of having limited flexibility with respect to the type of problems they can solve.This includes especially inverse problems, as PDE parameter inference, being a hard challenge for the aforementioned methods.In contrast, the variational formulation defining the PINN training provides the desired flexibility, but comes again with the cost of less reachable accuracy.
We mention the following recent approaches addressing the stability and accuracy of PINNs.
The approach relies on exploiting analytic integration and differentiation formulas of shallow NNs with specified activation functions.The method is extended by using cubatures and automatic differentiation for computing the losses and complemented by a domain decomposition approach.

Inverse Dirichlet loss balancing
The method was introduced by Maddu et al (2021) and shown to increase the numerical stability of PINNS by dynamically balancing the occurring gradient amplitudes, which, if unbalanced, cause numerical stiffness phenomena (Wang et al 2021).The PINN formulation, however, rests on classic MSE losses.

Sobolev training
As aforementioned, the concept was prior considered in (Czarnecki et al 2017, Kissel and Diepold 2020, Zhu et al 2022).While our contribution can be seen as part of this strategy, we pointed to the novelties, we provide here, complementing this former approaches in section 1.1.

Sobolev cubatures
Closer approximations of L 2 -integrals than reachable by prominent MSE approaches can be derived by applying classic Gauss-Legendre cubatures (see e.g.Stroud 1966, 1971, Sobolev and Vaskevich 1997, Trefethen 2017a, 2017b, 2019) or even further extension to what we call Sobolev cubatures, presented herein.The notions of this section follow classic concepts of spectral methods (Canuto et al 2007, Trefethen 2019).We start by fixing the notation used in the article.
Throughout this article we denote with Ω = (−1, 1) m the m-dimensional standard hypercube and with ∥x∥ p = ( We recommend (Adams andFournier 2003, Brezis 2011) for an excellent overview on functional analysis and the Sobolev spaces, given by all L 2 -integrable functions f : Ω −→ R with existing L 2 -integrable weak derivatives From the Trace Theorem (Adams and Fournier 2003), we find furthermore that whenever H ⊆ R m is a hyperplane of co-dimension 1, then the induced restriction denotes the R-vector space of all real polynomials in m variables spanned by all monomials x α = m i =1 x α i i of maximum degree n ∈ N and A m,n = {α ∈ N m : ∥α∥ ∞ ⩽ n} the corresponding multi-index set with |A m,n | = (n + 1) m .We order A m,n with respect to the lexicographic order ⪯.Let D = (d i,j ) 1⩽i,j ⩽|Am,n| ∈ R |Am,n|×|Am,n| be a matrix.We slightly abuse notation by writing with d α,β being the αth, βth entry of D.

Orthogonal polynomials
Let m, n ∈ N we consider the m-dimensional Legendre grids P m,n = ⊕ m i =1 Leg n ⊆ Ω, where Leg n = {p 0 , . . ., p n } are the n + 1 Legendre nodes given by the roots of the Legendre polynomials of degree n + 2 (see e.g.Gautschi 2011) and denote p α = (p α1 , . . ., p αm ) ∈ P m,n .It is a classic fact (Stroud 1966, 1971, Sobolev and Vaskevich 1997, Trefethen 2017a, 2019) that the Lagrange polynomials L α ∈ Π m,n , α ∈ A m,n given by , where δ •,• denotes the Kronecker delta and the efficiently computable Gauss-Legendre cubature weight.Consequently, for any polynomial Q ∈ Π m,2n+1 of degree 2n + 1 the following cubature rule applies: Summarising: polynomials of degree 2n + 1 can be (numerically) integrated exactly when sampled on the Legendre grid P m,n of order n + 1. Thanks to |P m,n | = (n + 1) m ≪ (2n + 1) m this makes Gauss-Legendre integration a very powerful integration scheme yielding for all Q ∈ Π m,n .

Differential operators on Π m,n
We follow Bernardi and Maday (1997), Canuto et al (2007), Trefethen (2019) and derive exact matrix representations of differential operators on the polynomial spaces Π m,n based on equation ( 3).This will allow to extend equation ( 4) to Sobolev cubatures, approximating the Sobolev norms for general Sobolev Consequently, the matrix represents the finite dimensional truncation of the differential operator to be the approximation of As a direct consequence of equation ( 4) the following statement applies: with D β from equation ( 6) be the Sobolev cubature matrix then is an exact approximation whenever f ∈ Π m,n .
For the sake of simplicity we will denote the cubature weight matrix on the domain Ω of degree n as W Ω := W m,n and denote with W k Ω := W k m,n the kth Sobolev cubature matrix on the domain Ω.Likewise for the boundaries we shortly denote We conclude that theorem 1 enables to control the uniform distance ∥f − g∥ C 0 (Ω) on the whole domain Ω.
Corollary 2. Let the assumptions of theorem 1 be fulfilled, and f, g ∈ H k (Ω, R), k > m/2 be two Sobolev functions.Then there exists an n ∈ N (large enough) and a polynomial r θ ∈ Π m,n such that for the residuum r : Furthermore, let us assume that for the approximation of the H k -loss we have: Proof.Due to theorem 1 we deduce that ∥r θ ∥ H k (Ω) = ϵ, and by triangle inequality we have ||r|| The Sobolev embedding theorem (Adams and Fournier 2003) yields the continuous inclusion H k (Ω, R) ⊆ C 0 (Ω, R), which implies that ∥f − g∥ C 0 (Ω) ⩽ ϵ + δ, proving the claim.The existence of r θ comes from the density of polynomials in H k (Ω, R).
In light of the provided perspectives, we propose the following formulations of classic PDE problems.

Strong and weak PDE formulations
We follow Jost (2002), Brezis (2011) to restate classic (weak) PDE formulations and their Sobolev cubature approximations.For the sake of simplicity, we focus on the example of the classic Poisson problem.

Poisson equation
We can weaken the initial regularity assumptions by demanding u ∈ H 2 (Ω, R) to satisfy the PDE in the integral sense, yielding the strong variational formulation: for all test functions ϕ ∈ C ∞ Ω, R denoting smooth functions in Ω with restriction ϕ| ∂Ω smooth almost everywhere.We can weaken the regularity assumptions even more by imposing u ∈ H 1 (Ω, R) to satisfy the following weak variational formulation: where we applied integration by parts and η denotes the (piecewise smooth) normal field of ∂Ω.

Residual loss in terms of Sobolev cubatures
We translate the introduced PDE formulations into variational optimisation problems to be minimised by the PINNs framework.In addition to the Sobolev cubature matrices W m,n,k from equation ( 7), following the result of theorem 1 we introduce the matrices for the strong variational formulation: Again we introduce the simplified notations , where the H k+3/2 -metric of the second residual s strong,k reflects the Trace Theorem, equation ( 2).We propose to approximate L strong,k (u) ≈ r strong,n,k (u) + s strong,n,k (u) by the following Sobolev cubatures: where Definition 2 (strong variational loss).The strong variational loss L var strong,k : where the H where R Ω , S ∂Ω are as in equations ( 12) and (1).

Definition 3 (weak variational loss).
The weak variational loss L var weak,1 : H 1 (Ω, R) −→ R given by L var weak,0 = r var weak,0 (u) + s var strong,1 (u) only differs in the first residual r var weak,0 (u) from L var strong,k by This yields the approximate r var weak,0 (u) ≈ r var weak,n,0 (u), obtained by fixing ϕ = L α * which gives us: where we shortened Furthermore, we define H ∂Ω := TH Ω as the restriction u| ∂Ω to the boundaries given by the discrete trace operator T : Π m,n → Π m−1,n .The discrete weak variational loss is given by: ∇ var weak,n,0 (u) = Summarising, the given notions allow to extend corollary 2 in order to state the following result.
Theorem 3 (Strong solution approximation for Poisson problems).Let u ∈ H k (Ω, R), k ⩾ max{m/2, 2}, m ∈ N be a regular m-variate Sobolev function, then the following holds: then the following chain of implications hold: (ii) Under additional regularity assumptions on f, e.g.f ∈ C ∞ (Ω, R), and if u ∈ H 1 (Ω, R) then all the losses are equivalent, i.e.
(iii) Denoting with L strong,n,k , L var strong,n,k , L var weak,n,1 the loss approximations of degree n ∈ N, definitions 1-3.Then for all ε > 0 there is n = n(ε) and u Proof.Without loss of generality, let us assume that g = 0. Then statement (i) follows from the fact that if u ∈ C ∞ (Ω, R) and L strong,k (u) = 0, u solves the strong Poisson problem, equation ( 9).Hence, ∆u = f almost everywhere in Ω.
, which implies that L var strong,k (u) = 0 and L var weak,1 (u) = 0. Statement (ii) follows from classic PDE theory (Evans 2010) and the Sobolev embedding theorem (Adams and Fournier 2003) and is also a strong solution.Thus, L var weak,1 (u) = 0 ⇒ L strong,k (u) = 0. Statement (iii) follows from the Stone-Weierstrass theorem (Weierstrass 1885, De Branges 1959), stating that polynomials are dense in C r (Ω, R) with respect to the corresponding norms (Adams and Fournier 2003).Hence, there exist polynomial approximations ∥u − u ′ ∥ H k (Ω) < ε, u ′ ∈ Π m,n for n = n(ε) ∈ N large enough.Due to theorem 1 the Sobolev cubature losses L strong,n (u ′ ), L var strong,n,k (u ′ ), L var weak,n,1 (u ′ ) are exact for u ′ ∈ Π m,n proving the statement.Remark 1.We conclude, that for sufficient regular u : Ω −→ R and regular right hand side, e.g.f ∈ C ∞ (Ω, R), theorem 3 theoretically guarantees the existence of uniform approximations of the strong solution of the Poisson problem, which is given by the minimum of any of the losses L strong,n,k (u ′ ), L var strong,n,k (u ′ ), L var weak,n,1 (u ′ ).This argumentation can be extended to more general well-posed regular elliptic and hyperbolic PDEs, (Jost 2002, Brezis 2011).

Gradient flow of PINN training
For a given PINN u = ûθ (•) ∈ Ξ m,1 , of fixed architecture, (optimally) adjusted weights θ ∈ Υ Ξ2,1 are derived by NN training minimising a given PINN-loss L : Υ Ξ2,1 −→ R. Formally, NN-training is realised as a gradient descent, solving the gradient flow ODE where θ 0 is given by the NN initialisation and ∇ 2 denotes the Euclidean gradient.For the proposed Sobolev loss approximations, definitions 1-3, the gradients simplify due to the identities: with the adjoint operator ∂Ω the Sobolev cubature matrices from equations ( 7) and ( 10).An example deriving this notion in detail is given in the appendix.
In the following we investigate the efficiency of the resulting SC-PINNs.

Polynomial differentiation vs automatic differentiation
We present the complexity analysis for both, PINNs based on A.D. and the PINNs based on P.D.
Theorem 4. For a given NN ûθ : Ω → R, with architecture Ξ m,1 consisting of l hidden layers and q neurons per layer, the complexity per epoch for computing the k th derivative ∂ k x ûθ in s ∈ N training points is given by (i) O(2 k−1 lsq 2 ) for a PINN resting on A.D., scaling exponentially with the order of the derivative.

Proof.
For proving (i) we use the fact that a single backward pass required for computing the derivatives, has the same complexity as a forward pass, which is O(lsq 2 ) due to Baydin et al (2018).Thus, for computing the first derivative at s points we need O(lsq 2 ) operations.Due to the chain rule, computation of the second order derivatives causes the size of the dependency graph of the A.D. to double.By recursion of this fact the factor 2 k−1 appears as claimed.
For proving (ii) we recall that the SC-PINN computes the kth derivative by applying the pre-computed differential matrix D k := Π k l=1 D, equation ( 6).Hence, for each epoch the costs is equivalent to the evaluation ûθ (p) at each cubature (Legendre) node p ∈ P m,n , with s = |P m,n | and the matrix vector multiplication D k ûθ (P m,n ).The product has O(s 2 ) complexity per epoch and the evaluation ûθ (P m,n ) has O(lsq 2 ) complexity, yielding the result.
Example 1.We consider a NN with 2D input architecture Ξ 1,1 = {2, 50, 50, 50, 50, 1} given by four hidden layers of length 50.For s = 31 training points in each direction, the computation of a 4th order derivative computed with A.D. has a theoretical computational cost of 7.6 • 10 7 operations while P.D. requires 9.2 • 10 5 operations.In fact, the predicted factor-100-speed-up is achieved for the experiment given in experiment 8 in section 6.8.
In addition to the derivative computation complexity, the SC-PINN formulation exploits the approximation power of the Gauss-Legendre cubature, as it becomes observable in the numerical experiments, section 6.Here, we want to point out the following insight.
Remark 2. We consider the m-dimensional integral I[ f ] := ´Ω f dΩ of a k-times differentiable function, once approximated by the Gauss-Legendre cubature of degree n, I g n [ f 2 ] := α∈Am,n f 2 (p α )ω α and once by the Monte Carlo approximation Trefethen (2017a) the approximations rates scale as: ) Hence, for regular functions, k ≫ m, achieving a similar accuracy with the SC-PINNs requires less points compared to applying PINNs with MSE-losses.The limitations of the Sobolev cubatures start as the dimension m ≫ 1 causes the complexity in theorem 4, (i) to be dominated by the O(s 2 ), s = (n + 1) m term.
Continuing this analysis empirically is part of the next section.

Numerical experiments
We designed the following experiments for validating and demonstrating our theoretical findings.All experiments were executed on the NVIDIA V100 cluster at HZDR.For re-producibility, code of the experiments used in the benchmarks is available at (Suraz Cardona 2022), resting on the open source package MINTERPY (Hernandez Acosta et al 2021), which is based on our recent work (Hecht et al 2017, 2018, 2020, Hecht and Sbalzarini 2018).For comparison we benchmark the following schemes:  For a given ground truth function g : Ω −→ R and a NN approximation ûθ we measure the l 1 , l ∞ -errors ϵ 1 := ∥g − u∥ 1 , ϵ ∞ := ∥g − u∥ ∞ by sampling on equidistant grids of size N = 100 2 in 2D.For inverse parameter inference problems we denote the parameter error with ϵ λ = |λ − λ gt |.
Experiment 1.We compare the performance of the following introduced PINN implementations: (I) PINN and ID-PINN relying in strong mean square (MSE) loss L MSE , equation ( 1).
(II) VPINNs relying on the strong variational PDE loss and domain decomposition specified by the number of its elements N el ∈ N. (III) SC-PINN with strong Sobolev loss, L strong,n,k , equation ( 11), or strong variational Sobolev loss L var strong,n,k , equation ( 13), both with k = 0, degree n = 30 for the domain Ω, and n = 100 for the boundary ∂Ω, given in two versions: Once resting on A.D. and once on P.D.
All methods were implemented for fully connected feed-forward NNs, ûθ ∈ Ξ 2,1 , of 4 hidden layers, each of 50 units length, unless specified otherwise.Activation functions were chosen as σ(x) = sin(x), which performed best compared to trials with ReLU, ELU or σ(x) = tanh(x).All PINNs were trained by applying the Adam optimiser (Kingma and Ba 2014) for 30 000 iterations, training set size T s = |P m,n | for all PINNs, and learning rate of lr = 1 × 10 −3 , whereas ID-PINN applies its dynamic gradient balancing scheme.
Approximation errors and CPU-training times t are reported in figures 1-5.While the classic PINN approach failed to converge (reach reasonable approximations) its results are not reported here.
In comparison SC-PINN reaches several orders of magnitude better approximations: SC-PINN with L strong and A.D. improves by one order, while replacing A.D. with P.D. even increases the accuracy by one further order.In addition the CPU runtime is reduced by factor of 3 when executing SC-PINN with P.D. instead of A.D. The choice of L var strong improves the ε 1 error by one order of magnitude compared to SC-PINN with L strong and A.D., which requires more CPU runtime.
We address a further prominent PDE problems to continue our empirical investigations.

Quantum harmonic oscillator (QHO)
The time independent QHO in 2D, m = 2, corresponds to the Schrödinger equation with the linear potential , see e.g.(Liboff 1980, Griffiths andSchroeter 2018), given by: We consider the eigenvalue problem whereas H n denotes the nth Hermite polynomial.
Experiment 2. We keep the experimental design from experiment 1, and set λ = 15.
Results are reported in figures 6 and 7.As in experiment 1 the classic PINN failed to converge to reasonable solutions.Thus, we skipped reporting its results again.SC-PINNs improve the accuracy up to 2 orders of magnitude in 4-times less runtime.While SC-PINN with strong variational Sobolev loss L var strong,n,k and P.D. performs best, reaching ϵ ∞ = 7.27 × 10 −3 , ϵ 1 = 8.16 × 10 −4 , t ≈ 167, see figure 7, it slightly worsens for the strong Sobolev loss

Poisson problems with hard transitions
We re-investigate PINN-solutions of the Poisson problem in dimension m = 1, whose analytic solutions include hard transitions.That is, choosing with boundary condition g(x) := C(A sin(ωx) + tanh(βx)) yielding u(x) = g(x) to be the analytic solution.
Experiment 3. We consider two scenarios: Apart from choosing ûθ ∈ Ξ 1,1 with 4 hidden layers and 20 neurons each we keep the settings of experiment 1 for all methods.The degree n ∈ N of the Sobolev losses is set accordingly to the training size, n = T s − 1.The VPINNs are executed for two configurations: one with N el = 1 which has T s = 100 as many training points as the SC-PINN.The second one with N el = 3 which has T s = 300 training points, 100 in each element.Similarly, we consider two versions of the ID-PINN for the second scenario S 2 , one with T s = 100 and one with T s = 5000.
The comparison of SC-PINNs and VPINNs for scenario S 1 is reported in figures 8 and 9: while SC-PINN with strong Sobolev loss misses to capture the solution, SC-PINN with variational loss reaches similar order of approximations compared to VPINN with a 1-element decomposition N el = 1.
For the second set of parameters S 2 results are reported in figure 10: we observe a similar behaviour as in scenario S 1 .However, SC-PINN with strong variational loss reaches two order of magnitude better ε 1 -error compared to VPINN with N el = 3. ID-PINN and PINN both fail to converge for scenario S 1 .Consequently,       we present the results only for scenario S 2 in figures 10 and 11.We observe that even by increasing the training size by a factor of 50, ID-PINN still fails to converge, reaching ε 1 -and ϵ ∞ -errors two orders of magnitude worse than SC-PINN.
In the following, we extend our investigations by addressing inverse problems.

2D Poisson inverse problem
We consider the inverse problem for the 2D Poisson equation, seeking for inferring the parameter λ on the RHS f(x) = λ cos(ωx) sin(ωy) for ω = π.
Experiment 4. We keep the experimental design from experiment 1, but skip VPINNs due to their worse performance, figure 2.
Due to figure 12, the SC-PINN recovers the parameter λ with one order of magnitude higher accuracy by requiring almost two orders of magnitude less runtime.This shows the superiority of SC-PINN in both approximation and computational performance for this inverse problem.To support this result, we consider a further inverse problem for the time independent 2D QHO.

2D QHO inverse problem
We pose the QHO eigenvalue problem for unknown eigenvalue λ gt = 5 and seek finding the eigenvalue and the PDE solution simultaneously.
Experiment 5. We repeat experiment 4 for this setting, but applying the SC-PINN with cubature degrees n = 50 for the domain Ω and n = 200 for the boundary ∂Ω.  Figure 13 shows that SC-PINN outperforms the ID-PINN and the PINN by inferring λ with four orders of magnitude better accuracy and recovers the PDE solution with one order of magnitude higher accuracy.On top, SC-PINN requires two orders of magnitude less runtime.This demonstrates once more the superiority of the method SC-PINN when applied to forward and inverse problems.Non-linear problems are addressed below.

2D incompressible, non-linear Navier-Stokes equation
We consider the 2D non-linear incompressible Navier Stokes flow: Let u = (u 1 , u 2 ), u ∈ C 2 (Ω, R 2 ) be the velocity field and p ∈ C 1 (Ω; R) the pressure field, then the equation is given by: Experiment 6.For solving the Navier-Stokes problem, we set the viscosity to ν = 1 40 , the parameter λ := 1 2ν − 1 (2ν) 2 + 4π 2 and the analytic pressure field to p = 1 2 (1 − exp(2λx)) and repeat experiment 1 for the SC-PINN with strong variational Sobolev loss L var strong,n,k resting on P.D.As shown in figure 14 the SC-PINN reaches two orders of magnitude better accuracy than standard PINNs and slightly better accuracy as ID-PINN in more than 4-times less runtime, reinforcing the argument that the SC-PINNs increase the performance for learning even general, non-linear PDE solutions.
We further extend the demonstrations validating this observation by the example below.

Non-linear Burger's equation in 1D
We consider the time independent Burger's equation in conservative form with Dirichlet boundary conditions given by with f(x) := ω 2 sin(2ωx) + ω 2 sin(ωx) .Experiment 7. We will use again the setting for training as in experiment 1 with ûθ ∈ Ξ 1,1 with 4 hidden layers and 50 neurons each.We solve the PDE with a n = 100 degree cubature, ω = 14π and the and strong variational loss for the SC-PINN.We test it against the PINN and the ID-PINN with same number of (random) training points.
According to the results presented in figure 15, SC-PINN performs again better than the other PINN formulations by reaching 2 orders of magnitude smaller ϵ 1 -error, using halve of the runtime.In regard of the complexity analysis, theorem 4, the difference of runtime between A.D. and P.D. is moderate for the first order derivatives, as being reflected by this result.Whatsoever, experiment 7 shows again the benefits of SC-PINNs with P.D. compared PINNs with A.D. to extend to non-linear PDE problems.
We specifically focus on investigating the runtime performance below.

Polynomial differentiation vs automatic differentiation
We close our empirical investigations by directly comparing the runtime performance of PINNs relying on A.D. compared to PINNs relying on P.D. (SC-PINNs), both of same architecture.
To do so, we consider the set of forward problem of the following kth order PDE with Dirichlet boundary conditions given by the analytic solution u gt (x, t) = sin(ωx) cos(ωy).Analytic expressions of the RHS f k (x) = ∂ k x u gt (x, t) + ∂ k y u gt (x, t) are computed straightforwardly.As reported in figure 16 the SC-PINN with P.D. outperforms the standard PINN with A.D. both in accuracy and in runtime.The improvement of runtime performance of P.D. compared to A.D. increases up to a factor of 200, validating the predictions made in our former discussions in section 5, example 1.
Figure 17 reports the computed losses on a test set of 100 equidistant points demonstrating the far better loss-convergence achieved by SC-PINN with P.D. for all derivatives than being achievable by the PINN resting on A.D.
We discuss these and our prior observations and results in regard of the impact our contribution may cause in the field of 'PDE learning' in our concluding thoughts.

Conclusion
We introduced the notion of P.D. due to Sobolev cubatures and gave theoretical insights in order to setup the novel Sobolev PINNs (SC-PINNs).Hereby, the SC-PINNs do not rely on changing the underlying NN architecture, but only on their specific loss formulation.As predicted, by the runtime complexity analysis, theorem 4, for low dimensional problems, SC-PINNs with P.D. result in a faster PDE learning scheme than PINNs relying on A.D.
The numerical demonstrations addressed prominent, linear, non-linear, forward and inverse PDE problems and clearly showed the superiority of the SC-PINNs, reaching significant higher accuracy in by far less runtime compared to classic PINNs (Raissi et al 2019), and recent enhancements as ID-PINNs (Maddu et al 2021) or VPINNs (Kharazmi et al 2019).
Depending on the numerical experiment, the choice of the Sobolev cubature differed.A deeper theoretical understanding of optimal choices of the norm is out of the scope of this article and part of a follow-up study, including potential further extensions, such as: a relaxation of the Sobolev cubatures, mitigating the curse of dimensionality when addressing higher dimensional problems or data given by unstructured grids.
Apart from these potential enhancements the class of low dimensional, dim = 1, . . ., 4, real world problems being addressable by the present SC-PINNs is huge.The demonstrations, present here, make us believe that applying the SC-PINNs might be beneficial for many scientific applications across all disciplines.
By exploiting the definition of the transpose, we obtain Note that the function u is in our setting parameterised by θ.Therefore, we want to solve the gradient flow with respect to the parameters w, yielding the gradient Hence, the (vectorial) gradient ODE flow becomes providing the optimal parameters w due to its minima.Note that if we discretise the gradient flow from equation ( 23), we recover the standard (vectorial) gradient descent

Figure 12 .
Figure 12.Solution and errors for 2D Poisson inverse problem with λ = π, SC-PINN with strong variational Sobolev loss and P.D.

Figure 14 .
Figure 14.Solution and errors for 2D Navier-Stokes problem, SC-PINN with strong variational Sobolev loss and P.D.

Figure 15 .
Figure 15.Solution and errors for 1D Burger's forward problem, SC-PINN with strong variational Sobolev loss and P.D.

Figure 16 .
Figure 16.Computational cost and errors of the SC-PINN with P.D. and the PINN with A.D., executing experiment 8.

Figure 17 .
Figure 17.Loss behaviour for the PDEs in equation (21) with A.D. vs P.D. due to Sobolev cubatures.