Analytic auto-differentiable ΛCDM cosmography

I present general analytic expressions for distance calculations (comoving distance, time coordinate, and absorption distance) in the standard ΛCDM cosmology, allowing for the presence of radiation and for non-zero curvature. The solutions utilise the symmetric Carlson basis of elliptic integrals, which can be evaluated with fast numerical algorithms that allow trivial parallelisation on GPUs and automatic differentiation without the need for additional special functions. I introduce a PyTorch-based implementation in the phytorch.cosmology package and briefly examine its accuracy and speed in comparison with numerical integration and other known expressions (for special cases). Finally, I demonstrate an application to high-dimensional Bayesian analysis that utilises automatic differentiation through the distance calculations to efficiently derive posteriors for cosmological parameters from up to 106 mock type Ia supernovæ using variational inference.

All of the aforementioned cosmological simulators require calculating cosmographic distances, which are also a key ingredient in the modelling and data analysis of standard candles, sirens, and rulers, volumetric rates and densities, the cosmic microwave background radiation, Ly α forests in quasar spectra, as well as in the studies of galaxy properties and evolution.In the general case, cosmographic calculations require evaluating integrals numerically, which is both slow and not trivially parallelisable, while requiring a further numerical integration for the gradient calculation.1

JCAP07(2023)065
Here I describe a unified analytic approach to distance calculations in a general ΛCDM+radiation cosmology, formulated in terms of the Carlson elliptic integrals.These special functions were introduced by [7] as an alternative to Legendre's solutions to elliptic integrals.They are more suited to analytical rather than geometrical problems and offer two further advantages: first, fast and precise numerical algorithms for their evaluation exist [10], which makes their implementation for GPUs straightforward; and second, their derivatives can also be evaluated analytically without the use of additional special functions.Furthermore, a general reduction scheme [11] allows all cosmographic distances to be calculated within the same framework.
This paper is structured as follows.Section 2 describes cosmographic distances in the ΛCDM+radiation setting and discusses the Carlson elliptic integrals and their derivatives.The explicit solutions are presented in section 3, with simplified expressions for the radiationless case given in appendix B.1.A GPU-enabled automatically differentiable implementation in the phytorch.cosmologypackage is introduced in section 3.2, and its accuracy and speed are briefly examined.Finally, an application to gradient-based cosmological inference from standard candles is demonstrated in section 4, and conclusions are drawn in section 5.

ΛCDM cosmology
In the homogeneous and isotropic cosmological model of Friedmann, Lemaître, Robertson, and Walker (FLRW), the evolution of a universe is described by the dimensionless Hubble parameter E(t), whose form is determined by the Friedmann equations based on the universe's composition.
From a few seconds after the Big Bang onward, the large-scale evolution of the Universe is well described by a homogeneous non-interacting mixture of radiation,2 matter, and dark energy in the form of a cosmological constant (Λ), constituting the ΛCDM+radiation cosmological model.The amounts present of each component (including the effective contribution of spatial curvature) are quantified by the dimensionless density parameters {Ω i (z)} i∈{r,m,Λ,k} , which have different evolution laws: with the time axis parametrised by the cosmological redshift3 z and Ω i0 the dimensionless density parameter of component i at the present time (z = 0). 4The parameters are defined to sum to one at all times, which is often used to express the curvature parameter as Ω k = 1 − i =k Ω i .Measured values for the current epoch are Ω m0 and Ω Λ0 of order unity with a dominance of dark energy, Ω r0 ≈ 10 −5 , and Ω k0 ≈ 0 consistent with a spatially flat universe [60].By the first Friedmann equation ( [59], eq. ( 13.3)), (2.2) JCAP07(2023)065

Cosmography
Three fundamental distance integrals: the radial comoving distance, time coordinate, and absorption distance, are defined, respectively, as ) Analytical solutions to eqs.(2.3) to (2.5) in terms of special functions, which can be evaluated to arbitrary precision by numerical algorithms that do not (directly) reference the defining integral, have been discussed in the literature only for certain special cosmologies.For example, the comoving distance in a flat ΛCDM universe can be expressed using the Gauss hypergeometric function [2], the Legendre elliptic integrals [22,55,77,78], which are also applicable in the non-flat case ( [23,69], see also references therein), and the Carlson elliptic form [49]. 5 Ref. [20] presented a solution valid also in the presence of radiation and curvature that makes use of the Weierstrass elliptic function, 6 and finally, ( [70], appendix B) used Carlson's basis to solve for the time coordinate. 7he formulae presented in this work are the first to unify the different distance calculations in one framework applicable in the most general case of a ΛCDM cosmology with non-zero curvature and in the presence of radiation.They also have the advantages of fast and parallelisable numerical evaluation and easy to compute gradients with respect to the cosmological parameters.

The Carlson symmetric elliptic integrals
The integrals eqs.(2.3) to (2.5) cannot be expressed for general {Ω i0 } as elementary functions.However, if E 2 (z) is a polynomial of degree up to four, they are instances of elliptic integrals, which can be reduced to a linear combination of a small set of basis integrals, e.g.Legendre's JCAP07(2023)065 elliptic integrals [6,45].The resulting expressions are present in most comprehensive tables of integrals (e.g.[6,27]). 8hile Legendre's formulation is well suited to geometric problems, an alternative basis, the Carlson symmetric form ( [7], see also [12], section 19), is the natural choice when dealing with rational functions.By preserving the original permutation symmetry in the polynomial roots, it unifies the different cases that select the correct branches of the square roots involved.Furthermore, for the Carlson basis integrals there are efficient numerical algorithms with guaranteed fast convergence to an arbitrary precision in the whole complex plane9 [10].In fact, these algorithms are the basis for some numerical implementations of Legendre's integrals ( [62], section 6.11).
The Carlson integrals of the first and third kind, which form the basis for reduction, are defined by: It is useful to define also the degenerate versions: ) The functions are well-defined for all complex x 1 , x 2 , x 3 except the non-positive reals (for which the integrand has poles along the integration path) and for all non-zero w (the Cauchy principal value is assumed if Computing derivatives of the Carlson integrals is closed, i.e. does not require any other special functions: (ref.[12], chapter 19, eq.(19.18.1)), and ) ref. [75], where R F and R D are evaluated at (x 1 , x 2 , x 3 ), and R J at (x 1 , x 2 , x 3 , w).Derivatives with respect to x 1 and x 2 are obtained by symmetry, while those of R C and R D via their respective definitions and the chain rule.In eqs.(2.11) and (2.12) the limits have to be explicitly implemented when arguments are repeated (e.g.w = x i or x i = x j : see [75] for details).10

JCAP07(2023)065
Reductions of particular elliptic integrals to the Carlson form were tabulated by [8,9], while general reduction schemes, which can be easily implemented in computer algebra systems or purely numerically, were later presented by [11] and [28].Since the Carlson basis involves integrals with fixed bounds, the associated reduction scheme applies to definite integrals, while reduction to Legendre's form usually considers the indefinite version, thus requiring two evaluations of the resulting expression or additional manipulation.

A unified analytic approach to distance calculations
In Carlson's tables [8] elliptic integrals are standardised as parametrized by the n-tuples (i.e.ordered sets) p ≡ [p i ] n i=1 and a ≡ [a i ] n i=1 .The p i must be integers, and exactly m = 3 or m = 4 of them must be odd (corresponding to E 2 being a degree-3 or -4 polynomial).Each of the factors z + a i is assumed to be positive on the interval (z 1 ; z 2 ). 11o make use of the Carlson basis, one must, therefore, factorise E 2 (z).Since in the ΛCDM+radiation framework described above it is a degree-four polynomial (cf.eq.(2.2)), this can be achieved using well-known explicit algebraic formulae ( [64], section 1.11(iii)) or established numerical methods, e.g.via the companion matrix12 ([62], section 9.5).We write the resulting factorisation as where m = 4, α m = Ω r0 is the leading coefficient of the polynomial, 13 and [r j ] m j=1 are the, generally complex, roots of E 2 (z) = 0. Since this effectively re-parametrises the problem with {Ω i } → {r i }, one needs to also compute the gradient of the root calculation, which is discussed in appendix A.
Using eq.(3.2), the integrands of eqs.(2.3) to (2.5) can be recast in the form with p m+1 ∈ {0, −2, 4} respectively.The factorisation of the square root into a product of square roots is valid since E 2 is real.
For physical solutions, we require that no root lie on the integration path, so that the integrand does not diverge, i.e. that z = 0 is smoothly connected to the Big Bang at z → ∞.Parameter values for which this is not satisfied, exclusively with Ω m0 < 0 or Ω Λ0 > 1, are depicted in red in figure 1. : when all roots have negative real part (green region), distance calculations are numerically stable; if there is a complex-conjugate pair with positive real part (yellow region), the numerical stability of evaluating eqs.(3.8) and (3.9) is affected by the choice of permutation of the roots; finally, the red region comprises the parameters of unphysical universes without a Big Bang, for which E 2 has two positive real roots, and so the Hubble parameter diverges at finite redshift.

JCAP07(2023)065
where {i, j, k, l} is any permutation of {1, 2, 3, 4}, ∆z ≡ z 2 − z 1 , d ij ≡ r j − r i (with the special cases d i0 ≡ −1 and d i5 = −(1 + r i )), and (The quantities u i0 , s i0 , q i0 are calculated as u i5 , s i5 , q i5 but with all factors like + 1 → 1, which corresponds to treating (z + a 5 ) as identically one instead of (1 + z).Note also that I have modified slightly the expressions from [8] by taking factors of ∆z outside the function arguments where possible via the homogeneity relations ([8], eq.(1.4)).)Notice that eqs.(3.8) and (3.9) are not explicitly symmetric in {i, j, k, l}, i.e. in the ordering of the polynomial roots.Even though the result does not in principle depend on the exact chosen permutation, owing to the properties of the Carlson integrals, numerical evaluation might still be affected. 14After some experimentation, I have found that simply picking r i and r j to be the biggest roots 15 by absolute value gives good results across parameter space and redshift range for all three distances and so defer an in-depth investigation to future work.

Implementation: phytorch.cosmology
The expressions eqs.(3.7) to (3.9) have been implemented in the python package pytorch.cosmology,a sub-library of phytorch, 16 which includes CUDA kernels for the Carlson elliptic integrals, interfaced through PyTorch [58], along with the respective gradient expressions (eqs.(2.10) to (2.12)) for use with PyTorch's automatic-differentiation engine. 17he Legendre integrals are also available through their relations to the Carlson form, which enables implementations of the other known cosmographic distance formulae valid in special cosmologies.Finally, the general reduction schemes of [11] and [28] are implemented in phytorch for completeness.

Accuracy
I briefly verify the phytorch.cosmologyimplementation against numerical integration18 for a range of redshifts, considering two setups: a flat radiation-less universe and a non-flat one with a small amount of radiation present.As shown in figure 2, there is generally good JCAP07(2023)065 agreement in calculating the comoving distance and lookback time using expressions from this paper (both in the Legendre and Carlson formulations). 19Because of the sheer size of eq.(3.9), the absorption distance calculation is prone to small roundoff errors, which also strongly depend on the permutation of roots chosen.Lastly, the accuracy of the root calculation itself, which is not trivial when the coefficients span different orders of magnitude, affects the distance estimates, and in that respect numerical root-finding methods worked better than the algebraic formulae.

Speed
Although speed is not the primary focus of the current study, I examine the performance of the analytic formulae and of numerical integration in the same two setups as in subsection 3.2.1.
In the radiation-less, and more importantly, flat case, I also include the hypergeometric solution20 [2].
All tests were performed in a single-threaded setting on a CPU.While analytic implementations are trivially parallelisable on multi-threaded CPUs and on GPUs, parallel numerical integration is easy for different limits (z 1 , z 2 ) but harder for differing cosmological parameters since it requires multiple evaluations on the integration grid, and this is why it usually does not benefit much from GPU acceleration.
The results are summarised in figure 3. Carlson's formulation exhibits a step-like speedup for small ranges of integration due to the decreasing number of iterations needed in the numerical algorithm.The bahaviour of Legendre's formulation is, instead, reversed, and JCAP07(2023)065 ), corresponding to the solution of [55])." 2 F 1 " is equivalent to the formula of [2], and "quad" stands for numerical integration using SciPy.All experiments use 1 CPU thread.
the speed-up occurs for large upper limits.Numerical integration behaves similarly and, logically, is highly performant for small ranges when the integrand is approximately constant.For high redshifts, though, it can be more than an order of magnitude slower than analytic formulae; however, this difference may well be due to the very different implementations used.Finally, as [2] observed, the hypergeometric solution has complicated behaviour with a drop in performance around z = 1 due to the nature of the numerical algorithm. 21he same conclusions apply also when the radiation term is included in E 2 (z) (see the dashed lines in figure 3) with almost no noticeable difference in the evaluation of the comoving distance, a factor < 2 slowdown for the time coordinate and less than an order of magnitude for the absorption distance.Since eqs.(3.8) and (3.9) are not symmetric in the roots of E 2 (z), even though the final numerical result is the same, using different permutations significantly affects the runtime since the required iterations to calculate R J are different.

Application: Bayesian inference of standard candles
Probably the most prominent application of cosmographic distances is in inference using standard(isable) candles (a "neoclassical test of cosmology" according to [59]).A standardisable candle is an object whose intrinsic brightness (absolute magnitude) can be derived from other observables.Its observed brightness, then, can be used to derive the luminosity distance to it via the inverse square law. 22By obtaining in addition the cosmological redshift, one can in principle constrain the parameters of the cosmological model.

JCAP07(2023)065
Current analyses of type Ia supernovae (SNae Ia), the most widely used standardisable candle, rely on sophisticated Bayesian hierarchical models (BHMs) to infer, on one hand, global properties of the SN Ia population, and, on the other, intrinsic characteristics of individual objects [33, 52-54, 65, 68].While applying these models to current datasets (e.g.[67]), containing ∼ 1000 SNae Ia, is still feasible with sampling-based methods like Gibbs sampling and HMC, 23 the necessity to explore the whole parameter space, whose size scales with the number of observed SNae, makes traditional inference techniques prohibitively expensive for analysis of the up to 10 6 SNae Ia expected in the near future from LSST [37,44].
The burden of high dimensionality can be alleviated24 by using modern gradient-based Bayesian techniques like variational inference (VI), in which the posterior is approximated by a tractable distribution, commonly referred to as the variational proposal or guide, implemented either in a simple analytic form, or with a neural network-based density estimator.Parameters of the proposal are determined by maximising a specifically designed function, called the evidence lower bound (ELBO), which includes the model likelihood conditioned on the data.Automatic differentiation through the model is, thus, the key to making VI a viable technique for high-dimensional inference since it enables optimisation with stochastic gradient descent.Even though one is still required to infer all model parameters jointly, it is possible to purposefully ignore some correlations and/or derive only point or Gaussian a posteriori estimates for certain parameters, thus simplifying the structure of the proposal and the learning task, if one is careful not to introduce biases (or is at least conscious of them).For a recent review of VI and the mathematical formalism, refer to [81]; see also ([3], subsection 11.3.2) for further discussion.
In this demonstration, I use a very simplified model for supernova cosmology, presuming to have measured the redshifts ẑs and derived standardised distance moduli μs , of N type Ia supernovae. 26For the latter I assume Gaussian likelihoods with equal and independent uncertainties: μs |µ s ∼ N (µ s , σ 2 µ ), where σ µ is the combined uncertainty due to residual scatter after standardisation and measurement noise.For the redshifts I adopt the toy model of [63], which also has a Gaussian likelihood albeit with a redshift-dependent variance: ẑs |z s ∼ N (z s , (1 + z s ) 2 σ 2 z ), imitating a photometric estimate, and uses as prior a physically motivated and simple to calculate gamma distribution with rate parameter β.In this example, I fix β = 3, σ µ = 0.14, σ z = 0.04 and assume radiation-less ΛCDM, since the effect of Ω r is negligible at low redshifts.This leaves Ω m0 and Ω Λ0 as the free parameters in addition to the SNae's latent redshifts {z s }.The model is depicted graphically in figure 4b.I analyse mock datasets with number of SNae Ia ranging from 1000 to 10    Results from HMC sampling and VI fits to the mock datasets are presented in figure 5.Because of the computational cost of HMC (generating 1000 posterior samples with 10000 SNae Ia took ≈ 2 h on a high-end workstation), it was only applied to the datasets with 1000 and 10000 SNae Ia.In contrast, VI can analyse quickly (in ≈ 1 h on the same workstation with an NVIDIA A-100 GPU) up to 10 6 SNae Ia.To enable this, I use a partial multivariate normal (PMVN) proposal distribution, which accounts for correlations among the two cosmological (global) parameters and between them and each individual SN's latent redshift but ignores additional posterior correlations between different SNae: full details are given in ( [41], subsection 4.2).Due to the conditional structure of the model, a PMVN is -11 -

JCAP07(2023)065
sufficient in this case, especially for large observed samples when the cosmological posteriors do approach Gaussianity.
While the particular results of figure 5 are not a focus of this work, we note that VI is successful and efficient for this simple model, with the posterior size shrinking in each dimension as 1/ √ N , as expected, and covering the parameter values used to produce the mock data.To the best of my knowledge, this is the first application of VI to cosmological inference with standard candles, albeit with a toy model and mock data.Extending the inference to more realistic models and real datasets, however, requires significant improvements to the guide so that correlations in high dimensions are properly accounted for. 27

Conclusion
I have presented analytic expressions for distance calculations in a general (possibly curved) ΛCDM+radiation cosmology, which utilise the Carlson elliptic integrals, and so can be evaluated using fast numerical algorithms with guaranteed precision.Furthermore, differentiating the Carlson integrals analytically does not require any additional functions, which makes them easy to include in high-performance libraries with automatic differentiation.The formulae have been implemented in the phytorch.cosmologypackage, part of phytorch, which provides PyTorch-interfaced GPU kernels for various special functions.Their speed and accuracy have been briefly examined in comparison with numerical integration and other known analytic expressions (applicable only in certain special cases like zero curvature and/or no radiation).Finally, I have demonstrated an example application to cosmological analysis of up to 10 6 mock type Ia supernovae using high-dimensional variational inference, which relies on automatically computing gradients through the distance calculations.

Figure 1 .
Figure 1.Nature of the roots of E 2 (z) = 0 for different values of the leading coefficient Ω r0 : when all roots have negative real part (green region), distance calculations are numerically stable; if there is a complex-conjugate pair with positive real part (yellow region), the numerical stability of evaluating eqs.(3.8) and (3.9) is affected by the choice of permutation of the roots; finally, the red region comprises the parameters of unphysical universes without a Big Bang, for which E 2 has two positive real roots, and so the Hubble parameter diverges at finite redshift.

− 5 Figure 2 .
Figure 2.Relative difference between distance calculations (from z 1 = 0 to the redshift on the abscissa) using the formulae from this paper and using numerical quadrature.For comparison, the dashed red lines show the reported error estimate of quad itself (divided by the actual value).The setup for the top row is a flat radiation-less universe with Ω m0 = 0.3 and Ω Λ0 = 0.7, while for the bottom row an additional radiation component with Ω r0 = 10 −5 is present.Expressions labelled "Carlson" are those in eqs.(B.1) to (B.3) (top row) and eqs.(3.7) to (3.9) (bottom row), while "Legendre" labels eqs.(B.4) and (B.5) (top row) and eqs.(B.6) and (B.7) (bottom row).

Figure 4 .
Figure 4. Mock SN Ia data and the model used to generate and analyse it.
Example mock data with 1000 SNae Ia.Top: histograms of the latent and observed redshifts, z s and ẑs , in comparison with the z prior, also used to draw z s in the simulator.Bottom: observed Hubble diagram as dots and the underlying true relation as a line.Graphical representation of the SN Ia inference model, also used to generate mock data.Shaded boxes indicate parameters, and shaded circles the observed data.β, σµ, σz are the model inputs (fixed parameters) in this work.The plate labelled "SN" indicates the conditional independence of each supernova (with index label s).
Λ0 from mock data with increasing numbers of observed type Ia supernovae.Blue ellipses are results of a VI fit as described in the text, while the red contours were derived with HMC (only performed up to 10000 SNae Ia and only up to 2-sigma shown).Note the different scales for each plot.The values used to produce the mock data are indicated with a star.