Reconstructing the Hubble parameter with future Gravitational Wave missions using Machine Learning

We study the prospects of Gaussian processes (GP), a machine learning (ML) algorithm, as a tool to reconstruct the Hubble parameter $H(z)$ with two upcoming gravitational wave missions, namely the evolved Laser Interferometer Space Antenna (eLISA) and the Einstein Telescope (ET). Assuming various background cosmological models, the Hubble parameter has been reconstructed in a non-parametric manner with the help of GP using realistically generated catalogs for each mission. The effects of early-time and late-time priors on the reconstruction of $H(z)$, and hence on the Hubble constant ($H_0$), have also been focused on separately. Our analysis reveals that GP is quite robust in reconstructing the expansion history of the Universe within the observational window of the specific missions under consideration. We further confirm that both eLISA and ET would be able to provide constraints on $H(z)$ and $H_0$ which would be competitive to those inferred from current datasets. In particular, we observe that an eLISA run of $\sim10$-year duration with $\sim80$ detected bright siren events would be able to constrain $H_0$ as good as a $\sim3$-year ET run assuming $\sim 1000$ bright siren event detections. Further improvement in precision is expected for longer eLISA mission durations such as a $\sim15$-year time-frame having $\sim120$ events. Lastly, we discuss the possible role of these future gravitational wave missions in addressing the Hubble tension, for each model, on a case-by-case basis.


INTRODUCTION
After the profound success of the LIGO Scientific and Virgo Collaborations in paving the way for gravitational wave (GW) astronomy (Abbott et al. 2016(Abbott et al. , 2017a(Abbott et al. ,b, 2019(Abbott et al. , 2021b,c),c), a number of ground-and space-based GW missions are currently being planned, most of which are expected to be functional within a couple of decades.The space-based Laser Interferometer Space Antenna (LISA) (Amaro-Seoane et al. 2017;Klein et al. 2016;Tamanini et al. 2016) and the ground-based Einstein Telescope (ET) (Maggiore et al. 2020;Sathyaprakash et al. 2010) are particularly promising future missions in this regard.Alongside their conventional prospect of contributing to GW astronomy, their roles in cosmology and cosmography are also being investigated to a thorough extent.Designed to explore farther into the redshift space than currently available experiments, these missions are potentially interesting probes of the Hubble parameter as they will be capable of providing an independent measurement of the luminosity distance based on the GW waveforms of detected standard siren events (Schutz 1986).In fact, the pressing need for constraining the Hubble constant ( 0 ) through multi-messenger observations at intermediate redshifts is driven by a fundamental inadequacy of the current cosmological paradigm.
The concordance ΛCDM model of cosmology is, of late, facing tensions on multiple fronts when subjected to observational tests.The Hubble tension, being the most severe one among them at ∼ 5, has lately been a matter of much deliberation.The inconsistency between the measurements of  0 from Planck 2018 (67.36 ± 0.54 km s -1 Mpc -1 ) by Aghanim et al. (2020a) and from direct observations such as SH0ES (73.30 ± 1.04 km s -1 Mpc -1 ) by Riess et al. (2022) warrants a deep investigation into the systematics and methodologies of the missions, as well as into the assumed cosmological models.On closer inspection, the conflict extends beyond a Planck-vs-SH0ES dichotomy and appears to be one between various early time measurements of  0 on one hand and late time measurements on the other.While the Planck 2018 best-fit value of  0 sits well with DES clustering and weak lensing data combined with BAO spectroscopic surveys (Abbott et al. 2018b), a number of complementary direct measurement probes besides SH0ES consistently report a higher value (Jang & Lee 2017;Freedman et al. 2019Freedman et al. , 2020;;Yuan et al. 2019;Huang et al. 2010;Pesce et al. 2020;Wong et al. 2020).With the systematics being an unlikely culprit (Aghanim et al. 2017;Kevin et al. 2019;Vagnozzi 2020), a plethora of alternatives to the vanilla ΛCDM model have been proposed to resolve this tension (Knox & Millea 2020;Di Valentino et al. 2021;Shah et al. 2021;Efstathiou 2021;Schöneberg et al. 2022;Dainotti et al. 2021Dainotti et al. , 2022)).Furthermore, on the basis of some generalized classes of cosmological models, it has also been shown that the Hubble tension might in fact be quite generic to current datasets (Bhattacharyya et al. 2019).In recent years, these findings have motivated the community to look forward to future GW missions as an alternative probe of  0 at intermediate redshifts, distinct from both early and late-time measurements.
The quantitative scope of constraining  0 using GW data is under active investigation and has shown promise based on existing datasets (Fishbach et al. 2019;Soares-Santos et al. 2019;Abbott et al. 2021aAbbott et al. , 2017cAbbott et al. , 2023;;Fung et al. 2023), although the limited number of events at present entails wide error bars.Such prospects at present are also, quite expectedly, restricted to fairly low redshifts.For example, Borhanian et al. (2020) show that the upgraded LIGO and Virgo facilities would be able to constrain the Hubble constant to ≲ 2% accuracy via a simultaneous measurement of the waveform, and host galaxy redshift from the GW waveform, using dark sirens within  = 0.1 in five years.On the other hand, Chen et al. (2018) conclude that a 2% measurement of  0 would be possible in five years, provided the events have reliable electromagnetic (EM) counterparts, or the host galaxy redshifts can be determined using statistical approaches, up to redshifts of  = 0.5.In order to tackle the degeneracy between source masses and redshifts of binary black hole systems, Farr et al. (2019) propose a novel method of determining the redshifts of such mergers relying on pair-instability supernova processes.They predict that  () may be reconstructed at 6.1% in one year, and to 2.9% in five years, at the pivot value of  = 0.8.For cases where EM counterparts are absent, Gray et al. (2020) study the effects on inferring redshifts of GW events using incomplete galaxy catalogs.Such lines of inquiry, alongside the inherent limitations arising from sensitivity bounds of the present observatories, naturally lead to the question of how efficient next-generation GW missions might be when it comes to the question of constraining the expansion history of our Universe.Constraints based on future GW data are expected to be much more precise due to significantly higher numbers of detectable events up to higher redshifts, whose prospects have been explored by multiple authors (Messenger & Read 2012;Jin et al. 2023;Zhang et al. 2023;Califano et al. 2023;Seymour et al. 2022;Liu et al. 2022;Ghosh et al. 2022;Chen et al. 2022;Cao et al. 2022;Feeney et al. 2021;Mortlock et al. 2019;Kyutoku & Seto 2017;Jana et al. 2023).However, the predicted results show dependence on the choice of cosmological parameters when real data is absent and forecasts are carried out using mission-specific mock standard siren catalogs (Shah et al. 2023).
In the absence of a consensus model for cosmic acceleration, there have been attempts to reconstruct the evolutionary history that directly fits with observations without assuming any particular model parametrization (see Bilicki & Seikel (2012); Ishida & de Souza (2011);Gómez-Valent (2019); Arjona & Nesseris (2020); Bengaly et al. (2020); Wang et al. (2020); Mukherjee & Banerjee (2021a); Mehrabi & Rezaei (2021) and references therein).So, a non-parametric reconstruction of the Hubble parameter as a function of redshift using simulated future GW mission data may serve as a powerful predictive tool at this stage, until we have direct data from those missions at hand.Machine learning (ML) algorithms like Gaussian processes (GP) may play an important role for this purpose.
Gaussian processes (Rasmussen & Williams 2005) are a generic supervised learning algorithm designed to solve regression and probabilistic classification problems, depending on whether the output is continuous or discrete.Over the last decade, GP has become particularly popular in cosmology for reconstructing dark energy (Holsclaw et al. 2011;Seikel et al. 2012;Zhang & Li 2018) and modified theories of gravity (Zhou et al. 2019;Belgacem et al. 2020;Yang 2021;Bernardo & Levi Said 2021), exploring the interaction between dark matter and dark energy (Yang et al. 2015;Mukherjee & Banerjee 2021b;Cai et al. 2017), testing the standard concordance model and distance duality relation (Nair et al. 2015;Rana et al. 2017;Mukherjee & Mukherjee 2021;Mukherjee et al. 2023), constraining spatial curvature (Yang & Gong 2021;Dhawan et al. 2021;Mukherjee & Banerjee 2022) and in cosmographic studies (Shafieloo et al. 2012;Mukherjee & Banerjee 2021a;Jesus et al. 2022).For a pedagogical introduction to GP, one can refer to the Gaussian process website1.
In this paper, we propose a novel method to reconstruct the Hubble parameter using Gaussian process regression (GPR) from future gravitational wave mission data.This exercise will further help us investigate the status of the Hubble tension, complementary to the latest available datasets.We take eLISA (Amaro-Seoane et al. 2017;Klein et al. 2016;Tamanini et al. 2016) and ET (Maggiore et al. 2020;Sathyaprakash et al. 2010) as two specific examples of next-generation missions to focus on.Taking each mission's specifications into account, we generate mock catalogs of standard sirens for a few representative cosmological models, and employ GPR to reconstruct  () and thereby infer  0 directly in a non-parametric manner.Owing to our focus on luminosity distance data, we have specifically chosen models which encapsulate late-time modifications to ΛCDM.For each of the models, we carry out our analysis on two different sets of catalogs based on two distinct sets of fiducial parameter values, which are obtained by running MCMC for that model on early-time and late-time datasets separately.This disparate treatment is warranted due to the persistent tension between the values of  0 inferred from early and late-time data.In addition to helping us avoid any potential bias which might otherwise result from choosing any particular dataset as sacrosanct in an a priori fashion, this methodology allows us to study the dependence of the reconstruction procedure on the chosen priors in a transparent manner.Our study shows that GP is a promising tool which can be used to reconstruct the Hubble parameter  (), and hence also constrain the Hubble constant  0 , using next-generation GW mission data, where 10-years of eLISA results would be at par with 3-years of ET results as far as the precision level is concerned.Besides, the reconstruction results obtained by us also give a concrete estimate of how much the constraints might improve if the eLISA mission runs for a longer duration, say for 15-years.
The paper is organized as follows.In Sec. 2, we outline our adopted methodology, present the latest relevant constraints from current datasets that are used as fiducial values, and discuss the procedure for generating the mock standard siren catalogs.In Sec. 3, we give a brief description of the Gaussian process regression technique and provide the details regarding the implementation.Sec. 4 contains our obtained reconstruction results, followed by a detailed analysis of the same.Finally, we summarize our key findings and make some concluding remarks in Sec. 5.

STEPS TOWARDS RECONSTRUCTION
In pursuit of our goal to obtain a non-parametric reconstruction of the Hubble parameter, we implement the steps outlined in the ensuing subsections, taking into consideration the following arguments: 1. We generate realistic mock catalogs that incorporate instrumental specifications and account for all possible sources of error for two future GW missions, eLISA and ET separately, to reconstruct the Hubble diagram along with its associated errors.
2. In order to simulate a mock catalog, a set of fiducial parameter values must be specified.However, there is no unique choice of fiducial values and this choice may influence the results.Therefore, to select fiducial values that are most realistic, we have taken into consideration the constraints from present datasets on each model.
3. To ensure that our analysis is unbiased towards any specific realization, as the results may be sensitive to the initial dataset provided, we assemble multiple realizations of these mock catalogs for a particular mission.
4. The choice of fiducial values may partially draw the mock catalog and hence the reconstructed values towards that particular dataset from which the fiducials were inferred, leading to potential bias in the subsequent analysis.In view of the rising discrepancy between early and late time observations of the Hubble parameter, two different sets of fiducials are considered: one using constraints from early time observations, and the other from late time observations.5. Finally, the results obtained for two different future GW missions, eLISA and ET have been comparatively analyzed.
By doing so, we believe, the entire analysis would be considered free from any potential bias due to any particular mission or choice of mock catalog, thereby yielding more reliable and robust results for the GW missions under consideration.

Fiducial values from current constraints
We consider six representative cosmological models for the purposes of this study, as given in Table 1.By identifying the vanilla ΛCDM model as the baseline 6-parameter model, these can be classified into 6, (6 + 1), and (6 + 2) -parameter scenarios.Besides ΛCDM, five other models/parametrizations have been chosen based primarily on their potential in addressing the Hubble

−
Redshift-dependent DE density parameter to resolve the  0 tension: Vacuum Metamorphosis (VM) (Parker & Raval 2000;Parker & Vanzella 2004;Caldwell et al. 2006) related to Ω 0 Motivated by non-perturbative effects of quantum gravity where a gravitational phase transition occurs at some critical redshift Extension of original VM model allowing a nonvanishing DE component at  >   , i.e., before the gravitational phase transition.tension in view of present data (Di Valentino et al. 2021).A brief discussion on each model and their tension-resolving potential in light of future missions have been explored in Shah et al. (2023).The present article focuses on a nearly unbiased investigation of GP as a tool to reconstruct  () from next-generation GW data.However, the value of  0 is obtained from the analysis as a natural consequence.Given the discrepancies in the constraints on the parameters of the concordance model when subjected to late-time experiments versus early-time experiments, we choose not to assume either set of constraints to be the absolutely correct one.We rather split the currently available experimental data into two compilations -namely, CSB, from the early-time probes and RSH, from observations in the late Universe.Table 2 gives a summary of the existing datasets and future GW missions under consideration.Using these two combinations of datasets, we first bring all six cosmological models under consideration to an equal status when it comes to their parameter constraints from the latest observational probes.
The MCMC constraints for each model corresponding to CSB data have already been investigated partially by the present authors (Shah et al. 2023) and partially by others (Di Valentino et al. 2020).We run the Boltzmann code CLASS (Lesgourgues 2011;Blas et al. 2011) and the Markov chain Monte Carlo (MCMC) parameter estimation code MontePython (Brinckmann & Lesgourgues 2019;Audren et al. 2013) to obtain the constraints for the same models with RSH data.Table 3 gives a summary of the above parameter constraints.We consider the following uniform priors for each cosmological model parameter when running MCMC: We consider the two sets individually in our analysis to focus on both ends of the spectrum when it comes to the tension between early-time and late-time inferences and implement the constraints thus obtained as fiducial values for the mock standard siren catalogs which we generate at the next step.

Generation of mock GW catalogs
ET, a proposed third-generation ground-based European GW detector, is an evolution of 2G detectors such as Advanced LIGO, Advanced Virgo, and KAGRA (Maggiore et al. 2020).As demonstrated by Belgacem et al., ET is expected to observe ∼ 10 3 binary neutron star (BNS) merger events with electromagnetic counterparts from associated short-hard -ray bursts over an observation period of 3-years.Synergies with instruments like the Transient High-Energy Sky and Early Universe Surveyor (THESEUS) (Amati et al. 2018), the Extremely Large Telescope (ELT) (Kissler-Patig & Lyubenova 2009), the Advanced Telescope for High Energy Astrophysics (ATHENA+) (Piro et al. 2022), and the Daksha satellites (Bhalerao et al. 2022) can help in rapid detection of the EM counterpart transients and subsequent determination of the redshifts of the host galaxies.Alternative methods include employing advanced statistical techniques to estimate the redshifts of associated galaxies (Mukherjee & Wandelt 2018), or crosscorrelating GW catalogs with large-scale structure observations (Scelfo et al. 2018).ET is expected to observe events from a minimum of  = 0.07 to a maximum of  ∼ 2 with SNR > 8.The lower cutoff is maintained to exclude sources which require modelization of the local Hubble flow before including them in the analysis (Zhao et al. 2011).To generate our event catalog, we use the standard expression for the number density of the observed events within some redshift interval (Sathyaprakash et al. 2010), which is a function of the coalescence rate at a given redshift.This rate is obtained via a fit to the observationally determined star formation history (Schneider et al. 2001).We generate   samples according to these distributions.To get an estimate of the uncertainties on   , we add in quadrature the instrumental error (Zhao et al. 2011) and the error due to lensing (Sathyaprakash et al. 2010).
The space-based eLISA mission is expected to revolutionize our understanding of cosmic history by probing massive black hole binaries (MBHBs).High SNR detection of MBHB coalescence will yield accurate direct measurements of their luminosity distance.Additionally, simultaneous detection of EM counterparts would be possible for events with SNR > 8 and a sufficiently small sky localization error (Tamanini et al. 2016;Dotti et al. 2012;Antonini et al. 2015).Such EM counterparts may be detected either by the Large Synoptic Survey Telescope (LSST) (Zhan & Tyson 2018) in the optical range, or through the joint efforts of the Square Kilometre Array (SKA) (Bacon et al. 2020) and ELT in the radio frequency range.The former would be possible for quasar-like luminosity flares which might be triggered during mergers due to the presence of sufficient amounts of surrounding gas, whereas the latter would be the way to go for radio flares and jets due to mergers occurring in the presence of sufficiently strong galactic magnetic fields.Tamanini et al. have shown that the eLISA mission (in the L6A2M5N2 configuration Amaro-Seoane et al. 2017) is expected to observe ∼ 25 MBHB coalescence events for 'No Delay' (Koushiappas et al. 2004;Barausse 2012;Klein et al. 2016) sources over a mission duration of 3-years up to redshifts  ∼ 7 − 8.A comparative study of the number of detectable sources with redshift measurable counterparts to populate the   −  diagram is given in Tables 9 and 10 of Tamanini et al. (2016).To generate mock catalogs, we draw plausible redshift values for MBHB mergers from an interpolated  distribution, based on the data presented in Fig. 1 of Tamanini et al. (2016).The lower bound is introduced at  < 0.1, to match our expectations of not finding MBHBs at very low redshift (Speri et al. 2021).To estimate a realistic 1 error on   to each MBHB merger event, we combine the uncertainties contributed by lensing (Tamanini 2017;Hirata et al. 2010;Shapiro et al. 2010), peculiar velocity corrections (Tamanini 2017;Kocsis et al. 2006), instrumental precision (Marsat et al. 2021;Speri et al. 2021), and redshift error associated with photometric measurements (Dahlen et al. 2013;Ilbert et al. 2013).
Keeping these assumptions and arguments in mind, we generate mock catalogs so as to give a reliable number of data points for each mission.In order to get rid of any potential bias from a particular mock catalog, we generate 500 mock catalogs of observable bright siren events for each case in our study (see Table 2).We also make some modifications to the methodology outlined in Ferreira et al. (2022) to make the catalogs more realistic as mentioned in Shah et al. (2023).Here is a brief outline of our adopted methodology: • Obtain the set of redshifts of bright siren events for both eLISA and ET by sampling from the theoretical redshift distribution based on the specifications of each mission.
• Choosing a particular cosmological model, calculate the theoretical luminosity distance  { ℎ}  () at the sampled redshifts.
• Estimate the total error Δ  () by taking into account the different sources of error which affect the measurement of the luminosity distance for each mission.
This gives us a set of event redshifts {}, the corresponding luminosity distances {  ()}, and the observational errors in determining the latter {Δ  ()}.Such a set constitutes an individual catalog corresponding to each combination of mission, cosmological model and observational prior.

GAUSSIAN PROCESS RECONSTRUCTION
Let us begin with a brief description of the ML tool, namely the Gaussian process (Rasmussen & Williams 2005;Seikel et al. 2012;Shafieloo et al. 2012), that we are going to use in our analysis.A Gaussian process involves an indexed collection of random variables having a multivariate normal (MVN) distribution, used to infer a distribution over functions directly within a continuous domain.For a given set of observational data, one can use GP to reconstruct the most probable underlying continuous function describing that data and the associated confidence levels, without limiting to any particular parametrization ansatz.Thus, GP serves as a powerful, non-parametric tool for statistical modeling.
For undertaking GPR, our assumption is that the observational data with Gaussian noise, as well as the predicted function, jointly underlie an MVN distribution, described solely by a mean vector and a covariance matrix.Thus, given a mock   vs  catalog, we can employ GP to reconstruct   () and    () directly from data, without assuming any model parametrization.
The reconstructed function at some redshift  is characterized by a mean () = E[  ()] and covariance function (also called "kernel") cov(  (),   ( z)) = (, z), which correlates values of the function between two redshifts,  and z, respectively.As prior information for the predictions, we assume a zero mean function to ensure a cosmological model-independent analysis.Hence, the choice of kernel determines almost all the generalization properties of our GPR.
A wide range of possible covariance functions is available in the literature (Rasmussen & Williams 2005;Seikel et al. 2012).The most general and standard choice is the squared exponential covariance, where,  and   are the characteristic length scale and noise variance, known as the hyperparameters of our GP model.Note  roughly corresponds to the distance one needs to move along  before   significantly changes, whereas   describes typical changes in the value of   .
Another possible choice is the Matérn covariance, with  ≡  + 1 2 , given by, where  denotes the order.The squared exponential kernel is indefinitely differentiable and for a reconstruction involving an th order derivative, the Matérn  covariance works well if  > .Note that the reconstruction kernel plays a significant role and the final results are sensitive to this choice.Imposing greater differentiability, hence a greater degree of smoothness, leads to strong correlations in the reconstructed functions and their derivatives, resulting in tighter bounds on the uncertainties.As the present work requires the use of second-order kernel derivatives, in this work, we choose the Matérn  = 9/2 covariance, as suggested in Seikel & Clarkson (2013).Moreover, it has been shown in Ó Colgáin & Sheikh-Jabbari (2021) that the reconstructed errors decrease as  → ∞ (squared exponential kernel), but the difference between  = 5/2 and ∞ is not highly striking.With the marginalized covariance function, the data can be extended to the redshift range , specific to a particular mission, for obtaining a smooth continuous function for   with respective uncertainties    .As a demonstrative example, consider a single eLISA mock catalog generated assuming the baseline ΛCDM model.We have used a python-implementation for GP, being simple linear algebra.In the next section, we will extensively make use of this algorithm for other catalogs.Given a set of training inputs  = {  } from the   () vs  GW mock catalog (Sec.2.2) with noise    (), we use GPR to make predictions of   ( ★ ) on a test set of redshifts  ★ =  ★  .The prior covariance matrix between the training points (, ) is defined as [(, )]   =(  ,   ).Similarly, we also obtain the prior covariances between the training vs test and test vs test redshifts as (,  ★ ) and ( ★ ,  ★ ) respectively.So, the joint prior distribution of the training data and test predictions of   is given by ∼ N (0, Σ), where, and C is the covariance matrix of the mock data.The predictive distribution is given by ∼ N (  ( ★ ), cov[  ( ★ ),   ( ★ )]), where the predicted mean vector,   ( ★ ) and the covariance matrix cov[  ( ★ ),   ( ★ )] are, Again, GP can also be used to reconstruct the derivatives of   , assuming these derivatives also follow a joint MVN with the observed data.The mean vector and the covariance matrix corresponding to the first derivative   ′ are given by,

and, cov[𝑑
Furthermore, the covariance between the reconstructed functions   ( ★ ) and   ′ ( ★ ) can be obtained as, Here, a prime denotes a derivative of the prior covariance function with respect to redshift, In what follows we make use of the above-mentioned algorithm for non-parametric reconstruction of the Hubble parameter.We also need to know the hyperparameters   and , which are obtained by optimizing the log-marginal likelihood, which in this case is given by, For a complete Bayesian analysis, we marginalize the log-likelihood (Eq.( 10)) over the hyperparameters instead of optimizing them.So, the posterior predictive distribution is, where (  , ) is the 'prior' function that we take as uniform.We adopted a python implementation of the ensemble sampler for MCMC, emcee, introduced by (Foreman- Mackey et al. 2013).
Using the best-fit samples from the posterior distribution of hyperparameters, we make predictions for the functions   and  ′  at the test redshifts  ★ ≡ {} with the help of Eq. ( 5) and ( 7) assuming 1000 reconstruction bins along redshift range 0 <  <  max .
We plot posterior covariance matrices cov[  ,   ], cov[ ′  ,  ′  ] and cov[  ,  ′  ] in Fig. 1.The reconstructed functions   () and   ′ () also shown in Fig. 2. Finally, with these reconstructed   () and   ′ (), we derive the evolution of the Hubble parameter as, and infer  0 ≡  ( = 0) directly in a non-parametric way.The uncertainties associated with the reconstructed  0 are estimated via the Monte Carlo error propagation technique.We show the posterior distribution of the Hubble parameter at present epoch  0 for a given sample of hyperparameters, i.e., ( 0 |  , ) with respect to  0 in the right panel of Fig. 2. The best-fit  0 is shown with a blue dashed line.The shaded region in blue is the standard deviation.The mean  0 along with the 1 uncertainties are plotted in red with dash-dot and dotted lines respectively.To validate our results, we compared the plots shown in Fig. 2 to those obtained from using the GaPP (Seikel et al. 2012) code.Now, for each of the representative models mentioned in Table 2, we employ GP to directly reconstruct   (),   ′ () and thereby obtain the evolution of  () as a function of redshift using the respective 500 mock catalogs, generated in Sec.2.2, separately for both early-time (CSB) and late-time (RSH) fiducial values.Henceforth, we shall refer to them as 'Set-I' and 'Set-II' respectively.For this exercise, we take into account two GW missions under consideration, namely, eLISA and ET, separately, and finally compute the averaged reconstructed  () functions from the respective mock catalog compilations.Finally, to put down the results obtained by this reconstruction exercise in a convenient way, we plot the reconstructed Hubble parameter  () as a function of redshift.Fig. 3 shows the evolution of the Hubble parameter over a redshift range of 0 <  < 2 detectable by ET for a ∼ 3-year mission duration for each of the six cosmological models under study.Fig. 4 shows the reconstructed  () from simulated events detectable by eLISA with the "No Delay" MBHB source population for a ∼ 10-year mission duration, in the redshift range 0 <  < 5.Each of the  () plots have insets magnifying the reconstruction in the redshift range 0 <  ≲ 0.3.For each plot, the dashed and solid lines correspond to the averaged reconstructed  () curve from Set-I and Set-II mock catalogs.The shaded regions with × and + hatches show the 1 and 2 confidence levels for Set-I and Set-II respectively.The fiducial Hubble function constrained using the CSB and RSH dataset combinations are plotted with dotted and dash-dot lines.We also plot the comparative constraining power on  () between the two missions under consideration in Fig. 5.
We further investigate the effects of mission duration, and hence the number of detected events, on the mission's potential in constraining  0 .We show how the reconstructed  0 varies with the number of detected events by ET in Fig. 6, and eLISA in Fig. 7.Such a demonstration helps in adding perspective to whether and how these two upcoming future surveys will help in improving the constraints on  0 , in comparison to the latest constraints from currently available datasets.The square and circular markers with error bars represent the respective mean values and 1 uncertainties of the reconstructed  0 from eLISA and ET, assuming both Set-I and Set-II mocks.We further indicate the early-time constraints from CSB datasets in blue color with × hatches, whereas the late-time ones using RSH data are denoted in red color with + hatches respectively for the different models under consideration.

ANALYSIS AND DISCUSSION
The present analysis reveals that GP can be a useful tool to reconstruct the Hubble parameter for future GW missions, at least to a moderately good redshift range.Our results also demonstrate the level of precision at which these next-generation missions should be able to constrain both the redshift evolution of the Hubble parameter  () and its present value  0 .
The 1 error of the reconstructed value of  0 shown in Fig. 6 and 7 generically decreases with an increase in the number of events, irrespective of the choice of the background cosmological model, the GW mission, and the set of fiducial parameter values used to generate the catalogs.It reveals the constraining power of these two missions for  0 with relatively less error than current observations.Besides, the mean value of the reconstructed  0 typically falls within its corresponding fiducial bounds in a stable manner for a sufficiently large number of events and beyond.These results indicate a faithful reconstruction.The saturation of the error with the number of events is, of course, set by the specifications of the individual mission.

Comparison between eLISA and ET
Coming to a comparison between these two particular GW missions, we first note that while ET is expected to detect a significantly higher number of bright sirens than eLISA up to  ∼ 2, a typical ∼ 10-year eLISA mission is predicted to constrain  0 almost as precisely as ET, with longer mission durations generically tending to perform better than ET.This trend holds in a robust fashion for both sets of mock catalogs.The superior performance of eLISA can be attributed partly to its enhanced instrumental sensitivity, and partly to its ability of probing higher redshifts up to  ∼ 8 in search of standard sirens.The impact of the former factor is further visible in the plots of the reconstructed redshift evolution of  () in the range 0 <  < 2 for ET and eLISA (from Figs. 3 and 4), where the reconstruction in case of eLISA is seen to be much tighter compared to the one in case of ET (Fig. 5).Nonetheless, a direct comparison between the expected number of events, detectable by ET (∼ 1000) and eLISA (∼ 25), for a ∼ 3-year mission duration (from Figs. 6 and 7) shows better performance of the former compared to the latter.

Comparison among cosmological models
on the results obtained, we notice a few characteristic trends from Figs. 6 and 7.For ΛCDM, CPL, and JBP, the reconstructed mean  0 alongside its 1 uncertainty tends towards a marginally higher value compared to the early-time prior in the case of ET, while the reconstruction corresponding to the late-time prior shows no significant shift.This is somewhat interesting from the perspective of the Hubble tension, although further studies with alternative ML algorithms are required in order to make any strong comments in that direction.The same mean-shifting tendency is visible to some extent in the case of eLISA as well for a realistic 10−15 year mission duration, albeit with tighter error bars which make the reconstructed values more consistent with the fiducials than in the case of ET.For the PEDE model, the opposite tendency is observed in the case of both ET and eLISA with an increasing number of events, as the reconstructed  0 shifts marginally towards lower values for both early and late-time priors.This effect is not very significant though, as it still remains largely consistent with the fiducial which also includes the reconstructed mean.The VM model displays mean-shifting characteristics quite similar to PEDE, with the only difference being that the early-time fiducial value of  0 in the case of VM is higher than the late-time one, as given by the MCMC results.Finally, for the VM-VEV model, the reconstructed mean value of  0 falls almost outside the 1 bounds of the corresponding early-time fiducial but remains consistent with the latter within 1.However, as the early-time prior itself falls entirely within the bounds of the late-time one with which both of the reconstructed  0 values are entirely consistent, there is no statistically significant surprise in the outcome.We summarize the status of the  0 tension for the different models with respect to present and mock GW datasets in the right panel of Fig. 8, where  = Δ 0 /  0 is the 1-dimensional Gaussian tension metric.

Comparison between fiducials
For a given background model, the two separate reconstructed trajectories of  () are unsurprisingly distinct and clearly dependent on the choice of the fiducial values used to generate the two different sets of catalogs.However, the reconstructed uncertainties in  0 do not depend on or distinguish between the choice of the late-time or early-time fiducials for any given model, as shown in the left panel of Fig. 8. Besides, under the same choice of fiducials, the reconstructions obtained for the two different missions are mutually consistent for each of the models considered.This provides another reality check on the fidelity of the GP reconstruction technique.
Based on the reconstructed  () plots in Figs. 3 and 4, a few more model-specific features can be summarized.A unique feature of the VM model appears in the form of a crossing between the two mean trajectories of  () for the early and late-time fiducials.This occurs in the approximate redshift region 0.1 <  < 0.2, beyond which the reconstruction based on the late-time prior gives a higher value of  () compared to the early-time one.While the VM model is capable of resolving the  0 tension at present within 1, the reconstructed history thus obtained indicates a tension in  () at higher redshifts.This tension persists up to  ∼ 1.5 in the case of ET and  ∼ 2 in the case of eLISA, beyond which the evolutionary histories of  () are once again consistent with each other within 1 but solely on the basis of widened error bars (the mean curves continue to be significantly apart).None of the other models shows such a pronounced discrepancy at higher redshifts.While the mean  () curves for PEDE and VM-VEV are also somewhat distant from each other, the difference is not statistically significant due to the large uncertainties involved.For ΛCDM, CPL, and JBP, the two reconstructed trajectories are comparatively more concordant with each other, in spite of the worse status of these models as far as the present  0 tension is concerned.

CONCLUSIONS
In this article, we analyze the robustness of Gaussian processes (GP) in reconstructing the Hubble parameter in light of future gravitational wave (GW) missions.We employ the following methodology, which enables us to assess the efficacy of GP in a more or less robust and unbiased manner: • We focus on the ground-based ET and the space-based eLISA as two representatives of next-generation GW detectors.
These missions aim to detect gravitational waves originating from different astrophysical phenomena by probing different redshift ranges.This helps mitigate mission-specific dependencies within our general conclusions which are based on a comparative study between the individual reconstruction results for both missions.
• We consider six different background cosmologies (including the concordance ΛCDM model as the benchmark).The chosen scenarios incorporate late-time modifications on top of the vanilla 6-parameter picture that subsequently leads to considerable variations in the theoretical   vs  behavior.This, in turn, allows us to study various representative cases of the same and arrive at conclusions which hold generically across a variety of cosmological models (Table 1).
• For each mission and model under consideration, we generate two sets of mock catalogs (Set-I and Set-II) using the constraints obtained from currently available early-time (CSB) versus late-time (RSH) datasets as fiducials (Table 2).
• We examine their relative effect on the reconstructed results, owing to the tension between early and late-time constraints that persists particularly in the value of the Hubble constant.We have resorted to this disparate analysis involving two distinct fiducials, assuming neither set to be indisputably correct in an a priori fashion.
The reconstructed plots of the Hubble parameter  () are strongly dependent on the choice of the fiducials used to generate the mock catalogs.However, under the same choice of fiducials, the reconstructions corresponding to the two different missions are consistent with each other for each of the models considered, which indicates a faithful reconstruction process.Upon varying the number of detectable events or (equivalently) the mission duration, the presently observed mean value of the reconstructed  () and its associated 1 error also show generic trends for both missions in accordance with their specifications.In particular, our results confirm the following: • GP is a powerful and efficient non-parametric tool which can be used to reconstruct the Hubble diagram based on standard siren datasets from multiple next-generation GW detectors, and hence constrain the value of the Hubble constant based on GW luminosity distance data at intermediate redshifts.
• Concerning precision, the results obtained from a ∼ 10-year eLISA mission are likely to be at par with those from a ∼ 3-year ET run.Thus, these future missions have the potential to constrain cosmological parameters and provide bounds that would be competitive to present-day constraints.
• If the missions run for a longer duration, the reconstruction results show further improvement.For example, the relatively tighter constraint on  0 from a ∼ 15-year eLISA mission can be readily compared with those from shorter runs in Fig. 7.
However, in this work, we have chosen not to combine the simulated data from eLISA and ET because of two primary sources of uncertainty: the dissimilar systematics of the missions, and the questionable physical validity of combining luminosity distance data obtained with distinct instruments.We plan to explore this avenue in future work.
Moreover, a natural question that arises here is: What about other future GW missions, such as the DECi-hertz Interferometer Gravitational-wave Observatory (DECIGO) (Mandel et al. 2018;Kawamura et al. 2021;Zhang et al. 2022), the Big Bang Observer (BBO) (Cutler & Harms 2006;Corbin & Cornish 2006), TianQin (Luo et al. 2016), Cosmic Explorer (Reitze et al. 2019), Taĳi (Ruan et al. 2020) as well as ongoing GW missions such as the LIGO-Virgo-KAGRA (Abbott et al. 2018a) network and the Pulsar Timing Array (Hobbs et al. 2010)?We have already carried out some initial checks with DECIGO, which proposes a large number of detections in the redshift range 0 <  < 5 with a very high signal-to-noise ratio (SNR).We found that the reconstructed  (), and hence  0 , obtained from a simulated catalog strongly mimic the fiducial values, albeit with increased precision.This shows the reconstruction efficacy of GPs.At the same time, averaging over other realizations of the catalogs for DECIGO is not expected to give much variability.However, given the success of GP for eLISA and ET, it would be intriguing to find out the prospects of other GW missions that plan to probe events at relatively higher redshifts.
Furthermore, in the present study, we have exclusively focused on bright standard sirens, i.e. those with detectable electromagnetic counterparts.This choice is reasonable because upcoming optical and radio observatories like LSST, SKA+ELT, and THESEUS/ATHENA are expected to identify transient EM emissions accompanying gravitational wave (GW) events.This would aid in pinpointing the source galaxies' locations and redshifts even at high redshifts.Recent reports from the operational James Webb Space Telescope (JWST) show promise in probing luminous sources beyond  ∼ 10 via EM observations, determining their redshifts through spectroscopy or photometry (Curtis-Lake et al. 2023;Robertson et al. 2023;Larson et al. 2023;Juodžbalis et al. 2023;Maiolino et al. 2023).Future observatories' improved capabilities are anticipated to enhance collaboration between GW and EM facilities, enabling real-time study of high-redshift multi-messenger events and subsequent follow-ups of host galaxies.If, on the other hand, dark sirens could also be localized with sufficient precision using galaxy cross-correlation or any other viable technique(s) (Oguri 2016;Mukherjee et al. 2022;Gray et al. 2023), their addition would significantly increase the number of detectable events at higher redshifts, thus bolstering the fidelity of the results obtained through our adopted methodology.Such prospects have attracted considerable attention in recent years, in light of a number of proposed ground-based and space-based GW missions (Seymour et al. 2022;Wang et al. 2022;Jin et al. 2023;Muttoni et al. 2023;Zhu & Chen 2023).As far as the reconstruction methodology demonstrated in this work is concerned, these are a couple of other challenging directions that would be interesting to pursue in near future.
Finally, the applicability of other machine learning techniques besides GP to the reconstruction of  (), based on standard siren datasets either from eLISA/ET or from some of the other future GW missions mentioned above, is another potentially interesting field to shed light on.

Figure 2 .
Figure 2. Plots showing the reconstructed functions   (left panel) and  ′  (middle panel) in units of Gpc with respect to the test redshifts.The right panel shows the reconstructed posterior distribution of  0 , ( 0 |  , ), for the best-fit values of the hyperparameters.

Figure 3 .
Figure 3. Plots for the reconstructed  () vs redshift  from ∼ 1000 GW events detectable by ET for a ∼ 3-year mission duration in the redshift range 0 <  < 2.Here 'CSB' & 'RSH' denotes the fiducial Hubble function obtained using respective dataset combinations.Set-I & Set-II represent the reconstructed functions with ET mock catalogs generated from the 'CSB' and 'RSH' constraints as fiducials.

Figure 4 .Figure 5 .
Figure 4. Plots for the reconstructed  () vs redshift  using ∼ 80 GW events from the "No Delay" source population detectable by eLISA for a ∼ 10-year mission duration in the redshift range 0 <  < 5.

Figure 6 .Figure 7 .
Figure 6.Values of the reconstructed  0 considering a variable number of GW events with EM counterparts detectable by ET.

Figure 8 .
Figure 8. Plots showing a comparison of the relative uncertainty on reconstructed  0 in the left panel.The right panel presents the resulting  0 tension in light of eLISA and ET compared to the existing tension for different models.

Table 1 .
Brief summary of the models/parametrizations under consideration.Here "Additional parameters" means parameters on top of 6.

Table 2 .
Table showing a summary of the observational datasets utilized to obtain the latest constraints on the different models/parametrizations, and the gravitational wave missions considered for the analysis.