Importance nested sampling with normalising flows

We present an improved version of the nested sampling algorithm nessai in which the core algorithm is modified to use importance weights. In the modified algorithm, samples are drawn from a mixture of normalising flows and the requirement for samples to be independently and identically distributed (i.i.d.) according to the prior is relaxed. Furthermore, it allows for samples to be added in any order, independently of a likelihood constraint, and for the evidence to be updated with batches of samples. We call the modified algorithm i-nessai. We first validate i-nessai using analytic likelihoods with known Bayesian evidences and show that the evidence estimates are unbiased in up to 32 dimensions. We compare i-nessai to standard nessai for the analytic likelihoods and the Rosenbrock likelihood, the results show that i-nessai is consistent with nessai whilst producing more precise evidence estimates. We then test i-nessai on 64 simulated gravitational-wave signals from binary black hole coalescence and show that it produces unbiased estimates of the parameters. We compare our results to those obtained using standard nessai and dynesty and find that i-nessai requires 2.68 and 13.3 times fewer likelihood evaluations to converge, respectively. We also test i-nessai of an 80 s simulated binary neutron star signal using a reduced-order-quadrature basis and find that, on average, it converges in 24 min, whilst only requiring 1.01×106 likelihood evaluations compared to 1.42×106 for nessai and 4.30×107 for dynesty. These results demonstrate that i-nessai is consistent with nessai and dynesty whilst also being more efficient.


Introduction
John Skilling proposed nested sampling in [1,2] and it has since seen widespread use in astronomical data analysis, including but not limited to the analyses of gravitational waves [3,4], asteroseismology [5] and cosmology [6].
Nested sampling is a Monte Carlo algorithm that approximates the Bayesian evidence Z ≡ p(d|H) = p(d|θ, H)dθ, for some observed data d with an assumed model H over the parameters θ where L(θ) ≡ p(d|θ, H) is the likelihood.This is usually considered in the context of Bayes' theorem p(θ|d, H) = p(d|θ, H)p(θ|H) p(d|H) , where π(θ) ≡ p(θ|H) is the prior and p(θ|d, H) is the posterior.Samples from the latter are a by-product of approximating the evidence.When implementing nested sampling, the main challenge is drawing new points from the likelihood-constrained prior at a given iteration.There are different approaches to this such as using Markov Chain Monte Carlo (MCMC), slice sampling or sampling from bounding distributions [7].There have also been efforts to incorporate machine learning into nested sampling for approximating the likelihood [8], in the proposal process [9,10] and for sampling from arbitrary priors [11].
In Williams et al. [10], we proposed nessai, a nested sampling algorithm that uses normalising flows to approximate the likelihood-constrained prior at different iterations.We showed that this approach could speed up convergence and allowed for natural parallelisation of the likelihood.However, we noted that a significant portion of compute time was being spent performing rejection sampling to ensure points were distributed according to the prior, and this, alongside the inherently serial nature of nested sampling, set a lower limit on how fast the algorithm could be.
In this work, we present a modified nested sampling algorithm based on importance sampling that addresses the aforementioned bottlenecks.In particular, this modified algorithm: • incorporates normalising flows in a similar fashion to Williams et al. [10], • removes the requirement for samples to be independently and identically distributed (i.i.d.) and distributed according to the prior, • allows samples to be added in any order independent of a likelihood constraint, • allows the evidence to be updated for batches of samples.
Taken together, these changes improve the efficiency of the algorithm, reducing the number of required likelihood evaluations by up to an order of magnitude over our previous version, and greatly increasing the scalability of the algorithm.This is especially relevant in the context of gravitational-wave data analysis, where nested sampling is the de facto analysis algorithm [3,4].As of the last LIGO-Virgo-KAGRA [12][13][14] observing run, there are 90 confirmed detected compact binaries [15][16][17] and this number is expected to increase by a factor of ∼ 3.3 in the fourth observing run [18].This presents a significant computational challenge since typical analyses take of order days to weeks.Furthermore, a subset of these analyses are currently only possible at great computational cost [19,20].The algorithm we present brings the possibility of tackling these challenging analyses and dramatically reduces the wall-time required to complete an analysis.
This paper is structured as follows: in section 2 we present background theory on nested sampling and various alternative formulations that this work builds upon.We then describe a simplified version of our modified algorithm and validate it in section 3.This is followed by a description of the complete method and algorithm in section 4. Finally, we present results in section 6 and discuss them in section 7.

Nested sampling
Nested sampling [1,2] is a stochastic sampling algorithm where the Bayesian evidence (p(d|H) or Z) is rewritten as a one-dimensional integral in terms of the prior volume where L(X) is the likelihood at a given prior volume X.If the likelihood L(X) is a well-behaved function, then this formulation allows for the evidence to be approximated using an ordered sequence of decreasing prior volumes X i such that where L i = L(X i ) is the likelihood at X i and the weights w i are, for example, given by w i = (1/2)(X i − X i+1 ).The prior volume at a given iteration X i is computed in terms of the previous prior volume X i−1 , the number of points within the likelihoodconstrained prior N live and the shrinkage factor t i which is a random variable in (0, 1) with probability density function P (t) = N live t N live −1 .The mean and standard deviation of log t are therefore Since each draw of log t i is independent, the prior volume at a given iteration i is approximately X i ≈ exp(−i/N live ).We can express this as a recursive relationship in terms of t i where The overall nested sampling algorithm can then be summarised as follows: (i) Draw N live points {θ i } N live i=1 ∼ π(θ) and compute the likelihood L i = L(θ i ) of each point, (ii) Choose the point θ * with the lowest likelihood L * ≡ L(θ * ), (iii) Draw new points θ until L( θ) > L * , (iv) Replace θ * with the new point θ and add θ * to the nested samples, (v) Update the evidence estimate via eq.( 4), (vi) Repeat steps 2-5 until a stopping criterion is met.
The algorithm returns a set of nested samples, with corresponding prior volumes and likelihoods, and an evidence estimate with a corresponding error.The stopping criterion is typically related to the fractional change in the evidence between iterations [7].
Given a completed nested sampling run, posterior samples can be drawn by computing the posterior weights for each nested sample and then, for example, rejection sampling can be used to obtain samples from the posterior distribution.This formulation has been extended and modified in various works, such as to allow for a varying number of live points [21], to use different proposal methods [6,10,22], or even using different definitions of the weights w i in eq. ( 4) [23][24][25], which is the focus of this work.
As mentioned previously, the main challenge when implementing a nested sampling algorithm is drawing live points that are i.i.d.according to the prior and satisfy the likelihood constraint at the current iteration.There are various different approaches to this.In the original paper [2], Skilling proposes using MCMC over the prior and accepting only those points for which L(θ) > L * until the correlation with the starting point (one of the existing samples) has been lost.This method requires a random walk that can adapt to the continuously shrinking likelihood-constrained prior and a method for determining the number of steps to take [7].Further modifications are often needed to handle multi-modality and complex correlations between parameters, for example, as implemented in Veitch et al. [3].Similarly, slice sampling [26], where samples are drawn from a randomly oriented line within the likelihood-constrained prior, has also been used [6].The challenge in this case is choosing the direction of the line and how to sample from it.Another approach is to sample from a bounding (or proposal) distribution that directly approximates or contains the likelihood-constrained prior, such as ellipsoids [22,25] or mixtures of these to handle, for example, multi-modality.Finally, there are algorithms that use a mix of the aforementioned methods [27,28].
One limitation of nested sampling is its inherently sequential nature.This is addressed in part by dynamic nested sampling [21] where an initial exploratory run is then retroactively improved upon by adding samples in regions of interest.However, the core algorithm is still sequential.Diffusive nested sampling [23] tackles this by using a multi-level exploration method which allows returning to lower likelihoods.We draw from this variant of nested sampling when developing our modified algorithm.
Machine learning has also been incorporated into nested sampling algorithms to address some of the limitations and accelerate inference.In Graff et al. [8], the likelihood is approximated using a neural network which, for computationally expensive likelihoods, can reduce the overall computational cost.In Alsing and Handley [11], normalising flows are used to allow for arbitrary priors which could otherwise not be used, for example, when using a posterior distribution as the prior for subsequent inference.Normalising flows have also been applied specifically to the proposal process.The algorithm proposed in Moss [9] improves MCMC efficiency by transforming the sampling parameter space to a simpler space using a normalising flow and in Williams et al. [10], we proposed nessai which uses normalising flows to directly approximate the likelihood-constrained prior and to avoid the need for MCMC, greatly improving sampling efficiency.We discuss nessai in detail in section 2.2.

nessai: Nested sampling with normalising flows
In Williams et al. [10], to address the aforementioned challenged of proposing new live points from the likelihood-constrained prior, we introduced nessai, a nested sampling algorithm that incorporates normalising flows in the proposal process.We now review the core aspects of nessai.
Normalising flows are a family of parameterised invertible transforms that can be trained via an optimisation process to map from a simple distribution p Z (z) in the latent space (Z) to a complex distribution p X (x) in the data space (X ).They were first proposed in [29,30] and have since been applied to a range of problems including image synthesis, noise modelling, physics and simulation-based inference [31][32][33].
One property that distinguishes normalising flows from other generative models, such has Variational Autoencoders [34] and Generative Adversarial Networks [35], is their construction allows for an explicit expression for the learnt distribution p X (x) where f is the normalising flow and |det (∂f (x)/∂x)| is the Jacobian determinant.The normalising flow f must be constructed such that the mapping is invertible and has a tractable Jacobian determinant.Depending on how the mapping is constructed, they fall into two main categories: autoregressive flows and coupling flows.The former have more expressive power at the cost of being more computational expensive to train and evaluate, whereas the opposite is true for the later [32].In Williams et al. [10] and in this work, we use coupling flows based on RealNVP [36].For a complete review of normalising flows, see Kobyzev et al. [31] and Papamakarios et al. [32].
In nessai, at a given iteration, a normalising flow is trained using the current live points.The trained flow maps the live points from the sampling space X to samples in the latent space Z.New samples are then drawn by sampling from a truncated latent distribution and applying the inverse mapping f −1 .Finally, rejection sampling is used to ensure that the samples are distributed according to the prior.The benefit of this approach is that all the samples are i.i.d., removing the need for MCMC sampling.Furthermore, since the points are drawn in parallel, the likelihood evaluation can also be parallelised, further reducing the time taken for the algorithm to converge.
However, we found that the rejection sampling step can be inefficient and lead to many samples being discarded.In particular, for the results we presented in Williams et al. [10], this rejection sampling accounted for approximately 40% of the total sampling time and, unlike the likelihood evaluation, this time cannot be significantly reduced via parallelisation.Additionally, we found it was necessary to reparameterise certain parameters that would otherwise be difficult to sample or make the rejection sampling inefficient.For example, parameters with posterior distributions that rail against the prior bounds could be under-sampled when the latent space is truncated.Whilst reparameterising these problematic parameters does address these issues, it requires prior knowledge of the parameter space.

Alternative formulations of nested sampling
In this section, we highlight alternative formulations of nested sampling that will be built upon in this work.

Diffusive nested sampling
Diffusive nested sampling [23] uses a multi-level exploration method where a mixture of constrained distributions is sampled from at each iteration using MCMC.The constrained distributions are added sequentially and each contains approximately e −1 of the prior volume of the previous.In contrast to standard nested sampling approaches, all the samples from the MCMC chain are kept and those that do not meet the current likelihood criteria are added to the previous level.The values for the prior volume X are estimated using the fraction of samples above the likelihood threshold compared to the total number of samples.This variation of nested sampling avoids the strict likelihood constraint and utilises all the samples drawn at a given iteration but still requires that new points be sampled from the prior.

Importance nested sampling
Importance nested sampling was proposed in Cameron and Pettitt [24] and expanded upon in Feroz et al. [25].In this version of nested sampling, the evidence integral is approximated in terms of a pseudo-importance sampling density Q(θ) where N Total is the total number of nested samples.Posterior weights are then computed using and these can be used to obtain posterior samples via rejection sampling, or used directly in weighted histograms or kernel density estimates to approximate marginal distributions.
In standard importance sampling, the unbiased estimator for the variance of the evidence is given by however, this does not apply when using a pseudo-importance sampling density, which is the case in multinest [25].
In multinest [22,25], one or more ellipsoidal distributions are used to construct an approximation of the current likelihood contour defined by L * .New points are then drawn from within this proposal distribution and their likelihood evaluated until L( θ) > L * and, similarly to diffusive nested sampling, all these points are used in the evidence summation and define the number of points within a level n i .The pseudoimportance sampling density for each point is given by where V tot,i is the volume of the bounding distribution, E i is an indicator function that is 1 if the point lies within the i'th ellipsoidal decomposition and 0 otherwise, N iter is the number of iterations, where an iteration is an instance of the ellipsoidal decomposition and N tot is the total number of points N tot = N iter i=1 n i .This formulation of the evidence removes the requirement that samples are distributed according to the likelihood-constrained prior so long as the exact distribution of nested samples Q(θ) can be written down.However, only a single point is removed and updated between each update of the ellipsoidal decomposition, therefore convergence will require computing the decomposition hundreds or thousands of times.This makes it ill-suited to use with normalising flows that are, in comparison, slow to train.

Nested Sampling via Sequential Monte Carlo
Sequential Monte Carlo (SMC) is a general extension of importance sampling where random samples with corresponding weights are drawn from a sequence of probability densities such that they converge towards a target density [37].These algorithms are typically comprised of three main steps: mutation in which the samples are moved towards the target density via a Markov kernel, correction where the weights of the samples are updated, and selection where the samples are resampled according to their weights.
In Salomone et al. [38], the authors draw parallels between nested sampling and SMC and show that nested sampling is a type of adaptive SMC algorithm where weights are assigned suboptimally.They also highlight several limitations of the standard nested sampling algorithm, including the assumption of independent samples.They propose a new class of SMC algorithms called Nested Sampling via Sequential Monte Carlo (NS-SMC) and demonstrate that it is equivalent to nested sampling but addresses the aforementioned limitations.This formulation bares similarities to the importance nested sampling [24,25] but removes batches of live points at each iteration and includes the mutation and selection steps that are typical in SMC.
A downside of this formulation is that since the points are resampled at each iteration, some samples for which the likelihood has been evaluated are discarded and not used in the final evidence estimate or output.In this work, we aim to avoid this by not including the resampling step and instead directly using the weights of the samples when constructing the next level.

Core importance nested algorithm
In this section, we motivate and present the core importance nested sampling algorithm used in nessai.We extend the formulation of importance nested sampling described in section 2.3.2 to allow the use of normalising flows instead of ellipsoidal bounding distributions.We also draw on the design of diffusive nested sampling where the likelihood constraint is relaxed such that samples are not rejected based on their likelihood.
We start by considering the definition of the evidence from eq. ( 9).In importance nested sampling, the aim is to construct an importance sampling density Q(θ), which we will call meta-proposal, from which samples can be drawn, and used to estimate the evidence.The error on this estimate is given by eq. ( 11) and depends on the number of samples N tot and Q(θ).If we consider a fixed number of samples, the meta-proposal that maximises the effective sample size (ESS) of the set of summands L(θ i )π(θ i )/Q(θ i ), and therefore provides the most precise evidence estimate, will be Q(θ) ≡ L(θ)π(θ)/Z, i.e. when Q(θ) is equal to the target posterior.Since the evidence is unknown a-priori, the aim is to construct the meta-proposal such that This formulation of nested sampling is closely related to Variational Inference [39], where the goal is to approximate a target probability density.In this case, the target density is L(θ)π(θ) and the approximate distribution is the meta-proposal Q(θ).The difference is in how the approximate distribution is obtained.In variational inference, the approximate distribution is optimised by minimising a variational objective, whereas in this algorithm the distribution is constructed by progressively sampling and adding proposal distributions.
We now consider how to construct the meta-proposal using normalising flows.An important difference between the ellipsoidal bounds used in multinest and normalising flows is the space over which they are defined.For a normalising flow, this depends on the domain of the latent distribution p Z .For the typical case of a n-dimensional Gaussian the mapping is defined such that f : R n → R n , so the flow will have infinite support.We need the meta-proposal to have the same support as the prior, so we include an additional invertible transform that maps from R n to a bounded space, such as the Sigmoid s(x) = [1 − exp(−x)] −1 .We denote the bounded space X and the unbounded space X ′ .Therefore, instead of considering a series of bounded distributions, we consider a set of N normalised proposal distributions (normalising flows) {q 1 , ..., q N } all defined over the entire prior volume and with corresponding weights α j defined such that N j=1 α j = 1.The overall proposal density as a function of θ is given by In practice, in order to sample from Q(θ) we first draw a proposal k ∈ {1, . . ., N }, drawn from a categorical distribution with category weights {α 1 , ..., α N }, then a sample is drawn from the sub-proposal q k (θ).
With this formulation, we can compute an estimate of the evidence for a set of samples drawn from Q(θ) using eq.( 9) and, as noted in Feroz et al. [25], we no longer require new samples that have monotonically increasing likelihood values.Furthermore, as described in Salomone et al. [38], we do not require that new samples be i.i.d. or distributed according to the likelihood-constrained prior.This removes the need for the rejection sampling that was a bottleneck in the version of nessai we described in Williams et al. [10].
We now outline a simplified importance nested sampling algorithm which we build upon in later sections.The main changes are to steps 2-5 of the standard nested sampling algorithm outlined in section 2. Instead of removing a point and finding a single replacement point, we construct a proposal distribution q j (θ) based on the points sampled thus far and draw a set of N j new points Θ j = {θ i } N j i=1 which are added to the overall set of points {Θ 1 , ..., Θ j−1 }.The meta-proposal Q(θ) is then updated to include q j (θ) and the evidence is updated.The new importance nested sampling algorithm therefore consists of the following steps: i=1 ∼ q j (θ) and compute the corresponding likelihoods, (iv) update the meta-proposal Q(θ) to include q j (θ), (v) compute the evidence and the corresponding error via eqs.( 9) and ( 11), (vi) repeat steps 2-5 until a stopping criterion is met, (vii) redraw independent samples from the final meta-proposal, (viii) compute the final evidence and posterior weights using the independent samples and eqs.( 9) and ( 10).
This includes an additional step not present in standard nested sampling: redrawing independent samples from the final meta-proposal.Since subsequent proposals are constructed using samples from the previous iterations, new samples are not i.i.d. and eqs.( 9) to (11) do not strictly apply.However, once the meta-proposal is finalised, i.i.d.samples can be sampled and used to compute unbiased estimates of the evidence and posterior weights.
The design of the algorithm hinges on how the next proposal distribution is added, how the number of samples drawn from each proposal (N j ) is determined and how the weights in the meta-proposal Q(θ) are determined.Note that the first proposal distribution q 0 (θ) will typically be the prior.We now apply this simplified algorithm to a toy example.

Toy example
In this toy example, we consider a simple problem with an analytic evidence and posterior distribution.We apply the algorithm described in section 3 but with some simplifications.This allows us to validate the core algorithm.
We use a 2-dimensional Gaussian likelihood with mean µ L = 0 and standard deviation σ L = 1 and a Gaussian prior a with mean µ π = 0 and standard deviation σ π = 2.The posterior distribution is therefore another Gaussian distribution with mean µ P ost = 0 and standard deviation The evidence is given by a Gaussian distribution with mean µ π and standard deviation σ 2 L + σ 2 π evaluated at µ L , so Z Analytic = 0.03183.
To make the comparison between the true and sampled posterior distributions easier, we express the posterior distribution in terms of the log-likelihood p(ln L).To do this, we note that the posterior distribution defined in terms of the radius squared is p(r 2 ) = χ 2 2 (r 2 )/σ 2 P ost where χ 2 2 is a chi-squared distribution with two degrees of freedom.Then where which is defined on [0, ∞) since the maximum possible value of the log-likelihood is ln L = − ln(2πσ 2 L ).The four steps we must define for the simplified algorithm are: how to construct each proposal distribution, how to determine the number of samples to draw from each proposal, how to determine the weights for each proposal in the meta-proposal and a stopping criterion.For the proposals, instead of normalising flows, we use 2-dimensional Gaussian distributions q j (θ) with mean zero and different standard deviations.We determine the standard deviation of each proposal by setting a likelihood threshold L t such that 50% of the points from the previous iteration are discarded and then compute the standard deviation of the remaining points.We set the number of samples drawn from each proposal to constant N j = N live = 500 and set the weights for the metaproposal α j to be equal.This means that each proposal will contribute equally to the meta-proposal.Finally, instead of using a stopping criterion, we define a fixed number of proposal distributions (iterations) N = 4 where the first is the prior distribution q 0 (θ) ≡ π(θ).This is akin to fixing the number of iterations in a normal nested sampling algorithm.Once the final proposal has been added, we draw i.i.d.samples from the finalised meta-proposal and compute the final unbiased evidence estimate and posterior weights.
We present the results obtained with this algorithm in fig. 1.This shows the samples and the 1-σ contours for each of the proposal distributions, along with the corresponding distribution of log-likelihoods.We compute two evidence estimates: one with the initial samples that are not i.i.d.Ẑ = 0.03177 ± 0.00042 and the other with the final i.i.d.samples Ẑ = 0.03191 ± 0.00042.We find that both are in agreement with the analytic value, Z = 0.03183, but, as we will see in section 6.1, the initial estimate will be biased, the bias is just very small in this simple example.This demonstrates that the underlying algorithm can reliably estimate the evidence.We also compute the posterior weights using eq.( 10) and plot the weighted histogram in log-likelihood space, which shows good agreement with the analytic expression from eq. ( 14).Overall, these results demonstrate the principles of the proposed algorithm and that, for a simple toy example, it converges to the expected result.

Method
Having outlined the underlying algorithm, we now describe each of the steps in the complete algorithm in detail.

Constructing proposal distributions
With this formulation of nested sampling, the main design choice is how to construct the proposal distribution q j (θ) at each iteration (step 2).This is akin to drawing new samples in standard nested sampling however, since we no longer require an ordered sequence of points with decreasing prior volume, new points no longer need strictly increasing likelihood values.
The new proposal q j (θ) at each iteration is defined in terms of a likelihood threshold L t : of the current N live points, M j are discarded based on a likelihood threshold and the remaining N live − M j points are used to construct the next proposal distribution q j (θ).In our implementation, this is done by training a normalising flow.The result is a series of increasingly dense proposal distributions, which is equivalent to the distributions becoming narrower in the log-likelihood space.This is shown in fig. 1.
We therefore require a method for determining the likelihood threshold L t used to determine how many points will be discarded before constructing the next proposal distribution.We consider two methods, both of which use weights which quantify the relative importance of each sample θ j compared to the prior.Additionally, one could include the likelihood in the weights, however, we leave this for future work.
In the first method, the threshold L t is determined using the (1 − ρ) quantile of the likelihood values of the samples from the previous iteration, where ρ is set by the user.To account for non-prior distributed samples used in our algorithm, we use a weighted quantile, where the weights are given by eq. ( 16).This method is based on the standard method used in SMC [38] and diffusive nested sampling [23], but with the addition of the weighted quantile.
The second method we consider is closely related to the first but uses log-weights log w i instead of w i .We consider the normalised sum of log w i for the set of N live points ordered by increasing likelihood where M is the number of live points to be discarded.We then determine the value of M at which λ(M ) ≥ ρ, for ρ ∈ [0, 1] and set L t ≡ L(θ M ).This is analogous to shrinking the log-prior volume by a factor ρ at each iteration whilst also accounting for the different weights of each sample.In practice, since the normalising flows have support over the entire prior volume, this results in increasing the entropy of q j (θ).We therefore denote this as the entropy-based method to distinguish it from the quantile-based method.
For both methods, we employ a maximum number of live points that can be removed -this prevents the remaining live points being too few to robustly train the next normalising flow.This maximum together with the value of ρ will determine the total number of samples used in the algorithm.We also employ a minimum number of samples to ensure a minimum change in distribution of training data between subsequent proposals.We discuss the advantages and disadvantages of both methods in Appendix B.

Training normalising flows with weights
As discussed in section 2.3.3, it is common practice in SMC to resample at each iteration prior to the mutation step.Different sampling methods can be used, but they all keep the total number of samples constant by including repeated samples.This works when the mutation step is a Markov kernel, but in this work we use a normalising flow to perform the equivalent of the mutation step and, when training a normalising flow duplicates in the training data, can be problematic.In extreme cases, where only a few samples are representative, the training data could contain tens of copies of the same sample, which will make training unstable.
Without a step that is equivalent to resampling, deficiencies in training can have a cumulative effect.For example, if the mapping learnt by the normalising flow q j (θ) under-samples a region of the space compared to the target, then if another normalising flow q j+1 (θ) is trained with samples drawn using q j (θ) then q j+1 (θ) will also under-sample the same region.To counteract this effect, we include weights in the approximation of Kullback-Leibler divergence (KL divergence) used to train the normalising flow.We describe this in detail in Appendix A. To train the j-th flow, we use all samples from the current meta-proposal Q j−1 (θ) that satisfy the likelihood constraint L(θ) > L t and then minimise where q j (θ) is given by eq. ( 8) and w i are the weights for each sample.In principle these weights could include the likelihood, however in this work we use the weights given by eq. ( 16) which are proportional to the ratio of the likelihood-constrained prior and the likelihood-constrained meta-proposal.

Drawing samples from the proposal distributions
At a given iteration j, once the normalising flow q j (θ) has been trained (step 2), we sample from the flow (step 3) and evaluate the likelihood for each new sample.This involves sampling from the latent distribution p Z (z) and then applying the inverse flow mapping f −1 to obtain samples in X ′ .These samples must then be mapped backed to the original space X , where the likelihood can be computed.
The number of samples drawn at a given iteration N j should be determined by drawing from a multinomial distribution with N possible outcomes (the number of proposal distributions) and N Total = N j=1 N j trials, however the weights for each outcome are not known prior to sampling.Instead, we set N j and determine the weight for the current iteration α j based on its value.We allow N j to either be equal to the number of samples removed at that iteration (M j ) or kept constant (N j = N live ).The former will maintain a fixed number of live points N live throughout the run whereas the latter allows for N live to vary.We discuss the consequences of this approximation in sections 4.4 and 4.7.
Similarly to diffusive nested sampling, all the samples are kept irrespective of their likelihood, which means that samples can "leak" below the current likelihood threshold.

Updating the meta-proposal
Having drawn samples from the current proposal distribution, the meta-proposal Q(θ) must be updated.The overall form of Q(θ) will depend on the weights α j that are assigned to each proposal.Whilst adding proposals, we approximate the weights as α j ∝ N j and normalise them such that they sum to one.This approximation can be corrected for once the sampling has been terminated by fixing the weights to their values from sampling, recomputing N j by sampling from a multinomial distribution with weights {α 0 , ..., α N j } and drawing new samples from each q j (θ) according to N j .However, in practice, we find error introduced by this approximation to be significantly smaller than the overall error of the estimated evidence.

Stopping criterion
We define the stopping criterion to be the ratio of the evidence between the live points and the current evidence where ẐLP is computed using eq.( 9) and including only the live points in the sum.The algorithm will then terminate when the condition is less than a user-defined threshold τ .This is more suitable than the fractional change in the evidence between iterations, that is used in standard nested sampling algorithms, because multiple points are removed simultaneously at each iteration, the number of points can vary between iterations and points can leak below the current L t , which all mean fractional change does not decrease smoothly and instead can fluctuate significantly between iterations.

Posterior samples
Similarly to SMC and multinest, our algorithm returns samples {θ i } N Total i=1 and their corresponding posterior weights p i given by eq. ( 10).Different methods can then be employed to draw posterior samples.The standard approach in nested sampling is to use rejection sampling [10] or multinomial resampling [28] to resample the nested samples using the posterior weights.Alternatively, the weights can be used directly in weighted histograms or kernel density estimates.
When using multinomial resampling or the weights directly, the posterior samples are not statistically independent, so it is informative to compute Kish's ESS [40] where p i is given by eq. ( 10).This gives an indication of the effective number of posterior samples in the posterior and allows for comparing results obtained via different sampling methods.It can also be used to diagnose poorly converged runs, since a low ESS is an indication that the samples and their corresponding weights are a poor match for the true posterior distribution.

Post-processing
Once sampling is complete, we correct for the approximation of the meta-proposal Q(θ) discussed in section 4.4 by redrawing N Final samples from the meta-proposal according the draws from the multinomial distribution.The number of samples can be equal to N Total or can be increased or decreased depending on the desired output.This has the additional benefit of allowing more samples to be drawn after sampling has completed and can be used to obtain more posterior samples or decrease the estimated error on the evidence.

Complete algorithm
We can now combine all these elements into a complete algorithm which is shown in alg. 1.The algorithm incorporates normalising flows but no longer requires that samples drawn from them be i.i.d.according to the prior.Furthermore, samples are drawn and their likelihoods evaluated in batches and all the samples are kept irrespective of their likelihood.Finally, the evidence is a simple sum, so it can be updated for batches of samples.Thus, this algorithm meets all the criteria that were initially set out.

Algorithm 1: Overview of i-nessai
Input: Likelihood L, Prior π, Tolerance τ , Method for determining N j , N Final Output: Evidence Ẑ, samples {Θ 1 , ..., Θ j } and posterior weights W 12 end 13 Redraw N Final samples from the final meta-proposal and compute the final evidence estimate and posterior weights.

Biases
In our algorithm, the proposal distributions (normalising flows) are trained and then sampled from, rather than being constructed post sampling.This means that, unlike in multinest, the meta-proposal distribution is an importance sampling density and eq.( 11) should give a reliable estimate of the evidence error.We verify this in section 6.1.
We also note that a different bias in the evidence arises from evaluating each normalising flow with samples that were also used to train it.This is necessary since the meta-proposal requires evaluating each normalising flow on every sample.This is a side effect of the small amount of training data available to each flow and difficulty in setting the hyperparameters for N different normalising flows prior to sampling.This bias is corrected for when the samples are redrawn as described in section 4.7 which we demonstrate in section 6.

Related work
As described in section 2, the proposed method draws from existing variations of nested sampling: the soft likelihood constraint from diffusive nested sampling [23], the formulation of importance nested sampling used in multinest [25] and the use of normalising flows as described in Williams et al. [10] and Moss [9].However, it also has parallels to standard importance sampling and the methods derived from it.
Considering the use of a sequence of normalising flows to approximate a target (or posterior) distribution, the most closely related works are Nested Variational Inference [41], Annealed Flow Transport Monte Carlo [42] and Preconditioned Monte Carlo [43].The first is a hybrid between Variational Inference and SMC where a series of parameterised distributions are simultaneously optimised using an annealed version of the target distribution.In the latter two works, the standard SMC algorithm is modified to include an additional step that uses a normalising flow.Additionally, in Karamanis et al. [43] the authors apply their algorithm to gravitational-wave inference, however only a single simulated event is analysed rather than a set of events.
As with any stochastic sampling algorithm for Bayesian inference, this work can also be compared to simulation-based or likelihood-free inference [33] where the posterior distribution is approximated using repeated simulations of the data instead of evaluating the likelihood.This technique has been applied to data analysis in physics and astrophysics, including but not limited to gravitational-wave data analysis [44][45][46][47], cosmology [48,49] and particle physics [50].The approach used in these methods involves training on a dataset that is representative of the entire parameter space and then being able to perform inference for any given point in that space.This is the opposite to the approach employed in this work, where the algorithm is general purpose and is not trained for a specific task but instead is trained on the fly, removing the need for expensive initial training at the cost of being slower when performing inference.

Results
We present results obtained using the algorithm described in section 4.8 on range of problems.We implement the algorithm in the nessai software package and it is available at [51].To distinguish it from the version of nessai described in Williams et al. [10], we will refer to it as i-nessai.
We run all our experiments using normalising flows based on RealNVP [36] as we find that more complex flows, such as Neural Spline Flows [52], over-fit to the small amount of data available ‡ and, compared to the other components of the algorithm, are too computationally expensive to justify using.Furthermore, i-nessai requires storing the normalising flow for each level so using a flow with more parameters can significantly increase the memory footprint of the algorithm.
We start with a series of tests using analytic likelihoods followed by a test using a more challenging likelihood and compare these results to those obtained with nessai.We then apply i-nessai to two different gravitational-wave analyses.Finally, we investigate parallelisation of the algorithm and how it scales with the number of live points.

Gaussian Mixture
Figure 2. Mean estimated log-evidence before (blue cross) and after (orange dot) the resampling step described in section 4.7 for an n-dimensional Gaussian and Gaussian Mixture.The error-bars show the mean estimated error for the log-evidence.The estimated evidence has been rescaled using the true value such that the distributions of log-evidences should be centred around zero.The number of samples drawn during the resampling step is set such that is equal to the number of samples accumulated during the initial sampling.
For all experiments, we use the entropy-based method for constructing each proposal distribution described in section 4.1 with ρ = 0.5.We discuss this choice in Appendix B. We also set the number of samples per flow to a constant N j = N live .Code to reproduce all the experiments is available at https://doi.org/10.5281/zenodo.8124198 [53].

Validation using analytic likelihoods
We start by validating i-nessai using likelihoods for which the evidence can be computed analytically in n dimensions.We choose to analyse the simple case of an n-dimensional Gaussian.For a more complex case, we employ the n-dimensional Mcomponent Gaussian mixture likelihood described and used in Moss [9] and Higson et al. [21] L where µ (m) and σ (m) are the mean and standard deviation of each component in all dimensions and M m=1 W (m) = 1.We use the same hyperparameters [9,21]: .., n} and m ∈ {1, ..., M }, and σ (m) = 1 ∀ m ∈ 1, ..., M .
For both likelihoods, we consider n = {2, 4, 8, 16, 32} and use uniform priors on [−10, 10] n .The analytical log-evidence for both models is ln Z = −n log 20.We analyse each likelihood 50 times, including redrawing the samples as described in section 4.7, and examine the distribution of the log-evidence estimates and the corresponding estimated error.In fig.2, we include the result of the redrawing of the samples and recomputing the final log-evidence estimate.This shows that without redrawing the samples there is a bias in the estimated log-evidence, however this bias is small compared to the value of Comparison of results produced using nessai (orange) and i-nessai (blue) when applied to the n-dimensional Gaussian, Gaussian Mixture and Rosenbrock likelihoods as described in sections 6.1 and 6.3.From top to bottom, results are shown for the final estimated log-evidence rescaled by a reference evidence (the true value for the Gaussian and Gaussian Mixture and the mean value obtained with i-nessai for the Rosenbrock), the estimated log-evidence error, the total number of likelihood evaluations, the total wall time in seconds and the ESS of the posterior distribution.Results are averaged over 50 runs with different random seeds for both samplers and the error bars show the standard deviation.
the log-evidence, for example, for the 32-dimensional Gaussian and Gaussian Mixture the true log-evidence is -95.86 and the average biases are 0.6% and 0.9% respectively.After redrawing the samples, i-nessai reliably estimates the evidence for both models for all values of n.We also compare the distribution of the re-computed log-evidences alongside the expected distribution computed using eq.( 11) in Appendix C and observe that the estimated log-evidence errors agree with the observed distributions.

Comparison with standard nested sampling
We now compare i-nessai with standard nested sampling, in particular the standard version of nessai.This allows us to verify the results obtained with i-nessai, compare the observed and estimated evidences and evidence errors, the number of likelihood evaluations, the wall time and ESS of the posterior distribution.We repeat the analyses described in section 6.1 using nessai and present the results for both likelihoods in fig. 3.
Figure 3 shows that i-nessai produces estimates of the log-evidence for the Gaussian and Gaussian Mixture that are consistent with nessai but have significantly lower variances and the corresponding estimates of the error are correspondingly smaller.We explore how the error on the log-evidence estimate scales in section 6.7.Furthermore, fig. 3 shows that i-nessai requires a comparable number of likelihood evaluations in lower dimensions but more than an order of magnitude less in higher dimensions and a similar trend is seen with the wall time.However, this behaviour is highly dependent on the user-defined settings, which in these experiments were set based on the requirements for the high-dimensional analyses.The ESS of the posterior distribution highlights a notable difference between the two samplers; with nessai the ESS increases as the number of dimensions increase for both likelihoods whereas with i-nessai, for the Gaussian Mixture likelihood, it decreases in higher dimensions but is still of order 10 4 .Since in importance nested sampling the ESS depends on how well the metaproposal approximates the likelihood times the prior, a lower ESS indicates a "worse" approximation.In contrast, in standard nested sampling, and therefore nessai, the ESS does not depend on the convergence of the sampler and an under-or over-constrained result can still have a large ESS.

Testing on more challenging likelihoods
To further test i-nessai, we consider the n-dimensional Rosenbrock likelihood [54] which has highly correlated parameters and is recognised as a challenging function to sample.We use the more involved variant [55,56] where the log-likelihood is defined as with a uniform prior on [−5, 5] n .We test for n = {2, 4, 8} and run i-nessai 50 times for each n.Above n = 2 there is no analytical solution for the log-evidence of the Rosenbrock likelihood, so we compare results to those obtained with nessai.We present these results in fig. 3. We observe that i-nessai is consistent with nessai for n = 2 but for n = {4, 8} predicts a lower evidence than nessai, however the relative difference is less than 1%.The number of likelihood evaluations and wall times are comparable between both samplers but i-nessai has a larger ESS in n = {2, 4} and lower in n = 8.
To better understand these differences, we inspect the results obtained with nessai and find that the insertion indices [10,57] are consistent with the results being overconstrained (see Appendix D).This corresponds to the log-evidence being marginally over-estimated which agrees with the differences in estimated log-evidence observed in fig. 3.

Probability-probability test with binary black hole signals
As a more practical test for i-nessai, we repeat the analysis used to validate nessai in Williams et al. [10], where we used bilby [4] and nessai to analyse simulated signals from compact binary coalescence of binary black holes injected into 4 seconds of data sampled at 2048 Hz in a three-detector network.For this analysis, we use the same priors (described in Appendix C of Williams et al. [10]) and enable phase, distance and time marginalisation in the likelihood.This reduces the parameter space to 12 parameters.We analyse 64 injections simulated from the same priors and produce a probabilityprobability (P-P) plot and corresponding p-values using bilby.This analysis includes the resampling step described in section 4.7 and we re-draw the same number of samples that were used in the initial sampling, doubling the number of likelihood evaluations.The probability-probability plot is presented in fig. 4 with individual and combined p-values.The combined p-value is 0.3798 which demonstrates that i-nessai reliably recovers all 12 parameters.Furthermore, these results are obtained without introducing any of reparameterisations used in standard nessai [10] to handle, for example, angles and spin magnitudes.
In fig. 5, we show the sampling time and the number of likelihood evaluations required to reach convergence.The median number of likelihood evaluations is 6.5 × 10 5 and the median wall time is 119 minutes.We also include results obtained using nessai and dynesty [28] §, which has been used extensively for gravitational-wave inference [15][16][17]58].Probability-probability plots for both samplers are shown in Appendix E. We observe that the median reduction in the number of likelihood evaluations are 2.68 and 13.3 for nessai and dynesty respectively.These equate to reductions in the total wall time of 4.2 times and 17.2 times.

Binary neutron star analysis with reduced order quadrature bases
We simulate the signal from a binary neutron star merger similar to GW190425 [59] at a distance of 45 Mpc using IMRPhenomPv2 NRtidalv2 [60] and inject it into 80 seconds of simulated noise from a two-detector network with aLIGO noise spectral density sensitivity [61] sampled at 8192 Hz.The resulting signal has an optimal network SNR of 30.12.
To analyse the signal, we use IMRPhenomPv2 [62][63][64] with a Reduced-Order-Quadrature (ROQ) basis [65] to reduce the cost of evaluating the likelihood∥.We § We use dynesty version 1.0.1 with the custom random walk implementation included in bilby version 1.2.1 [4,58] ∥ We use the ROQ data available at https://git.ligo.org/lscsoft/ROQ_data.also limit the analysis to assume aligned spins and use a low-spin prior a described in Abbott et al. [59].We run the analysis using i-nessai, nessai and dynesty.We repeat each analysis with four different random seeds and combine the posterior distributions for each seed into a single distribution.We use 16 cores for each analysis to decrease the overall wall time.The settings for i-nessai are tuned to ensure that the effective number of posterior samples are comparable to the other samplers.
In fig.6, we show how the meta-proposal evolves as more proposal distributions (normalising flows) are added over the course of sampling.This shows how the proposals converge around the parameters of the injected signal which correspond to the region with the highest log-likelihood.
To quantify the differences between the results, we compute the Jensen-Shannon divergence (JS divergence) between the marginal posterior distributions for each parameter as described in Romero-Shaw et al. [58].We use the threshold described in  Ashton and Talbot [66] to determine if the JS divergence indicate significant statistical differences between the results.We find that all the divergences are below the threshold, except for the in-plane spin χ 1 , for which i-nessai and nessai agree but dynesty marginally disagrees with both.We include the complete set of JS divergences in Appendix F and a corner plot comparing the distributions in Appendix G.We also compare the total number of likelihood evaluations and wall time for each sampler in table 1.From these results we see that, on average, i-nessai requires 1.4 and 42.5 times fewer likelihood evaluations than nessai and dynesty respectively.

Parallelisation
As mentioned previously, the formulation of nested sampling used in this work does not have the same serial limitations of standard nested sampling.The algorithm we present is designed around drawing new samples and evaluating their likelihood in parallel.Figure 6.Evolution of the proposal distributions (q i (θ)) included in the meta-proposal when performing inference on the binary neutron star injection described in section 6.5.Brighter colours indicate later iterations in the algorithm.Left: the 90% contours for each of the proposal distributions in the chirp mass-mass ratio space.Only a small region of the parameter space around the highest likelihood is shown.The cross-hair indicates the injected value.Right: the distribution of log-likelihoods for each of the proposal distributions.
This leverages the inherently parallelised nature of the normalising flows.However, the process of training subsequent proposals to add to the meta-proposal is still a serial process.
In standard nessai, the costs of rejection sampling and training set an upper limit for the reduction in wall time that can be achieved by parallelising the likelihood evaluation.However, the total cost of training typically accounted for less than 8% of the total wall time [10].In i-nessai, the rejection sampling step is no longer necessary, so the training is now the main limiting factor and the potential reduction in wall time is far greater.In fig. 7, we present results showing how the wall time decreases for an increasing number of cores for one of the binary black holes injections used in section 6.4.This shows how initially the wall time is dominated by the cost of evaluating the likelihood but as more cores are added the inherent cost of sampling, which includes training the flows and drawing new samples, becomes the dominant cost.However, in this example, it only accounts for 13% of the total wall time when running on a single core.

Algorithm scaling
In i-nessai the number of live points has a different function to that in a typical nested sampler since, in combination with the method used to determine new levels, it will determine how many points are removed at an iteration and how many remain to train the normalising flow.We previously noted that, for nessai, 2000 points were needed for reliable results [10].We now test i-nessai with different values of N live and set the number of samples per flow N j = N live We evaluate the scaling of i-nessai as a function of N live and present the results in fig.8 for a 16-dimensional Gaussian likelihood sampled with N live = {100, 500, 1000, 2000, 4000, 6000, 8000, 10000}.The estimated log-evidence is consistent with the true value for all values of N live and both the observed and estimated standard deviations decrease as N live increases, which is consistent with eqs.( 9) and (11).We observe that the number of likelihood evaluations scales approximately linearly with the number of live points.This contrasts with the wall time which, for a 100 times increase in the number of live points, only increases by ∼ 22 times.This is the result of using a likelihood that has a low computational cost, so the cost of running the sampler is dominated by the operations related to the normalising flow: training, drawing new samples and computing the meta-proposal probability as given by eq. ( 13).In practice, most likelihoods will have a higher computational cost and the wall time will scale approximately linearly with N live .

Discussion and conclusions
In this work, we present an importance sampling-based nested sampling algorithm, i-nessai, that builds on existing work [23,25,38] to incorporate normalising flows and .Scaling of i-nessai as a function of the number of live points N live for an 16-dimensional Gaussian likelihood, as described in section 6.1.Results are averaged over 10 runs and the error-bars show the observed standard deviation.From top to bottom the results show the mean estimated log-evidence rescaled by the true value, the mean estimated standard deviation for the log-evidence, the total number of likelihood evaluations, the total wall time and the ESS of the posterior distribution as defined in eq.(20).
overcome the main bottlenecks in nessai described in Williams et al. [10].The resulting algorithm is a hybrid between standard nested sampling and SMC, where normalising flows are successively trained and added to an overall meta-proposal that describes the distribution of samples.
We demonstrate that i-nessai reliably estimates the log-evidence and associated error for Gaussian and Gaussian Mixture likelihoods in up to 32 dimensions.When we compare these results to those obtained with standard nessai, we observe that i-nessai converges significantly faster and requires fewer overall likelihood evaluations.Furthermore, the observed variance in the estimated log-evidence is consistently less than for nessai.This demonstrates that i-nessai produces consistent evidence estimates at a fraction of computational cost while also being more precise.
We perform inference on 64 simulated gravitational-wave signals from binary black hole coalescence using i-nessai and show that it passes a probability-probability test (fig.4) which indicates that it produces unbiased estimates of the system parameters.Furthermore, these results are obtained without introducing problem specific reparameterisations.Similarly to the analytic likelihoods, we compare these results to those obtained with nessai and dynesty and observe a median reduction in the number of likelihood evaluations of 2.68 and 13.3 times respectively, which equates to a 4.2 and 17.2 times reduction in the total wall time.
To further demonstrate the advantages of i-nessai compared to standard samplers, we perform inference on a simulated GW190425-like binary neutron star merger using ROQ bases [65] and aligned low-spin priors.The inference completes in just 24 minutes, 2.4 and 15.5 times faster than nessai and dynesty respectively, while also producing consistent posterior distributions and only requiring 1.01 × 10 6 likelihood evaluations compared to 1.42 × 10 6 and 4.30 × 10 7 respectively.
We also show how the likelihood evaluation can be parallelised in i-nessai and find that, once of the cost of evaluating the likelihood becomes negligible, training the normalising flows and drawing new samples are the main limiting factors.This is in contrast to nessai, where performing rejection sampling is the main limiting factor, accounting for approximately 40% of the time when running on a single core.In i-nessai training and drawing new samples account for significantly less of the total time.It therefore has improved scaling with respect to the number of cores compared to nessai, as shown in fig. 7.
A downside of this approach when compared to nessai is that the order statisticsbased tests proposed in Fowlie et al. [57] and included in nessai are no longer applicable since we no longer require points be distributed according to the likelihood-constrained prior.It is therefore harder to identify under-or over-constraining in i-nessai.The ESS (eq.( 20)) can be used to diagnose issues during sampling, however it is not always a reliable diagnostic.
In future work we will consider alternative methods for constructing the metaproposal which do not rely on discard samples, for example using only the weights in eq. ( 18) and we will explore optimising the meta-proposal weights after sampling.We will also explore applications of i-nessai more complete gravitational-wave analyses like those described in [15][16][17] which included calibration uncertainties and waveforms with higher-order modes.Another possible application to explore is model comparison; typically, if we want to obtain a posterior distribution for a different prior than that used for the sampling, the existing posterior samples must be re-weighted using an alternative prior.However, the formulation of the nested sampling in this work would allow for the prior to be changed post-sampling and the evidence recomputed by updating eq. ( 4), so long as the new prior does not extend the boundaries of the prior using during the initial sampling.
In summary, we have introduced an importance nested sampling algorithm, i-nessai, that leverages normalising flows and addresses the bottlenecks in nessai [10].
We have demonstrated that i-nessai produces results that are consistent with standard nested sampling for a range of problems, whilst requiring up to an order-of-magnitude fewer likelihood evaluations and having improved scalability.Similarly to nessai, i-nessai is a drop-in replacement for existing samplers, meaning it can easily be used to accelerate existing analyses.

Figure 1 .
Figure1.Results for the toy example described in section 3.1.Top: the final samples are shown in grey, the solid lines show the 1-σ contour for each proposal distribution starting with the prior, lighter colours indicate later iterations.The orange dashed line shows the 1-σ contour for the analytic posterior distribution.Bottom left: distribution of log-likelihoods for the final samples drawn from each proposal distribution.Bottom right: distribution of the log-likelihoods of the final samples weighted by their corresponding posterior weights.The orange dashed line indicates the analytic posterior distribution computed using eq.(14).

Figure 3 .
Figure3.Comparison of results produced using nessai (orange) and i-nessai (blue) when applied to the n-dimensional Gaussian, Gaussian Mixture and Rosenbrock likelihoods as described in sections 6.1 and 6.3.From top to bottom, results are shown for the final estimated log-evidence rescaled by a reference evidence (the true value for the Gaussian and Gaussian Mixture and the mean value obtained with i-nessai for the Rosenbrock), the estimated log-evidence error, the total number of likelihood evaluations, the total wall time in seconds and the ESS of the posterior distribution.Results are averaged over 50 runs with different random seeds for both samplers and the error bars show the standard deviation.
of events in C.I.

Figure 4 .
Figure 4. Probability-probability plot for 64 simulated signals from binary black hole coalescence analysed using i-nessai.The shaded regions indicated the 1-, 2-and 3-σ confidence intervals.Individual p-values are shown for each parameter and the combined p-value is also shown.

Figure 5 .
Figure 5.Total sampling time versus number of likelihood evaluations for i-nessai (blue dots), nessai (orange crosses) and dynesty (green plus signs) for the 64 binary black hole injections described in section 6.4.

Figure 7 .
Figure 7.Comparison of the wall time spent training the normalising flows and evaluating the likelihood in nessai and i-nessai as a function of the number of cores.Results are shown for one of the binary black hole injections described in section 6.4 and are averaged over four runs.

Figure 8
Figure 8. Scaling of i-nessai as a function of the number of live points N live for an 16-dimensional Gaussian likelihood, as described in section 6.1.Results are averaged over 10 runs and the error-bars show the observed standard deviation.From top to bottom the results show the mean estimated log-evidence rescaled by the true value, the mean estimated standard deviation for the log-evidence, the total number of likelihood evaluations, the total wall time and the ESS of the posterior distribution as defined in eq.(20).
Figure E1.Probability-probability (P-P) plot showing the confidence interval versus the fraction of the events within that confidence interval for the posterior distributions obtained using nessai and dynesty for 64 simulated compact binary coalescence signals produced with bilby and bilby pipe.The 1-, 2-and 3-σ confidence intervals are indicated by the shaded regions and p-values are shown for each of the parameters and the combined p-value is also shown.

Figure G1 .
Figure G1.Posterior distributions for the GW190425-like injection described in section 6.5.Results are shown for dynesty in green, nessai in orange and i-nessai in blue.The 1-σ confidence intervals for each parameter are shown in the marginal histograms.

Table 1 .
Total likelihood evaluations, wall time in minutes and ESS of the posterior distribution for the binary neutron star analysis with ROQs as described in section 6.5 for dynesty, nessai and i-nessai.Results are averaged over four runs and the mean and standard deviations are quoted.All analyses were run with 16 cores.