Host group degeneracy in gravitational lensing time delay determination of $H_0$

Massive elliptical galaxies, that serve as lenses in gravitational lensing time delay measurements of the Hubble parameter $H_0$, often reside in a host group. We consider degeneracies in the modeling of the group halo. When the group effect on imaging can be summarized by its flexion (the next order term beyond shear in the tidal expansion), the posterior likelihood map can develop disjoint local minima, associated with an approximate discrete symmetry of a dominant flexion term. Monte-Carlo Markov Chain (MCMC) algorithms that are not designed to explore a rich posterior landscape can miss some of the minima, introducing systematic bias. We study mock data and demonstrate that the bias in $H_0$ can exceed $10\%$, and pulls the inference value of $H_0$ above its truth value, for a reason that can be traced to the structure of a mismodeled flexion term. MCMC algorithms that are designed to cope with a rich posterior landscape can uncover the structure. If the group is X-ray bright enough, X-ray data may also help to resolve the degeneracy, by pinpointing the group's center of mass. Finally, we show that some implementations in the literature used an inaccurate kinematical prior, mis-modeling the group velocity dispersion by as much as $20\%$

A core feature is an approximate mass-sheet degeneracy (MSD) [15]. It could be an intrinsic characteristic of the lens galaxy itself, on distances of dozens of kpc [26]. However, the effect could also come from larger scales. On intermediate scales, in between cosmology and lens internal structure, it is noteworthy that massive galaxies like the lenses of [6][7][8][9][10][11] are often members of a group 1 , and the dark matter halo of a host group may act to some extent as a core-MSD. Ref. [29,30] studied the impact of host and line of sight (LOS) groups on lensing systems. Among the systems considered, PG1115+080, RXJ1131-1231, HE0435-1223, and WFI2033-4723 featured in the H 0 campaign of [11] 2 . We show these systems in Fig. 1. Interestingly, PG1115+080 yields a high central value of H 0 (81 +8 −7 km/s/Mpc) [11]; at the same time, this lens resides in a massive group, inducing Lensing-inferred H0 (from [11], with "H0(CMB)" taken from [23]) vs. estimated LOS convergence (from [30]).
convergence κ g ≈ 0.2 [30]. The group convergence goes directly into the inference of H 0 , via δH 0 /H truth 0 ≈ −δκ g , where δκ g = κ truth g − κ model g is any error in the model determination of κ g , and δH 0 = H truth 0 − H model 0 . We do not know if the lensing results for PG1115+080 are indeed biased by its group modeling (and will not make such a claim in this paper); but clearly, it is important to understand to what accuracy can lensing analyses determine κ g .
In this paper we explore the group effect. The outline and main results are as follows. In Sec. II we consider simple analytic estimates. We show that when the distance separating the group's centroid from the primary lens is large in comparison to the primary lens's Einstein angle, attempts to model the group halo effects based on imaging data suffer from an approximate MSD, where the mass sheet comes from the group halo itself. This version of the MSD persists even when second-order tidal effects (flexion terms [32,33]) are clearly detectable. Kinematics and astrometry of tracer galaxies are needed to break the degeneracy.
In Sec. III we perform numerical mock data analysis, motivated by a realistic example. We show that the posterior likelihood of the lensing reconstruction problem exhibits three disjoint minima. These minima reflect an approximate discrete degeneracy, related to the transformation properties of a dominant flexion term under coordinate rotations around the primary lens. We show that a naive implementation of a commonly used Monte-Carlo Markov Chain (MCMC) algorithm tends to fall into one of the minima, missing the others. Interestingly, and dangerously, the best fit H 0 obtained in any one of the wrong minima is systematically biased high, for a reason that we explain in Sec. IV. The size of the H 0 bias can reach ∼ 10%. An MCMC implementation that is especially designed to probe a rich likelihood landscape, uncovers the full degeneracy structure.
We summarize in Sec. V. Many details are kept to appendices, including relevant calculations that exist in the literature (for completeness of some of our arguments), and also a few results that we did not see elsewhere. In App. A we spell out some details of the lensing potential of a Navarro-Frenk-White (NFW) profile [34]. In App. B we review the relation between group member velocity dispersion and halo model. We note that some lensing studies implemented an incorrect kinematics prior for the NFW group model. Depending on selection cuts and some other factors, the systematic error in the interpretation of group velocity dispersion can reach ∼ 20%. In App. C we give a rough estimate of the probability that an MCMC will actually fall into a wrong minimum; for the example of PG1115+080 [11] this probability is not large, on the order of ∼ 10%. In App. D we attempt a (very) crude estimate of the fraction of strong galaxy lensing systems where flexion degeneracy may be expected to become a concern, finding this fraction to be in the ballpark of ∼ 10% − 30%. In App. E we collect expanded versions of MCMC triangle plots along with some sanity checks of our analysis.
We should comment that throughout the analysis, we do not consider stellar kinematic measurements of the primary lens itself (see [35,36] for state-of-the-art). Primary lens kinematics can constrain the group-induced MSD, if it can reach a sensitivity for over-all scaling of the primary lens mass model at the level of 1 − δκ g .

II. HOST GROUP AS A CORE-MSD
In this section we provide simple analytic estimates that clarify the group effect on the lensing problem. The discussion is useful in understanding features of the numerical analysis of the next section.
In systems like those considered in [11], the Einstein angle of the primary lens is of the order of θ E ∼ 1 ′′ , and most of the imaging information lies at angular separation θ ∼ θ E around the lens. This angular scale projects onto a physical separation of the order of ≲ 10 kpc at the redshifts z l ∼ 0.1 − 1 of typical lenses. In comparison, a typical separation of any galaxy (with the possible exception of the brightest group galaxy (BGG)) from the host group's center of mass is ≳ 100 kpc, that is, angular distance h ≳ 10 ′′ . Therefore, for a rough estimate of the impact of a group on imaging analyses, it is sensible to expand the group's lensing potential in powers of θ/h.
We adopt complex notation for 2D angle vectors on the sky [37,38], defining, e.g., etc. We set the origin of coordinates at the center of the primary lens. With this formalism, the lensing equation can be written as Here, α l (θ) is the deflection angle due to the primary lens, ∆α(θ) is the deflection due to the host group or cluster, and β is the source position. The external convergence and shear, κ ext and γ ext , contain a combination of different line of sight (LOS) contributions 3 .
In what follows, when we refer to a host group, we consider the group's central dark matter halo, rather than individual member galaxies. For a group located at center of mass position e iϕ h, with h ≫ |θ|, ∆α can be expanded as a power series in θ [38] (see App. A for more details): In the case of an axisymmetric group halo profile, it is possible to decompose the expansion coefficients as where ∆β 0 , ∆γ 0 , F 0 , G 0 are real numbers that depend on h but not on ϕ. Note that ∆κ is independent on ϕ.
Axisymmetry is mildly broken in realistic elliptic profiles; we discuss this point in App. A 4. However, as long as the ellipticity is small, the ϕ dependence of ∆β 0 , ∆γ 0 , F 0 , G 0 is weak, and does not affect our main point.
Of the expansion coefficients in Eq. (3), ∆β is degenerate with β, which is a free parameter in the modeling, and has no effect on time delays. ∆κ is degenerate with κ ext , a free parameter. 4 Varied along the MSD, the combination ∆κ + κ ext is invisible to imaging, but affects time delays. ∆γ is degenerate with γ ext , a free parameter. Thus, as far as imaging data is concerned, the leading order effect that constrains the group comes from the MSD-invariant combinations F/(1−∆κ−κ ext ) and G/(1−∆κ−κ ext ), related to what is known in the literature as reduced flexion [32,33].
An attempt to constrain the group model via imaging data, suffers from the following three main difficulties. The first obvious point is that the flexion terms are small; in the 1/h expansion, the flexion terms are parametrically suppressed as F, G ∝ ∆κ/h. The second point is that direct modeling of the group halo can still leave room for a residual MSD: a model to capture correctly the imaging distortion produced by the flexion, while mismodeling the convergence. The third point, which may be the most important in practice, is that the posterior likelihood of the group halo model exhibits a discrete approximate degeneracy, related to the 2π/3 phase degeneracy of the G term (see Eq. (4)).
It is useful to study a simple example. Consider a group halo described by an isotropic power-law (PL) density profile with 3D slope γ g and Einstein angle θ Eg . In this case, the coefficients of Eqs. (3) and (4) are given by For γ g ≈ 2 (singular isothermal sphere (SIS)), the line of sight velocity dispersion (LOSVD) in the center of the group is related to the group's Einstein radius via 5 We can therefore estimate the convergence, and to the flexion terms, (10) For example, ∆κ ≈ 0.1 produced by a SIS group with d ls /d s = 0.5, σ los = 400 km/s, and h = 50 ′′ , would cause (if not modeled) a ∼ 10% bias in the inference of H 0 , while the imaging distortion produced by the group's flexion field at θ ∼ 1 ′′ would only amount to δθ ≈ 0.0075 ′′ , typically dominated by the G term. Next order terms in the expansion (beyond the flexion) are further suppressed by ∼ 1 ′′ /h ∼ O(1/10), and can be neglected.
This discussion highlights the obvious hierarchy between flexion and convergence, but as was mentioned earlier, there is also MSD. In Eqs. (5)(6), the leading effect of the MSD (at small ∆κ ≪ 1) is seen by noticing that even if one can fix F and G exactly from imaging data, this still allows ∆κ to vary freely, as long as h is varied simultaneously with ∆κ ∝ h. Thus, unless we have good external prior data on the group's center of mass (parameterised by h), determining the flexion from imaging data alone does not fix the convergence. (More precisely, the degeneracy is somewhat regulated by the fact that the true MSD invariant quantities are F/(1 − ∆κ − κ ext ) and G/(1 − ∆κ − κ ext ), rather than F and G themselves. The exact MSD is then captured via the freedom to adjust ∆κ and h while keeping Finally, another important point is the phase degeneracy of the G term. This degeneracy means that models of the halo in which the direction to the group's center is changed by ϕ → ϕ ± 2π/3 yield identical G terms. Although the rotated models do not reproduce the truth value of the F term, the G degeneracy can produce isolated local minima in the posterior likelihood, that can trap unweary MCMC chains. Interestingly, the offset F term in the "wrong" minima causes the model to pull towards a biased estimate of the group's convergence, thereby biasing H 0 . This issue is seen to be an important point in the next section.

III. ILLUSTRATION WITH MOCK DATA
The host group is often modeled explicitly if the system is known to reside in a group (see e.g. [11,29,30,39]). We now study such modeling using mock data. Our implementation is based on the package lenstronomy [40][41][42].

A. Mock setup description
We chose PG1115+080 as a reference object to guide our study. Ref. [11] inferred H 0 = 81 +8 −7 km/s/Mpc from this system, quite high compared to the CMB value [23]. At the same time, the system is known to reside in a group [30] with σ los = 390 +50 −60 km/s (an earlier study found σ los = 440 +90 −80 km/s [39]), estimated to induce κ g ∼ 0.2. The group's projected center of mass is not far from the primary lens (h of the order of 10 ′′ ). This suggests that flexion terms are probably not negligible for this system, distorting the image on scales ∆θ ∼ κ g θ 2 E /h ∼ 0.01 ′′ . An image of the field is shown in Fig. 2.
We consider the following mock setup. For the primary lens, we consider an elliptic power-law density profile with 3D slope γ = 2.17, Einstein angle θ E = 1.08 ′′ , and ellipticity parameters e 1 = −0.2, e 2 = 0.05, corresponding to q = (1 − e)/(1 + e) ≈ 0.66, where e = e 2 1 + e 2 2 . For the group, we consider an elliptic NFW profile [43,44] with e 1 = −0.07, e 2 = 0.03, compatible with the findings of [45]. Fig. 3 illustrates the setup, including the truth position of the group halo center of mass. This setup could mimic PG1115+080 (Fig. 2) if the BGG -or the X-ray blob found by [46] -happens to indicate the group's center of mass position.
Following [9], we choose our inference pipeline to include only a simple spherical group model, ignoring group halo ellipticity in the modeling.
B. External priors: tracer galaxy kinematics and theoretical input from N-body simulations As in [9], we include external priors on the group halo, dictated by cosmological N-body simulations and by kinematics data. We defer most of the details to App. B, notably Secs. B 2 and B 3. However, we would like to point out a potential inaccuracy in the kinematics analyses of some previous works.  [47]). The primary lens, GL, is marked by a cross, surrounded by the quasar images. White ellipse shows the 68% CL prior derived in [29] and used in [9] to constrain the group's center of mass. Green and blue ellipses give a rough illustration of the X-ray emission associated to the group, found in [46] and [48], respectively, using different subtraction schemes for the quasar emission. Galaxy groups are often assumed to follow the NFW density profile [34], In terms of the parameters ρ 0 and R s , the LOSVD of tracer galaxies (with number density assumed to follow the same profile as the dark matter density) can be expressed as where f (a) is a dimensionless function derived in Sec. B 2. While Eq. (12) (averaged as needed within some aperture cut of the observations) gives the correct translation between LOSVD data and the NFW model parameters, Ref. [9] (after [49,50]) considered a different expression as a proxy for the LOSVD data: where R vir and M vir = M (R vir ) are the virial radius and virial mass, and c vir = R vir /R s is the NFW concentration parameter.In App. B we show that identifying the quantitȳ σ 2 with the observable σ 2 los introduces an error of up to ∼ 20% (the precise error depends on the analysis aperture).
One reason for introducing the auxiliary quantities R vir and M vir , is that N-body simulations provide theoreticallymotivated priors that are often presented in terms of these quantities [51,52]. These cosmological priors are, however, quantitatively and conceptually decoupled from the kinematics data interpretation. Instead, as we review in App. B, the cosmological priors dictate a certain redshift-dependent relation between the NFW parameters ρ 0 and R s . There is no obstacle to implement this theoretical prior while still maintaining the correct kinematical expression, Eq. (12).
In our main implementation of the MCMC, we set the standard deviation for σ los to 120 km/s. This doubles the nominal uncertainty quoted by [30] for the group of PG1115+080, but we believe that such cautionary procedure is reasonable. To be clear, we are not suggesting to doubt the observational LOSVD from [30]. Rather, the uncertainties we worry about concern the theoretical interpretation within simplified halo models. The systematic error due to usingσ instead of σ los , as done in [9], is of the order of δσ 2 los /σ 2 los ∼ 20%, so δσ los /σ los ∼ 10%, or δσ los ∼ 40 km/s, quite comparable to the "bare" observational uncertainty. There are additional plausible errors: the group is likely to be aspherical [45], the velocity distribution need not be isotropic, and the group may not be fully virialised. Each of these could cause systematic shifts of tens of percent in the kinematics interpretation.
For completeness, in App. E we check how a LOSVD uncertainty of 60 km/s changes the results. We find the difference to be quantitatively insignificant for our main results.

C. Results
First, to obtain a global view of the posterior "landscape", we run the zeus MCMC algorithm [53,54], which is designed to cope with multiple likelihood minima. The local minima could trap an MCMC, if the scanning algorithm is not suited to probe multimodal posteriors. To see this, we repeat the analysis, this time using the emcee algorithm [55]. The results are shown in the top-right and bottom panels of Fig. 4. Indeed, emcee tends to discover only one of the local minima, missing the others. In the top-right and bottom-right panels, emcee converges on a biased group halo position. The biased local minima yield an H 0 posterior that is on the high side, pulling H 0 above its truth value by ∼ 3σ and ∼ 1.8σ, respectively. The convergence pattern associated to the three minima is shown in Fig. 5.
Performing more MCMC runs, we find that emcee consistently converges into just one of the minima, missing the others. The choice of the minimum found by emcee mostly depends on the initial position of the MCMC walkers in the c nfw x − c nfw y parameter space. In our runs, the initial walker allocation is guided by a Gaussian prior with a similar 68%CL radius as that used in [9] (see Fig. 2), and with centers shown by star symbols in Fig. 5.
After performing a series of numerical trials we emphasize that, at least for the given, rather broad (but realistic) width of the prior, the starting point of the MCMC walkers appears to be a more significant factor in determining which of the three minima traps the chain, than the prior center itself. These considerations coincide if the MCMC walkers are initiated at the prior center.
Given the above discussion, we can make a rough estimate of the probability of the MCMC to fall into a displaced minimum, by matching it with the probability of the group center prior to be nearer a false minimum than the truth one.
In Sec. C we estimate this probability from mock realizations of samples of tracer galaxies. The result depends on the analyses aperture, the number of galaxies in a sample, and the underlying group profile. For 13 members in the NFW profile, with an aperture of ∼ 10R s (i.e., around 3 virial radii), the false minimum probability we find is ∼ 10%.
We comment that X-ray data may help to pinpoint the centroid of a massive group, resolving the threefold degeneracy. For example, Ref. [46] found an X-ray blob that appears to be centered around the BGG of the host group of PG1115+080 (green contour in Fig. 2). If the X-ray emission can be associated with the group's centroid, it may provide a narrower prior than that derived from tracer galaxies. The X-ray analysis may be complicated by blending with the lensed quasar: using a different method to mask the quasar, Ref. [48] found a shifted, more extended, and brighter group emission. Even so, it seems plausible that X-ray data could help narrowing the group's centroid prior 6 .
It is natural to ask whether the discrete modeling bias tends to come along with an anomalously large external shear estimate. While this could indeed happen, and might serve as a useful "alarm bell" if the H 0 bias is larger than solutions with acceptable values of external shear. For example, Fig. 11 shows a false minimum solution with ∼15% bias in H 0 but with external shear values that are compatible with those found in [9] for PG1115+080 (see Figs. 7,8 there).
Finally, we emphasize again that PG1115+080 was selected to guide our mock specifically because of its massive, near-by group association. This set-up, while we believe it deserves study on its own right, may be an un-representative outlier among lensed quasar systems, and it is therefore natural to ask just how uncommon it is. A detailed analysis of the fraction of systems that may exhibit flexion degeneracy is beyond the scope of this work, but in App. D we attempt a crude estimate using the data from [30]. Our results suggest that detectable flexion distortion from LOS groups (including, not limited to, the host of the primary lens) may affect ∼ 10% − 30% of lensed quasar systems.

IV. ORIGIN OF H0 BIAS IN A DISPLACED GROUP
As we explained, the three-fold approximate degeneracy manifest in Fig. 4 is due to the behavior under rotations of the G flexion. The F flexion, however, behaves like a vector. Hence, when the inference falls into a wrong minimum, it attempts to minimize the difference between the truth F deflection and the wrong-minimum inference value of F . This can be achieved by keeping G eff the same, but reducing are the reduced flexion terms (for clarity, here we omit the cosmological term κ ext ). To see this point, denote the reduced flexion of the inference model by F eff , and denote the truth flexion by F (truth) eff . 7 We expect the MCMC to minimize with n = ±1 selecting the position of the false minimum. Since cos(±2π/3) = −1/2, the above expression is minimized for F eff = 0.
Consider the PL model of Sec. II. In this model, we have F eff /G eff = (γ g − 3)/(γ g + 1). Therefore, for 1 < γ g < 3 7 The truth and model values of the G term are assumed to approxi- (the range of interest), decreasing |F eff | at fixed G eff entails increasing γ g , while adjusting the other parameters of the model so as to keep G eff constant. Those other parameters were introduced in Eqs. (5)(6) as h and θ Eg , but we can equally well replace θ Eg by ∆κ. Now, we have G eff = − γ 2 g −1 3−γg ∆κ h(1−∆κ) . The γ g -dependent factor, γ 2 g −1 3−γg , increases with increasing γ g ; to compensate for this and keep G eff constant, the factor ∆κ h(1−∆κ) needs to decrease. For small ∆κ ≪ 1, this means that near any one of the displaced likelihood minima, the MCMC will attempt to decrease ∆κ/h in comparison to its truth value. Part of this adjustment entails decreasing the model value of ∆κ, which therefore biases H 0 high, as ∆H 0 /H 0 ≈ − ∆κ (truth) − ∆κ (model) .
In App. A we show that a similar analysis holds also for the NFW model, used in the MCMC implementation: also in that case, falling into a displaced group minimum causes the fit to underestimate of ∆κ, leading to an overestimate of H 0 . This analysis clarifies the trend seen in Fig. 4.

V. SUMMARY
Lens galaxies in quasar lensing time delay measurements are often members of galaxy groups, that must be modeled for an accurate determination of H 0 . The group modeling exhibits approximate versions of the MSD (Sec. II). Essentially, it is a displaced-center version of the core-MSD considered in [19,26].
At leading order in the tidal approximation, the group halo enters imaging through the flexion. We showed an approximate threefold discrete modeling degeneracy, associated with rotating the assumed position of the group centroid by an angle of 2π/3 around the primary lens (Secs. II and III). This produces a posterior likelihood with three disjoint minima. MCMC algorithms that fail to expose this structure may fall into a displaced minimum. The inferred value of H 0 found in a displaced minimum is systematically biased high (Sec. IV). Using numerical mock data experiments motivated by a realistic system, we demonstrated that the H 0 bias can reach ∼ 10%.
The choice of the minimum detected by the MCMC can strongly depend on the starting position of the walkers in the space of group centroid coordinates. If the starting point is chosen as the centroid prior center, then the probability for the MCMC to land in a displaced minimum may be rather small. For a sample of 13 tracer galaxies (relevant for PG1115+080) with an aperture of about 3 virial radii in an NFW halo, this probability is ∼ 10%.
Our analysis suggests the following recommendations.
1. Bayesian cosmography analyses should explore the full posterior likelihood landscape. Awareness of this possible three disjoint minima structure, if not already there, is needed.
2. X-ray data may help to pinpoint the centroid of a massive group, resolving the degeneracy.
3. As an aside, we note that some cosmography analyses (e.g. [9]) used an incorrect kinematics prior to constrain the group model. The error in the interpretation of tracer galaxy velocity dispersion depends on the analysis aperture, and can reach ∼ 20%. A correct version of the kinematics prior is reviewed in App. B.
The number of strong lensing time delay systems is expected to increase by more than an order of magnitude in the near future [57][58][59], an important step towards possibly reaching a few percent lensing determination of H 0 [60]. This program could be further assisted by many resources [61][62][63][64][65]. Our study highlights some pitfalls (and suggests solutions) that need to be taken into account if the precision goal for H 0 should also be accurate.

ACKNOWLEDGMENTS
We thank Simon Birrer, Marko Simonović, and Raphael Flauger for useful discussions. This work made use of the following public software packages: lenstronomy [40,42,66], emcee [55], zeus [53,54], corner [67], astropy [68,69]. This research has made use of the SIMBAD database, operated at CDS, Strasbourg, France [47]. This work was supported by the Israel Science Foundation grant 1784/20, and by MINERVA grant 714123. LT wishes to acknowledge association with the International Helmholtz-Weizmann Research School for Multimessenger Astronomy.

Appendix A: Deflection angle expansion, NFW profile
We start with some general preliminaries; for the discussion of the NFW profile, the reader can skip to Eq. (11).
For our purpose, which is to analyze the lensing equation in the vicinity of a particular galaxy member of the group, it is convenient to use a coordinate system that is centered on the primary lens galaxy, and displaced from the group center of mass by a separation angle ⃗ h. In these coordinates the group lensing potential reads We can expand with respect to the small parameters θ 1 /h, θ 2 /h, Recall that the lensing potential and the deflection angle are related as ∇Ψ = ⃗ α; in particular, we can write the lens equation as As discussed on the main text, in complex notation ⃗ h → he iϕ . The lensing potential Ψ in this formalism is real. One can obtain the complex deflection angle by means of the derivative operator In the expansion of the lensing potential, it is easy to see that where the derivatives are computed at the origin. Analogously, we can express third order derivatives of the lensing potential as derivatives of the shear, where G, F are the flexion terms.
In the axisymmetric case, a group halo center at ϕ = 0 and fixed h would yield the same expansion coefficients of a halo located at a generic ϕ, granted that we rotate our coordinate system accordingly. We can pass to cylindrical coordinates 8 and write In complex notation, we thus have Notice the minus sign, which is there due to our choice of coordinates; it is easy to see that which has opposite sign with respect to the usual choice of cylindrical coordinate. We can thus write so that We now specify to the NFW model. Here we discuss the spherical model without ellipticity, commenting on ellipticity App. A 4. Using a coordinate system centered on the group, the NFW lensing potential is [70] Ψ NFW (θ) = 2κ θ 2 s log 2 θ 2θ s − arccosh 2 θ s θ , Defining x = θ s /h, we have: Note that applying these expressions for x < 1 requires using the logarithmic definition of arccosh(z) = ln z + √ z 2 − 1 and allowing complex z.
. These functions are shown in Fig. 6.
We can use these expressions to analyze the modeling constraints obtained from imaging data. As we already remarked in Sec. II, of the expansion terms, ∆β 0 is exactly degenerate with the unknown source position; ∆κ is degenerate with external convergence, and can be absorbed by the MSD (changing the inference of H 0 ); ∆γ 0 is degenerate with external shear; and so, only F 0 and G 0 can produce useful modeling constraints. However, since the axisymmetric NFW model contains three free parameters κ, h, x = θ s /h, even a perfect measurement of the F 0 and G 0 terms still leaves a degeneracy.

MSD
Consider the usual MSD transformation, induced by a parameter λ: (with matching transformation on the other terms, that are not relevant here). The combinations F eff and G eff of Eq. (14) are MSD-invariant, and are the quantities that constrain the model parameters. Suppose then that we are given precise determination of F eff and G eff from the imaging. In this case, the ratio F eff /G eff determines x . Having fixed x, there remains a degeneracy in ∆κ, that we can express as In Eq. (A25), we think of G eff and x as fixed by the imaging data, while the model parameter h is free to vary (up to possible external priors, discussed in the main text). For |h G eff | ≪ 1, the parameter regime of most interest for us, we have ∆κ ≈ h G eff f κ (x)/f G (x). Namely, an imaging determination of the flexion terms G eff and F eff cannot determine ∆κ, which is almost directly degenerate with a change in the model parameter h while holding x = θ s /h andκ/h fixed.
In terms of the original NFW model parameters, the ∆κ degeneracy maps to adjusting R s ∝ h while holding ρ 0 fixed. External priors are needed to break this degeneracy. A prior on h, the cluster center of mass position, obviously ameliorates it. A prior on the group velocity dispersion also ameliorates it, since σ 2 los ∝ ρ 0 R 2 s . The important point is that the external priors are crucial: without them, the MSD associated with an NFW group is not broken by the imaging data, even with explicit modeling of the group.

G-term degeneracy
The threefold G-term degeneracy can also be clarified using the analytic expansion. To this end, it is useful to replace the model parameterκ by ∆κ. As we have seen in Sec. IV, an MCMC trapped near a displaced likelihood minimum will attempt to reduce |F eff | while keeping G eff fixed. This amounts to flowing towards x that minimizes |F eff /G eff | = |f F (x)/f G (x)|, while adjusting ∆κ and h so as to keep G eff = (f G /f κ )(∆κ/h)/(1 − ∆κ)=Const. As can be seen from Fig. 6, minimizing |f F (x)/f G (x)| pulls the fit towards smaller x. In turn, this pulls |f G (x)/f κ (x)| to a larger value, meaning that in order to compensate and keep G eff constant, the combination (∆κ/h) / (1 − ∆κ) is pulled to a smaller value. For ∆κ ≪ 1, this tends to make the fit pull towards a model in which ∆κ is smaller than its truth value, causing a positive upwards bias in H 0 .

Time delays
Time delays between image A and B can be written as (see e.g. [71,72]) (A26) One can find the time delay from the complex lens equation, by noticing where the integral is a definite integral. Notice that the integral misses a possible function of θ alone; this function can be recovered by imposing τ = τ * . Focusing on τ A , in complex notation we can write (A28) In real notation,

Effect of ellipticity
We can implement ellipticity in the NFW profile by using pseudo-elliptical NFW lens models [73,74], which is a reliable approximation as long as the ellipticity is not too large. This approximation, implemented in lenstronomy, uses the spherical NFW lensing potential, computed in elliptical coordinates, Notice that under rotations of the coordinate system, ϵ is invariant but φ changes. The expansion of the potential Ψ halo,ϵ follows Eq. (A2); however, since the potential is not axisymmetric, it is not possible to factor out the ϕ dependence on ∆γ ϵ , G ϵ , F ϵ as we did in Eq. (4). Nevertheless, it is possible to express these quantities as functions of ∆γ 0 , F 0 and G 0 . Notice that we can write (here, ⃗ θ = (x, y) ⊤ ) As an example, consider where the quantities without ϵ subscript refer to the spherical functions computed at the elliptic coordinate. By defining the displacement vector in elliptical coordinates in the complex notation, ⃗ h ↔ h ϵ e iϕe , we can express For the NFW example, γ 0 is the expression in Eq. (A20). For κ, one has A similar reasoning follows for F ϵ , G ϵ . As one would expect, ellipticity modifies the effective flexion and convergence by correction terms of order ϵ. Our numerical MCMC analysis includes the full effect, and moderate or small ellipticity has only a minor effect on the results.

Appendix B: Kinematics constraints for host group
Here we review constraints on a host group, obtainable by measurements of a sample of galaxy members. We only consider spherical systems with an isotropic velocity distribution. We start with a PL mass and tracer galaxy distribution, and go on to consider the NFW profile. For clarity, our discussion repeats a number of statements from the main text, completing these with details and derivations.

Power law
We start with the PL profile, ρ(r) = ρ 0 (r/R 0 ) −γ . The surface density of this profile is resulting with the deflection angle and convergence, where The circular velocity is The circular velocity is not directly measurable. What is measurable is the LOSVD [75], where n * and s * are the galaxy number density and surface density, respectively. Eq. (B5) holds for an arbitrary isotropic profile, while Eq. (B6) applies to the PL. If the galaxy number density is distributed similarly to the mass density in the group, n * (r) ∝ (r/R 0 ) −γ , we find (valid for 3 2 < γ < 5 2 ): For the SIS (γ = 2), θ E = 8π 2 d ls ds Gρ 0 R 2 0 and σ 2 los = ds d ls 4π is sometimes adopted by lensing analyses. However, the expressions for σ 2 los depend on a series of simplifying assumptions, including: PL mass distribution; same PL galaxy number distribution; virial state (following Jeans equation); spherical symmetry; isotropic velocity distribution; and to satisfy σ 2 los = ds d ls θE 4π , the specific case of the SIS. We do not expect those assumptions to hold precisely for individual galaxy groups. Lensing analyses would be prudent to assign a larger systematic uncertainty than the "bare" observational uncertainty on σ 2 los . To explore one of those effects, consider varying γ in the relation between σ 2 los and θ E . For non-SIS systems, σ 2 los (θ) depends on θ, and a relevant observable is the brightnessweighted average of σ 2 los in some aperture θ A . We can compare such averaged LOSVD to the SIS relation: We illustrate Eq. (B9) in Fig. 7.

NFW
For the NFW profile, assuming again that the tracer galaxy density follows the mass density, Eq. (B5) can be written as: Some of these integrals can be done in closed form, but that is not particularly illuminating. The function f (θ/θ s ) is shown in Fig. 8. Assuming the NFW model, a kinematics prior should use Eq. (B10) (aperture-averaged as needed) to translate a LOSVD measurement into a constraint on the combination of ρ 0 and R s appearing in the equation. However, analyses in the literature took a somewhat different route. Ref. [9] (after [49,50]) considered the following expression as a proxy for the LOSVD of the NFW profile: where R vir = c vir R s , M vir = M (R vir ), and c vir is the NFW concentration parameter, for which it is possible to extract a theoretical prediction from N-body simulations [52]. Namely, instead of using the physical relation, Eq. (B10), to convert the measured LOSVD into a constraint on the group halo model, Ref. [9] made the identification σ 2 los →σ 2 , and then used Eq. (B12) to constrain M vir and R vir . The dependence of the RHS of Eq. (B12) on c vir can be clarified by noting that The dependence is not strong: the factor respectively. However, it is noteworthy that this dependence is not physical, but rather introduced artificially. The red solid line in Fig. 9 compares the physical aperture-weighted LOSVD to the quantityσ 2 , using c vir = 3.5, relevant for low-redshift massive galaxies. The meaning of Fig. 9 is that using Eq. (B12) and identifying σ 2 los →σ 2 can introduce a systematic error. The magnitude of the error depends on the aperture and on the value adopted for c vir , but can amount to ∼ 20%.

Kinematical and cosmological priors
Here we describe our attempt to define a kinemat-ics+cosmology prior, following a similar procedure as used in [9] for PG1115+080.
The first step taken by [9] is to identify σ los →σ. The next step is to invoke theoretical expectations based on N-body simulations. Following App. A of [50], the virial mass and the virial radius of the halo are related to a characteristic redshift-dependent overdensity: where ∆ c (z) = 178(ρ m (z)/ρ c (z)) 0.45 is the expected halo overdensity at the virial radius and ρ c (z) and ρ m (z) are the cosmological critical density and matter density at redshift z [51]. Combining Eqs. (B14) and (B12) turnsσ into separate priors on R vir and M vir : Next, Ref. [9] adds a prior on c vir , using the M vir − c vir relation from Ref. [52]: Treating c vir as a function ofσ via Eqs. (B17) and (B16), one has now produced separate priors on the parameters ρ 0 and R s : The uncertainty can be estimated by combining the observational uncertainty on σ los and the theoretical scatter on the M vir − c vir relation; the latter is taken in Ref. [9] as ±0.14 on the log 10 c vir expression in Eq. (B17). In practice, our implementation of these priors in the MCMC mock analysis is as follows: 1. The parameters that are directly sampled by the chain, defining the posterior likelihood space, are angular variables that are linearly proportional to ρ 0 and R s . In each step of the calculation, we translate the sampled value of ρ 0 into an expected value of c vir , using Eq. (B18).
2. Having obtained c vir , we convert Eq. (B17) into an equation for R vir , replacing M vir by R vir using Eq. (B14). Now, we implement the estimated theoretical scatter of ±0.14 dex on c vir in the M vir − c vir relation, to define a range of acceptable values for R s via R s ⊂ R vir /(10 +0.14 , 10 −0.14 ). In the chain, we discard sampled values of R s that fall outside of this range. This step therefore enforces a redshiftdependent correlation between ρ 0 and R s . Up to this point, we made no connection to kinematics.
3. Finally we come to the kinematics. Using Eq. (B13), we convert the model point represented by ρ 0 and R s , along with the central value of c vir , into a model prediction forσ. This prediction is then compared to the measured value of σ los using the nominal observational uncertainty to obtain a likelihood factor.

Sample variance
Sample variance is the dominant nominal source of uncertainty in σ 2 los , given that typically only a handful of galaxies are measured as tracers of the group. Ref. [30] used bootstrap to estimate sample variance directly from the measured sample of galaxies; here, we complement this route by generating random sets of N g galaxies tracing an NFW halo.
The realizations are drawn from an equilibrium phase space distribution function f (r, v). For a spherical halo with a statistically static distribution function, we have f (r, v) = f (ε(r, v)), where ε = ψ(r) − v 2 2 , ψ(r) = −ϕ(r), and ϕ(r) is the halo Newtonian potential. f (ε) is given by 9 We calculate f (ε) numerically, and use it to draw samples of N g tracer galaxies that fall within projected aperture θ A . The results of this exercise for N g = 13 (as in PG1115+080) and for different apertures are shown as blue dots in Fig. 9. For each value of θ A /θ s we generate 100 mock samples. For each sample we calculate σ 2 los directly as the variance of LOS velocity across the N g galaxies. The mean and standard deviation of σ 2 los are shown by the thick blue line and shaded region.
The variance we find for σ 2 los in Fig. 9 is roughly consistent with sample variance of a normal distribution: For comparison with Ref. [29], we also calculate uncertainty estimates for σ 2 los using the bootstrap method. As a rule, the bootstrap method provides a slightly lower uncertainty estimate than the variance found with mock realizations, but the difference is small: As noted in Sec. III C, at least for a wide group centroid prior (as derived in Ref. [30] for PG1115+080), trial and error with emcee [55] suggests that the initial placement of the walkers is a key factor in deciding which likelihood minimum will attract the fit. We can (roughly) estimate the probability of falling into a wrong minimum by the probability for the walker placement to start off closer to a false minimum than to the truth one. Let us assume that the initial placement of the walkers is chosen to coincide with the prior's center (this seems like a natural choice). In this case, the probability to fall into a wrong minimum is roughly given by the probability of the group centroid prior to be nearer a false minimum than the truth one.
We can estimate this probability using mock samples of tracer galaxies as in Sec. B 4. Consider a sample of N g galaxies, and choose a random member to be the "primary lens". From the same sample, derive a group center prior as the center of mass of the members. In general, of course, the prior center does not coincide with the center of the halo used to generate the mock. 10 Now, draw a line connecting the primary lens with the true halo center, and another line connecting the primary lens with the prior center: the prior center is closer to a false minimum if the smaller angle between these lines is larger than 60 o .
The result of this calculation depends on the number of member galaxies N g , the group halo profile, and the analysis aperture. For N g = 13 in the NFW model with an aperture of 10 R s (around 3 virial radii), we find the wrong minimum probability to be ∼ 10%.
It should be clear that the simple estimate we described here ignores a range of possible selection effects, both natural (such as mass segregation, and the bias of more massive group members to become primary lenses) as well as analysis-specific (such as luminosity-dependent contamination and incompleteness). Such effects can probably modify our estimates at the O(1) level, but should not change the order of magnitude of the result.
PG1115+080 was selected specifically because of its massive, near-by group association, this set-up may be quite uncommon. Most of the lensed quasars analyzed for timedelay cosmography may either not be associated with LOS groups; or, if they are, may lie far from the group's projected center of mass, making flexion terms less important than for PG1115+080. A detailed analysis of the fraction of systems that may exhibit flexion degeneracy is beyond the scope of this work. Nevertheless we can make a crude estimate using the data from [30], as follows.
Numerical experiments with mock analyses mimicking the main features of the pipeline of [11] suggest that flexion terms are quantitatively important in the fit if the flexioninduced deflection angle is larger than ∼ 1 mas. Very roughly, we can expect that flexion degeneracy should be a concern in the same parametric regime. It is interesting to note that all of the 4 systems that take part in both of the cosmography [11] and kinematics [30] campaigns, turn out to exhibit ∆θ ≳ 1 mas.
Appendix E: Full corner plots.
In this Appendix we collect some detailed results from the MCMC analysis.
In Fig. 10 we show triangle plots in which the cosmological prior on c vir is not included. We do this exercise in order to investigate the impact of this prior on the results. The main point to notice is that omitting the c vir prior, the bias on H 0 becomes somewhat more pronounced (compare Fig. 5, that includes this prior). At the same time, without this prior, the best fit result for R s in displaced (false) posterior likelihood minima is driven to small values. This point is shown by a comprehensive triangle plot in Fig. 11.
In Fig. 12 we show triangle plots in which the kinematics prior is enforced with a standard deviation of 60 km/s on σ los . These results can be compared to Fig. 5 from the main text, where, as noted in Sec. III B, the standard deviation on σ los was taken as 120 km/s. We do not find a significant difference. Fig. 13 gives a more complete perspective on the degeneracies and the global structure of the likelihood as exposed by a zeus run.