A Bayesian approach to strong lensing modelling of galaxy clusters

In this paper, we describe a procedure for modelling strong lensing galaxy clusters with parametric methods, and to rank models quantitatively using the Bayesian evidence. We use a publicly available Markov chain Monte-Carlo (MCMC) sampler (‘bayesys’), allowing us to avoid local minima in the likelihood functions. To illustrate the power of the MCMC technique, we simulate three clusters of galaxies, each composed of a cluster-scale halo and a set of perturbing galaxy-scale subhalos. We ray-trace three light beams through each model to produce a catalogue of multiple images, and then use the MCMC sampler to recover the model parameters in the three different lensing configurations. We find that, for typical Hubble Space Telescope (HST)-quality imaging data, the total mass in the Einstein radius is recovered with ∼1–5% error according to the considered lensing configuration. However, we find that the mass of the galaxies is strongly degenerated with the cluster mass when no multiple images appear in the cluster centre. The mass of the galaxies is generally recovered with a 20% error, largely due to the poorly constrained cut-off radius. Finally, we describe how to rank models quantitatively using the Bayesian evidence. We confirm the ability of strong lensing to constrain the mass profile in the central region of galaxy clusters in this way. Ultimately, such a method applied to strong lensing clusters with a very large number of multiple images may provide unique geometrical constraints on cosmology. The implementation of the MCMC sampler used in this paper has been done within the framework of the lenstool software package, which is publicly availablewww.oamp.fr/cosmology/lenstool..


Introduction
Strong gravitational lensing is produced when a distant object (such as a galaxy or a quasar) is serendipitously aligned with a critical foreground mass concentration.Such a phenomenon was first observed by Walsh et al. (1979) who discovered a double quasar strongly lensed by a distant galaxy.In the 1980's, with the advent of CCD imaging and its application to astronomy, giant gravitational arcs in galaxy cluster cores were discovered by two independent teams (Lynds and Petrosian 1986;Soucail et al. 1987).The lensing explanation proposed by Paczynski (1987) was soon confirmed by Soucail et al. (1988), who measured the redshift for the giant arc in Abell 370 as being roughly twice that of the cluster redshift.Together with the multiply-imaged quasars, giant arcs in galaxy clusters turned strong gravitational lensing from a theoretical curiosity into a powerful tool to probe the mass distributions of galaxies and galaxy cluster cores.Although rare in current surveys, strong lensing events are expected to number as many as a few hundred thousand over the whole sky (Cabanac et al. 2007).
In order to fully exploit strong gravitational lensing events, one generally needs high resolution imaging coupled to deep spectroscopy to measure the redshift of both the lensing object and the lensed sources.By combining, Hubble Space Telescope (HST) images with ground-based spectroscopy on 8-10m telescopes, strong lensing analysis has proved to be very successful at constraining the mass distribution of galaxies (e.g.Muñoz et al. 1998;Koopmans et al. 2006) and galaxy cluster cores (e.g.Kneib et al. 1996;Abdelsalam et al. 1998;Smith et al. 2005;Halkola et al. 2006).
Nowadays, one particularly interesting application of strong lensing is to constrain the dark matter (DM) distribution in cluster cores, and contrast it with predictions of numerical simulations.For example, we would like to measure accurately the inner slope and the concentration parameter of the DM density profile, to probe DM properties and its link with the baryonic component (Sand et al. 2007, and references therein).Indeed, numerical simulations seem to advocate a cuspy DM slope that could be described by an NFW (Navarro et al. 1997) or a Sérsic (Sérsic 1968;Merritt et al. 2005) profile.Observations are not yet giving definitive answers relative to the value of the inner slope (Gavazzi et al. 2003;Sand et al. 2004Sand et al. , 2007) ) or the concentration (Kneib et al. 2003;Gavazzi et al. 2003), but progress is being made steadily.
For example, in Abell 1689, after much disagreement over its concentration (Clowe and Schneider 2001;King et al. 2002;Bardeau et al. 2005;Broadhurst et al. 2005;Halkola et al. 2006), Limousin et al. (2007) came to a consensus value of c vir ∼ 6 − 8 after careful and detailed modelling of the previously-analysed data combined with new multiple image identifications, redshifts and weak lensing source galaxy colours.Comerford and Natarajan (2007) discuss the issues related to the determination of the concentration parameter with different techniques, and compare its measurement in a large compilation of galaxy clusters with the distribution of c vir in numerical simulations.
Numerical studies have shown that the concentration parameter of the NFW potential is quite sensitive to complex structures along the line of sight (King and Corless 2007) or triaxiality of the dark matter halos (Corless and King 2006).Improved datasets, but also more advanced techniques are needed to accurately model the mass distribution of gravitational lenses such as these.This movement towards more complex models has generated two competitive methodologies for lens modelling.
So-called "non-parametric" methods, where the mass distribution or lens potential is reconstructed as a map defined on a grid of pixels, have been developed to constrain the mass distribution of (admittedly well-constrained) galaxy-scale lenses (Saha and Williams 1997;Abdelsalam et al. 1998), initially for the purpose of probing the large diversity of possible mass models with a view to investigating in particular the modelling degeneracy present in the measurement of the Hubble constant.Since 1997, non-parametric modelling has been intensively tested and greatly improved to overcome the lack of constraints very common in strong lensing (e.g.Koopmans 2005;Diego et al. 2005;Kochanek 2006).However, the flexibility of these methods arising from their very large number of parameters has to be controlled to avoid overfitting the data.Recent work on regularisation techniques Bradač et al. (2005); Suyu et al. (2006) has improved the situation in this regard somewhat.However, physical understanding often comes from the measurement of quantities such as total mass, profile slope, and so on, which still have to be extracted from the flexible reconstructed maps.
"Parametric", or rather, simply-parameterised models therefore have two advantages: the assumption of a physical model leads to inferences that are directly related to physical quantities, while the model fits the data with relatively few free parameters compared to a "non-parametric" model.
Effectively the regularisation of the mass distribution is achieved through the physical model itself.The predicted surface density maps are smooth (by design), a situation perhaps valid only for quiet systems where the galaxy dynamics are well understood.The modelling of merging and perturbed systems is clearly the next challenging step for parametric methods.
Another important issue in both parametric and non-parametric methods is the way the parameter space is explored.In this paper, we have used the parametric gravitational lensing package lenstool to perform the lens modelling.Given a parametrization describing the lens, this software explores the parameter space around the best-fit region, reproducing the location of the observed multiple images within the supplied uncertainties.The first versions of the software (Kneib et al. 1993;Smith et al. 2005) were based on a downhill χ 2 minimization.However, this technique is very sensitive to local minima in the likelihood distribution; as a result, the modelling of complex systems would rapidly become too involving and inefficient.
In order to face the current and future observational data, we have thus implemented a new optimization method based on a Bayesian Markov Chain Monte Carlo (MCMC) approach.We will investigate here the merits of this new method on simulated strong lensing clusters.
In the first part of the paper, we explain how to model a cluster of galaxies, and how to identify systems of multiple images.Then, we describe the implementation of the MCMC package bayesys (Skilling 2004) in the lenstool software.In the second part, we analyse the performance of the Bayesian MCMC sampler by studying the degeneracies between the parameters of the Peudo-Isothermal Elliptical Mass Distribution (PIEMD, Kassiola and Kovner 1993), the pseudo-elliptical Navarro, Frenk & White (NFW, e.g.Navarro et al. 1997;Golse et al. 2002) and the pseudoelliptical Sérsic potentials.In the last section, we use the Bayesian evidence to rank the models that best reproduce systems of multiple images simulated from galaxy clusters with flat inner mass profiles.Finally, we discuss the limitations of the strong lensing modelling.

Definition
The gravitational lensing transformation is a mapping from the source plane to the image plane (Schneider et al. 1992): where θ and β are the image and source positions respectively and ϕ(θ) is the lens potential computed at the image position.Depending on the strength of the gradient of the lens potential, one can easily see that for a given source position β, multiple images (at different θ) can solve the lensing equation.When this is happening it corresponds to the strong lensing regime.The lens potential is the product of angular diameter distances ratio: D LS /D OS (Lens-Source distance over Observer-Source distance) and the projected Newtonian potential φ(θ) at the image position: Hence, once the distance of the lens and the source are known, solving the lensing equation for different multiple images, allows to directly constrain the Newtonian potential, or equivalently the mass distribution of the lens.

Modelling the different cluster mass components
Observations of clusters of galaxies reveal two components: cluster-scale halos (which includes both DM and the baryonic intra cluster gas) and galaxy-scale halos (made of stars and DM).Similarly, N-body simulations of clusters show that the mass distribution of subhalos inside a cluster halo follows a Schechter function (e.g.Shaw et al. 2006).
Thus, cluster gravitational potential can be decomposed in the following manner: where we distinguish the cluster-scale smooth and large potentials φ ci , and the subhalo potentials φ pj providing small perturbations (Natarajan and Kneib 1997).In the following, we consider a subhalo as a clump of matter containing a galaxy: we assume that there are no dark galaxies in clusters.This decomposition has been successful in reproducing the observed systems of multiple images and in constraining the size of the subhalos in clusters (e.g.Smith et al. 2005;Natarajan et al. 2006).We now describe in more detail how we model the cluster-scale halos and galaxy-scale subhalos.

Smooth cluster-scale halos
The smooth cluster-scale halos represent both the DM and the intra-cluster gas.With enough constraints, each of these two component could in principle be modelled separately, but in this work they are modelled together as a single mass component.The number of such halos is not easy to evaluate; generally one starts with a single halo -except when X-Ray observations or the distribution of the galaxies clearly show a multi-modal distribution -and increases the complexity of the model from there.
In the case of a multi-modal distribution or a clearly bad fit to the data with a single halo, additional halos can be included to the model until a good fit is reached.In the lenstool literature to date no more than two cluster-scale halos have been needed to achieve a good model (e.g.Abell 2218, Abell 1689), but this may change in the near future with the expected improvement of the strong lensing data (in particular with more spectroscopic redshifts) or when properly taking into account external constraints.
Each halo in a model (both the cluster-scale and the galaxy-scale described below) is parametrized by a position on the sky (x c , y c ), a projected ellipticity of the mass distribution (ǫ Σ ) (see also Appendix B for the pseudo-elliptical developments of the Sérsic potential), a position angle (PA), and a set of parameters specific to the choice of potential profile used to describe the halo.In this paper, we consider either the SIE, NFW, PIEMD, or Sérsic profiles, described by either 1, 2, 3, or 3 parameters respectively (see Table 1 for the analytic description of each potential.See also Limousin et al. (2005) for the surface density definitions of the PIEMD and NFW potentials).
In Figure 1, we compare the surface density of the SIS, the Sérsic, the NFW and the PIEMD profiles both in the very central and in the very outer regions.These regions are accessible either to strong or weak lensing.These profiles are the best fit to the set of plotted multiple images.We clearly note the flat core of the PIEMD profile up to 10 kpc and in contrast the monotonically increasing slope of the NFW and the Sérsic profiles.The SIS profile slope is constant and hardly follows the other profiles.
Given the data (e.g.strong lensing or dynamics data), the cluster brightest galaxy -also called the cD galaxy in the following -can either be included in the cluster-scale halo or modelled separately.However, Smith et al. (2005) showed that the centre of mass of the cluster-scale halo can be different from the cD galaxy centre.Therefore, it is generally justified to model the cD galaxy as an additional subhalo.Kneib et al. (1996) first demonstrated that the inclusion of galaxy-scale subhalos was necessary to reproduce the observed systems of multiple images, particularly those appearing near cluster galaxies.These galaxyscale subhalos or perturbers can be probed in a direct way using weak galaxy-galaxy lensing techniques (Natarajan and Kneib 1997;Natarajan et al. 2002), however in this paper we will concentrate only on the strong lensing aspects.

Galaxy-scale components
The number of subhalos to include in a model needs to be quantified.To date, a conservative attitude has been adopted: all the massive cluster member galaxies with cluster-centric radii out to approximately two times the limits of the strong lensing region are included.This is generally achieved by selecting galaxies within the cluster red sequence and selecting them brighter than a given luminosity limit.Moreover, the subhalos shape (ellipticity and orientation) is usually taken to be the same as its galaxy .Kneib et al. (1993) b Golse (2002) c Golse and Kneib (2002) d Ciotti and Bertin (1999) Recently, Wambsganss et al. (2005) and King and Corless (2007) have raised the issue of multiple halos/subhalos along the line of sight that increase the projected surface density and thus affect the lensing strength.While not large, this effect is a systematic, and so lensing models must consider the possibility of such gravitational perturbations.In practice, the mass distribution along the line of sight can be understood from spectroscopic and photometric measurements in the field of view.
Here, we propose a set of criteria for including perturbing subhalos in a model.The basic idea is to measure their strong lensing deviation angle and compare it to the spatial resolution δ of the lensing observations (δ ∼ 0.1 ′′ for HST).A subhalo is included in the model if it can increase significantly the deflection angle at its associated galaxy position.For a cluster member galaxy if its Einstein radius The surface densities correspond to the fit performed in section 5 and extended to very small and large radii.The arrows mark the multiple images positions used as constraints.
R Einstein > δ/µ (where µ is the magnification of the cluster-scale halo at the position of the galaxy) then it is included, otherwise its lensing contribution is not important and it is disregarded.For galaxies not part of the cluster, if R Einstein > δ/µ and the associated galaxy is in projection out of the strong lensing region, we include it in the model at the cluster redshift by rescaling its mass so that the global lensing effect is preserved.Finally, if the galaxy is in the strong lensing region and its lensing effect is detectable then the associated subhalo must be included with a proper multi-plane lensing technique (we will not discuss such a case here as it is beyond the scope of this paper).
Accounting for all the subhalos in a galaxy cluster as individually optimisable potentials would lead to an under-constrained problem.Assumptions must be made in order to make the number of parameters commensurate with the number of constraints.Koopmans et al. (2006) have shown that a strong correlation exists between the light and the mass profiles of elliptical galaxies in the field.Consequently, in a first approximation, the subhalos position, ellipticity and orientation are matched to their luminous counterpart.
As we will show in the second part of this paper, apart from a few subhalos perturbing multiple images close to them, the vast majority of subhalos act merely to increase the total mass enclosed in the Einstein radius.Strong lensing provides few constraints on the mass profile parameters of most individual subhalos.
We therefore reduce the number of subhalo parameters by asserting exact scaling relations between the subhalo masses and their associated galaxy luminosities.Following the work of Brainerd et al. (1996), we model cluster subhalos with PIEMD potentials.The mass profile parameters in this model are the core radius (r core ), cutoff radius (r cut ), and velocity dispersion (σ 0 ), which we take to scale with the galaxy luminosity L in the following way: (4) The total mass of a subhalo scales then as: where L ⋆ is the typical luminosity of a galaxy at the cluster redshift, and r ⋆ cut , r ⋆ core and σ ⋆ 0 are its PIEMD parameters.When r ⋆ core vanishes, the potential becomes a singular isothermal potential truncated at the cut-off radius.This is generally the type of potential used in weak galaxy-galaxy lensing studies to measure the tidal radius of galaxy-scale subhalos in clusters or in the field (see Limousin et al. (2005Limousin et al. ( , 2006))).
In these scaling relations, the velocity dispersion scales with the total luminosity in agreement with the Tully-Fisher and the Faber-Jackson relations for spiral and elliptical galaxies respectively.The r cut relation is more hypothetical.If α = 0.5, it assumes a constant mass-to-light ratio independent of the galaxy luminosity.If α = 0.8, the mass-to-light ratio scales with L 0.3 similar to the scaling of the fundamental plane (Natarajan and Kneib 1997;Jørgensen et al. 1996;Halkola et al. 2006).

Multiple images
In the strong lensing regime, the light coming from a background galaxy (the source) passes through a high density region and is lensed into multiple images.The position, shape and flux of each multiple image depends on the properties of the lens and the redshift of the source.The precise measurement of the source redshift, and of the image properties (such as position, ellipticity and orientation) provides strong constraints on the lens model.
In general, image properties can be inferred from their light distributions.Indeed, the first order moment provides the image position, and the PSF-corrected second order moment gives the ellipticity and the position angle of the image.Note however, that the ellipticity of a curved arc is somewhat ill-defined, so this information can only be used if the images are relatively compact.In this paper, we only consider the multiple image's position as a constraint, and we discuss the associated likelihood in the next section.
Sometimes, the background galaxy presents several bright regions that can be individually identified in each multiple image.Matching these bright regions in each image brings even tighter constraints to the lensing model.
The images flux can also be considered as a constraint.However, the amplification can vary strongly across highly extended images, and properly computing the amplification to measure the total flux in each image is usually not straightforward.
Finally, the redshift of the source is a strong constraint on the lens model.A spectroscopic determination is best, but a photometric redshift (e.g.Ilbert et al. (2006)) can be sufficient if accurate enough (e.g.σ z < 0.05 introduces a 2% error on the D LS /D OS ratio for a lens and a source at redshifts z L = 0.2 and z S = 1 respectively) and with no multiple peak in its probability distribution (no catastrophic redshift).
For well-defined photometric redshifts, lenstool provides a way of introducing accurately the redshift likelihood as a prior for the model.
Including an uncertain source redshift as a free parameter to be inferred from the data gives the model more freedom, albeit at some extra computational cost.However, due to the other available constraints, it may lead to a more accurate redshift for that image system.This procedure may also raise questions about a photometric or spectroscopic measured redshift if the model favours a different range of values.
The correct identification of multiple images is probably the most complex task in strong lensing modelling.
Initially we consider (as a guide) only generic geometrical lensing configurations -cusp, fold and saddle (Blandford and Narayan 1986) -for single cluster-scale halo.
Having found a basic model that satisfies the most obvious or most straightforward multiple image system, the perturbations due to galaxy-scale subhalos can be taken into account.Generally, subhalos do not create strong lensing events by themselves, but affect the multiple images produced by the cluster-scale halo.They can deflect their position or occasionally further divide a multiple image.
Comparing the colours of multiple images is another straightforward technique.As lensing is achromatic, multiple images must have similar colours unless the images' fluxes are strongly contaminated with or reddened by nearby galaxies.
It is important to realize that the identification process of multiple images is both iterative and strongly linked to the determination of the mass profile, starting from the most obvious systems close to the cluster centre and progressively adding perturbations and new systems.New multiple images can be predicted before they are observationally confirmed.

Other lensing constraints
Single images Single images with known redshift lying close to the strong lensing region (typically when R Einstein < r < 2R Einstein ) can also be included in the lens model.Indeed, they can help in constraining the parts of the model where no multiple image system is detected.Such constraints have been neglected up to now.We propose here an efficient way to include them in the χ 2 determination.
In essence, we add a penalizing term to the likelihood if an observed single image is predicted to be multiple, and if at least one of the counter-images could effectively be detected in the observed data image.The penalizing term is a function of n k , the number of predicted images above the detection limit (defined to be 3 times the sky noise flux in the object detection aperture).
The penalizing term is implemented in the following way: Here, x single is the position of the observed single image and x j (θ) is the position of a detectable image predicted by the current model, whose parameters are θ and σ single is the position error of the observed single image.This implementation provides a smooth way of converging to the best χ 2 single .Once χ 2 single = 0 (as it must be if truly single), the single image is no more a constraint.Consequently, this definition only imposes an upper limit on the enclosed mass at the single image position.The truly singly-imaged systems do not add to the overall number of degrees of freedom, nor to the final global chi-squared value.However, they do accelerate the convergence on the best fitting parameter region.
This penalizing term must be used with some care; in particular, instances where χ 2 single > 0 have to be flagged and investigated, as they indicate either a failure of the model or that the single image identification was incorrect.Indeed, this is one way in which new multiple images may be found.

Location of critical lines
In the case of fold images, the position of the critical line passing in between the 2 images can sometimes be observed as a saddle point in the surface brightness of the images.We can use this information to put a constraint on the lens model by minimizing the distance between the position where the image isophotes cross and the critical line predicted by a model, as shown in Fig. 2.
The prior segment for the critical line position can be defined by a centroid O, a position angle and a Gaussian error size on the position σ cl , hence, the corresponding χ 2 can be given as: where D is the intersection of the predicted critical line and the defined prior segment.This constraint merely reinforces the weight of the considered system of multiple images in the model.
By focusing on the crossing isophote, it makes of use of more of the imaging information than just the centroids of the multiple images.As such, it is a low-cost constraint in terms of computation time and definitely accelerates the convergence on the best fit region.Of course, since constraints must be independent observations, this constraint must be observable and not computed from the image positions.
At the end of the optimisation, we check that χ 2 cl < 1.If this is not satisfied, then either the critical constraint was wrongly identified or the model has not yet fully converged.
Weak shear signal Outside the strong lensing region, the weak shear signal can be used to constrain the model on larger angular scales.Considering a catalogue of background galaxies with PSF-corrected shape measurements, one can minimize the difference between the ellipticity of each galaxy and the reduced shear predicted by a mock model at the galaxy location (see e.g.Marshall et al. 2002, and references therein).We will discuss the weak lensing implementation in a forthcoming paper

The multiple images' likelihood
We assume that the noises associated with the measurement of the images position are Gaussian and uncorrelated from one image to another.The noise covariance matrix for all the considered systems of multiple images is therefore diagonal.Hence, the usual definition of the likelihood function applies and becomes, in this case, where N is the number of sources, and n i is the number of multiple images for source i.The contribution to the overall χ 2 from multiple image system i is where x j (θ) is the position of image j predicted by the current model, whose parameters are θ and σ ij is the error on the position of image j.
The accurate determination of σ ij depends on the image S/N ratio.For extended images, a pixellated approach is the only accurate method which takes the S/N ratio of each pixel into account (Dye and Warren 2005;Suyu et al. 2006).However, this method is very time consuming.Therefore, in a first approximation, the image position error can be determined by fitting a 2D Gaussian profile to the image surface brightness.In this case, the fit error contains implicitly the S/N ratio of each pixel.However, this assumes that the background galaxy is compact and its surface brightness profile is smooth so that the brightest point in the source plane match the brightest point in the image plane.In this paper, for simplicity, the image positions are determined by inverting the lens equation for a given source position.Therefore, the images are point-like.We assign them identical σ ij so that they have the same weight in the likelihood computation.Of course, this procedure is valid only in simulations where the source positions are known a priori and could not be applied to real cases.
A major issue of the χ 2 computation is of how to match the predicted and observed images one by one.Many techniques have been proposed so far to find the roots of the lens equation (see e.g.Dominik 1995).Unfortunately, the matching of the predicted to the observed images one by one becomes problematic when their respective positions do not match closely.This always happens during the first steps of the optimisation.We have found no algorithm that performs this matching automatically.
In contrast, the algorithm implemented in lenstool is a simplex method (Press et al. 1986) of image transport (Schneider et al. 1992).By definition, the observed image is coupled to the predicted image all along the iterative refinement of the predicted position.The χ 2 is therefore easy to compute.However, in models producing different configurations of multiple images (e.g. a radial system instead of a tangential

Image Plane
Source+Image Plane Source Plane Figure 3. 2D marginalized posterior PDF of a simulated cluster of galaxies.The left, middle and right columns are respectively obtained by computing the likelihood with the source plane method, with the image plane method and successively with the source plane and the image plane methods.In terms of computation time, the combined method source plane -image plane is about 8 times faster than the image plane method alone.
system), the method fails and that particular model is then rejected.This usually happens when the model is not yet well determined, and it can slow the convergence of the model significantly.
To get around this complexity, we can compute the χ 2 in the source plane (by computing difference of the source position for a given parameter sample θ) instead of the image plane.The source plane χ 2 is written as where x j S (θ) is the source position of the observed image j, < x j S (θ) > is the barycenter position of all the n i source positions, and µ j is the magnification for image j.Written in this way, there is no need to solve the lensing equation and so calculation of the χ 2 is very fast.
The MCMC method we have implemented in lenstool supports both the source and the image plane χ 2 methods.However, with the image plane method many models have to be tested and eventually rejected before the Bayesian sampler (see below) focuses on the best fit region.This unnecessarily increases the computation time.In this paper, we first "size up" the best fit region with the source plane method, and then refine the models with the image plane method.
Figure 3 shows that the posterior PDF are similar when computed with the image plane method alone or with the successive source plane+image plane method.However, this latter method is about 8 times faster than the image method alone.

A Bayesian Markov Chain Monte Carlo method
We have implemented the Bayesian MCMC package BayeSys (Skilling 2004) to perform the lens model fitting.By model, we mean a multiple-component (and hence multi-scale) mass distribution as described above, with a set of priors for its parameters.
Theoretically, the Bayesian approach is better suited than regression techniques in situations where the data by themselves do not sufficiently constrain the model.In this case, prior knowledge about the parameter Probability Density Function (PDF) helps to reduce the model's degeneracies.The Bayesian approach is well-suited to strong lens modelling, given the few constraints generally available to optimize a model.
The Bayesian approach provides two levels of inference: parameter space exploration, and model comparison.The first level can be achieved using the unnormalised posterior PDF (equal to the product of the likelihood and the prior); the second requires the calculation of the normalisation of the posterior, known as the evidence.All these quantities are related by Bayes Theorem, where Pr(θ|D, M ) is the posterior PDF, Pr(D|θ, M ) is the likelihood of getting the observed data D given the parameters θ of the model M , Pr(θ|M ) is the prior PDF for the parameters, and Pr(D|M ) is the evidence.
The posterior PDF will be the highest for the set of parameters θ which gives the best fit and is consistent with the prior PDF, regardless of the complexity of the model M .Meanwhile, the evidence Pr(D|M ) is the probability of getting the data D given the assumed model M .It measures the complexity of model M , and, when used as in model selection, it acts as Occam's razor: "All things being equal, the simplest solution tends to be the best one."Here, the simplest solution tends to be the model with the smallest number of parameters and with the prior PDF the closest to the posterior PDF.In contrast, the commonly-used reduced χ 2 analysis is only a rough approximation to the evidence analysis, although it does provide an absolute estimator of goodness-of-fit (provided the error estimates on the data are accurate).
In information theory, the evidence combines the likelihood and the information I, or negative entropy: where the sum is performed over the whole parameter space and Pr(θ|D, M ) is the posterior PDF and Pr(θ|M ) is the prior PDF.
The negative entropy measures the information we have obtained in computing the posterior PDF from the input prior PDF.It represents a "distance" between the prior PDF and the posterior PDF.It can also be understood as the volume of the prior PDF over the posterior PDF, which can be very large for high signal to noise data.[In this case the task of parameter space exploration is like searching for a "a needle in a haystack," and the entropy measures the ratio of the needle's volume (the posterior PDF) to the haystack's volume (the prior PDF)].
In general, the information is much bigger than unity because the "distance" between the prior PDF and the posterior PDF is large.For this reason, we use annealed Markov Chains to converge progressively from the prior PDF to the posterior PDF.
Technically, we run 10 interlinked Markov chains at the same time to prevent any Markov chain from falling in a local minimum.The MCMC convergence to the posterior PDF is performed with a variant of the "thermodynamic integration" technique (O Ruanaidh and Fitzgerald 1996) called selective annealing.
"Selective" stands for the following process.At each step, 10 new samples (one per Markov chain) are drawn randomly from the current posterior PDF (which corresponds to the prior PDF at the beginning).These samples are weighted according to their likelihood raised to the power of δ λ (see below) and selected with a variant of the Metropolis-Hasting algorithm (Metropolis et al. 1953;Hastings 1970).Roughly, the samples with the worst likelihood are deleted and the ones with the best likelihood are duplicated so that we always keep 10 Markov chains running at the same time.
Then, bayesys provides 8 exploration algorithms to randomly move the new samples in the parameter space and keep the 10 Markov chains uncorrelated (see Skilling 2004, for more details).This new set of randomly mixed samples is appended to the Markov chains and used as a new seed for the next step.
The bayesys production of new samples is fast but the likelihood computation by lenstool is slow.For each observed image, we must compute the gradient of every potential and sum them to compute the deviation angle and determine the source position.Therefore, the optimization process takes longer with more images and/or more potentials.However, if the r ⋆ cut or σ ⋆ 0 parameters are fixed, the luminosity-scaled subhalo gradients can be computed just once (at the first iteration), thus reducing drastically the computation time.
The "annealing" term of the "selective annealing" technique controls the convergence speed.The slower and smoother the convergence, the more accurate is the evidence and the better-characterised is the posterior.The annealing process is best seen by re-writing Bayes theorem: Here, λ is the cooling factor for the annealing.During a so-called "burn-in" phase, the likelihood influence is raised progressively from λ = 0 to λ = 1 by step of δ λ ∼ Rate/(log L max − log L) where L is the mean likelihood value of the 10 samples and Rate is an arbitrary constant set by the user.At the beginning of the optimization, δ λ is small because the likelihood dispersion of the 10 samples is large.
As seen above, the samples are weighted and selected according to their likelihood raised to the power of δ λ .Thus, whatever the likelihoods are widely separated, δ λ decreases and the convergence automatically slows in proportion to compensate.
In the small-convergence speed limit, the relative information between the beginning and end of a MCMC step is approximately constant and equals to Rate 2 (Skilling 2004).
By decreasing Rate, the user decreases the information rate per MCMC step and thus the evidence error (see left panel of Figure 4) but at the price of slower convergence.
The right panel of Figure 4 shows that, within the error bars, the median χ 2 is stable when Rate decreases.A lower Rate implies a slower convergence speed.The chains will contain more samples and hence better explore the parameter space towards the best fit region.This explains the slight decrease of the median χ 2 when Rate decreases.Alternatively, the spread of χ 2 is similar for all Rate values, indicating that the convergence speed does not affect the parameter space exploration around the median χ 2 .
From our experience, we have found that a value between 0.1 and 0.5 gives evidence values that are accurate enough for our purposes, while returning the posterior PDF in a reasonable amount of computation time.From Figure 4, we can see that the uncertainty on the logarithm of the evidence is approximately 4 units: this corresponds to an odds ratio of 50 to 1, a sufficiently convincing value.In the rest of this paper, we will use a Rate of 0.1 unless otherwise specified.

MCMC output
Contrary to maximum likelihood methods (like the downhill method used by Kneib et al. 1993), the Bayesian MCMC sampler does not look for the best sample of parameters.Instead, it samples the posterior PDF, drawing more samples where the posterior PDF is higher.
The more samples we collect after the burn-in phase, the better the resolution of the posterior PDF.This is of particular interest given that we use one-and twodimensional histograms to represent the marginalized posterior PDFs Pr(θ i |M ) and Pr(θ i , θ j |M ).The number of histogram bins is limited by the number of samples.To determine the bin sizes, we use the Freedman & Diaconis rule (Freedman and Diaconis 1981).They have shown that in order to get the best fit between a PDF and the corresponding histogram, the bin size should be: where IQR is the interquartile range of the θ i samples and N is the number of samples.The produced 2D posterior histograms in the rest of this paper show that the parameters are not independent, and that their PDFs are certainly not Gaussian.Techniques based on the assumption of Gaussian errors, with correlation matrix measured around the best fit, are not accurate and likely underestimate some errors.
Therefore, uncertainties must be estimated with care, and eventually asymmetric errors must be adopted in case of large asymmetries observed in the posterior PDF.
To compress the posterior PDFs and provide a convenient way of comparing them, we use the median and the standard deviation estimators.It has been shown (Simard 1996) that the median is the most robust estimator for unimodal asymmetric distributions -which is usually the kind of distribution we have for our parameters -whereas the mean estimator is valid only if the distribution is close to Gaussian.The more samples we have, the less we are affected by outliers.

Lens potential parameter degeneracies
In this section, we present and interpret the degeneracies observed in galaxy cluster strong lensing models.Degeneracies will always appear in strong lensing modelling because the lensing only constrains the mass inside an Einstein radius.Unfortunately in parametric models, the parameters involved in the computation of the mass inside the Einstein radius are rarely orthogonal and strongly degenerate.
In the literature, we have found several papers presenting parameters degeneracies (see e.g.Zekser et al. 2006;Rzepecki et al. 2007;Meneghetti et al. 2007, for illustrations of the NFW r s -ρ s degeneracy).We are finding similar results, although we are going beyond most of the previous study by exploring many more parameters.
In this section, we use the same potential to simulate and recover the cluster-scale halo, respectively a PIEMD, a NFW and a Sérsic potential.Fitting the data by the true model never happens in practice.However, the presented degeneracies always appear and simple models are required for a proper understanding.
In section 5, we will use different models for the simulation and the recovery in order to compare the limits of each model given the data.

The mass models
We simulate a cluster of galaxies comprising a cluster-scale halo, and 78 galaxy-scale subhalos that perturb the lensing signal.The cluster-scale halo is modelled successively by a PIEMD, a NFW and a Sérsic potential whose input parameters are reported in Table 2.The galaxy-scale subhalos are modelled by PIEMD potentials with vanishing core radius.The cluster is placed at redshift z = 0.2.Hereafter, we will refer to each model as the PIEMD, the NFW and the Sérsic models.
The galaxy-scale subhalo distribution follows the galaxy distribution in the cluster Abell 2390 in a region of 200 kpc around the cluster centre.This is two times larger than the radius of the outermost images in our simulation.Thus, we account for the shearing effect produced by outer galaxies.The selected galaxies are part of the cluster red-sequence and therefore are assumed to be cluster members.
The galaxy-scale subhalos r cut and σ 0 are scaled with the scaling relations (4).A constant M/L ratio is assumed.We consider the scaling parameters r ⋆ cut = 18 kpc and σ ⋆ 0 = 200 km/s as the input values for our simulations.These values correspond to measured values obtained through galaxy-galaxy lensing in Abell 2390 (Natarajan et al. 2006).The apparent K-band magnitude of an L * galaxy at the cluster redshift is M ⋆ = 17.05 (in AB magnitude) (de Propris et al. 1999).The galaxy magnitudes come from observations of Abell 2390 in the K-band (Jullo et al. 2007), and are used to calculate the true mass parameters in the simulations.We also include a cD galaxy in the model to produce more systems of multiple images in the cluster centre.The cD galaxy is described by an individual subhalo modelled by a PIEMD potential with vanishing core, and shape parameters matching the light distribution.Its mass profile is characterized by σ 0 = 290.km/s, r core = 0 and r cut = 38.kpc.The cluster Einstein radius for a z = 10 background source is 30."The enclosed mass at this radius is M eins = 6.7 10 13 M ⊙ , of which the galaxies' contribution is about 9%.

Strong lensing constraints
We lens three background sources, A, B and C, at redshifts z A = 0.6, z B = 1.0 and z C = 4.0, through each simulated cluster.We adjust the B and C source positions in order to produce the three following configurations of multiple images.
Configuration 1: source A is placed on the North-East side of the cluster, but outside of the multiple image region.It therefore produces a single image.Also on the East side, but inside the radial caustic, source B produces a radial arc system with 3 images.On the West side, source C lies along the West naked cusp of the caustics and so produces a system with 3 tangential images.
Configuration 2: sources A and C are in the same places as in Config. 1, but source B is placed along the East naked cusp and so produces 3 tangential images.The second configuration therefore constrain mainly the enclosed mass in the outer part of the cluster (100 < r < 200 kpc).
Configuration 3: sources A and B are at the same place as in Config. 1, but source C is placed close to the radial caustics and therefore produces a second radial system of 3 images on the West side of the cluster.The third configuration then preferentially constrains the inner part of the mass profile (r < 100 kpc).
The source and image positions in the three configurations are presented in Figure 5, along with the critical and caustic curves for sources at redshift z B = 1.0 and z C = 4.0.Gaussian noise of FWHM 0.1" was added to the image positions to mimic the observational uncertainties.All the predicted images are used for the parameter recovery unless their lensing magnification is lower than 1.In practice, such images are never observed (too faint or blended in the cD flux).
Config. 1 constrains the cluster central and outer regions, Config. 2 only constrains the outer region and in Config.3, the 4 radial images strongly constrain the cluster central region on both the East and the West sides.

PIEMD posterior PDF analysis
First, we fit the PIEMD model with a PIEMD potential for the cluster-scale halo.For each of the three configurations of multiple images, we recover the cluster-scale halo parameters (ǫ, PA, r core , r cut and σ 0 ), as well as the galaxy-scale subhalos scaling parameters σ ⋆ 0 and r ⋆ cut .For each parameter, we assume a uniform prior with 50% errors around its input value.In this case the computed posterior PDF is merely proportional to the likelihood PDF.The cD galaxy subhalo parameters are fixed to their input value in order to avoid annoying additional degeneracies with the clusterscale halo parameters.We therefore constrain 7 free parameters with 8 constraints.
The χ 2 is computed in the image plane, although we observed no difference between the source plane χ 2 and the image plane χ 2 .
The obtained posterior PDF is marginalized (by making a histogram in two dimensions and ignoring the samples' other parameters), and plotted in Figure 6.The estimated (median) parameters are given in Table 3.In every configuration, the input values are recovered well, but strong degeneracies appear.
First, we note that the posterior PDF is more compact in Config.3 than in Config. 1 and 2, in concordance with the number of radial arcs in each configuration.This is in agreement with the results of Miralda-Escude (1995), who showed that the combination of radial arcs and their counter image provides a stringent constraint on the profile shape as well as the enclosed mass.
Second, the velocity dispersion tightly correlates with the core radius, and, to a lesser extent, with the cut-off radius.This is a mathematical degeneracy that appears when the mass enclosed by the Einstein radius is maintained constant (or in this case, constrained tightly by the data).Indeed, for a PIEMD potential, the enclosed mass is given by (Limousin et al. 2005): Thus, for a mass enclosed into a large circle of radius R ∼ r cut , we derive σ 2 0 ∝ 1/r cut .At a smaller radius, assuming r core ≪ r cut , the 3D density approximates ρ = ρ 0 /(1 + R 2 /r 2 core ) and the corresponding enclosed mass becomes For a constant aperture mass, we then obtain σ 2 0 ∝ ( r 2 core + R 2 − r core ) −1 , which is also equivalent to σ 2 0 ∝ 1 R 2 (r core + r 2 core + R 2 ), an increasing function of r core resembling the observed degeneracy.
Third, in Config.3, the cluster-scale cut-off radius is slightly better constrained than in Config. 1 or 2. Since strong lensing cannot probe directly the surface density at the cut-off radius, this result is just a product of the aperture mass definition 15 and the stringent constraints obtained for r core and σ 0 .
Fourth, we observe changes in the slopes of the ellipticity-PA, the ellipticity-L ⋆ mass and the M eins -L ⋆ mass degeneracies between Config.1., 2 and 3.
This more effect is due to a subtle interaction between the cluster-scale halo and the galaxy-scale subhalos' mass distributions during the inference.In particular, in Config.2, we suggest that when the ellipticity increases, alignment of the cluster with the giant arcs B and C is favoured.However, in Config. 1 and 3, this behaviour is not so clear, probably because of the presence of radial arcs in the central region.
Finally, in every configuration, the scaling relation parameters r ⋆ cut and σ ⋆ 0 are strongly degenerate, with the degeneracy closely following the constant mass contours over-plotted with solid lines.In Table 3, we note that strong lensing cannot predict the L ⋆ cut-off radius to better than 24% accuracy, nor σ ⋆ 0 with better than 6% accuracy.Although strong degeneracies have been highlighted so far for a cluster-scale halo modelled by a PIEMD potential, the aperture mass error at the Einstein radius is always smaller than 5% and even reaches 0.8% in Config.3. (see Table 3).
In section 5, we show that the same precision can also be achieved when the input and the fitted models are different.

NFW posteriors distribution analysis
Now, we fit the NFW model with a NFW potential for the cluster-scale halo.Given the three configurations of multiple images, we perform the recovery of the clusterscale halo parameters (ǫ, PA, c and r s ) as well as the galaxy-scale subhalo scaling parameters σ ⋆ 0 and r ⋆ cut .Again, we assume uniform priors for the parameters, with a width of 50% centred on the input values; the cD galaxy subhalo parameters are again fixed.We constrain 6 free parameters with 8 constraints.
The obtained posterior PDF is marginalized and plotted in Figure 7.The (median) estimated parameters are given in Table 4 as well.
First, similarly to the PIEMD case, we note that the degeneracies are more compact in Config.3 than in Config. 1 and 2 for which the central region of the cluster is less constrained.
Second, we note a strong degeneracy between c and the r s .It can be fitted by a power law r s ∝ c α where α = −1.7,−1.5 and −1.4 for Config.1, 2 and 3 respectively.To confirm the mathematical origin of this degeneracy, we consider the NFW definition of the aperture mass.By solving numerically for r s given c at constant aperture mass, we manage to reproduce the observed degeneracy and measure α = −1.1, in relatively good agreement with the measured slopes given the uncertainty on the aperture mass.
Third, the ellipticity, the PA, the M eins and the L ⋆ mass parameters are degenerate in the same manner as in the previous section, when the cluster-scale halo was modelled by a PIEMD potential.This confirms that these degeneracies are independent of the cluster model, and just depend on the lensed image configuration.
Finally, in Table 4, we note that the L ⋆ cut-off radius error is recovered with nearly the same accuracy when the cluster-scale halo is modelled by a NFW potential than when modelled by a PIEMD potential.This suggest that the scaling relation parameters accuracy is model-independent.Similarly, the uncertainty on the enclosed mass measured at the Einstein radius is similar to that found when the cluster-scale halo is modelled by a PIEMD potential.

Sérsic posterior distribution analysis
Finally, we fit the Sérsic model with a Sérsic potential for the cluster-scale halo.We perform the recovery of the cluster-scale halo parameters (ǫ, PA, R e , Σ e and n), as  well as the galaxy-scale subhalo scaling parameters σ ⋆ 0 and r ⋆ cut , given the same three configurations of multiple images as before.
Again, we assume uniform priors for the parameters, with widths of 50% centred on the input values.The cD galaxy subhalo parameters are fixed.We constrain 7 free parameters with 8 constraints.
The obtained posterior PDF is marginalized and plotted in Figure 8.The estimated parameters are given in Table 5.
First, we note that for the same lensing configuration, the parameters of a clusterscale halo modelled by a Sérsic potential are more difficult to constrain than those of a PIEMD or a NFW potential.We understand this to be a result of the effective radius R e and index parameter n mainly impacting the outer region of the mass distribution, which is not probed by strong lensing.
Second, the ellipticity, the PA, the M eins and the L ⋆ mass parameters are degenerate in the same manner as in the previous sections, confirming that these degeneracies are dependent on the lensing configuration alone.
Finally, in Table 5, we note that the L ⋆ cut-off radius is recovered with nearly the same accuracy as in the case where the cluster-scale halo is modelled with the NFW potential.We suggest therefore that the scaling parameters r ⋆ cut and σ ⋆ 0 accuracies cannot be lower than about 20% and 7% respectively.This result is independent of both the model and the lensing configuration.Figure 9 sums up the results found in this section concerning the accuracy obtained on the mass profile in each configuration for each potential.Although the accuracy depends on the lensing configuration it is usually better than 5% in the region of multiple images with no obvious bias.The accuracy is model independent, and is just the noise on the image positions (0.1 arcsec) translated into the uncertainty on the parameters.

Model inference
In this section, we use the Bayesian evidence to rank models.As an example, we consider the controversial inner slope of the density profile in clusters of galaxies.In 2004, Sand et al. have used a sample of 6 galaxy clusters to show that the slope of the central density profile was shallower than r −1 as predicted by CDM simulations.In their modelling they were using axisymmetric potentials.The same year, Bartelmann and Meneghetti reconsider these results and conclude that an NFW profile with a r −1 inner slope could not be ruled out by strong lensing once effects of asymmetry and shear were taken into account.
In order to illustrate the model inference with the Bayesian evidence, we assume here that galaxy clusters actually present an inner slope shallower than r −1 .Then, we show that even when accounting for asymmetry and shear, the Bayesian evidence is still able to rank models and eventually rule them out.
To do so, as an input model, we use the PIEMD model from section 4.2 i.e. the inner slope is shallower than r −1 .In order to observe the limits of Bayesian inference with the evidence, we simulate 6 models in which we change the size of the clusterscale halo core radius.We scale the velocity dispersion accordingly so that the enclosed mass at the Einstein radius is maintained.
The 3 background galaxies of the previous section are lensed through each model.We have to slightly move the sources in the source plane so that in every simulation, we always end up with 1 tangential system, 1 radial system and 1 singly imaged system.For models with r core < 30 kpc, we remove the images predicted at the very centre of the galaxy cluster because their lensing amplification is lower than 1 and in practice they are never observed (either too faint or blended in the cD galaxy flux).In contrast, for models with r core ≥ 30 kpc, we keep all the predicted images because their lensing amplification is always greater than 1.We add a Gaussian noise of FWHM 0.1" to each image position.
Then, we successively fit a SIE, a NFW and a Sérsic potential to the simulated systems of multiple images and report the computed evidences in Table 6.As a reference, the last column reports the evidence computed when we fit the simulated PIEMD models by themselves.We assume no prior knowledge (in practice, we use uniform distributions and adjust the limits so that the posterior PDF is not bounded).We also consider the scaling relation parameters r ⋆ cut and σ ⋆ 0 as free parameters.Figure 10 shows the aperture mass errors relative to the input PIEMD mass profile for the SIE, the NFW and the Sérsic potentials.
First, we note that excluding the inner region and when r core ≤ 20 kpc, the input mass profile is well recovered by all the models.Note that in the case r core = 0 kpc, the SIE aperture mass error is smaller than 10% on the full range of radius.This ascertain the consistency of our SIE and PIEMD models.Conversely, the SIE aperture mass error increases rapidly in the inner region as soon as we increase the core radius.In the inner region, the large errors are due to the intrinsic slope of each model (see Figure 1).
The evidences reported in Table 6 correctly summarize these observations.In particular, the SIE evidence at r core = 0 kpc is close to the evidences of the other models.According to Jeffreys (1961), the difference between two models is substantial if 1 < ∆ ln E < 2.5, strong if 2.5 < ∆ ln E < 5 and decisive if ∆ ln E > 5. Following this criteria, for r core ≤ 20 kpc, the NFW, the Sérsic and the SIE models are equivalent at fitting the data within the evidence error established in section 3.However, when  the core radius increases, the SIE model can be confidently rejected.Now, excluding the SIE models, we can use the evidences to classify the models in 2 categories : (i) when r core ≤ 20 kpc, the NFW and the Sérsic models evidences are equivalent to the reference PIEMD evidence within the evidence error.The evidences cannot confidently rank models.(ii) when r core > 20 kpc, the evidences drop significantly and the NFW and Sérsic models are confidently ruled out.This corresponds to the appearance of bright images inside the core radius (see Figure 10) as expected from flat core models.Here, the Sérsic model evidences are generally better than the NFW model evidences although the Sérsic model contains an additional free parameter.In the r core = 30 kpc case, the NFW and the Sérsic models evidences are very low because of the stringent constraints imposed by the distribution of multiple images (a triplet of tangential images at R = 81 kpc and a set of uniformly distributed images below 40 kpc).
Finally, we conclude that the Bayesian evidence can effectively rank strong lensing models even when accounting for asymmetry and shear.However, this result strongly depends of the presence of images in the cluster centre.
As we are submitting this paper, some of us are already using lenstool and the evidence inference to study the inner slope of the dark matter profile with real data.Their results will be published in a forthcoming paper.

Conclusion
In this study, we have described how to build a gravitational lensing model of galaxy clusters and a set of constraints with multiply and singly imaged systems.Then, we have presented a new Bayesian method for efficiently exploring its parameter space without falling into local maxima of the likelihood PDF.The Bayesian method also gives an estimate of the errors and includes prior knowledge.We have illustrated the Bayesian posterior PDF analysis by studying the degeneracies in the PIEMD, the NFW and the Sérsic potentials in 3 different configurations of multiple images.We draw the following conclusions.
(i) Strong degeneracies appear in both the PIEMD, the NFW and the Sérsic potentials.The parameters are clearly dependent and compensate in order to produce a constant enclosed mass at the images location.The degeneracies are either due to the mathematical definitions of the potentials (σ 0 -r core , σ 0 -r cut for PIEMD, cr s for NFW, R e -Σ e , R e -n and Σ e -n for Sérsic) or to the configuration of multiple images (ǫ-P A, ǫ-L ⋆ galaxy mass, M eins -L ⋆ galaxy mass).The latter degeneracies are easily identified by looking at the degeneracies between the shape and the dynamical parameters.They are model independent.In every case, the enclosed mass in the Einstein radius decreases with the model ellipticity.
(ii) Radial systems of multiple images combined to tangential arcs provide unique constraints on the slope of the mass profile.It is therefore important to identify radial (or central) images in the cluster cores.
(iii) The PIEMD cut-off radius, the Sérsic effective radius and the NFW scale radius are poorly constrained by strong lensing only.Hopefully, future parametric methods combining weak and strong lensing will provide tighter constraints.
(iv) Galaxy-scale subhalos degenerate with the cluster-scale halo.The best constraints were obtained in lensing configurations combining radial and tangential multiple images systems.In this case, we barely manage a 20% accuracy on the cutoff radius of subhalos scaled with scaling relations.As shown by (Natarajan et al. 1998(Natarajan et al. , 2006) ) weak and strong lensing combination can improve this result.
We have also illustrated how to rank models with the Bayesian evidence.We fit a NFW, a Sérsic and a SIE potential to 6 PIEMD simulated clusters with different core radius.We have shown that the NFW and the Sérsic potentials can actually fit systems of multiple images produced by clusters with core radius provided no image lie inside the core radius.For large core radius, central images appear at the very centre of the cluster and provide enough constraints to disentangle PIEMD, NFW or Sérsic based models.
Although strong lensing is a wonderful tool to infer surface densities, it becomes rapidly limited by the models and the observed lensing configuration.For instance, it is not possible to constrain the central density slope without radial images.Actually, the presence of radial images strongly suggests the presence of a flat core.
In a forthcoming paper, we will expand this method to constrain cosmological parameters with strong lensing.With a large number of multiple images with known redshift, one should be able to compare the strong lensing cosmography constraints (similarly to the early work of Golse et al. (2002) and Soucail et al. (2004)) with other methods such as the CMB/WMAP results, or Supernovae or cosmic shear results.critical line detected are left aside.Once the size of the rectangle has reached a lower limit, a line is kept in memory for this rectangle according to the marching squares configurations.The individual lines are then fused into the critical lines contour.
The previous technique was a line following algorithm called snake.It starts from the center of a clump and picks amplification samples along its way outwards.When an amplification sign change is encountered, it precises the infinite amplification position and circles the clump until it comes back to its starting point along the critical line.
In complex environment, the snake algorithm sometimes gets lost and produces incomplete critical lines.Conversely, the multiscale marching square algorithm never gets lost and identifies all the critical lines in the field.However, it can miss a part of critical line if the higher limit is too large.

Appendix B. Pseudo-elliptical Sérsic potential
As another addition to lenstool, we have incorporated the Sérsic density profile (Sérsic 1968) as an alternative description of the matter density.The motivation for including it is that as the Sérsic profile describes the 2D luminosity profile of elliptical galaxies (Sérsic 1968;Ciotti 1991;Caon et al. 1993), it can be used to separately model the baryonic matter component (which should be traced by the light) and the dark matter (DM) component, given enough lensing constraints.In addition, Merritt et al. (2005Merritt et al. ( , 2006)), find that a deprojected Sérsic profile gives a better fit than an NFW profile to the 3D density profile of DM halos from simulations.Elíasdóttir and Möller (2007) found that given that the surface density distribution is indeed given by a Sérsic profile, but fitted by an NFW using lensing constraints, it can lead to unrealistic estimates of the parameters (e.g. the predicted weak lensing signal and the concentration parameter), making the Sérsic profile an interesting alternative for modelling the DM halos themselves.Finally, the special case of the Sérsic index n = 1, corresponds to an exponential disk, making it useful for modelling spiral galaxies.Spiral lenses are comparatively rare to date, but dedicated efforts are being made to find such lenses, and with the inclusion of the Sérsic profile to Lenstool, it can now be used to study and model such lenses.
The Sérsic 2D density profile has three free parameters (n, R e , Σ e ) and is given by: where R is the projected radius, n is the Sérsic index, b n is a constant chosen such that R e is the radius containing one-half of the projected mass and Σ e is the density at R e .The Sérsic profile reduces to the de Vaucouleurs profile for n = 4, and to the exponential disk for n = 1.The other parameters of the Sérsic profile in lenstool are its position on the sky, its position angle and its ellipticity.The elliptic version of the Sérsic profile is calculated using the pseudo-elliptical approximation developed by Golse et al. (2002).It is introduced in the expression of the circular Sérsic potential by substituting R by R ǫ , using the following elliptical In this definition, ǫ = (a 2 − b 2 )/(a 2 + b 2 ) where a and b are respectively the semi-major and the semi-minor axis of the elliptical potential.From the elliptical lens potential ϕ ǫ (r) ≡ ϕ(r ǫ ), Golse et al. propose generic expressions to compute the elliptical deviation angle α ǫ (r), the convergence κ ǫ (r), the shear γ ǫ (r) and the projected mass density Σ ǫ (r): Σ ǫ (r) = Σ(r ǫ ) + ǫ cos 2φ ǫ ( Σ(r ǫ ) − Σ(r ǫ )) . (B. 3) The pseudo-elliptical developments are limited to small ellipticities.For instance for the NFW, when ǫ > 0.25, the surface iso-densities become increasingly boxy/peanut.Similarly for the Sérsic potential, we have found that when ǫ > 0.25, the goodness of fit (defined in Golse et al.) measured at R ǫ = R e becomes larger than 10%.We also fit the relation between ǫ Σ and ǫ and found ǫ Σ = 3.55ǫ − 3.42ǫ 2 with a χ 2 = 10 −5 .
The ellipticities of the potentials used in this paper and of the projected mass densities ǫ Σ are linearly proportional through multiplicative factors (reported in Table 1).
The range of valid surface density axis ratio q = b/a provided by the pseudoelliptical approximation for the SIE, the NFW and the Sérsic potentials are q SIE > 0.65, q N F W > 0.53 and q Sersic > 0.44 respectively.From N-body simulations Oguri et al. (2003) found that the most probable projected axis ratio is q = 0.6.The pseudoelliptical technique is therefore able to model most of the triaxial halos.
In case of highly elliptical mass distributions, the PIEMD model (Kassiola and Kovner 1993) produces elliptical iso-densities because the ellipticity has been introduced directly in the projected mass distribution and not at the level of the potential.

Figure 1 .
Figure1.Surface density comparison between the Sérsic (solid line), the NFW (dashed line), the PIEMD (dotted line) and the SIS profiles (dot-dashed line) The surface densities correspond to the fit performed in section 5 and extended to very small and large radii.The arrows mark the multiple images positions used as constraints.

Figure 2 .
Figure 2. Merging of two multiple images and determination of the distance between the true critical line (solid line, showing the surface brightness saddle point) and a predicted critical line (dashed line).The dashed segment represents the prior that would be set on the critical line location.

Figure 4 .
Figure 4. Evidence and χ 2 evolution in function of the convergence speed parameter "Rate."

Figure 5 .
Figure 5. Left panel: Image plane for the PIEMD simulated cluster, showing the image positions of the systems A, B and C at redshifts z A = 0.6, z B = 1.0 and z C = 4.0 in the configurations 1, 2 and 3.The black circles mark the image positions.The critical curves of systems B and C are shown in red.Right panel: The corresponding source plane.The blue crosses mark the source positions; the caustic curves are plotted in black.The plotted caustics for systems B and C are radial and tangential, tangential and tangential and radial and radial respectively for configurations 1, 2 and 3. North is up and East is left in both panels.

Figure 6 .
Figure6.2D marginalized posterior PDFs for the parameters of the cluster-scale halo modelled with a PIEMD potential obtained, from left to right, with multiple image configurations 1, 2 and 3 respectively.The 3 contours stand for the 68%, 95% and 99% CL.The input values to the simulation are marked by the stars.The mass of an L ⋆ galaxy is the total mass for a circular profile.The plotted contours in the r ⋆ cut -σ ⋆ 0 plot are isodensity contours.The cluster mass M eins is the inferred total enclosed mass (i.e.galaxy subhalos and cluster-scale halo) within the Einstein radius (30").

Figure 7 .
Figure 7. 2D marginalized posterior PDF of the parameters of the cluster-scale halo modelled with an NFW potential, obtained, from left to right, with multipleimage configuration 1, 2 and 3 respectively.The 3 contours stand for the 68%, 95% and 99% CL .The fiducial values are marked by the stars.The mass of a L ⋆ galaxy is the total mass for a circular profile.The plotted contours in the r ⋆ cut -σ ⋆ 0 plot are the isodensity contours.The cluster mass M eins is the total enclosed mass (i.e.galaxy subhalos and cluster-scale halo) in the Einstein radius (30").

Figure 8 .Figure 9 .
Figure 8. 2D marginalized posterior PDF of the parameters of the cluster-scale halo modelled with an Sérsic potential obtained from left to right with Config. 1, 2 and 3 of multiple images respectively.The 3 contours stand for the 68%, 95% and 99% CL.The fiducial values are marked by the stars.The mass of a L ⋆ galaxy is the total mass for a circular profile.The plotted contours in the r ⋆ cut -σ ⋆ 0 plot are the iso-mass contours.The cluster mass M eins is the total enclosed mass (i.e.galaxy subhalos and cluster-scale halo) in the Einstein radius (30").

Figure 10 .
Figure 10.Aperture mass profile errors relative to the input PIEMD mass profile for the fitted potentials SIE (vertically hatched region), NFW (−45 • hatched region) and Sérsic (45 • hatched region) as a function of the aperture radius.The hatched width represents the 3σ error estimated from the posterior PDF.The arrows mark the positions of the multiple images used as constraints.

Figure
Figure A1.16 square configurations.The empty and filled circles are points with positive and negative amplification respectively.The dashed lines are the infered critical lines.

Figure A2 .
Figure A2.Multiscale marching square field splitting.The boxes represent the splitting squares and the red lines, the critical curve contour.The imposed upper and lower limits for the boxes sizes are 10" and 1" respectively.The 1" boxes are not plotted for clarity.

Table 2 .
Input parameters for the 3 simulated cluster-scale components.

Table 3 .
Parameter recovery for a cluster-scale halo modelled by a PIEMD potential, given 3 different strong lensing configurations.The errors are given at 68% CL.The L ⋆ galaxy masses are given for a circular mass component with identical dynamical parameters.

Table 4 .
Parameter recovery results for a cluster-scale halo modelled by a NFW potential, given 3 different strong lensing configurations.The errors are given at 68% CL.The L ⋆ masses are given for a circular mass component with identical dynamical parameters.

Table 5 .
Parameter recovery results for a cluster-scale halo modelled by a Sérsic potential and recovered in 3 different strong lensing configurations.The errors are given at 68% CL.The L ⋆ masses are given for a circular mass component with identical dynamical parameters.

Table 6 .
Comparison of the log(Evidence) produced by the fit of the NFW, SIE and Sérsic potentials to a core radius varying PIEMD potential.The values come from fits performed with sets of multiple images described in the text and a Rate equal to 0.1