Self-calibrating BAO measurements in the presence of small displacement interlopers

Baryon Acoustic Oscillation (BAO) observations offer a robust method for measuring cosmological expansion. However, the BAO signal in a sample of galaxies can be diluted and shifted by interlopers — galaxies that have been assigned the wrong redshifts. Because of the slitless spectroscopic method adopted by the Roman and Euclid space telescopes, the galaxy samples resulting from single line detections will have relatively high fractions of interloper galaxies. Interlopers with a small displacement between true and false redshift have the strongest effect on the measured clustering. In order to model the BAO signal, the fraction of such interlopers and their clustering need to be accurately known. We introduce a new method to self-calibrate these quantities by shifting the contaminated sample towards or away from us along the line of sight by the interloper offset, and measuring the cross-correlations between these shifted samples. The contributions from the different components are shifted in scale in this cross-correlation compared to the auto-correlation of the contaminated sample, enabling the decomposition and extraction of the component terms. We demonstrate the application of the method using numerical simulations and show that an unbiased BAO measurement can be extracted. Unlike previous attempts to model the effects of contaminants, self-calibration allows us to make fewer assumptions about the form of the contaminants such as their bias.


Introduction
The upcoming Roman Space Telescope [1] and Euclid [2] missions will soon provide astronomers with the next generation of space-based galaxy surveys.Both surveys will make use of slitless spectroscopy to obtain galaxy redshifts, where a grism in the optical path is used to disperse all incident light.While this method has the potential to quickly measure redshifts, it is not without its own sources of systematic error.The primary source of error we will study is the contamination of samples by interlopers arising due to line misidentification.
When measuring redshifts using emission lines, we compare observed and rest frame wavelengths, hence the measured redshift is dependant on the assumed rest frame wavelength.In most cases, this is not an issue since the use of multiple lines at known separations gives a secure result.However, to provide a larger sample for Roman and the baseline for Euclid we will include observations for which only a single line can be seen.This is where the sample can become contaminated.For example, [OIII] will be the primary line for high redshift Roman observations and is generally seen as a doublet [3].However, when this doublet is unresolved into its two components, galaxies with strong Hβ emission can be mistaken as [OIII] emitting galaxies with incorrect redshifts.Unfortunately for astronomers, the shift in position caused by the line misidentification is close to the BAO scale, and thus contaminates the BAO measurement [4].It is crucial to correct for this systematic, otherwise the improved statistical error gained from the larger sample size is outweighed by the interloper contamination.
We focus on this type of contamination in Roman-like galaxy catalogues.[5] presented a method for modelling the auto-correlation of the contaminated sample that requires us to understand the [OIII]-Hβ cross-correlation function.[5] assumed that this could be modelled directly using the small-scale [OIII] auto-correlation function, making the simplifying assumption for their analysis that the [OIII] and Hβ biases were the same.Another method utilizing graph neural networks to estimate the fraction of interlopers was presented by [6].We instead present a new method for estimating the interloper-galaxy target cross-correlation term based on measuring the cross-correlation between two replications of the contaminated catalogue, with the replications obtained by shifting the original catalogue by ±1× the interloper displacement.Shifting the sample by these amounts, we change the scales of the target, interloper, and target-interloper cross-clustering terms compared to the corresponding terms in the auto-correlation of the contaminated sample.In this way, we may estimate the cross terms without any assumptions regarding the relative bias between targets and interlopers.
We can then use the measured cross-correlation between the two replicated catalogues in the model of the auto-correlation of the contaminated sample, based on fitting a linear model from CAMB [7] coupled with nuisance parameters allowing for deviations from this model [8].We use this model to find an unbiased estimation of the Alcock-Paczynski isotropic and anisotropic dilation parameters α and ϵ [9] from mock catalogues, in addition to the fraction of interlopers f i .
Our paper is structured as follows.In section 2, we describe the effects of small displacement interlopers on the correlation function.In section 3, we present our model for estimating the target-interloper cross term and the contaminated correlation function.In section 4, we present results given by our pipeline.Finally, we conclude in section 6.

The Interloper Displacement
Interloper displacements are fixed in observed redshift, dependent on the offset in wavelength between the assumed and true lines.For a galaxy survey, we typically convert from redshifts to distances using a fiducial cosmological model, and so the interloper displacement in our map depends on the fiducial rather than the true cosmology.The comoving proper distance between an object with true redshift z true but displaced to z false is: where H fid is the Hubble parameter in the fiducial cosmology, and c is the speed of light.If we consider the case of [OIII]-Hβ interlopers, this simplifies to with Ω Λ,fid and Ω m,fid being the energy density of the cosmological constant and matter in the fiducial cosmology, respectively.For the range of redshifts of interest to Roman, z=1.0-2.8, and a Planck-like cosmology, this interloper shift ranges from 80 h −1 Mpc to 97 h −1 Mpc, quite close to the BAO feature.

The Contaminated Correlation Function
With the introduction of interlopers into the target sample, the correlation function becomes "contaminated" as some fraction, f i , of galaxies within the observed sample is displaced along the line of sight (LOS).We will refer to the contaminated correlation function as ξ cc .Furthermore, we will refer to the populations of interlopers and galaxy targets with the subscripts "i" and "g" respectively.The overdensity can then be written using two terms Now we can model the contaminated correlation function as with ⃗ r = ⃗ x 2 − ⃗ x 1 .Note the convention adopted follows that of [5], and has the interloper catalogue already shifted along the line of sight.I.e. the interlopers are considered at their observed rather than true positions.
Hereafter we will write the correlation functions as ξ rather than ξ(⃗ r) for simplicity.The first two terms are the auto-correlations of galaxy targets and interlopers with an amplitude dependent on the interloper fraction.These terms contain most of the cosmological information and the familiar BAO peak.The third term is the cross-correlation between the galaxy targets and interlopers.In the cross term, the signal is distorted and shifted in the radial direction, such that the strong small-scale clustering appears at the displacement scale.This component typically skews the BAO peak in the contaminated catalogue clustering towards the interloper displacement and hence biases the standard BAO measurement.

Adding Interlopers into Simulations
For the purposes of testing our model, we employ 1000 halo catalogues from the Quijote suite of N-body simulations [10].Our aim is to demonstrate the method, so we do not need to have a realistic distribution of emission line galaxies in our simulations.We therefore choose not to populate the haloes with galaxies, using halo positions only.The simulations have a box length of 1 h −1 Gpc and use the fiducial Quijote suite parameters [10].In this configuration, the minimum halo mass is equal to 1.31 × 10 13 h −1 M ⊙ .While it is true that we expect to find [OIII]-Hβ interlopers in Roman-like catalogues mostly in redshift ranges greater than z = 1.8 [3], we use the catalogue snapshots at redshift = 1.This range includes a greater density of objects, more consistent with the expected [OIII] mean density at z = 1.8 -2.8 [11].We do use different interloper displacements corresponding to a redshift variation along the full Roman redshift range, as outlined in section 2.1.
For each of the 1000 halo catalogues, we randomly-select a fraction f i of the total haloes, label them as interlopers, and shift them towards the observer along the line of sight by a given shift ∆.We generate catalogues with interloper fraction f i = (0.02, 0.05, 0.10, 0.15) and interloper shift ∆ = (85, 90, 97) h −1 Mpc.We then measure the correlation function using Nbodykit [12] and the standard estimator, where DD are the normalised catalogue-catalogue pairs and RR are the expected randomrandom pairs calculated from the average halo density.We then extract the monopole, quadrupole, and hexadecapole moments.The effects of interlopers on the real-space monopole and quadrupole are shown in figure 1, showing the average of 1000 contaminated correlation function measurements with various interloper fractions and shifts.We also consider injecting interlopers into samples that also include redshift-space distortions.Here we move the full sample of haloes into redshift-space by shifting the line-ofsight displacements based on the halo velocities, before selecting a fraction of the haloes as interlopers.For simplicity we assume that all lines-of-sight are parallelalong one axis of the simulation.
To illustrate the effect of differing galaxy biases between the galaxy target and interloper samples, we also generate a set of catalogues with a different selection of haloes for targets and interlopers.For these, all objects are ordered by mass and we select objects with the greatest mass as interlopers, down to the interloper fraction for the given catalogue.For example, for a catalogue with f i = 0.1, the top 10 percent most massive objects are selected as interlopers.
3 Modeling the BAO Feature with Interlopers

Overview of Method
In order to model Eq.2.6, we need to estimate the galaxy target-interloper cross term accurately.Because of the interloper shift, the value of this term around the BAO scale comes from the small-scale target-interloper correlation function in true units (where interloper galaxies are at the true position), which are typically difficult to model due to nonlinearities.
We present an empirical method for measuring this term that uses the fact that we know ∆ exactly, and can use this to artificially apply further shifts to the contaminated catalogue, to create a new clustering measurement where the components that appear in Eq. 2.6 are further separated in scale.In this revised cross-correlation, the corrections for the terms that we are not interested in can be modelled using the large-scale clustering, such that we do not require a theoretical model of the small-scale clustering signal.This concept works in redshift-space and for situations where target and interloper galaxies have different biases.
The following subsections describe this process in detail.

The Near-Far Shifted Correlation Function
We introduce two additional catalogues.The far-shifted catalogue is the contaminated catalogue with all object positions shifted away from the observer by the known interloper shift.The near-shifted catalogue is the same contaminated catalogue with the object positions moved towards the observer by one interloper shift.Each catalogue is still made up of galaxy targets and interlopers.A schematic diagram for this process is shown in figure 2 We use the cross correlation between the near-shifted and far-shifted catalogues: with the subscript f referring to far-shifted components and the subscript n referring to near-shifted components.Note that unlike the case in equation 2.6, the target-interloper and interloper-target cross correlations are now not equal in amplitude.
The key insight that led to this approach is that the near-shifted targets are still separated by the same distance from the far-shifted interlopers as the original interloper-target pairs.Hence the cross term ξ gni f in equation 3.1 is equal to the cross term ξ gi in equation 2.6.Furthermore, this term dominates the near-far cross correlation.Unfortunately, the other terms in Eq. 3.1 are not negligible and need to be included in the model.The contributions of each component to Eq. 3.1 are shown as dashed colored lines in Fig. 3, while the full near-far cross correlation is displayed in black.As anticipated, ξ gni f dominates the expression.As the scales of this term match those in the contaminated auto-correlation, and we are interested in fitting BAO on scales ∼ 50 -150 h −1 Mpc , these are also the scales for which we need to model the other components.The next most important component is ξ gng f , followed by ξ ini f , such that pairs of galaxies or interlopers that are in the original contaminated catalogue at small separation r ∥ along the line of sight are mapped to a separation r ∥ + 2∆ ∼ r ∥ + 200 h −1 Mpc in the near-far shifted cross-correlation, so that small-scale clustering is mapped outside the fitting range.On the other hand, galaxies or interlopers that are at a large separation (∼ 2∆) along the line of sight in the original contaminated catalogue, become pairs at small to intermediate separation in the near-far shifted cross-correlation.When considering near-far pairs perpendicular to the line of sight, those that are at ∼ 50 -150 h −1 Mpc correspond to objects that are at distance larger than 2∆ ∼ in the original contaminated catalogue.Because of this, ξ gng f and ξ ini f are sourced by large-scale clustering, they have a small amplitude -ξ ini f is additionally suppressed by the factor f 2 i -and they can be modelled using linear theory.Similarly, the final term ξ ing f is also the result of large-scale pairs in the contaminated catalogue and has a very small amplitude, making it important only when we work in redshift space, due to the Kaiser effect.Now returning to Eq. 3.1, we can rearrange to isolate the target-interloper cross term.
As ξ gi = ξ gni f , we can substitute into equation 2.6 and form a model for the contaminated correlation function: We now have a model for the contaminated correlation function using only the measured near/far-shifted cross correlation ξ cnc f and the usual large-scale auto-correlation template to describe ξ gg , ξ ii , and terms that may be modelled with the same auto-correlation.

Including interlopers in the BAO model
Before a correlation function or power spectrum may be measured, a fiducial cosmology must be assumed to translate redshifts into physical separations.The BAO technique is based on the peak being shifted if there is a difference between the true cosmology and the assumed fiducial cosmology.This distortion is characterised using the Alcock-Paczynski (AP) parameters α and ϵ [9], where D A is the angular diameter distance, and H the Hubble parameter.r s is the comoving sound horizon at the baryon drag epoch.Using these parameters, we may relate the true parallel and perpendicular separations, The fiducial Cartesian coordinates and true polar coordinates can be related as follows, In order to extract the BAO signal and marginalise over other information in the correlation function, it is standard to fit a model such as where ξ galaxy is the large-scale auto-correlation template that we will describe in 3.4, the multiplicative factor B is added to adjust the amplitude, and a polynomial term A(r) is added to account for scale dependant bias and match the overall broadband shape [8].α and ϵ are the Alcock-Paczynski parameters and the a i , B are nuisance parameters.These nuisance parameters are chosen such that they may modulate the template to match the data, but without the flexibility to fit the BAO peak itself.With this form for the auto-correlation multipoles, we may combine with Eq. 3.3 to construct models for the multipole moments of the contaminated correlation function: Here, we have used a mapping shift operator, M X introduced in [5], acting on the full two dimensional correlation function ξ galaxy .This operator acts on the argument of a correlation function by mapping a separation (r, µ) into (r shift , µ shift ) to describe the distortion due to a relative shift X between the two objects in each pair count, when this shift is performed along the line of sight.For any pair with initial separation (r, µ), the shifted coordinates are given by a simple trigonometric mapping for a relative line of sight shift X For example, we have written our unweighted estimate of ξ gng f as ξ gng f ≈ M 2∆ [ξ gg ].This takes the auto-correlation of galaxies, at their true positions, and generates the crosscorrelation of two catalogues of galaxies separated by 2∆.In ξ ing f , the galaxy/interloper populations are shifted 2∆ from their observed positions (as with all terms in the near/far shifted correlation), but the interlopers had been already shifted by 1∆ from their true positions.Thus, we write the unweighted estimate of ξ ing f as For the third and fourth lines of 3.8, note that the polynomial term described above is not included in the estimate of the auto-correlation.As described in 3.2, these terms are derived from the large-scale correlation, and can be modelled using linear theory.Thus, the polynomial term is not required here.
We have written these equations in a way that allows us to include differing bias schemes for interlopers and targets.We consider that a reasonable estimate is to separate the amplitude parameter B into B 1 and B 2 to account for two different biases, weighted by the interloper fraction for targets and interlopers.This also allows us to estimate ξ ing f as we can use the mapping for 3∆, while adding in factors of √ B 1 and √ B 2 (since B 1 , B 2 are the squares of the usual biases) to correctly scale the term along with the others.However, unless otherwise stated, we combine both B 1 , B 2 , when fitting mocks with only a single bias.
When fitting both multipoles simultaneously, this results in a model with 11 parameters.When not working in redshift-space, we have found that we can omit the last term in the models containing M 3∆ .

Obtaining the auto-correlation template
Now we describe the methods to obtain the template ξ galaxy .We first obtain the linear matter power spectrum P (k) from CAMB and produce a real-space model where P smooth is the linear power spectrum smoothed as in [13] to remove the BAO wiggles.
The Gaussian in the first term damps the BAO wiggles (P lin − P smooth ) to account for latetime non-linear evolution.We use Σ ⊥ = 4.8 h −1 Mpc and Σ ∥ = 7.3 h −1 Mpc at z = 1 in the fiducial cosmology.
In redshift space, there is the additional effect of redshift space distortions (RSD).The two primary effects are Fingers-of-God (FoG) [14] and Kaiser ("Pancakes-of-God") [15] effects, resulting in the following modifications to the redshift space power spectrum [16], (3.11) We use F (k, µ) = 1.As we are using halos in place of galaxies, we can consider the FoG effect to be negligible.We then take the Fourier transform of the redshift space power spectrum to obtain the two-point correlation functions and multipole moments.
We have also considered using analytic templates for the real-space power spectrum from HALOFIT [17,18] for the additional terms in the near-far shifted correlations function to see if these impact our fits.We find little difference, again a consequence of only needing to model these terms on large scales.

Testing the method using simulations
In this section, we present the results of fitting our model to the contaminated catalogues in the simulations.We fit the mean of both the monopole and quadrupole moments of 1000 correlation functions in redshift space via a Markov-chain Monte Carlo (MCMC) analysis using the emcee python package [19] and minimizing the following posterior distribution [20]: where n s is the number of simulations, m is given by Eq. 54 in [20], and evaluated for all points in ξ = ξ data = ξ cc − 2ξ cnc f .The BAO fit is performed using the monopole and quadrupole within the range r ∈ [50, 120] h −1 Mpc.In order to quantify the performance of our BAO fitting model in Eq. 3.8, and how this depends on the amount of interlopers in the contaminated catalogue and on the particular value for the interloper displacement (thus the redshift of the survey), we consider 12 different scenarios given by the combination of 3 different values for the interloper displacement ∆ = (85, 90, 97 h −1 Mpc ) and 4 values for the interloper fraction f i = (0.02, 0.05, 0.10, 0.15).

Results
First, we performed the BAO fit using a model that does not account for the presence of interlopers (using the first lines only in Eq. 3.8) to quantify how interlopers bias the estimation for the dilation parameters α and ϵ.Our findings are consistent with those in [4,5]: the systematic errors on both dilation parameters increase with the interloper fraction of the contaminated catalogue and they reach maximum values up to 0.048 and 0.15 for α and ϵ, respectively, when f i = 0.15.Results for the 12 combinations of interloper fractions and displacements are shown in red in Fig. 4. In general, bias in the estimates for the AP parameters increases with interloper fraction, and the bias increases faster if the interloper displacement is further from the BAO peak.For example, consider ∆ = 85 h −1 Mpc , the displacement we consider that is furthest from the BAO peak.The systematic error on ϵ ranges from about 0.05 to 0.15.However, with ∆ = 97 h −1 Mpc , much closer to the BAO peak at ∼ 100 h −1 Mpc , the bias on ϵ is both smaller at f i = 0.02 and only reaches ∼ 0.07, spanning only half the range as the case with ∆ = 85 h −1 Mpc .The average systematic error on all three parameters decreases with increasing interloper displacement ∆, as the interloper peak overlaps more with the BAO peak.Of course, the trend of increasing parameter bias with increasing interloper displacement does not continue.As the interloper peak is shifted further away from the BAO peak, it moves outside the fitting range and its effects on the AP parameters are significantly lessened.The peak systematic error in both parameters is obtained when the absolute difference between the positions of the two peaks is ∼ 30 h −1 Mpc [5].The fitting model requires describing the uncontaminated galaxy correlation function ξ galaxy , which involves the use of a linear prediction for the matter power spectrum coupled to a model for the BAO damping to account for the nonlinear evolution around the BAO scale only, and a linear bias scheme to describe the difference between the matter and galaxy clustering.These are approximations: additional parameters, such as amplitude parameters and polynomial terms, are introduced to account for residual nonlinearities.In order to distinguish between any bias caused by these approximations and the novel way in which we model the galaxy-interloper cross-correlation via Eq.3.2, we consider a model where ξ galaxy is directly measured in simulations and inserted in all the terms in Eq. 3.8.Results are shown as blue points in Fig. 4 and in Table 1.The model is accurate and gives unbiased estimates of all three parameters of interest.The majority of systematic errors are within the statistical errors, with the systematic error of all three parameters remaining consistent regardless of interloper fraction or displacement.
We proceed by implementing the full model where ξ galaxy is computed from the CAMB power spectrum and following the procedure explained in Sec.3.4.These fits are the main results of this paper, and are shown as orange squares in Fig. 4 and they are reported in Table 2.In this case, both the statistical and systematic errors increase compared to using the measured galaxy auto-correlation.In particular, the systematic error is about 0.4 − 0.6% for α, but it is consistent with the systematic error attributed to the standard prereconstruction BAO fit pipeline without the presence of interlopers [21].Estimates for epsilon remain relatively unchanged and generally are still within statistical errors.As expected, we see significantly poorer results for f i in particular.The case using ∆ = 90 h −1 Mpc is notably different from the other two, with more biased estimates of f i and ϵ, and less biased estimates of α.With this displacement case, f i is significantly overestimated regardless of interloper fraction.The opposite is true for ∆ = 97 h −1 Mpc , with the interloper fraction underestimated for most interloper fractions.See Table 2.
Finally, we consider the catalogues where interlopers have greater bias than galaxies.The results are reported in Table 3 and are shown in Fig. 4 as the green triangles.In this case, it is necessary to use the two amplitude parameters B 1 and B 2 to scale for the relative bias between galaxies and interlopers.We see increased statistical error on estimates of f i , and now this parameter is consistently overestimated by 1-2σ.This is in better agreement with the true values, although only to the inflated uncertainty.The estimates for ϵ remain robust regardless of the testing case.

Galaxy bias
The primary difference between fits using the measured galaxy auto-correlation and using a CAMB template are the estimates of f i .While the estimates using the measured correlation function give unbiased estimates for f i , when CAMB is used, the B parameter is required to adjust the amplitude, and in the case with differing galaxy biases, we require two B i .These parameters are highly degenerate with f i (see Fig 5), as they are all amplitude-like parameters.It is possible to improve the estimation of f i when using CAMB by changing the prior on B from a flat to a Gaussian prorior centred on B = 1 and with standard deviation = 0.02.We show the results using this Gaussian prior in Fig. 6 in orange.This prior causes no change in the estimates of the AP parameters as compared to the flat prior case (displayed in blue in the same figure).It is important to note that applying such a prior is equivalent to assuming that one knows the error in the estimation of the underlying halo bias, which is not measured using this method.
In the case where target and interloper galaxies exhibit differing biases, it is not as simple to set a similar prior on both B 1 and B 2 .If we consider the Gaussian prior for B 1 and B 2 , it needs to be wide enough to capture a sufficient range of possible different biases.This makes no difference compared to a flat prior.Conversely, if we consider offsetting the priors for B 1 and B 2 , this assumes we know which population, targets or interlopers, are more biased, which is an assumption we do not make in this paper.There is also the possibility of placing a prior on the fraction of interlopers.This, like a prior on B is equivalent to knowing the halo or galaxy bias to some confidence, is equivalent of some knowledge of the expected interloper fraction.Observational estimates are indeed possible using other surveys, for example via [OIII]/Hβ equivalent width ratios measured from the MOSDEF Survey [22], giving an expected interloper fraction around 5% for Roman [4].Upcoming deep fields may also be used to place constraints on the expected f i .It is then trivial to place a Gaussian prior on the interloper fraction within our method.

Periodic Boundary Conditions
While we work in a simulation box with periodic boundary conditions, the total pair count in ξ cnc f is unchanged compared to ξ cc , all pairs with the same initial separation will have the same new separation after the shift, and the information contained in both correlation functions is the same.However, in the near-far cross-correlation we have changed the relative scales at which the different components overlap, which allows us to more cleanly separate and model them to extract the cross-correlation term.
In a survey volume, we cannot apply a periodic boundary condition as in the simulation box.Instead, we would need to additionally shift the randoms used determine the expected density and calculate correlation function.Thus, the pair counts at all scales are compared to their expected counts in a random field, without missing the correct normalizations for the large scale correlation we introduce via the near/far shift.

Testing for additional systematics
We now compare how the statistical and systematic errors change between our baseline model including the near-far derived correction for interlopers, with results of fits using the following data inputs and covariances.
• The Noiseless Near/Far Correlation (ξ cc − 2ξ cnc f , C cc ) Here, as opposed to considering ξ cc − 2ξ cnc f for each simulation, we now consider subtracting the mean ξ cnc f from ξ cc for each simulation.For this, the covariance should be that calculated from only ξ cc .In practice, we consider a single fit to the mean over all simulations rather than fitting to each, and so this method only differs in the covariance matrix used.Fitting with C cc results in systematic errors for the AP parameters that are nearly identical to our baseline, but with statistical errors that are on average 30% smaller for interloper fractions of 2% -15%, with a weak dependence on fraction.Thus, including the near/far shifted measurements in the covariance does not bias the results, but it does increase the uncertainty.
• The No Interlopers Case (ξ gg , C gg ) We also compare our results for the AP parameters to a fit on a dataset without interlopers.In this case, the data vector is ξ gg , the auto-correlation of galaxies, with associated covariance C gg , and the fitting model is given by the first line of Eq. 3.8 with f i set to zero.The systematic error for α and ϵ in this case is consistent with the ones obtained using our method applied to contaminated catalogues, while the statistical errors on the AP parameters decreased by approximately 25%-40% for interloper fractions of 2%-15%.Therefore, the systematic error in our method is due to the standard BAO modelling procedure.

Dipole of the near/far cross-correlation
In a 2D correlation function, a non-zero dipole arises when the function is not invariant under reflection symmetry along the line of sight or, equivalently, under a change in the sign of µ, the cosine of the angle between the line of sight and the direction connecting each pair of objects.Changing the sign of µ corresponds to considering the pair AB instead of the pair BA.When measuring a cross-correlation function, one selects two different populations, then chooses which one appears first in the pair counting, and separations and angles between the first and second groups are determined for each pair.In an auto-correlation, this choice is not made.The near/far shifted correlation function is a cross-correlation between the same catalogue shifted in two opposite directions, so that the first object in the pairs always belongs to the near-shifted catalogue, and the second object always belong to the far-shifted catalogue.Thus we expect a dipole in the near/far contaminated correlation function.
This dipole is due to two effects.First, an artificial dipole appears into the near/far shifted correlation because that is a cross-correlation between the same catalog shifted along the line of sight in two different directions.The shift creates an artificial dipole in the contaminated clustering through the terms ξ gng f and ξ ini f .The near/far dipole also has features that are due to the presence of interlopers in the contaminated catalogue.
In the contaminated correlation function, the 2D galaxy-interloper cross correlation can be obtained via ξ gi ∼ M ∆ [ξ gg ], where we applied the transformation M in Eq. 3.9.In that transformation, the change µ → −µ is equivalent to the the change of variable ∆ → −∆.Therefore ξ ig ∼ M −∆ [ξ gg ].While the two cross correlations ξ gi and ξ ig exhibit equal evenmultipole components, they present different odd terms.The left panel in Fig. 7 shows the dipole of the two cross correlations, as predicted via the M ±∆ [ξ gg ] mappings as well as the mean of measurements over 10 mocks.The dipoles have the same amplitude but opposite sign as a function of separation r, thus they sum to zero.Since we cannot distinguish between interlopers and target galaxies in observed surveys, these two terms cannot be measured separately and the dipole cannot be detected.
When considering the near/far shifted correlation function, the dipoles in ξ ing f and ξ gni f appear at different scales, as a result of having shifted catalogues.Indeed, ξ gni f ∼ M ∆ [ξ gg ] and ξ ing f ∼ M 3∆ [ξ gg ], making all their multipoles (even and odd) differ as a function of separation r.The right panel of Fig. 7 shows the dipoles in ξ ing f and ξ gni f , which appear at different scales and do not sum to zero as in the contaminated correlation function.Additionally, they now share the same sign, another consequence of shifting the contaminated catalogue to generate the near and far shifted ones.
When measuring the near-far shifted correlation function of the full contaminated sample, one will always observe a dominant artificial dipole at exactly the shift scale, ∆ shift as in the monopole.Measurements over 10 mocks of the total near/far shifted correlation dipole are shown in Fig 8.In the presence of interlopers, additional offset dipole features will appear.As a result, the dipole can therefore be used as a initial indicator for the presence of this type of interlopers.In fact, the shift need not be equal to some integer multiple of the interloper shift to see this effect.In a situation where the interloper shift is not known, one The measured dipole of the total ξ cnc f is shown in green.Measurements are taken as the mean of 10 mocks.The vertical scale has been limited to a range of [-100, 30], so that the smaller features are more easily seen.The central feature peak is at ∼ -1500 and is not shown on this scale.

Conclusions
In this paper, we have discussed how to extract the BAO scale from a galaxy-target catalogue that is contaminated with a sample of small-displacement interlopers.This is important for future spectroscopic telescope surveys, like Roman, where is possible the population of [OIII] galaxies may be contaminated by Hβ interlopers due to single-line redshift measurements.We now cover some important points of our analysis.
The contaminated correlation function has two important terms: The auto-correlation term (which may be split into a galaxy auto-correlation and interloper auto-correlation) and a galaxy-interloper cross correlation term as a linear combination of two correlation functions.It is this auto-correlation term that we wish to isolate and measure, and thus we sought to model the cross-correlation.
We have introduced a self-calibration method to estimate this cross term.This involves taking the original contaminated catalogue and creating two new "far-shifted" and "near-shifted" catalogues, with LoS positions shifted towards or away from the observer by one interloper shift.The cross correlation of these catalogues can partially isolate the desired galaxy-interloper cross term.
We tested our model in various scenarios exhibiting different values for the interloper fraction and the interloper displacement ∆.In all cases, our model returns unbiased predictions for the AP dilation parameters with systematic errors less than 6.6×10 −3 in the worst case, which is consistent with prior pre-reconstruction fitting methods that did not considered catalogues contaminated by interlopers [21].If interlopers are present but not accounted for in the model, the systematic errors on the dilation parameters are one to two orders of magnitude greater, up to 0.048 and 0.15, for α and ϵ, respectively.
As our analysis will be useful for Roman [OIII] targets, we may predict the statistical error our method would give for a Roman survey.Using a simplistic approach, we may scale the errors we obtain from the effective volume of 1000 mocks (1 (h −1 Gpc) 3 each) to the approximate volume of a survey between z = 1.8 -2.8, ∼ 10 (h −1 Gpc) 3 .We are therefore testing the method at a precision a factor of 10 times better than expected from Roman.For the range of interloper fractions and displacements we consider, the systematic errors are therefore expected to be subdominant.The effect of the self-calibration method to ameliorate the effects of interlopers, is that the statistical errors on the AP parameters increase by approximately 30%, which is only weakly dependent on the interloper fractions for a range 2% -15%.This increase in the errors can be reduced further by providing prior information on the interlopers, for example from deep field observations.Small-displacement interlopers create a strong signal in the two point correlation function that appears near the BAO peak and thus contaminates measurements of cosmological parameters.With our self-calibration model, it is possible to robustly estimate the AP dilation parameters regardless of differing bias schemes of interlopers and targets.Furthermore, our model can estimate the fraction of interlopers to within a few percent, given that there is Gaussian prior around the parameter B.

A Tables of values
The following tables of values were used to generate Fig. 4. Table 1 contains values for the fit using a measured galaxy auto-correlation function and unbiased mocks.Table 2 contains values for the fit using a galaxy correlation function calculated from CAMB as outlined in Section 3.4, and unbiased mocks.Table 3  Table 3. Systematic and statistical errors on α, ϵ, f i from multipole fits on the mean of 1000 biased redshift space mocks and using CAMB autocorrelation.∆ given in h −1 Mpc .

Figure 1 .
Figure 1.The mean monopole (left) and quadrupole (right) of the correlation function measured from 1000 Quijote mocks for different interloper fractions and displacements.

Figure 2 .
Figure 2. Schematic diagram of the contaminated galaxy catalogue on the left and the effect of shifting it to generate the near-shifted and far-shifted catalogues.Galaxies at their true positions are labelled with g while interlopers are labelled with i.The interloper shift is ∆.

Figure 3 .
Figure 3. Contributions of each term of 3.1 to the total near/far-shifted cross correlation ξ cnc f .The monopole is shown on the left and the quadrupole on the right.

Figure 4 .
Figure 4. Systematic errors for α (top row), ϵ (middle row), and f i (bottom row), plotted against the true fraction of interlopers.Uncorrected case in red.

Figure 5 .
Figure 5.The posterior distributions of B and f i from the case using ∆ = 85 h −1 Mpc and f i,true = 0.1.

Figure 6 .
Figure 6.Improved parameter estimates applying gaussian prior on B for single bias case.Systematic errors for α (top row), ϵ (middle row), and f i (bottom row), plotted against the true fraction of interlopers.Vertical scales are the same as Fig. 4. The improvement in the estimates of f i are marked.

7 .
The left panel shows measured dipole in the terms ξ ig and ξ gi of the contaminated correlation function (Eq.2.6).The right panel displays the measured in the terms ξ ing f and ξ gni f in the near-far shifted correlation function (Eq.3.1).Measurements taken as the mean of 10 mocks.Predictions are shown as the solid black lines in both panels.could perform the near/far shift by any arbitrary value and the additional interloper features would appear at ∆ shift − ∆ interloper and ∆ shift + ∆ interloper .

Table 1 .
contains values for the fit using biased mocks.Systematic and statistical errors on α, ϵ, f i from multipole fits on the mean of 1000 redshift space mocks and using measured autocorrelation.∆ given in h −1 Mpc .

Table 2 .
Systematic and statistical errors on α, ϵ, f i from multipole fits on the mean of 1000 redshift space mocks and using CAMB autocorrelation.∆ given in h −1 Mpc .