Honing cross-correlation tools for inference on ultra-high-energy cosmic-ray composition

The chemical composition of the highest-energy cosmic rays, namely the atomic number Z of rays with energies E ≳ 40 EeV, remains to date largely unknown. Some information on the composition can be inferred from the deflections that charged ultra-high-energy cosmic rays experience while they traverse intervening magnetic fields. Indeed, such deflections distort and suppress the original anisotropy in the cosmic ray arrival directions; thus, given a source model, a measure of the anisotropy is also a measurement of the deflections, which in turn informs us on the chemical composition. In this work, we show that, by quantifying ultra-high-energy cosmic ray anisotropies through the angular cross-correlation between cosmic rays and galaxies, we would be able to exclude iron fractions f Fe ≥ 𝒪(10%) assuming a fiducial hydrogen map at 2σ level, and even smaller fractions in the reverse case of hydrogen on an iron map, going well below f H ≈ 10% when we mask the Galactic Centre up to latitudes of 40°. This is an improvement of a factor of a few compared to our previous method, and is mostly ascribable to a new test statistics which is sensitive to each harmonic multipole individually. Our method can be applied to real data as an independent test of the recent claim that current cosmic-ray data can not be reproduced by any existing model of the Galactic magnetic field, as well as an additional handle to compare any realistic, competing, data-driven composition models.


Introduction
The chemical composition of the highest-energy cosmic rays (UHECRs), namely cosmic rays with energies E ≳ 40 EeV, remains to-date poorly known even after decades of investigations [1][2][3].The chemical composition of UHECRs, indeed their atomic number Z, is very hard to measure experimentally on an event-by-event basis, and only aggregate, statistical information is available.Even then, systematic errors on the composition from current data are large and prevent unequivocal determination of UHECR composition as a function of energy, especially at the highest end of the spectrum.Indirect measurements of UHECR composition can be performed by measuring the deflections that charged UHECRs experience while they traverse intervening Galactic and possible extra-Galactic magnetic fields (GMF and xGMF, respectively).In order for these methods to work, one needs to know the distribution of UHECR sources as well as the strength and structure of intervening magnetic fields; neither of which are known well enough at present, unfortunately [4].Some useful conclusions can nevertheless be drawn by noticing that the uncertainty in atomic number, which can range from Z = 1 for protons ( 1 H) to Z = 26 for iron nuclei ( 56 Fe), is by far the largest unknown factor that determines the uncertainty in the deflection of UHECRs.This fact was behind recent methods and analyses [5][6][7][8][9].In particular, in our previous work [8] we have shown that using the harmonic-space cross-correlation power spectrum (often, 'angular cross-correlation', for short) between UHECRs and galaxies, if Z = 56 we can exclude a 42% fraction of 1 H nuclei (or more) at 2 σ for UHECRs above 100 EeV.In this paper we demonstrate how we can hone these tools and improve this result by a factor of 4, excluding as little as 12% 1 H using the same synthetic datasets, which we can further restrict to 7% if only the polar caps at latitudes higher than |b| = 40 • are considered.
This paper is structured as follows.In Section 2 we describe the flux model for UHECRs.In Section 3 we expound our method and detail the test statistics (TS) we employ.In Section 4 we present our main results, which are then contextualised in Section 5 together with an outlook for future perspectives.We gather all additional results in Appendix A.

Ingredients
Any function ∆ a ( n) of direction n on the celestial sphere can be decomposed into harmonic coefficients as where Y ℓm are Laplace's spherical harmonics.In this work the functions being correlated are the anisotropy fields of galaxies and UHECRs, for which a ∈ {g, CR}, respectively.

Galaxies
Following closely the notation of [8], we write the anisotropy of the galaxy sample as where δ g (z, χ n) is the three-dimensional galaxy overdensity, for which the radial kernel of the galaxy distribution ϕ g (χ) represents the weighted distribution of galaxies as a function of radial comoving distance, χ, which is related to the cosmological redshift z through dχ/dz = 1/H(z), with H the Hubble factor.The galaxy kernel ϕ g (χ) is given by where ng,c (χ) is the comoving, volumetric number density of galaxies in the sample.The quantity w(χ) is an optional distance-dependent weight that can be applied to all the objects in the galaxy catalogue, provided their redshifts are known.Assuming Poisson statistics, it can be shown [10] that the optimal weights that maximise the signal for the galaxy-UHECR cross-correlation (XC) are given by where α is the attenuation factor, defined as the fraction of cosmic rays injected with E ≥ E cut which are detected on Earth with energies greater than E cut ; the attenuation factor depends upon the atomic number Z and the injection energy-spectrum, power-law index γ (see below).We mould our mock galaxy sample on the 2MASS Redshift Survey (2MRS) [11], which is among the most complete full-sky spectroscopic low-redshift surveys, with 43 182 objects [see 12].Moreover, we limit our mock sample to the minimum distance of 5 Mpc, as is customary in the UHECR literature, in order for the many-source assumption to be valid [13,14]-were we not to do so, we would also introduce an unphysical cancellation of cosmic variance at low-ℓ in the combined UHECR auto-correlation (AC) and galaxy-UHECR XC analysis [see also 15].

UHECRs
Under the assumption that UHECR sources are numerous and steady, we can write the UHECR anisotropy field as where the UHECR radial kernel for UHECRs above E cut , coming from sources at distance χ, emitting UHECRs with atomic number Z and injection spectrum ∝ E −γ reads In this paper, we specifically want to know how firmly current data can rule out 56 Fe if UHECRs are 1 H-hence f mix = f Fe (or 1 H if UHECRs are 56 Fe).Notice that this test is not symmetric under the exchange 1 H↔ 56 Fe, and we will show both cases below.
Owing to the fact that the UHECRs flux is very low-a typical dataset for E ≳ 40 EeV contains O( 103 ) events-we can presume that on average each UHECR comes from a different source, and that all UHECR sources are in the galaxy catalogue, such that δ g (z, χ n) = δ s (z, χ n).This is accurate provided that there are many more sources than events [14], which is the case in this work.

Correlators
The harmonic-space (or, angular) power spectrum S ab ℓ is the two-point function of the harmonic expansion coefficients of the anisotropy fields, i.e.
where the angle brackets stand for the ensemble average.The power spectrum S ab ℓ is most conveniently expressed in terms of the three-dimensional Fourier-space power spectrum P ab (z, k) as where the theoretical P ab (z, k) is modelled according to the halo-model prescription, adapted to the specifics of the 2MRS catalogue [12,18].The observed power spectrum is the combination of signal and noise Since the angular positions of the UHECRs and the galaxies are discrete point processes, where NΩ,a is the angular number density of points in sample a, and NΩ,a∧b is the angular number density of points shared in common [16].Notice that, since we assume that each UHECR comes from a galaxy in our sample, we can neglect the noise term in the crosscorrelation power spectrum.

Magnetic fields
To account for the deflections caused by the intervening GMF, we adopt a data-driven prescription [19].In harmonic space, it amounts to applying a scale-dependent beam to the UHECR harmonic coefficients ∆ CR ℓm .The width of the beam, σ, is given by where b is galactic latitude [see 19, 20, but notice the factor √ 2 difference with their normalisation]. 1In practice, in order to keep with a mostly analytic treatment of the deflections, we conservatively smear uniformly across the sky with the largest σ in a given region of the sky.For instance, if we keep a full sky we set b = 0 in Eq. (2.13), whereas if we mask the Galactic plane out to some b cut we set b = b cut .For simplicity we neglect any contribution from hypothetical xGMFs in what follows.
Lastly, in order to account for the energy-dependence of the magnetic deflections, we bin the UHECR flux in five logarithmic bins of width 0.1, according to a model differential energy spectrum given by where  eV = 100 EeV, respectively.These are representative of the number of events we can expect for a full-sky observatory with statistics comparable to existing observatories [21,22].

Method
In our first work [8], we developed a test to determine how confidently a full-sky UHECR experiment, which observes a number of events comparable to existing data, can exclude a fraction f Fe of iron in a pure proton flux (or, vice versa, a fraction f H of protons in a pure iron sky) using the UHECR AC and UHECR-galaxies XC power spectra.The method follows closely our previous work, however with some differences that are responsible for our improved results.The method consists of the following steps: 1. We choose our fiducial model γ = 2.3 and E cut = 100 EeV-we later will repeat the analysis for E cut = 63 EeV and E cut = 40 EeV.We compute the synthetic data vector at a given f mix = f fid (with f fid ∈ {0, 1} for 1 H and 56 Fe fiducials, respectively), namely d ab ℓ = S ab ℓ,f fid , and its corresponding standard deviation σ ab ℓ which accounts for the observational error due to shot-noise and cosmic variance that is 2 (3.1)In addition, we compute the theory vector t ab ℓ (F mix ) = S ab ℓ,F mix for a single parameter F mix running in a flat prior in [0, 1] with a step of 0.001.We can define the χ 2 TS that is expected to be minimised at F mix = f fid , which is the value that reproduces the synthetic data (we suppress the a, b superscripts henceforth in this section to reduce clutter).The χ 2 values correspond to confidence levels (hereafter CLs) for 1 degree of freedom (that is, for the single parameter F mix ).Then, given the F mix for which we are q CLs away from the synthetic data at f fid , we compute the harmonic-space spectra corresponding to those values as follows: upper ℓ,f fid ,q := S ℓ,f fid + q S ℓ,f fid − S ,f fid ℓ,F mix at q=1 .(3.4) The q = 1, .., 5 define CL bands above and below the spectrum at f mix = f fid .In Fig. 1 we present the χ 2 values of the F mix model parameter for three scenarios of synthetic data at f fid = 0 (in blue colour), f fid = 0.5 (in orange colour), and f fid = 1.0 (in green colour).Notice that, as anticipated, the f fid = 1 and f fid = 0 are not symmetric.
2. We compute N rea = 1, .., 1000 realisations of the spectra G ℓ,Nrea,f fid derived from a Gaussian distribution with mean at S ℓ,f fid and standard deviation of |S ℓ,f fid − S f fid ℓ,F mix at q=1 |.For each realisation we compute the fraction of points that are between for all q and ℓ values.
3. Next, for a given value of the test model parameter, in our case f mix (more specifically f Fe , for which we use a step 0.025 in [0, 1]), we compute the Eq.(3.2), obtain the spectra 2 Assuming that ∆ a ℓm is Gaussian, the covariance matrix of the harmonic-space power spectrum (which is a correlator of four-point statistics) can be written using Wick's theorem.This is expressed as the sum of all the available two-point correlator combinations.at lower ℓ,f fid ,q and upper ℓ,f fid ,q , and repeat the previous steps to obtain the fraction of realisations of the fiducial spectra G ℓ,Nrea,f fid lie outside the band defined by Eq. (3.5) at the desired q CL.This represents the confidence with which each realisation of a hypothetical experiment (with its number of detected events and angular resolution) is able to reject the test model.

4.
Overall, this provides the percentage n(f mix ; q) of experiments that are able to exclude f mix at at least q CL, or-the information can also be read across f mix as well as across q -exclude f mix or more (when the fiducial is f fid = 0, otherwise f mix or less if the fiducial is f fid = 1) at q CL.
The main difference with respect to the method outlined in [8] is that here we choose to make use of the full harmonic-space power spectrum instead of compressing it to just on value, namely the total power across all multipoles.Having designed a TS that is separately sensitive to each multipole is the main reason behind its improved sensitivity, as we shall see in Section 4.
Another difference is that in [8] we generated 250,000 realisations of the power spectra, whereas here we find that 1000 realisations are enough to make our predictions stable against fluctuations.This is due to the fact that since each multipole is separately tested against the fiducial, in effect we are generating several hundreds realisations of observables that we are using in the TS (the highest multipoles have no constraining power because of the magnetic beam).

Results
In order to gain some intuition as to the UHECR anisotropy as seen by different primaries, in Fig. 2, left panel, we show the predicted harmonic-space power spectra for 1 H (red) and 56 Fe (green) at E cut = 100 EeV for a full-sky analysis or in the case in which we mask the Galactic Centre at |b cut | = 40 • .The 56 Fe curves exhibit a higher anisotropy at large scales (small ℓ), which drops off rapidly at intermediate scales.This is expected because 56 Fe has a slightly shorter radial kernel than 1 H and, before the magnetic deflections set in at intermediate ℓ, it carries a stronger anisotropic imprint than 1 H.Then, because of the stronger impact of the magnetic deflections, the 56 Fe spectra drop off rapidly.The hardening of the 56 Fe spectra at large ℓ is a result of the child protons developed from an initial pure 56 Fe injection. 3Also, notice that the difference between the AC and the XC opt is entirely caused by the fact that the magnetic and resolution beams are different for the UHECR map and the galaxy overdensity map; indeed, the unbeamed spectra are identical by construction of the optimal weights.Lastly, note that the spectra for |b cut | = 40 • are starting at higher ℓ compared to those for the full-sky.This is due to the fact that the minimum multipole ℓ we can probe is defined by ℓ min (b cut ) ∼ π/[2f sky (b cut )] and the sky fraction is given by f sky (b cut ) = 1 − sin(b cut ).In order to assess the strength of our method in discriminating different chemical compositions at high energies, in Fig. 3, top panels, we show the percentage of hypothetical experiments n(f Fe ; q) that will be able to exclude f Fe or more at q CL for our benchmark model with E cut = 100 EeV and γ = 2.3.In the bottom panels we show results when we exchange fiducial and test model, namely f H := 1 − f Fe .Our results show that with the AC we are able to exclude f Fe ≈ 0.40 (f Fe ≈ 0.27) in n = 80% (n = 50%) of the tests at q = 2 CL.These values significantly decrease in the case of the XC opt for which we find f Fe ≈ 0.15 (f Fe ≈ 0.11) in n = 80% (n = 50%) of the tests.If we adopt 56 Fe as our fiducial model we find f H ≈ 0.33 (f H ≈ 0.23) for the AC and f H ≈ 0.12 (f H ≈ 0.08) for the XC opt .We do not find a strong dependence on the energy cut E cut , see Appendix A. While the qualitative behaviour of these tests reflects that of our previous study [8], quantitatively the new method described here brings about a significant improvement.For example, using the total angular power summed over ℓ ∈ [1,1000] in our previous work we found that the XC opt could only exclude f Fe ≈ 0.55 (f Fe ≈ 0.39) in n = 80% (n = 50%) of the tests (and similarly for the other cases)-in comparing the two works by eye, notice that here the x-axes run always from f mix = 0 to f mix = 0.5, whereas in [8] they run up to f mix = 1.
Figure 3: (Top) Percentage of experiments n(f Fe ; q) that will be able to exclude f Fe or more at q CL for different choices of q: from q = 1 (solid) to q = 5 (dotted), for our benchmark model with E cut = 100 EeV and γ = 2.3.The AC angular power spectra are shown in yellow on the left panel, and the XC opt optimal angular power spectra are shown in blue on the right panel.Darker tones show results for the full-sky, lighter tones show results with the Galactic Centre masked off.(Bottom) Same but exchanging fiducial and test model: in this case we show n(f H ; q) and test a fraction f H = 1 − f Fe of Z = 1 injection against a Z = 26 fiducial.
Next, in order to assess how beneficial it can be to adopt a more refined version of the magnetic smearing that takes into account its latitude-dependence (namely, that at higher latitudes the magnetic field has a smaller impact), in lighter tones in Fig. 3 we re-do the analysis after masking the Galactic Centre within |b cut | = 40 • .In this case the smearing angle we adopt is much smaller (see Eq. 2.13), and for simplicity (and conservatively) we discard any anisotropic power arising from the masked region.Despite the fact that there is a loss of UHECR events due to the mask, the reduction in magnetic smearing more than makes up for it, improving the XC opt to f Fe ≈ 0.09 (f Fe ≈ 0.07) in n = 80% (n = 50%) of the tests for an 1 H fiducial and f H ≈ 0.07 (f H ≈ 0.05) for an 56 Fe fiducial; similar improvements are seen in the AC case.In Appendix A we show results with |b cut | = 20 • and |b cut | = 60 • , where it is visible that the best case scenario, but only marginally so, appears to be between

Discussion and outlook
In this paper we have presented an improved method that aims at statistically distinguishing UHECR atomic numbers Z, by quantifying their expected deflections in the GMF.In particular, in this work we build upon our earlier results in [8] where we employed the harmonic, angular AC and XC (specifically, the optimised XC opt ) power spectra to quantify UHECR anisotropies, and the impact of the GMF on them.With our improved method, by looking at the XC opt we are able to exclude 56 Fe fractions f Fe ≤ O(10%) on a fiducial 1 H map at 2 σ level in most of the tests, and even less in the reverse case of 1 H on a 56 Fe map, in some cases going below f H ≈ 10% when we mask the Galactic Centre up to |b cut | = 40 • .The AC is much less performing, as it is less sensitive to the effects of the GMF (it picks up anisotropies at larger scales than the XC, where the GMF is not as relevant).
These are improvements of a factor of a few compared to our previous method.The main driver behind this improvement is the use of each individual multipole ℓ separately in building the TS, as opposed to aggregating them in the total harmonic, angular power as done in [8].In this way, and the more so for the XC than the AC, we are able to separately capture the effects of the magnetic deflections at different scales; since the XC is sensitive to a wider range of scales than the AC, its constraining power is also stronger.
In this work we have chosen to work with 1 H and 56 Fe primarily because, at the highest energies where angular anisotropies are expected to be the most prominent, we do not expect a large contribution of other nuclei, simply because the propagation horizon of intermediate nuclei is much shorter (thereby reducing the available number of sources).Moreover, this work serves to illustrate our method, which then can be rapidly applied to compare more realistic, data-driven chemical compositions.We plan to address this point in future work.
There is some tentative evidence that, using an approach similar to our own, UHECR data is not reproducible with models of the GMF alone [9] (see also [23,24]) and requires additional smearing from, i.e., extra-Galactic magnetic fields.Our method enables a valuable complementary test for this result, as it is designed to detect anisotropies in harmonic-space, thereby further sharpening our understanding of extreme-energy cosmic-ray sources as well as Galactic and astrophysical magnetism.[%]  we show how our results can be improved ever so slightly by employing the combination of all namely the AC together with the (optimised) XC opt in one overall data vector.
Firstly, in order to assess the dependence on the energy cut E cut , in Fig.
A.1 we show the changes in constraining power when we adopt E cut = 40 EeV (left) and E cut = 63 EeV (right).We do not observe any major quantitative differences, but it is seen that lower energies are less constraining than higher energies.This is in line with our previous results and justifies our choice of E cut = 100 EeV for our benchmark scenario.The bottom line is that, the better we are able to detect UHECR anisotropies over the noise, the stronger is the constraining power of our method.In this case the loss of UHECR events due to the higher energy cut is more than compensated by the much smaller deflections due to the GMF.
Secondly, in order to assess the dependence on choice of sky mask, in  , respectively.We observe a weak tendency to improve as we cut out more of the sky where the GMF deflections are larger.Hence, once again the loss of UHECR events due to the small sky fraction we observe is more than compensated by the much smaller deflections due to the GMF in that region of the sky.A more realistic test with fully-fledged realisations of the GMF would enable us to maximise this feature to the benefit of the constraining power of our observable.
Thirdly, we can try to improve our method and results by building the data and theory vectors with all the available measurements, namely the AC and the XC together.Schematically we define v ℓ = {S CR CR ℓ , S CR g ℓ , S g g ℓ }, see [16] for details.In this case the χ 2 of Eq. (3.2) with Σ ℓℓ ′ being the covariance matrix.In Fig.
A.4 we show, in red, the AC+XC opt alongside what we already showed in Fig. 3.The improvement brought about by the combination of auto-and cross-correlation is present but not significant.Lastly, in order to test the impact of a high-energy injection cutoff on our results, in Fig. A.5 we compare the AC (left panels) and the XC opt (right panels) for our benchmark model with E cut = 100 EeV where for 56 Fe we have E max = ∞ (AC: yellow, XC opt : blue) or E max = AE cut (AC: red, XC opt : magenta) with A = 56.The physical consequence of the high-energy cutoff E max is that in the E max = A E cut case no child protons with energy E ≥ E cut would reach the Earth (instead of making up for approximately 23% of the flux when there is no high-energy cutoff).Without child protons the 56 Fe spectra become more distinguishable from the 1 H ones, as evidenced by the smaller fractions of f Fe (top panels) and f H (bottom panels) that can be excluded at a given CL compared to Fig. 3. Notice that the total angular power in the case of a childless 56 Fe spectrum on Earth is larger than that with child protons even though the latter, because of the child protons, extends to larger ℓ (smaller angular scales).The reason is that overall the 23% of child protons contribute to the spectrum less than an equivalent fraction of 56 Fe, so the pure 56 Fe case has more power, and it is easier to tell apart from 1 H. [%] Full-sky E cut = 100 EeV q = 1 q = 2 q = 3 q = 4 q = 5 0.0 0.2 0.4 0.6 0.   Full-sky E cut = 100 EeV q = 1 q = 2 q = 3 q = 4 q = 5 Figure A.5: Same as Fig. 3 but adding the case where E max = AE cut , which in turn means that f p = 0 (AC: red, XC opt : magenta).

Figure 1 :
Figure 1: Distribution of χ 2 values as a function of the model parameter F mix for synthetic data at f fid = 0 (blue), f fid = 0.5 (orange) and f fid = 1.0 (green).The χ 2 best-fit corresponds to the data value of f mix as expected.

Figure 2 :
Figure 2: harmonic-space power spectra for 1 H (red) and 56 Fe (green) at E cut = 100 EeV; the AC (dotted) is on the left panel, the XC (dashed) is on the middle panel and the XC opt (solid) is on the right panel.Full-sky results are in dark tones, whereas the results from masking the Galactic Centre at |b cut | = 40 • are in lighter tones.

Figure A. 1 :
Figure A.1: Same as Figure 3 but with energy cuts at E cut = 40 EeV (left) and E cut = 63 EeV (right), and overlaying AC and XC opt in the same panel.

Figure A. 4 :
Figure A.4: Same as Fig. 3 but adding, in red, the AC+XC opt according to the TS defined by Eq. (A.1).
FeFull-skyE cut = 100 EeV AC w/ f p = 0 AC w/ f p = 0.23 XC opt w/ p w/ f p = 0 UHECRs are of one given species Z 1 , for instance 1 H with Z 1 = 1, we seek to determine what is the fraction f mix of a second species Z 2 , e.g.56Fe with Z 2 = 26, that can be constrained through the XC for UHECR datasets of size comparable to what is currently available.Hence, the radial kernel for the Z 1 -Z 2 admixture becomes 20