Using two point correlation functions to understand the assembly histories of Milky Way-like galaxies

The two point correlation function (2PCF) is a powerful statistical tool to measure galaxy clustering. Although 2PCF has also been used to study the clustering of stars on parsec and sub-parsec scales, its physical implication is not clear on such non-linear scales. In this study, we use the Illustris-TNG50 simulation to study the connection between the 2PCF signals of accreted halo stars and the assembly histories of Milky Way-mass galaxies. We find, in general, the 2PCF signal increases with the increase in galactocentric radii, $r$, and with the decrease in the pair separations. Galaxies which assemble late on average have stronger 2PCF signals. With $z_{1/4}$, $z_{1/2}$ and $z_{3/4}$ defined as the redshifts when galaxies accreted one-fourth, half and three-fourths of their ex-situ stellar mass at today, we find all of them show the strongest correlations with the 2PCF signals at $r\sim0.2R_{200}$. $z_{3/4}$ shows the strongest correlations at all radii than those of $z_{1/4}$ or $z_{1/2}$, as later accreted stars preserve better clusterings. However, the correlations between the 2PCF signals at different radii and the galaxy formation times all have large scatters. The 2PCFs in velocity space show weaker correlations with the galaxy formation times within $0.38R_{200}$ than real space 2PCFs, and the scatter is considerably large. Both the real and velocity space 2PCFs correlate with the assembly histories of the host dark matter halos as well. Within $0.38R_{200}$, the real space 2PCF shows stronger correlations with the galaxy formation histories than with the halo formation histories, while the velocity space 2PCFs do not show large differences. We conclude that it is difficult to use 2PCF alone to precisely predict the formation times or assembly histories of galaxies.


INTRODUCTION
Two point correlation function (2PCF) and its Fourier transformed power spectrum are old but powerful statistical tools to study galaxy and matter clustering on different cosmological scales in our Universe (Groth & Peebles 1977).It has long been applied to various types of observed galaxies at different redshifts (e.g.Zehavi et al. 2005;Yang et al. 2005b,a;Padmanabhan et al. 2009;Wang et al. 2011;Li et al. 2012;Guo et al. 2014), and have led to significant discoveries, such as the measurement of the redshift space distortions (e.g.Zheng et al. 2013), of detection of the baryon acoustic oscillations (e.g.Percival et al. 2007).It has enabled halo occupations studies (e.g.Jing et al. 1998;Peacock & Smith 2000;Berlind & Weinberg 2002;Zheng et al. 2005;Zhai et al. 2021) and constraints on cosmological parameters (e.g.Jing et al. 1998;Cole et al. 2005;Zhai et al. 2023).
Corresponding author: Wenting Wang wenting.wang@sjtu.edu.cnIt is a widely used technique to achieve statistical comparisons between observational data and theoretical predictions (e.g.Wang et al. 2016).The cross correlations between galaxies and cosmic microwave background signals are also detected (e.g.Dong et al. 2021;Sun et al. 2022;Yao et al. 2023).
On much smaller non-linear scales, 2PCF has also been applied to quantitatively describe and study the clustering of stars.For example, on subparsec and parsec scales, it has been used to understand stellar multiplicity in star-forming regions (e.g.Joncour et al. 2017Joncour et al. , 2018;;Kamdar et al. 2021).
From subkiloparsec to kiloparsec (kpc) scales, it has long been recognized that our Milky Way (hereafter MW) halo stars are clumpy in both real and velocity space, which are contributed by stripped stars from infalling smaller satellite galaxies or globular clusters (e.g.Helmi et al. 1999;Yanny et al. 2000;Newberg et al. 2002;Yanny et al. 2003;Ibata et al. 2003;Belokurov et al. 2006a,b;Newberg et al. 2007;Starkenburg et al. 2009;Xue et al. 2011).Stars stripped from the same pro-genitor satellite galaxy maintain similar orbits initially (stellar streams), which then become more phase-mixed at later stages.2PCF can be used as a powerful statistical tool to quantify the phase-space clustering of MW halo stars and serve as the target for comparisons between observed MW halo stars and MW-like galaxies in numerical simulations.
In an early study, Cooper et al. (2011) studied mock halo star samples constructed from numerical simulations (Cooper et al. 2010), using 2PCFs they compared the mock with the 2PCF of observed blue horizontal branch (BHB) stars from SDSS.The simulation has no realistic disc population, but it was shown that reasonable agreement can be achieved with the signals of observed BHB stars, after placing the mock observer in a realistic position of a mock heliocentric frame and after including observational errors.In particular, their 2PCF was calculated over a 4-dimensional space formed by spatial coordinates plus the radial velocity.A metric was introduced to guarantee the same unit and similar scale between positions and the radial velocity.
With the launch of Gaia and other ground based surveys, more observational data is available.As a consequence, more and more phase-space substructures are being detected (e.g.Shipp et al. 2018;Myeong et al. 2018;Yuan et al. 2020;Koppelman et al. 2019), including disequilibrium patterns in the Galactic disk (e.g.Antoja et al. 2018;Gardner et al. 2021).In a more recent study, Hinkel et al. (2023) computed 2PCF signals using Gaia data release 2 data to investigate the symmetrybreaking pattern in the orthogonal and radial directions of the Galactic disk.Extensive wavelike structures are observed, indicating the system is not in steady state.Such symmetry-breaking patterns were first observed in stellar number counts of Widrow et al. (2012).
It has been recognized that MW-like galaxies can have varied accretion histories, and the accretion events are imprinted in the phase-space distribution of halo stars at today (e.g.Majewski 1993;Deason et al. 2016;D'Souza & Bell 2018a,b;Helmi et al. 2018;Belokurov et al. 2018).Since the 2PCF can quantify the phase-space clustering of halo stars statistically, it is promising to use the 2PCF to probe and see whether we can distinguish between the mass accretion or assembly histories, when comparing galaxies with different 2PCF signals at today in numerical simulations and comparing the MW to simulations.
Based on the assumption that the mass assembly histories of galaxies can be distinguished from the 2PCF of accreted halo stars at z = 0, Lancaster et al. (2019) computed the 2PCF using the 3-dimensional coordinates of RR Lyrae from the Catalina Real-time Transient Sur-vey (Drake et al. 2009, CRTS) and the Panoramic Survey Telescope and Rapid Response System (Chambers et al. 2016;Flewelling et al. 2020, Pan-STARRS1).By comparing the measured 2PCF of observed RR Lyrae with MW-like simulations (Bullock & Johnston 2005), Lancaster et al. (2019) picked up those simulated systems that have similar 2PCF signals as the sample of RR Lyraes.These simulated systems are thought to most closely represent the assembly histories of our MW Galaxy.
However, it is still not clear whether the mass assembly histories can be quantitatively distinguished from the 2PCFs of halo stars.The growth of our MW stellar halo is a non-linear process, so the measured 2PCF signals often cannot be well fit by a unique functional form (e.g.Lancaster et al. 2019).Most of the previous studies relying on numerical simulations to interpret the observed signals are empirical.
In this study, we use the TNG50-1 hydrodynamical simulation to investigate whether we can establish quantitative connections between the 2PCF signals and the assembly histories of MW-mass galaxies.We investigate both the real and velocity space 2PCF signals, as a function of the galactocentric radii.We will show definite clues for the correlations between the 2PCF signals and galaxy assembly histories or formation times, but the scatters are very large, which prevent precise inference of the galaxy or halo formation times from the 2PCF signals of halo stars.
The layout of this paper is as follows.We introduce the TNG50-1 simulation, our selections of MW-like galaxies and halo stars in Section 2. We introduce our method of calculating the 2PCFs in Section 3. Results will be presented in Section 4, and we summarize and conclude Section 5.

DATA
Our sample of MW-mass galaxies is selected from the TNG50-1 (Nelson et al. 2019a,b;Pillepich et al. 2019) simulation of the IllustrisTNG Project.The Il-lustrisTNG simulations are a suite of hydrodynamical simulations incorporating sophisticated baryonic processes, carried out with the moving-mesh code (arepo; Springel 2010) to solve the equations of gravity and magneto-hydrodynamics.They include comprehensive treatments of various galaxy formation and evolution processes, such as metal line cooling, star formation and evolution, chemical enrichment and gas recycling.For more details about TNG, we refer the readers to Marinacci et al. (2018); Naiman et al. (2018); Nelson et al. (2018); Pillepich et al. (2018); Springel et al. (2018); Nelson et al. (2019a).The TNG suites of simulations adopt the Planck 2015 ΛCDM cosmological model with Ω m = 0.3089, Ω Λ = 0.6911, Ω b = 0.0486, σ 8 = 0.8159, n s = 0.9667, and h = 0.6774 (Planck Collaboration et al. 2016).A total of 100 snapshots are saved between redshifts of z = 20 and z = 0, with the initial condition set up at z = 127.TNG50-1 is the simulation with the highest resolution in its suites, and hereafter we refer to it as TNG50.It has a periodic box with 35 Mpc/h on a side that follows the joint evolution of 2160 3 dark matter particles and approximately 2160 3 baryonic resolution elements (gas cells and stellar particles).Each dark matter particle has a mass of 3.1 × 10 5 M ⊙ /h, while the baryonic mass resolution is 5.7 × 10 4 M ⊙ /h.
Based on the measurement of Licquia & Newman (2015), our MW-mass systems are selected to be those central galaxies in the stellar mass range of 6.08 × 10 10±0.15M ⊙ from TNG50.Moreover, in order to eliminate systems undergoing major mergers that are strongly deviating from steady state, we require the magnitudes of these central galaxies should be at least 0.5 brighter than their satellites.In the end, we select 84 MW-like galaxies 1 .For each MW-mass galaxy in our sample, we select only ex-situ formed stars 2 for our analysis, and we call them halo stars or halo star particles hereafter.Moreover, we exclude star particles which are still bound to surviving subhalos or satellites.We use the stellar assembly catalogue provided by the TNG website (Rodriguez-Gomez et al. 2016, 2017) to check whether each star particle is ex-situ or in-situ formed.This catalogue is constructed by tracking the baryonic merger trees and it is used to determine whether a stellar particle was formed outside of the "main progenitor branch" of a given galaxy.If true, it is considered as an ex-situ star particle.Otherwise, the star particle is tagged as an in-situ star particle.

METHODOLOGY
With a continuous density field, the 2PCF, ξ, is defined as 1 A set of MW/M31-like galaxies have been selected and described by Pillepich et al. (2023), which are publically available at https://www.tng-project.org/data/milkyway+andromeda/.The allowed stellar mass range adopted for the selection is larger than the one used in our selection to cover the mass for M31, and as we have checked, most of our MW-like galaxies are included in the TNG MW/M31-like sample.In this study, we choose to focus on our smaller sample of MW-like galaxies. 2Ex-situ stars are formed by accreting smaller galaxies, while insitu stars are formed by gas cooling.
where the average goes over the spatial coordinates, x, and δ = ρ ⟨ρ⟩ − 1 is the density contrast, which describes the excess in the local density, ρ, with respect to the mean density, ⟨ρ⟩.
For a discrete density field, such as the spatial distribution of discrete galaxies or stars, the 2PCF can be equivalently written as the following form (2) ξ(∆x) describes the excess probability, with respect to a uniform distribution, of finding population 2 objects (denoted by lower index 2) with a separation of ∆x to and in a volume of dV 2 around a population 1 object, denoted by lower index 1 and occupying volume dV 1 .n1 and n2 are the average number densities of the two populations.dn 12 on the left hand side of Equation 2is the pair count of population 1 and 2 objects with spatial separation of ∆x.When population 1 (D 1 ) and population 2 (D 2 ) are identical, the 2PCFs are simply auto correlations (D 1 = D 2 = D), and in our analysis of this paper, we calculate the auto correlations of ex-situ halo stars.
In practice, the 2PCF is estimated from the pair counts of data points (in our case the star particles) and random points, with the random points generated following the spatial distributions of data points.In our analysis, we adopt the Landy-Szalay estimator (Landy & Szalay 1993) of the following form to calculate the auto correlations Here D stands for the sample of star particles from TNG50, and R refers to a random sample.DD, DR and RR are the total number of all possible pair counts between star particle pairs, between star particle and random point and between random pairs with a 3dimensional separation of ∆x.Throughout this paper, we use r to denote the galactocentric radii, and ∆x to denote the spatial separations between the pair counts, ∆x The random sample should trace the large scale spatial distribution of star particles, but should be free of substructures.In our case, the random sample has the same radial distribution or radial number density profile as the real halo star particles.To create the random sample, we adopt two different approaches.First, we follow the approach of Lancaster et al. (2019) which fit a double power law function to the radial number density profiles of halo stars in each galaxy.Based on the best-fitting double power law function, we generate random points which follow this radial distribution with random angular distributions.In the second approach, we randomly shuffle the position angles of different halo star particles.This is achieved by randomly assigning the position angle of another particle to the current particle.The two approaches lead to very similar results, and throughout this paper, we show results based on the second approach.
When calculating the 2PCF signals in bins of different galactocentric distances, stars are binned radially first and then the 2PCF signals are calculated only among the radially binned data.The edge effect of each radial bin can be naturally accounted for, as long as the star particles and random points have exactly the same spatial coverage or bin boundary.When calculating velocity space 2PCFs in bins of different galactocentric distances, the steps are the same, but for star particles and random points in each bin, the pair counts in Equation 3are calculated with pair separations defined in velocity space.
With the random sample generated following the same radial distribution as the halo stars, our measured 2PCF signals mainly represent the strength of the clumpyness in halo stars at different galactocentric radii and on different scales or pair separations.Since star particles bound to surviving subhalos or satellites are excluded, the clumpyness in our sample of halo stars reflects the existence of stellar streams.The streams approximately maintain their phase-space clusterings after getting tidally stripped from their progenitor satellite galaxies (e.g.Küpper et al. 2012;Bovy 2014;Palau & Miralda-Escudé 2023).For recently stripped stellar streams, which are dynamically cold and more concentrated, their 2PCF signals will be significantly greater than unity.Our measurements of the 2PCF signals can potentially tell, on which galactocentric radii and scales are the clumpyness more dominant.Moreover, since the phase-space clustering of stellar streams becomes weaker at later times, the strength in the 2PCF signals may correlate with the time or redshift when the merger event happens.In other words, it may contain information about the mass assembly histories of the host galaxy system.
Throughout our analysis in this paper, we adopt the open source code, corrfunc (Sinha & Garrison 2019;Sinha & Garrison 2020) for the ex-situ stellar mass growth of our MW-mass galaxies selected from TNG50.Here z 1/2 is defined as the redshift at which the galaxy has assembled half of its ex-situ stellar mass at z = 0. Based on the distribution, we divide galaxies into three bins according to z 1/2 < 0.6, z 1/2 = 0.6 − 1.2 and z 1/2 > 1.2 for analysis in later figures.The bin boundaries are denoted by the two black vertical dashed lines.We have similar numbers of galaxies in the three bins of z 1/2 .
to measure clustering statistics.It supports parallel calculations and thus enables very efficient counting of the above pairs (DD, DR and RR) over different pair separations or scales (∆x) in Equation 3.   (orange) and z 1/2 > 1.2 (green).Two columns refer to pair separations of ∆x = 1−2.5 kpc and ∆x = 2.5−5 kpc used to calculate the correlation functions, as indicated by the text on top.The two top panels show results within galactocentric radii of ∼ 0.3R200, with six equally spaced bins in log space.This is to highlight the results within ∼ 0.3R200.The two bottom panels show the results out to ∼ 0.6R200 with broader radial bins, to show that the amplitudes of the 2PCF do not change monotonically with z 1/2 beyond r ∼ 0.25R200.
The trend is clear in the top panels and on small scales of the bottom panels, in terms that for galaxies with later z 1/2 , the clustering amplitude in their mean two point correlation functions is stronger.Error bars are calculated from the 1-σ scatters of 100 sub samples bootstrapped over all halo star particles in each galaxy.
and green curves, respectively.For a given symbol at a given r, the signal tells the excess probability of finding pairs of halo stars over the given scales, compared with random distributions.The error bars are based on the 1-σ scatters of 100 bootstrapped sub samples.Overall, the clustering of halo stars, as revealed by the 2PCF signals, increases with the decrease in ∆x for most of the systems shown in Figure 1.Note, however, the softening scale of TNG50-1 is ϵ = 290 pc at z = 0 (Nelson et al. 2019a;Pillepich et al. 2018), which is greater than the lower bin boundary of the first ∆x bin (0.1 − 1 kpc).As a result, the 2PCF signals based on ∆x = 0.1 − 1 kpc (blue curves in Figure 1) might be affected by the resolution.Thus hereafter, we will focus our discussions on the 2PCFs averaged over the signals measured with ∆x = 1 − 2.5 kpc and ∆x = 2.5 − 5 kpc, but given the small difference between the different color curves in Figure 1, our conclusions are not affected even if we include the signals based on ∆x = 0.1 − 1 kpc or if we present results based on the three different ∆x separately.Moreover, we note the choice of ∆x does not have conformity when calculating the 2PCFs for stars.In a previous study, Hinkel et al. (2023) have performed detailed investigations on different bin width and the limiting scale length that can be probed, showing how the 2PCF signals change with the pair separations.In our analysis, the 2PCFs are computed with relatively wider pair separations than those explored in Hinkel et al. (2023).However, our 2PCFs do not show strong changes with different choices of ∆x, and in most cases the 2PCF signals increase monotonically with the decrease in ∆x with reasonable errorbars, and thus our conclusions barely change if we vary the bin width of ∆x within a reasonable range.
The 2PCF signals prominently increase with the increase of galactocentric radii.This is because at larger radii, halo stars were stripped from their progenitor satellites more recently, which preserve better the initial clustering and hence have stronger 2PCF signals.At later times, satellite galaxies and stellar streams fall to more central regions of the host dark matter halos due to dynamical frictions, and they also gradually become less clustered in phase space.Thus the 2PCF signals are weaker at smaller galactocentric radii.
A few systems in Figure 1 show non-monotonic changes in their 2PCF signals as a function of r, with prominent dips and bumps, such as the upper middle, middle middle and the lower right panels.This is due to the existence of some local massive and dynamically cold streams, which contribute to the local bumps or increases in the 2PCF signals.For example, we show in Figure 2 the scatter plot of radial velocity, v r , versus the galactocentric distance, r, for halo stars in the galaxy referring to the bottom right panel of Figure 1.Particles from a few most massive progenitor satellites are marked in color.Particles with the same color come from the same progenitor.By comparing the bottom right panel of Figure 1 and Figure 2, we can see when the 2PCF signals increase towards r ∼ 0.1R 200 and r ∼ 0.6R 200 , there are also very prominent and coherent streams.
Note when selecting our sample of MW-mass galaxies, we have required that the central galaxies should be at least 0.5 mag brighter than their satellites.This has eliminated some systems which are undergoing major mergers, hence removed systems with dips and bumps in their 2PCF signals.However, a few systems satisfy the selection criteria at z = 0, but have just undergone prominent merger events.The merged massive satellite has been mostly or completely disrupted, so the system passed our selection, but the resulting massive and dynamically cold streams cause such prominent dips and bumps in their 2PCF signals.The galaxy formation time of such systems, as we have checked, are on average more recent.Since we need enough number of MW-mass systems to cover a wide range of different mass assembly histories, we do not explicitly remove such systems from our sample.

Connection between the real space 2PCF and the stellar mass assembly history
We now investigate how are the 2PCF signals related to the stellar mass assembly histories of galaxies.We first perform an estimate of the general trend.In Figure 3, we show the distribution of the galaxy formation time, z 1/2 , for our sample of MW-mass galaxies.Here z 1/2 is defined as the redshift when the galaxy has ac-creted half of its total ex-situ stellar mass at z = 0. Note our mass assembly histories and formation times in this paper are all based on only ex-situ stars, not including in-situ stars.We make this choice because the phase-space clustering of accreted halo stars is expected to correlate more with the formation times of also the accreted part of stellar mass.In addition, we have checked that including in-situ stars would weaken the correlation.According to the distribution of z 1/2 shown in Figure 3, we divide our sample of galaxies into three bins, z 1/2 < 0.6, z 1/2 = 0.6 − 1.2 and z 1/2 > 1.2.
For galaxies even in the same bin of z 1/2 , their 2PCFs show very large scatters, and thus instead of showing the results for individual galaxies, we show in Figure 4 the mean 2PCF signals averaged for galaxies in the three bins of z 1/2 (different color curves, see the legend).For the bottom panels, we show the 2PCFs out to ∼ 0.6R 200 , while the upper panels highlight the signals within ∼ 0.3R 200 .In other words, the only differences between the top and bottom panels are their x-axes ranges and the binning in galactocentric radii.The two panels in the same row refer to two different pair separations, ∆x, as indicated by the text on top.Within ∼ 0.25R 200 , we can see the averaged signals show significant monotonic trend, in terms that galaxies with later z 1/2 tend to have stronger 2PCF signals on average.This is because for galaxies which assemble late, the accreted stars have less time to be phase mixed, hence resulting in stronger phase-space clustering and 2PCF signals.However, beyond ∼ 0.25R 200 , the trend is no longer monotonic.The orange points have the lowest amplitude over ∼ 0.25 − 0.5R 200 , whereas it has the highest amplitude at ∼ 0.6R 200 , despite the fact that the corresponding z 1/2 bin for the orange curve is in between of the blue and green curves.We show in Figure 5 the 2PCF signals measured at different galactocentric radius bins, versus the galaxy formation times.Here we choose to show the galaxy formation times defined in a few different ways.Along the y-axis, z 1/2 , z 1/4 and z 3/4 refer to the redshifts when the galaxy has accreted half, one-fourth and three-fourths of its total ex-situ stellar mass at z = 0. ξ 1−6 refer to the 2PCF signals measured at the smallest to the largest galactocentric radii in the bottom panels of Figure 4, i.e., from ∼0.1R 200 to ∼0.6R 200 .Each data point refers to the 2PCF signal for one galaxy, and for the signal in a given galactocentric radius bin of each galaxy, we have averaged the 2PCF signals for the measurements based on ∆x = 1 − 2.5 kpc and ∆x = 2.5 − 5 kpc.The numbers in each panel indicate the Spearman correlation coefficients.We can see z 1/4 , z 1/2 and z 3/4 all show the strongest correlations with the 2PCF in the first and second columns, i.e., r <∼ 0.2R 200 .This is consistent with Figure 4.In the bottom panels of Figure 4 we can see for galaxies in bins of later formation times, they on average have stronger 2PCF signals only within ∼ 0.25R 200 .Now we show that the Spearman correlation coefficients are also the strongest (most negative) within ∼ 0.2R 200 .At larger radii, the correlation becomes weaker, and we fail to see monotonic trends between the 2PCF signal strength and the formation times in Figure 4 as well, i.e., the green curves with the earliest formation times have higher amplitudes than the orange curve with the intermediate formation times in a few outer data points.
For a direct comparison, we show in Figure 6 the Spearman correlation coefficients as a function of galactocentric radii, and out to r ∼ 0.6R 200 .Cyan, magenta and yellow curves refer to the correlation coefficients between the 2PCF signals and z 1/4 , z 1/2 and z 3/4 , respectively.z 3/4 quantifies the fraction of mass accreted at later redshifts than those quantified by z 1/2 or z 1/4 .As a consequence, z 3/4 shows the strongest correlations (the most negative) with the 2PCF signals than those of z 1/2 or z 1/4 at every radii.This is expected, because for halo stars that are accreted later, they on average preserve better phase-space clusterings, causing stronger correlations between the 2PCF signals and the galaxy formation histories.The correlations are also stronger for z 1/2 than that of z 1/4 within 0.36R 200 , but beyond 0.36R 200 the trend switches.This is perhaps because the ma-jority of mass as quantified by z 1/4 is currently in the central regions, resulting in weaker correlations between z 1/4 and the 2PCF signals at large galactocentric radii.
Figures 4, 5 and 6 all show that the correlation between the galaxy formation times and the 2PCF signals is the strongest at r <∼ 0.2R 200 .This is because the number of halo stars is larger in more central regions.The density profiles of accreted halo stars drop very quickly with the increase in galactocentric radii, so particles on smaller scales dominate the determination of the global galaxy formation times.
We can see in all panels of Figure 5, however, the scatters are large.We then adopt a single power law functional form to fit the 2PCF signals within 0.3R 200 as a function of the galactocentric radii and for each individual galaxy.In Figure 7, we show the best-fitting amplitude (α) and power-law index (β) versus z 1/2 .Each dot refers to one galaxy, and we have averaged the results over the measurements with ∆x = 1 − 2.5 kpc and ∆x = 2.5 − 5 kpc.It is clear that with later z 1/2 , α and β get larger, but there are large scatters.The scatter is mainly due to intrinsic reasons which we discuss below, but might be partly related to the reason that some of the 2PCF signals as a function of the galactocentric radii cannot be perfectly fit by single power laws.
Figures 4, 5, 6 and 7 show us that there are correlations between the mass assembly history of galaxies and the 2PCF signals of accreted halo stars.However, the amount of scatters is not negligible.The scatter reflects the limiting precision of using 2PCFs to constrain the mass assembly histories of galaxies.In other words, the phase-space clustering of halo stars is also affected by many other aspects, such as the infalling orbits of the progenitors.Besides, halo stars which are fully phase mixed might have lost the information about the accretion event.In order to fully understand this, we move on to carry more detailed investigations between the 2PCF signals and the mass assembly histories.We calculate the χ 2 between the 2PCF signals of every possible combinations of two galaxies, and check for galaxies with very similar or very different 2PCF signals of their halo stars, how similar or how different are their mass assembly histories.
We calculate the χ 2 values of the 2PCF signals between every two different galaxies in our sample based on the following equation: where the summation goes over the 2PCF signals at different galactocentric radius bins, denoted by i.In the denominator, σ galaxy1 and σ galaxy2 are the errors in the 2PCFs of the two galaxies, which are computed from the 1-σ scatters of 100 bootstrapped sub samples.
Figure 8 shows five examples, in which the 2PCF signals of the two galaxies under consideration in each panel are similar to each other (bottom row).Given the similarity in their 2PCF signals, the actual mass assembly histories are similar.Note for the mass assembly histories in the top row, only ex-situ stellar masses are used.The sharp changes in the mass assembly histories would not exist, if we plot them using the total stellar mass, but the correlations between the trends in mass assembly histories and the 2PCFs become weaker after including the in-situ formed stellar mass.
We can see some reasonable correlations between the top and bottom panels of Figure 8.For example, in the first and second columns, for the galaxies which have later z 1/2 (blue curve), the corresponding 2PCF signals are higher in amplitude than the other one (orange) beyond ∼ 0.4R 200 .However, the trend switches or becomes less clear at smaller radii, in terms that the galaxies with later z 1/2 (blue) have lower or similar 2PCF signals to the other one on such smaller radii.This is perhaps because the other galaxy (orange) has more matter assembled at earlier redshifts4 , which sink to smaller galactocentric radii due to dynamical friction, piling up more over radial scales of 0.1 to 0.35 R 200 and hence increasing the 2PCF signals there.Note in previous figures we have shown that z 1/2 on average has stronger correlations with the 2PCF signals at smaller radii, but now the two individual cases in the first and second columns of Figure 8 show that the 2PCF signals at larger radii beyond ∼ 0.4R 200 is more consistent with the difference in z 1/2 .This again reflects large system to system scatters in our analysis.
In the fourth and fifth columns, for the galaxy which assemble later (the blue curve) and hence has a much later formation time than the other one, we can see the corresponding 2PCF signals are higher in amplitudes, with the blue curves having higher amplitudes than the orange ones.However, in the fourth example, we may expect a larger difference in the 2PCF signals given the more prominent difference in their assembly histories or formation times in the upper panel.
The third column of Figure 8 is a relatively bad example.The mass assembly histories of the two galaxies Different color curves are averaged 2PCFs of MWmass galaxies in three bins of formation time (z 1/2 , see the legend).To distinguish from the real space 2PCFs, we denote the y-axis with ξ vel .The two panels refer to two different pair separations of halo stars in velocity space (∆v), as indicated by the text on top.Error bars are calculated from the 1-σ scatters of 100 sub samples bootstrapped over all halo star particles in each galaxy.We choose not to show the results out to larger galactocentric radii, as the correlations between the 2PCF signals and the formation times are very weak there.
nificantly different.In particular, the fifth example is an extreme case.The orange galaxy has a much stronger 2PCF signal than the other galaxy, but it forms earlier.
The examples presented in Figures 8 and 9 demonstrate the large scatter when using the 2PCF signals to infer the mass assembly history or formation time, which is likely determined by several factors such as the infalling orbits of the progenitor satellites.Nevertheless, the general trends are reasonable in most cases.In most cases, the galaxy which assembles earlier also has lower 2PCF signals, except for the third column of Figure 8 and the fifth column of Figure 9.For the fifth case of Figure 9, in fact halo stars in the galaxy with later formation time has significantly more radial orbits than those in the other galaxy, causing much faster phase mixing and thus lower amplitude in its 2PCF signal.

Connection between the velocity space 2PCF and the stellar mass assembly history
Our results in the previous subsections show that the real space 2PCF signals correlate with the stellar mass assembly histories of MW-mass galaxies, but the scatters are considerably large.In this subsection, we investigate correlations in velocity space.We check whether the velocity space 2PCF can provide more information.
Figure 10 is similar to the top panels of Figure 4.The x-axis is the galactocentric radius, r, scaled by the virial radius, R 200 .The blue, orange and green curves are the average 2PCF signals for galaxies in three bins of z 1/2 .We calculate the 2PCF in velocity space.The two columns refer to two different pair separations in velocity space, as indicated by the text on the top of each panel.Note the choice of the pair separations in velocity space (∆v) is less well defined in the literature, and here we have carefully tried many different choices of ∆v, and the chosen values of ∆v are determined by the cases when we can see the most prominent monotonic trend that the 2PCF signal strength increases with the decrease in z 1/2 .
The 2PCF signal amplitudes only monotonically increase with the decrease in z 1/2 for the second and third data points counting from the center, but not at other radii.Compared with Figure 4, it seems that the correlations in the velocity space 2PCF and the galaxy formation times are weaker than those revealed in the real space 2PCF signals.We do not show the 2PCF signals calculated out to larger galactocentric radii as in the bottom panels of Figure 4, because on such large scales the trends are not monotonic.
We also show the correlations between the velocity space 2PCF signals at different galactocentric radii and the galaxy formation times in Figure 11.The Spearman correlation coefficients are indicated in each panel of Figure 11.Compared with the real space 2PCF (Figure 5), the Spearman correlation coefficients get closer to zero for ξ vel,1−3 .In order to have direct comparisons, we plot the Spearman correlation coefficients based on real space and velocity space 2PCF signals together in Figure 12, as a function of the galactocentric radii.Comparing the blue and orange solid curves, we can see the real space 2PCF signals show stronger (more negative) correlations with the galaxy formation times at r <∼ 0.3 − 0.35R 200 than the velocity 2PCF signals.This is consistent with the fact that we see weaker correlations in Figure 10 than Figure 4.At r > 0.35R 200 , the velocity space 2PCF signals show slightly stronger correlations.We thus conclude that the velocity space clustering is slightly better preserved at large galactocentric radius (r >∼ 0.35R 200 ) than the real space correlation functions.However, the scatters in Figure 5 are very large, and thus using velocity space 2PCF does not help to significantly improve the accuracy of using real space 2PCF signals to predict the galaxy assembly histories.

Connection between the 2PCF and the halo mass assembly history
So far, we have investigated the correlation between the real and velocity space 2PCFs and the assembly history of the ex-situ stellar mass for MW-mass galaxies.=-0.17 .Spearman correlation coefficients between the galaxy formation times and the 2PCF signals at different galactocentric radii (y-axis), reported as a function of the galactocentric radii (x-axis).Panels from the left to the right refer to formation times defined as the redshifts when the galaxies or host dark matter halos have accreted one-fourth, half and three-fourths of their total accreted mass at today (z 1/4 , z 1/2 and z 3/4 ).In all panels, blue/green and orange/red curves are based on the 2PCF signals calculated in real and velocity space, respectively.Blue and orange curves are based on the formation times of the galaxies, while green and red dashed curves are based on the formation times of the host dark matter halos.
We can also study the connection between the 2PCFs of halo stars and the assembly history of the host dark matter halo.Since both the host dark matter halos and the halo stars form through accretions of smaller halos and galaxies, and dark matter leaves imprints on the visible matter distribution (e.g.Gardner et al. 2021;Nadler et al. 2021b,a), we expect the phase-space clustering and the 2PCF signals of halo stars carry information about the assembly history of the host dark matter halos.In this subsection, we examine how the 2PCF signals of halo stars are related to the mass assembly history of the host dark matter halo.
In Figure 12, we show the correlation coefficients based on the real space and velocity space 2PCFs, and based on the formation times of ex-situ halo stars in galaxies and the host dark matter halos together.The green and red dashed curves use the formation times of the host dark matter halos, to be compared with the blue and orange solid curves.The green and red dashed curves show negative correlations between the halo star 2PCF signals and the halo formation times.Moreover, the velocity space 2PCFs do not show significant differences in their correlations with the galaxy or halo formation times, given the fact that the orange solid and red dashed curves show similar trends and amplitudes.This tells us that the velocity space 2PCF signals have similar performance in predicting the galaxy or halo formation histories.However, the blue solid and green dashed curves (real space 2PCFs) are more different.At r < 0.3R 200 , the blue solid curves are significantly below the green dashed curves, implying stronger negative correlations between the real space 2PCF signals and the galaxy formation times than that of the halo formation times.At r > 0.3R 200 , the blue solid and green dashed curves become more similar, with the green dashed curves slightly below the blue solid ones in the middle and right panels.Our results show that the clustering or 2PCF signals of halo stars are correlated with both the galaxy and halo assembly histories.In inner regions (r < 0.3R 200 ) the real space 2PCF has a stronger correlation with the galaxy formation time than the halo formation time.

DISCUSSIONS AND CONCLUSIONS
The two point correlation function (2PCF) has been used to measure the spatial clustering of galaxies and dark matter halos, in real observation and in numerical simulations.In addition, the cross correlations between different types of galaxies, between galaxies and dark matter, and between galaxies and other multiwavelength observations are powerful statistical tools to help understanding galaxy formation physics and cosmology.
While 2PCF has been particularly useful to study galaxies and cosmology, there are also attempts of using 2PCF statistics to study the clustering of stars on subparsec to kiloparsec scales (e.g.Joncour et al. 2017Joncour et al. , 2018;;Kamdar et al. 2021;Cooper et al. 2011;Lancaster et al. 2019).However, unlike the clustering of galaxies, the physical mechanisms that drive the clustering and dissolving of stars are non-linear, and thus, the interpretation of the 2PCF signals of stars is often empirical.
In this study, we investigate the connection between the 2PCF signals of accreted halo stars in Milky Waymass galaxies on kpc scales and the assembly histories of these galaxies.We use the cosmological hydrodynamical TNG50 simulation.Halo stars belonging to any surviving subhalos/satellites are not used, and our 2PCF mainly reflects the strength of the clumpyness in halo stars due to the existence of stellar streams.We investigate the 2PCF signals in both real space and velocity space, and at different galactocentric radii.In general, the 2PCF amplitude increases with the decrease in pair separations, and increases proportionally with the increase in galactocentric radii, because the clustering of phase-space structures preserve better on smaller pair separations and on larger galactocentric radii.In addition, the existence of dynamically cold and massive stellar streams can cause local increases in the 2PCF amplitudes.
We find evidences that there are correlations between the 2PCF signals and the galaxy formation times.We define the redshifts at which the galaxy accreted onefourth, half and three-fourths of its total ex-situ stellar mass at z = 0, as z 1/4 , z 1/2 and z 3/4 .All formation times show negative correlations with the 2PCF signals, with the strongest correlations at ∼ 0.2R 200 .For larger galactocentric radii, the correlations become gradually weaker.z 3/4 shows the strongest correlations with the 2PCF.However, the correlations show a large scatter.
By looking at individual cases, we can see that for galaxies with similar 2PCF signals, their mass accretion histories can be either similar or different, but we observe that for the galaxies with later formation times, the 2PCFs are more likely to be higher in amplitude.We also see cases when the 2PCF signals are very different, the mass accretion histories are not significantly different.We show an extreme example of one galaxy with earlier formation time has significantly higher 2PCF signals than the other.These all suggest that although 2PCF signals correlate with the mass accretion history, the scatters are large.
We investigate the 2PCF signals in velocity space, and find similar trends as the real space 2PCF.The velocity space 2PCF signals show slightly stronger correlations with the formation times beyond ∼ 0.3 − 0.35R 200 .On the other hand, the real space 2PCF signals show better correlations within 0.3R 200 .This indicates that the velocity space clustering are slightly better preserved at larger galactocentric radii.However, the velocity space 2PCF signals carry on average less information than the real space 2PCF.Therefore, the velocity space 2PCF will not significantly increase the accuracy determining the galaxy assembly histories.
The real and velocity space 2PCF signals of halo stars also correlate with the assembly histories of their host dark matter halos, though still with large scatters.Within 0.3R 200 , the real space 2PCF shows stronger correlations with the galaxy formation histories than with the halo formation histories, while the velocity space 2PCFs do not show large differences in their correlations with the galaxy or halo formation times.It indicates that the velocity space 2PCF signals have similar performance upon predicting the galaxy or halo formation histories.
We emphasize that throughout our analysis, observational errors are not incorporated.Real observational errors may further act to weaken the correlations between the 2PCF signals and the mass assembly histories.
We conclude that it is difficult to use 2PCF statistics alone to precisely predict the formation times or assembly histories of galaxies or their host dark matter halos.However, higher order statistics such as three point correlation functions might further help to capture the nonlinear process, and improve the precisions in predicting galaxy assembly histories.

Figure 1 .
Figure1.Examples of two point correlation functions (2PCFs) for accreted halo stars of nine randomly chosen MW-mass systems from TNG50.Stars bound to satellites are not used.The x-axis refers to the distance to galactic center in unit of the virial radius of the host dark matter halos, R200.Curves with difference colors refer to three different pair separations of the 2PCFs, ∆x = 0.1−1 kpc (dark blue), 1−2.5 kpc (red) and 2.5−5 kpc (green).In general, the 2PCF signals increase proportionally with the increase in the galactocentric radius, and with the decrease in ∆x.The local peaks are due to substructures such as massive and dynamically cold stellar streams.Error bars are calculated from the 1-σ scatters of 100 sub samples bootstrapped over all halo star particles in each galaxy.

Figure 2 .
Figure 2. Radial velocities, vr, versus the galactocentric radii, r, for the galaxy with ID=490079, i.e., the galaxy in the bottom right panel of Figure 1.Colors (yellow and cyan) refer to star particles from two most massive progenitor satellites.

Figure 3 .
Figure3.Histogram of the galaxy formation time, z 1/2 , for the ex-situ stellar mass growth of our MW-mass galaxies selected from TNG50.Here z 1/2 is defined as the redshift at which the galaxy has assembled half of its ex-situ stellar mass at z = 0. Based on the distribution, we divide galaxies into three bins according to z 1/2 < 0.6, z 1/2 = 0.6 − 1.2 and z 1/2 > 1.2 for analysis in later figures.The bin boundaries are denoted by the two black vertical dashed lines.We have similar numbers of galaxies in the three bins of z 1/2 .
2PCF in real space The 2PCF signals for the halo stars of nine randomly chosen MW-mass systems are shown in Figure 1.Unlike how the traditional 2PCFs are presented for galaxy clustering, here the x-axis is the galactocentric radius, r, normalized by the virial radius, R 200 , of the host dark matter halo 3 .Following Lancaster et al. (2019), we adopt three different pair separations of ∆x = 0.1 − 1 kpc, 1 − 2.5 kpc and 2.5 − 5 kpc to calculate the 2PCF, which are denoted by dark blue, red

Figure 4 .
Figure4.The averaged 2PCF signals (ξ) of halo stars in galaxies divided into three different bins according to their galaxy formation time, z 1/2 < 0.6 (blue), z 1/2 = 0.6 − 1.2 (orange) and z 1/2 > 1.2 (green).Two columns refer to pair separations of ∆x = 1−2.5 kpc and ∆x = 2.5−5 kpc used to calculate the correlation functions, as indicated by the text on top.The two top panels show results within galactocentric radii of ∼ 0.3R200, with six equally spaced bins in log space.This is to highlight the results within ∼ 0.3R200.The two bottom panels show the results out to ∼ 0.6R200 with broader radial bins, to show that the amplitudes of the 2PCF do not change monotonically with z 1/2 beyond r ∼ 0.25R200.The trend is clear in the top panels and on small scales of the bottom panels, in terms that for galaxies with later z 1/2 , the clustering amplitude in their mean two point correlation functions is stronger.Error bars are calculated from the 1-σ scatters of 100 sub samples bootstrapped over all halo star particles in each galaxy.

Figure 5 .
Figure5.Scatter plot showing the galaxy formation times defined in different ways (z 1/2 , z 1/4 and z 3/4 ) versus the 2PCF signals at different radial bins.Here z 1/2 , z 1/4 and z 3/4 refer to the redshifts when the galaxy has accreted half, one-fourth and three-fourths of its total ex-situ stellar mass at z = 0.The x-axis refers to the 2PCF signals in the six radial bins corresponding to the x-axis of the bottom panels in Figure4, i.e., ξ1 is the 2PCF signal at the smallest radial bin of the top panels in Figure4(r ∼ 0.1R200), and ξ6 is the 2PCF at the largest radial bin of the top panels in Figure4(r ∼ 0.6R200).Each dot refers to one galaxy, and we have averaged the 2PCF signals over the measurements based on ∆x = 1 − 2.5 kpc and ∆x = 2.5 − 5 kpc.ρ in each panel indicate the Spearman correlation coefficients.

Figure 6 .Figure 7 .
Figure6.Spearman correlation coefficients between the galaxy formation times and the 2PCF signals at different galactocentric radii.Cyan, magenta and yellow curves refer to the galaxy formation times, z 1/4 , z 1/2 and z 3/4 , respectively.

Figure 10 .
Figure10.Similar to the top panels of Figure4, but shows the two point correlation function signals in velocity space.Different color curves are averaged 2PCFs of MWmass galaxies in three bins of formation time (z 1/2 , see the legend).To distinguish from the real space 2PCFs, we denote the y-axis with ξ vel .The two panels refer to two different pair separations of halo stars in velocity space (∆v), as indicated by the text on top.Error bars are calculated from the 1-σ scatters of 100 sub samples bootstrapped over all halo star particles in each galaxy.We choose not to show the results out to larger galactocentric radii, as the correlations between the 2PCF signals and the formation times are very weak there.

Figure 11 .
Figure 11.Similar to Figure 5, but shows the results for 2PCF signals in velocity space.ξ1 to ξ6 refer to the 2PCF signals in six equal log spaced bins from ∼ 0.1R200 to 0.6R200.