A publishing partnership

The following article is Free article

RECONSTRUCTING THE ACCRETION HISTORY OF THE GALACTIC STELLAR HALO FROM CHEMICAL ABUNDANCE RATIO DISTRIBUTIONS

, , , and

Published 2015 March 20 © 2015. The American Astronomical Society. All rights reserved.
, , Citation Duane M. Lee et al 2015 ApJ 802 48DOI 10.1088/0004-637X/802/1/48

0004-637X/802/1/48

ABSTRACT

Observational studies of halo stars during the past two decades have placed some limits on the quantity and nature of accreted dwarf galaxy contributions to the Milky Way (MW) stellar halo by typically utilizing stellar phase-space information to identify the most recent halo accretion events. In this study we tested the prospects of using 2D chemical abundance ratio distributions (CARDs) found in stars of the stellar halo to determine its formation history. First, we used simulated data from 11 "MW-like" halos to generate satellite template sets (STSs) of 2D CARDs of accreted dwarf satellites, which are composed of accreted dwarfs from various mass regimes and epochs of accretion. Next, we randomly drew samples of ∼103–4 mock observations of stellar chemical abundance ratios ([α/Fe], [Fe/H]) from those 11 halos to generate samples of the underlying densities for our CARDs to be compared to our templates in our analysis. Finally, we used the expectation-maximization algorithm to derive accretion histories in relation to the STS used and the sample size. For certain STSs used we typically can identify the relative mass contributions of all accreted satellites to within a factor of two. We also find that this method is particularly sensitive to older accretion events involving low-luminosity dwarfs, e.g., ultra-faint dwarfs—precisely those events that are too ancient to be seen by phase-space studies of stars and too faint to be seen by high-z studies of the early universe. Since our results only exploit two chemical dimensions and near-future surveys promise to provide ∼6–9 dimensions, we conclude that these new high-resolution spectroscopic surveys of the stellar halo will allow us to recover its accretion history—and the luminosity function of infalling dwarf galaxies—across cosmic time.

Export citation and abstractBibTeXRIS

1. INTRODUCTION

The origin of the stellar halo has been a topic of intense study since the publication of the seminal paper by Eggen et al. (1962). The paper suggested that the stellar halo originated from the "monolithic collapse" of a galactic-sized primordial gas cloud. More specifically, they proposed that during this quick (≲100 Myr) collapse a very small portion of that metal-poor/free gas fragmented, owing to Jeans instabilities, and formed stars. While the bulk of the gas would eventually form the young, metal-rich, circularly orbiting, stellar disk of the Galaxy, these "halo" stars would instead be characterized as old, metal-poor stars on mainly radial orbits owing to the imprint of the cloud's initial collapse. When Eggen et al. (1962) proposed this theory, observations of the halo were restricted to small kinematic samples near the Sun—samples that lacked any features that might suggest that the halo was built over time via galactic mergers or accretion. However, a decade and a half later, Searle & Zinn (1978) stated in another seminal work that, in fact, some halo observations could be explained in another way. Their paper advanced the idea that differences in globular cluster abundance distributions versus galactocentric distances in the halo were due to the "hierarchical merging" of many smaller galactic systems over the lifetime of the Galaxy. As a consequence of hierarchical merging, the stellar halo was created metal-poor because most galactic progenitors of the halo were accreted early on, which, in turn, afforded stellar inhabitants of these accreted systems little time to enrich to higher metallicities. The theory also suggested that, while less abundant, a distribution of more metal-enriched stars and clusters should also inhabit the halo owing to mergers over time. Consequently, it was these mergers that led to the radial orbits of stars and clusters that were earlier seen and characterized by Eggen et al. (1962).

Also bolstering the theory of hierarchical merging was the development of the theories of the formation of structures within the cold dark matter paradigm (e.g., Efstathiou et al. 1985). These theories predicted that the continuous merging of galaxies was facilitated by the parallel growth of the dark matter halos that hosted or formed the backbones of those galaxies. As a consequence, hierarchical merger formation of the stellar halo is simply a manifestation of that growth at the galactic scale.

While cosmological theory supported Searle & Zinn's work, strong additional evidence for the theory of hierarchical merging came with the observations of halo substructure. In the early 1990s, Ibata et al. (1994) discovered the core of the Sagittarius dwarf galaxy in the outskirts of the stellar halo. Observations of this obviously "dying" satellite supported the assertion that stellar debris from the dwarf would follow the orbit of the accreted system. This debris would also disperse in phase space over time and contribute to the growth of the halo. Further evidence for hierarchical merging came from the Sloan Digital Sky Survey (SDSS; York et al. 2000). This state-of-the-art project was the first global survey of the halo to extend beyond a couple of kiloparsecs from the Sun. All previous deep surveys of the halo were done in pencil-beam mode—a mode that virtually guaranteed the omission of extended structures. Initial results from SDSS showed a halo teeming with photometric overdensities within ∼18 kpc from the Galactic center. This finding suggested that substructure was ubiquitous (Newberg et al. 2002). Majewski et al. (2003) found the tidal tails of Sagittarius wrapped around the Milky Way (MW) by observing M-giant overdensities in the halo. The "smoking gun" for hierarchical merging came in 2006, when a clear and distinct photometric picture of the halo from SDSS revealed newly discovered dwarf galaxies and, more to the point, tidal streams (i.e., substructure) from past mergers called the "field of streams" (Belokurov et al. 2006).

The SDSS discoveries of abundant substructure in the halo led to numerous dynamical studies. Some studies determined the membership of known objects (e.g., Majewski et al. 2005), while others discovered new objects by their dynamical overdensities in phase space (e.g., Schlaufman et al. 2009). Beyond SDSS lies the next generation of galactic halo surveys. From photometry (LSST), astrometry (Gaia), and high-resolution abundances (APOGEE and GALAH), we can expect to collect enough data for use in statistical analysis to actually answer some of the outstanding questions in Galactic astronomy.4 One outstanding question of great importance is, what is the merger history of the MW halo? With the aforementioned surveys soon at our disposal, we will have three ways of approaching this question.

A traditional photometric census of the halo (LSST) is only sensitive to mergers that are a few billion years old owing to the phase mixing of the projected phase-space dimensions of accreted structures (Sharma et al. 2010). Dynamical studies like Gaia should prove more successful in recovering accretion histories because these studies collect data that contain full 6D phase-space information. In fact, in principle, this information allows one to calculate orbital properties (i.e., integrals of motion) for a given potential. Since the integrals of motion for a static potential are conserved, it is possible to associate debris in orbital-property space even if the halo is fully phase mixed (Helmi & de Zeeuw 2000). However, for the outer halo (beyond 10 kpc), even Gaia cannot measure distances with sufficient accuracy, and this means that reconstructed histories of this depth (via astrometric data) are still incomplete. Furthermore, it is highly likely that rapidly occurring, violent mergers took place in the early assembly of the halo. Significant mergers of this nature should scatter normally conserved quantities in phase space, making the extraction of merger histories from earlier epochs harder and perhaps futile.

In the past decade, an understanding of the limitations to stellar phase-space data analysis has led to the promising pursuit of conserved quantities in stellar chemical abundance space—that is, stellar quantities that are more innate and, as such, cannot be changed by scattering in phase space. Unavane et al. (1996) were the first to demonstrate that such innate quantities could be fruitful by using a metallicity–color ([Fe/H]–(BV)) plane to select halo stars, which are similar in composition to existing metal-poor dSph satellite stars, to constrain the hierarchal buildup of the halo. Using this comparison, Unavane determined that the history of the halo cannot contain more than ∼60 Carina-like dwarf accretions or Fornax-like dwarf accretions. In an analogous proposal for the Galactic disk, Freeman & Bland-Hawthorn (Freeman & Bland-Hawthorn 2002; Bland-Hawthorn & Freeman 2004) suggested that measuring the detailed chemical composition of vast numbers of the stars in the Galactic disk might be used to recover their origins: those with identical compositions in high-dimensional abundance space are likely to have been born in the same star cluster. De Silva et al. (2007) observed that star clusters are chemically homogenous within error, while Bland-Hawthorn et al. (2010) confirmed that this homogeneity allows astronomers to track stars back to the natal clusters by "chemically tagging" these stars. Thus, "chemical tagging" could be used to reconstruct long-dead star clusters and recover the star formation history (SFH) of the Galaxy.

In this paper we explore whether a modified version of "chemical tagging" might be applied to the Galactic halo, expanding on the idea that Unavane et al. (1996) introduced over a decade earlier. Unlike stars born in the same cluster, stars born in the same dwarf galaxy do not share the same chemical composition. However, pioneering studies in the past decade have shown that stars in different dwarfs do have distinct (if overlapping) chemical abundance ratio distributions (CARDs; see, e.g., Nissen & Schuster 1997; Ivans et al. 1999; Shetrone et al. 2001, 2003; Venn et al. 2001, 2003; Fulbright 2002; Smecker-Hane & McWilliam 2002; Stephens & Boesgaard 2002; Gratton et al. 2003; Bonifacio et al. 2004; Cayrel et al. 2004; Kaufer et al. 2004; Geisler et al. 2005; Jonsell et al. 2005; Monaco et al. 2005; Johnson et al. 2006; Pompeia et al. 2008; Tautvaišienė et al. 2007). Figure 1 from Geisler et al. (2007) illustrates how these CARDs (revealed from a compilation of the aforementioned observations) tantalizingly suggest that such an attempt is possible.

Figure 1. Refer to the following caption and surrounding text.

Figure 1. Reproduction of Figure 12 from Geisler et al. (2007). The figure is a compilation of [α/Fe] vs. [Fe/H] data taken by Nissen & Schuster (1997), Ivans et al. (1999), Shetrone et al. (2001), Venn et al. (2001), Fulbright (2002), Smecker-Hane & McWilliam (2002), Stephens & Boesgaard (2002), Gratton et al. (2003), Shetrone et al. (2003), Venn et al. (2003), Bonifacio et al. (2004), Cayrel et al. (2004), Kaufer et al. (2004), Geisler et al. (2005), Jonsell et al. (2005), Monaco et al. (2005), Johnson et al. (2006), Pompeia et al. (2008), and Tautvaišienė et al. (2007). Symbols shown here represent a mixture of model data, stars and star clusters found in the MW halo (green), as well as stars and stellar clusters found in low-mass dwarf spheroidals (blue), dwarf irregulars (yellow), the Sagittarius dwarf galaxy (red), and the Large Magellanic Cloud (cyan). The distribution of accreted and "soon-to-be-accreted" systems in this 2D chemical abundance space demonstrates the potential for determining accretion histories by attributing various subsets of the CARDs observed in the stellar halo of a nearby galaxy (e.g., the MW halo) to different accreted systems (see the text for a brief explanation).

Standard image High-resolution image

Figure 1 is a reproduction of Figure 12 of Geisler et al. (2007) showing a 2D CARD plot of [α/Fe] (the ratio of the sum of α-elements [typically, Ca, Mg, Ti, and O] to Fe) versus [Fe/H]. The plot shows various different star and star cluster measurements of [α/Fe] and [Fe/H], which separate different parent or host systems into different parts of the 2D CARD space. Additionally, differences between different galactic systems at lower metallicities are also emerging for neutron-capture elements (e.g., strontium and barium). These observations suggest that

  • 1.  
    at a given accretion epoch, differences (in CARDs) between systems of the different stellar masses exist; and
  • 2.  
    at a given stellar mass, differences between systems that were accreted at different times exist.

In this paper, we develop a statistical approach (which uses the EM algorithm) to examine whether the CARDs of different-mass objects accreted at different times are sufficiently different to allow us to recover halo accretion histories using data alone. We test our method with the semianalytic models available from previous simulation work. In Section 2, we explain the nature of the models and methods used to produce accounts of accretion history from mock halo observations. In Section 3 we discuss the success of the EM algorithm when applied to specific cases. In Section 4 we describe the success of our results across our entire set of data. In Section 5 we discuss both the utility and reliability of applying this technique to real observations. In Section 6 we present our conclusions.

2. METHODS

We can approach the problem of recovering the accretion history of a galactic halo (using CARDs) by posing the following question: "How accurately can we determine the fraction of total stellar mass, Aj, contributed by satellites of various mass (Msat) and accretion time (tacc) to a stellar halo given a set of templates for the distribution of chemical abundances xd found in those satellites and observations of CARDs (f(xd)) in the stellar halo?" In this study, we attempt to answer this question by investigating realizations of the stellar halo by Bullock & Johnston (2005; see Section 2.1), which includes distributions of α- and iron (Fe) elements generated by the methods of Robertson et al. (2005) and implemented in the models by Font et al. (2006). To begin our investigation, we define our approach by recasting our question in the form of the following equation:

where

for m satellite templates with .

In Equation (1), f(xd) represents the probability density function (distribution) of observed "stars" in the d-dimensional CARD space () and Aj represents the relative contributions from each template fj. In a generic sense, each template fj represents the CARD for dwarfs of some characteristic mass Msat that were accreted at a characteristic time tacc. Hence, finding all Aj values corresponds to recovering the "accretion history profile" (AHP) of the galactic halo. Utilizing Equation (1) to address our question requires the following four steps.

  • 1.  
    Generate mock "observations" of CARDs (i.e., f(xd) in our case with [] = [[α/Fe],[Fe/H]]) for 11 realizations from simulations of purely accretion-grown halos (Section 2.2).
  • 2.  
    Create CARD templates () representing the density of stars in [α/Fe]-[Fe/H] space for satellites found in selected 2D bins of satellite mass and accretion time (Section 2.3).
  • 3.  
    Apply the expectation-maximization (EM) algorithm (a method for statistical estimation in finite mixture models; see Section 2.4) to observations using satellite templates to recover their relative contribution (i.e., Aj) to the host halo's stellar mass (Sections 3 and 4).
  • 4.  
    Evaluate the efficacy of this approach by using a summary statistic (Section 2.5) to encapsulate how accurate the method is in recovering the known accretion histories for each halo (e.g., see Section 4.2).

2.1. The Simulations

The simulations consist of 11 "MW-sized" halo realizations that involve a total of 1515 accreted satellites (with 100–150 satellites contributing to each halo) from Bullock & Johnston (2005). Each dark matter host of the 11 halo realizations has a total mass of Mvirial(z = 0) = 1.4 × 10 generated by merger trees using a statistical Monte Carlo method with an extended Press–Schecter formalism (Somerville & Kolatt 1999; Lacey & Cole 1993; Bullock & Johnston 2005, and references therein). Differences in the AHP between each halo are entirely based on the randomness in the merger trees.

CARDs for these 11 merger histories were generated from a semianalytic chemical enrichment code (Robertson et al. 2005) that was applied separately to each infalling satellite and combined with the simulations by Font et al. (2006). Since the enrichment code was implemented for each satellite generated, we can look at individual satellites to assess their relative contribution to their host halo's CARDs. Also, since the aim of this study is to determine the amount of information we can retrieve via chemical abundance observations, we abstain from utilizing any of the satellites' spatial information in our analysis. The main factors contributing to the SFH in the satellites are (1) the epoch of reionization, zre, (2) the fraction of gas remaining/accreted in the satellite halo after reionization (set mainly by the satellite's virial mass at its time of accretion), (3) the global star formation rate (SFR), and (4) the termination of star formation at the time of accretion (Bullock & Johnston 2005).

Here, one should take note that the assumption of quenched star formation upon satellite accretion into the host galaxy's halo is tentatively supported by observations in Geha et al. (2012) and Grcevich & Putman (2009) (strengthened by Grcevich & Putman 2010). Together, the observations suggest that the star formation of dwarf satellite galaxies is only quenched by close interaction with the host galaxy's halo.5

The chemical enrichment of the individual dwarf satellites is affected by these four factors, which are utilized in the simulations to determine the amount of gas available to produce stars and the duration of star formation, which, in turn, determines the chemical evolution of each satellite as prescribed in Robertson et al. (2005). The prescription includes α- and Fe-element enrichment from Type II and Type Ia supernovae (SNe) and stellar wind outflows of metals. The chemical evolution model was tuned with an SN feedback treatment to agree with the local dwarf galaxy stellar mass–metallicity relation (Robertson et al. 2005; see Section 2.3 for further discussion). The α-element patterns in dwarfs versus the smooth halo are consistent with the CARDs of dwarfs found in the compilation of data in Figure 12 of Geisler et al. (2007) (see Figure 1)—an agreement that further bolsters our approach in this investigation (Font et al. 2006).

2.2. "Observations" from the Simulations

The function f(xd) represents the density distribution produced by n random "observations" in chemical abundance space xd of "stars" (star particles; see Section 2.1 for explanation). Sample distributions for each halo are constructed by randomly drawing "stars" from the halo field.6 To mimic observational errors during mock observations, we add a random number drawn from a Gaussian with a dispersion of 0.05 dex to both [α/Fe] and [Fe/H] abundance ratios. The choice of the size of these errors is meant to probe the foreseeable potential of this technique by employing the best-possible conditions for analysis. Evaluation of this technique with ideal conditions provides us with a baseline for expectations from which analysis of real observations in the future can be assessed. In our study, we select samples of , 104, and 3 × 104 representing current, near-future, and optimistically anticipated sample sizes, respectively (K. Freeman 2010, private communication).

Figure 2 shows a 2D CARD ([α/Fe] versus [Fe/H]) of star particles representing mock stellar abundance ratio observations from the halo 1 simulation. The color of each particle represents the stellar mass of its parent satellite relative to all other accreted satellites. Black and purple particles are donated from the least massive satellites, while orange and red particles are donated from the most massive satellites. The distribution of particles shown demonstrates the expectation that the most massive satellites should account for the vast majority of stars found in the accreted halo stellar population. In comparing this 2D CARD with the observed CARDs in Figure 1, we see that the distributional spread between observed accreted dwarfs of different masses mirrors the distributional spread (in mass) for the simulated dwarfs.

Figure 2. Refer to the following caption and surrounding text.

Figure 2. Plot of [α/Fe] vs. [Fe/H] for "star particles." Each particle is color-coded to represent the relative stellar mass/luminosity of its parent satellite. The relative number of particles in the accreted satellite mass/luminosity range reflects the expected relative contribution from each parent to the total stellar mass of the host halo. The chemical evolution tracks of five satellites, randomly chosen to span the stellar mass range of accreted satellites for halo 1, are displayed over the colored particle distribution as black lines and labeled by a stellar mass proportional to the typical satellite stellar mass found in the mass bins outlined in Section 2.3 and displayed in Figure 3.

Standard image High-resolution image

The black dashed lines that overlay the colored particle distribution of Figure 2 represent chemical evolution tracks (from the simulations) of typical dwarf masses accreted over the lifetime of the halo. The length of these tracks is primarily affected by the satellite's accretion time. The more time a satellite has to produce stars, the longer its galactic chemical evolution can continue to advance to higher metallicities, and vice versa. The curvature of these tracks is primarily determined by the satellite's mass. The more mass a satellite has to produce stars, the higher its SFR, which means that chemical enrichment by core-collapse SNe is greater. This enhanced early enrichment from core-collapse SNe leads to higher galactic metallicities before the typical 1 Gyr onset (delay) in Type Ia SNe begins (ends), leading to the establishment of a so-called [α/Fe]-knee via significant contributions to Fe abundances. The incorporation of these various tracks into our dwarf model templates is discussed in the next section.

2.3. Satellite Template Sets

To see whether we can recover the AHP of our simulated halos from our mock observations, we need to generate templates that represent typical accretion events of given satellite stellar mass and age. The most "naive" approach to creating our templates would be to evenly divide the possible range in time tacc (0–13 Gyr) and mass (stellar) Msat (). This division would form Nr mass-binned templates (rows) by Nc time-binned templates (columns) with some "empty" templates (Nempty) where the total number of templates equals . However, since decades in galactic (stellar) mass have intuitive implications for galaxy evolution, we restrict our current templates to even divisions in tacc while we divide Msat by decades of mass from to and combine all satellites below into a fifth mass bin (see Figure 3).

Figure 3. Refer to the following caption and surrounding text.

Figure 3. Plot of 5 × 5 STS along with projects in the plane. Top right: our 5 × 5 STS. The relative contribution of stellar mass from a subset of all 1515 satellites in each template is shown as percentages of the total halo stellar mass (red). Each column and row reflects the mass/stellar mass–metallicity relation and age–metallicity relation, respectively (see Section 2.1 for details). Top left and bottom right: projections of the 5 × 5 STS into the tacc plane and Msat plane, respectively, are equivalent to the 1 × 5 (mass-divided) STS and 5 × 1 (time-divided) STS explored in Section 3. Bottom left: plot of a projection into both parameter dimensions exemplifies a density distribution (i.e., F(xd)) similar to the parent distributions of individual halos from which "observed" stars are drawn in our analysis.

Standard image High-resolution image

After divisions in the taccMsat plane are selected, all 1515 dwarf satellite models are divided among the bins created by the selected partitions based on each dwarf's individual tacc and Msat. During the process, each dwarf's chemical track7 (see Section 2.2) is smeared out by a convolution of each star particle with an observational error of dex in both chemical dimensions. To generate the CARDs required for implementation of our recovery algorithm (i.e., the EM algorithm), we separate an average of ∼19,500 star particles per satellite (with errors) into square bins of 0.1 dex that span 3 dex in [Fe/H] (−3 to 0 dex) and 1.7 dex in [α/Fe] (−0.7 to 1). The collection of all binned distributions in our 2D chemical space is normalized to produce an ensemble of probability densities that represent our satellite template set (STS).

Figure 3 shows our 5 × 5 STS as an example of our model template scheme. The full 5 × 5 panel (top right) shows the evenly spaced bins in accretion time versus bins spaced out by decades of accreted satellite stellar mass down to , below which all other satellites are binned together. As stated in Section 2.1, the feedback prescriptions in the chemical evolution models were tuned to reproduce the chemical abundance relationships observed in galactic surveys. First, the mass (luminosity) versus metallicity ([Fe/H]) relationship can be seen by inspecting the trends along any accretion time column. This relationship shows an increase in the distribution peak value of [Fe/H] (and a decrease in the distribution peak value of [α/Fe]) with increasing mass (luminosity) of the galaxy. Second, an age–metallicity relationship can be seen by inspecting the trends along any accreted satellite mass row (i.e., when holding the mass range constant). This relationship shows a decrease in the distribution peak value of [Fe/H] (and an increase in the distribution peak value of [α/Fe]) with an increase in the accretion time epoch (i.e., which dictates the available time for star formation) of the galaxy. However, it should be noted that this age–metallicity relationship is not strictly expected to hold for any given set of dwarf galaxies as other processes are as likely to quench star formation in dwarfs before accretion takes place.

Projections of the 5 × 5 STS, in accreted satellite mass and accretion time, are shown in the top left and bottom right corners of Figure 3, respectively. A comparison of both projections reveals smaller differences in CARDs between adjacent bins of accretion time than in adjacent bins of accreted satellite mass. The similarities between dwarf models in the 5 × 1 STS projection suggest that the EM algorithm will perform better when utilizing the 1 × 5 STS projections of accreted satellite mass for estimates (see Sections 3.2, 3.3, and 4.2 for further discussion). Finally, a 1 × 1 STS projection displaying the probability density function of our master template (i.e., containing the CARDs of all 1515 simulated dwarfs) is shown in the bottom left corner of the figure.

In Section 3 we use the two 1D projections discussed here to form a basis of analysis for the EM algorithm's performance and our ability to recover the AHP of halos in one dimension of mass or time.

2.4. Recovering AHPs Using the EM Algorithm

The composition of our halos can be best described as a finite mixture of discrete accreted objects that exhibit varying characteristics in a shared CARD space (x = [α/Fe], y = [Fe/H]). Since we can construct models for these accreted objects, we can create a mixture model

where the relations

confine the relative contribution of model satellites . Given n observations of , we can construct a log-likelihood function as follows:

Maximizing will yield the maximum likelihood estimate for —our best EM estimate for the true Aj values . This task, which can be computationally arduous, can be made tractable by adding a latent indicator, z, to each observed data point (x, y), to represent the model template of origin. By designating data set as our complete data, we can then define a complete data likelihood as

where zij equals the hard expectation that () comes from the jth satellite template and is the complete data log-likelihood.

As stated above, the log-likelihood derived above can be used to obtain via the EM algorithm. Starting from an initial set of guesses, , the algorithm iteratively steps through guesses (which are informed by the former set) until the value of the log-likelihood , conditioned on the data (and within some tolerance), is maximized. More specifically, the maximizing value of the tth iteration, , is then used as the starting value for the next run, and it continues until the likelihood changes by less than 10−3 over 25 iterations.8 Details of the implementation of this technique are shown in the Appendix.

We discuss how we evaluate the success of our estimates in the next section. Results from the EM estimates are discussed from Section 3 onward.

2.5. Evaluating the Success of the Method

In order to evaluate the relative success among our calculated AHPs across all halos and the success of the technique across various STSs, we compare the EM estimates, , with the known true values, . Using these values, we can calculate the "factor-of-error" (FoE) ratio for each template EM estimate. The FoE value is defined as the maximum between and .9

One way to evaluate the fidelity of our results is to determine an average FoE ratio () from all FoE measured (i.e., from a given STS and halo). This is an average of all FoEj, weighted by wj, and given as

where wj represents a choice of weights for the relative importance of each template estimate and m is the number of templates used. The lowest value indicates the best results balanced by wj in STS templates for each halo examined. For our primary analysis we take a mean of FoE values (), while other weights are examined in Section 5. The method of evaluation is applied to results in Sections 34.1.2.

3. RESULTS I: AHP IN 1D

In this section we determine how accurate our satellite contribution estimates can be for our simplest STS. More explicitly, we investigate how well we can estimate the fractional contributions to a halo's construction via STSs that span the stellar mass of the accreted system (i.e., its luminosity function) or its time of accretion (i.e., its stellar mass accretion history).

3.1. Stellar Mass Fractions

As discussed in Section 2.3, we can construct a true AHP from our model stellar halos to determine how accurately we can estimate them using the EM algorithm discussed in Section 2.4. Here, we examine the accuracy of our 1 × 5 STS estimates, which are a 1D set of five mass bins (as described in Section 2.3 and shown in the top left corner of Figure 3)—that is to say, we evaluate how well we can recover stellar mass fraction contributions from satellites with no sensitivity to their time of accretion.

Figure 4 presents some characteristic results from our 1 × 5 STS analysis. The top panel indicates that open squares represent the values estimated by applying our EM analysis to observed abundances from observed stars.10 Error bars (calculated from the Fisher information matrix) indicate the smallest possible (1σ) error values (see the Appendix for details). The colored circles shown represent the (true) values, while the specific colors of each circle categorize the FoE between and values by the color legend to the right of the plot. Various FoE values spanning less than 1.1 (bluish-green filled circle) to 10 or more (red filled circle) are examined.

Figure 4. Refer to the following caption and surrounding text.

Figure 4. Plot of fractional stellar mass contributions to the host halo vs. the satellite's binned stellar mass for the best and worst EM estimates among our 11 halos (labeled h1–h11, hereafter) for 1 × 5 STSs. Selection of these halo estimates is based on their values, given in respect to the number of stars (here we use stars) observed. Estimates from observations (open squares) are shown for each of the five templates. Their corresponding actual values (circles) are also shown with various holes and colors that indicate the FoE difference between the estimate and actual values (see legend for key). See the text for a discussion.

Standard image High-resolution image

In the figure, two plots are chosen to display results from two representative halos (labeled by "h" with the designated number for the halo for short). The two halos are the best (h8) and worst (h5) AHP estimates as determined by their average FoE () values.

Looking at our best EM estimates from h8, we see that individual AEM produce errors that are within a factor of 2.5 or better for all template estimates using observed stars. This is remarkable considering that we are characterizing to 10−3 of the total halo luminosity for the lowest-mass bins.

Our worst EM estimates from h5 seem to reinforce the notion that this analysis provides reliable results. In this worst-case scenario, most estimates are within a factor of two, while the worst estimate (given for our most massive satellite template) is within a factor of eight.

3.2. Accretion Time Histories

The other principle dimension of our analysis is time. Using the same prescribed analysis above, we can examine the success of estimating AHP from a 1D set of five equally spaced time bins (also described in Section 2.3)—that is to say, we evaluate how well we can recover stellar mass fraction contributions from satellites with no sensitivity to their stellar masses. Figure 5 presents some characteristic results from our 5 × 1 STS analysis. In the figure, plots are chosen based on the same criteria used in making Figure 4. The best EM estimates from h6 reveal very different results concerning the reliability of our analysis when compared to the 1D mass-resolved template results. While both the two most recent and two earliest accretion events have FoE values , the "medieval" accretion event has an FoE value . Here, only the least massive accretion event has a poor FoE value.

Figure 5. Refer to the following caption and surrounding text.

Figure 5. Same as Figure 4, but for the 5 × 1 STS. See the text for a discussion.

Standard image High-resolution image

Our worst EM estimates from h7 follow a trend where all but the most massive accretion event (the medieval event in this case) have markedly poor FoE values that range from to ≳103. Here, the best estimate has an FoE (i.e., with 50% of the true value). While the estimates call into question the reliability of using multiple dimensions in tacc and Msat, the overall results were already anticipated from the visual inspection of these templates in the bottom right corner of Figure 3. As suggested earlier, it is likely that degeneracies in CARDs within this template set led to the poor estimates seen. In particular, the difference between FoE values for the medieval accretion events in h6 and h7 versus the other events comes down to the dominant accretion event templates subsuming those events that are both highly degenerate in CARD space and significantly less massive than the main event(s). As a consequence, it may appear hopeless to try to glean any information about the accretion times from 1D estimates. This may also hold true for estimates in multiple dimensions when accretion time is treated as the dominant dimension of analysis (see Section 4 for further discussion).

3.3. Accuracy of Stellar Mass Fractions across Halo Realizations:

Our complete results, summarized by , provide us with insights into the overall effectiveness of the analysis for all 11 halos. Figure 6 displays values for the 1 × 5 STS (i.e., 1D mass-resolved; top panel) and the 5 × 1 STS (i.e., 1D time-resolved; bottom panel). In both panels, each plot shows a histogram of values, calculated using the number of observed stars indicated in each plot, and normalized by the number of halos examined. Dotted light-gray lines indicate an FoE = 2, which indicates, by eye, the vast difference in trying to recover AHPs from 1D mass of accreted satellite templates versus 1D time of accretion templates.

Figure 6. Refer to the following caption and surrounding text.

Figure 6. Frequency of values among all 11 halos for the 1 × 5 STS (i.e., vs. mass of accreted satellite; top) and 5 × 1 STS (i.e., vs. time of accretion; bottom). Red, green, and blue histograms refer to the number of stars used to calculate the EM estimates summarized in this figure. Light-gray dotted lines indicate a to guide the eye when comparing the difference in results. The difference in the spread and range of values between the 1 × 5 and 5 × 1 STSs is striking and seems to support the notion (from Figure 3) that the 1 × 5 STS retains greater distinction between its templates than the 5 × 1 STS does (resulting in better estimates from the 1 × 5 STS).

Standard image High-resolution image

In our mass-resolved (1 × 5 STS) estimates (top panel), we can examine the overall success of these templates and note the degree of improvement in estimates as a result of using more data points. Looking at the full panel, we can clearly see the gradual, distinct improvement in estimates when a larger data set is used. The median (i.e., our accuracy) for each larger set of observed stars is ∼2.55, ∼2.16, and ∼2.06, respectively. However, it is important to note that the modest improvement between the last two data sets possibly indicates that the method is hitting a limit owing to the number of templates versus the number of stars used. In our time-resolved (5 × 1 STS) estimates (bottom panel), we can see that these estimates are far poorer than the estimates for the mass-resolved estimates. In fact, the time-resolved estimates have a median for each larger set of observed stars equal to ∼100, ∼175, and ∼192, respectively.

Even more critical is the fact that these estimates get marginally worse with number of observations used. This suggests that there are degeneracies between templates in the set that cannot be removed with more CARD information in just two chemical abundance ratio dimensions. Conversely, these degeneracies may also suggest the inherent need for mass divisions in the STS to see differences in templates—a possibility that motivates moving our STS to higher dimensions in the plane. In the next sections, we discuss the impact of expanding our analysis to multiple dimensions in order to achieve better estimates.

4. RESULTS II: AHP IN 2D

Now that we have established a baseline for estimates in our special 1D cases, we seek to extend our search in higher dimensions of time (i.e., fixing five mass bins and varying our number of equally spaced time bins). In the following subsections, we discuss our results in detail for our 2 × 5 and 3 × 5 STSs (i.e., with two or three time bins), presenting insights into their success and failure.

4.1. Xx5 STS Results

The goal of expanding our STS into higher dimensions is twofold. First, we want to directly recover AHPs with high fidelity by dividing our plane into templates that would reveal interesting information (e.g., about the MW halo's history) when applied to real abundance observations. Second, we want to indirectly recover 1D stellar mass functions (mass-resolved profiles) and time of accretion histories (time-resolved profiles) of our halos by summing "like" estimates in time or mass together (marginalization) in order to generate better accounts in 1D than could be done directly. Our hypothesis is that allowing a finer grid in time will produce templates with less degeneracy and allow a better recovery of the AHP. Of course, this must be balanced by the size of our sample and its ability to constrain the additional parameters (larger set) from the increased number of templates.

4.1.1. "Early" versus "Recent" Accretion: 2 × 5 STS Results

To address our goals, we start by generating templates for our 2 × 5 STS, which have two evenly divided time bins for recent (0–6.5 Gyr ago) and early (6.5–13 Gyr ago) epochs. Figure 7 displays a selection of results that reveal the success of EM estimates due to the application of our 2 × 5 STS. In the figure, we can once again examine the best (h11), the median (h2), and the worst (h7) of the halo estimates using these templates. In the first column of Figure 7 we display the values of to indicate the success of estimates using stars, which can be compared with our marginalized results in the rightmost columns. At first glance, we see that all panels indicate, by (mostly green) colors, that most estimates are within an FoE of 2. For the best EM estimates (from h11), it is encouraging that all FoE values are .

Figure 7. Refer to the following caption and surrounding text.

Figure 7. Figure of the 2 × 5 STS is similar to Figure 4, but the first two columns show separate sets of templates for recent (0–6.5 Gyr) and early (6.5–13.0 Gyr) accretion epochs. The final column shows totals over all time (i.e., an "effective" 1 × 5 STS from adding corresponding estimates from both columns). Numbers labeled "L" and "T" refer to values calculated across satellite stellar mass and time bins, respectively.

Standard image High-resolution image

However, for the worst EM estimates (from h7) we see a marked decrease in the fidelity of a couple of estimates and especially for one at the high-mass end. Here, we see that the value for the early accreted template is actually similar to its recently accreted counterpart, whereas the EM estimates are very different. While the early accretion event is estimated to be essentially nonexistent, both the adjacent higher-mass template (early accreted template) and the recent accretion 107–8 template have slightly higher EM estimates than their true values. The difference in FoE values is probably due to both templates subsuming the contributions from the poorly estimated template. Given that this template is high mass and accreted early, this degeneracy is likely due to the fact that the accretion of most massive systems happens early in most of the 11 halos' histories. Since the 1515 satellites used to make the templates are composed of 11 ensembles of accreted dwarf systems that make up the composition of our simulated halos, it is not surprising that coarse divisions in accretion epochs lead to disparities in the fidelity of our estimates across the 6.5 Gyr divide.

On the other hand, as indicated by our best selection, it is reassuring that, given the simplicity of our dwarf models, there is enough information in their CARDs to make templates that differentiate between higher-mass progenitors of the halo at different epochs. This is true despite the fact that the highest-mass dwarf models show the greatest amount of degeneracy among accreted systems throughout all halos' assembly histories. Also, given the strength of current techniques to more accurately identify recent galaxy formation (e.g., color–magnitude diagrams from photometric surveys that lead to estimates for age and SFHs and phase-space diagrams from low-resolution spectroscopic surveys that lead to estimates for accretion histories), it is encouraging that our technique works so well for early accretion epochs and low-luminosity objects.

In the last column of Figure 7, we present a summation of estimates across accretion epochs (shown with values labeled "L") and across binned satellite luminosities (labeled "T") for all epochs. Here, we confirm that a marginalization of estimates across our two epochs yields 1D estimates with greater fidelity than its 2D decomposition for the worst EM estimates, as indicated by the L-labeled values. More importantly, we can compare our best worst values for our h7 estimates (FoE = 2.161) with the respective 1D h8 estimates (FoE = 2.059) in Figure 4. A comparison of these values shows tentative evidence that our hypothesis about gains in STS information is correct—that the 1D marginalizations across epochs from a 2D STS provide on par or better estimates for 1D AHP than does our bona fide 1D STS. We can also compare the set of "T"-labeled best values for our 1D marginalizations across satellite luminosity bins in Figure 7 with the set of values calculated for Figure 5 (FoE = [7.168, 485.6] for our best and worst values, respectively). Here, we find that our estimates for our time of accretion histories improve substantially overall and dramatically when comparing our best and worst AHP estimates. The next two sections address whether these improvements are ubiquitous as we increase the resolution of our STS in the accretion time dimension.

4.1.2. "Medieval" Accretion: 3 × 5 STS Results

In order to further test our ability to estimate AHPs, we seek to increase our accretion time resolution (by adding an intermediate "medieval" accretion epoch), with the hopes that greater information from an expanded STS will lead to better AHP estimates.

Figure 8 shows our best and worst 3 × 5 STS results. The values between the best and worst EM estimates show a substantial decrease in quality. It is immediately apparent (from color) that individual estimations fared significantly worse than they did in the 2 × 5 STS selections of Figure 7. Also, by inspection, the medieval epoch yields the worst estimates overall. Similar to Figure 7, early epoch estimates of Figure 8 are the most accurate. The overall decrease in performance from our 2 × 5 to 3 × 5 STS is likely due to the degeneracy in CARD space between some adjacent templates in the 3 × 5 STS (e.g., see Figure 3 for illustration of this effect) and across accretion time for the higher-luminosity templates. For example, if we look across the recent and medieval epochs for our worst EM estimate selection, we can see that there are degeneracies in the estimates for the highest stellar mass bins (). These degeneracies are due to the increasing similarities between chemical model tracks of more massive (and luminous) dwarf satellite models. Such degeneracies can lead to the satisfaction of estimates across all epochs by one individual template (e.g., h7 from Figure 5), by distributing the luminosity fraction among co-degenerate templates (e.g., h7 from Figure 7), or by swapping estimates across adjacent epochs (e.g., h10 from Figure 8). However, it appears that a clear separation in accretion epochs for the same stellar mass bins possibly reduces degeneracies between them (as seen for the best (h3) estimates).

Figure 8. Refer to the following caption and surrounding text.

Figure 8. Figure of 3 × 5 STS is similar to Figure 7 but includes an additional column for an intermediate medieval accretion epoch.

Standard image High-resolution image

If we look at the final column for our 1D marginalizations from the 2D 3 × 5 STS, we once again see improvements in values in comparison to Figures 4 and 5 (e.g., look at "L" and "T" values for all selections vs. uniformly weighed values in Figure 12 in Section 5). While improvements were anticipated, it is still surprising, given the relative lack of success for individual 3 × 5 STS templates, that marginalization of the worst 3 × 5 STS leads to 1D estimates that offer an improvement over the 2 × 5 STS marginalized 1D estimates. In this case, some inaccuracies due to degeneracies across epochs are mitigated by summation over accretion epochs. Consequently, improvements to our marginalized mass-resolved 1D estimates arise from an increase in the STS epoch resolution. Presumably, the better estimates would originate directly from improved individual epoch estimates. However, poor individual estimates due to degeneracies within the same stellar mass bins refute this idea. Indeed, it is more likely that improvements to our epoch resolution led to better estimates indirectly, not by decreasing the degeneracies between adjacent epochs, but rather decreasing degeneracies between adjacent stellar mass bins. While the effects described above are certainly taking place, it is still unclear from Figures 4, 5, 7, and 8 whether these improvements remain across all 11 halos. In the next section we examine the values as ensembles across the 11 halos to determine the overall success of recovering AHPs given our STS.

4.2. Comparison of Results across All STSs

In this section we compare results from all our simulated halos and the templates we constructed. Using FoE values (see Section 2.5), we can determine a cumulative distribution function (CDF) of FoE values with respect to for each STS used. The CDF values described above (which we call ) indicate the fraction of the total stellar halo mass we can identify within a given FoE value.

First, we construct values in Figure 9 for 6 of our 10 STSs. Each plot frames the recovery of AHPs in terms of the level of accuracy (i.e., FoE) at which we can characterize a certain portion () of the total luminous stellar content of the halos examined. Once again, differences in the fidelity of our estimates between 5 × 1 and 1 × 5 STSs are clearly shown with a median (fraction recovered) with an FoE ≲ 2 being and 95%–99%, respectively. Characterizing the success of the method overall, we find that the median (with FoE ≲ 2) across most STSs is or better. It is evident from the STS shown in Figure 9 that EM estimates fair poorly when applied to certain halo realizations. We discuss possible causes for the often poorer estimates of a few halos in Section 5.

Figure 9. Refer to the following caption and surrounding text.

Figure 9. Six STS-derived plots of (≤FoE) for all 11 halos demonstrating another benchmark for our CARD analysis for deriving the AHPs of our halos. Columns represent results for listed STS estimates. Rows represent estimates derived from a certain number of observed stars, which are labeled at the right edge of each row. Shaded areas in each plot guide the eye to FoE estimates of ∼2–3 or better, which primarily indicate estimates that cover . Individual solid colored lines represent each of the 11 halos used in the study. Colored labels for the halos are shown in the bottom left corner of the figure. Black dot-dashed CDFs represent the median of all 11 halos vs. FoE values.

Standard image High-resolution image

Figure 10 displays another way we can summarize our results with the utilization of and FoE. In the three panels, box-and-whisker plots illustrate the median and shape of the distribution of values calculated for estimates with FoE ≲ 2 among all 11 halos.11 The top panel displays similar information to the results shown in Figure 9. The middle and bottom panels show both genuine and marginalized estimates for the 1 × 5 STS accreted mass functions and the 5 × 1 STS accretion time histories, respectively.

Figure 10. Refer to the following caption and surrounding text.

Figure 10. Box-and-whisker plots of (FoE ≲ 2) for full STS (top), marginalized 1D mass-resolved STS (middle), and marginalized time-resolved STS (bottom) using all STSs examined for our 11 halos. The median values of for all 11 halos are shown as a black line across every box. The 25th and 75th percentiles of the distribution are shown as the lower and upper bounds of each box, respectively. Whiskers designate the minimum and maximum values for values in the distributions shown. Each box has a color that refers to the number of stars identical to the colors used in Figure 6. Top: boxes (solid colors) refer to the genuine values for each respective STS. Middle and bottom: "marginalized" boxes (striped colors) refer to the values calculated from the sum across the mass (time) dimension of templates into an effective 1 × 5 (5 × 1) template (e.g., see Figures 7 and 8). The 1 × 5 STS (mass-resolved) values derived from marginalizing over time-binned estimates are shown in the middle panel, while the 5 × 1 STS (time-resolved) values derived from marginalizing over mass-binned estimates are shown in the bottom panel. Increasingly darker gray bands spanning all STSs (for ) are shown to highlight the success of our estimates.

Standard image High-resolution image

In the top panel, (FoE ≲ 2) is plotted, as a color box, for all STSs examined. Here, as in Figure 6, the color refers to the respective number of observations used (as indicated in the plot legend). In the plot, we see that our best median values are given by the 1 × 5 and 2 × 5 STSs, while the worst values are given by 5 × 1 and 7 × 5 STSs. The average among the best and worst values across all STSs examined and for an increasing number of stellar observations is ∼0.96–0.98 and ∼0.29–0.41, respectively. The average median values across all STSs examined and for an increasing number of stellar observations are 0.742, 0.783, and 0.785. This means that, on average, our FoE are ≲2 for at least ∼75% of the total halo stellar mass (i.e., ) observed.

Marginalized values, which are defined in Section 4.1.1, are useful for evaluating any gains that may potentially arise owing to better time (or mass) resolution. More precisely, any information about templates that is lost or gained should generally result in a corresponding rise or drop in and thus appear as an increase in (FoE ≲ 2). As a reference, a gray bar is placed in each panel to indicate a region where the (FoE ≲ 2) values range from 70% to 100% (from bottom to top).

The middle panel shows our mass-resolved marginalized values (summed over accretion time bins) for eight of the nine STSs (with 5 × 1 omitted because its value is not applicable in this context). The plot shows an across-the-board increase in (FoE ≲ 2) values (i.e., a general drop in all STS values) measured for a recovery of the total stellar mass function. The improvement in FoE values despite the tendency for various individual FoE STS values to increase with an increase in the number of templates used, indicates that significant gains were made by using a larger template set for the specific purpose of generating more accurate estimates of a halo's total stellar mass function (via marginalization).

The bottom panel shows our time-resolved marginalized values (summed over mass bins) for eight of the nine STSs (with 1 × 5 also omitted because its value is not applicable in this context). In this case, the plot shows a descending trend in (FoE ≲ 2) values with larger STS (i.e., a generally ascending rise in values with increasing STS size) measured for a recovery of the total accretion time history. Despite the decrease in (FoE ≲ 2) values, these values remain relatively good (above 70% for values above the bottom 50% margin) up to our 6 × 5 STS. Indeed, all time-resolved marginalized values show a significant improvement in accretion time histories over the history given by the 5 × 1 STS. Overall, the results show that we could expect to recover accretion time histories using the EM algorithm given that we use reasonable templates.

Results shown in Figures 9 and 10 prove that even with the simplest template divisions, we could, with the appropriate data set, recover the accretion history of the MW halo. To that point, we find that these STS EM estimates can recover the total contributions from accreted systems (templates) of similar mass (i.e., halo luminosity function) to within a factor of 1.02 (≤2% of the true value) for most of the 11 halos. Separately, the EM algorithm can determine the mass fractions within accretion times to within a factor of for at least 90% of the halo's total stellar mass. Both results present encouraging prospects for recovering the accretion history of the MW halo from current and near-future data collections.

5. DISCUSSION

In the following discussion, we examine the statistical reliability (or robustness) of the EM algorithm when applied to our models and simulated data. We also explore what masses the current approach is most sensitive to and discuss implications for future work.

5.1. Reliability

We can test the statistical robustness of the EM algorithm's application to our simulated halos by performing a likelihood ratio test on the results of our analysis. By determining the true () and respective likelihood values from each application of STS to our halos via the EM algorithm, we can calculate a -statistic defined by the following equation:

where and are the likelihoods for and values, respectively. One can then reject the assumption that the true AHP templates are well approximated by the STS used if the -value from Equation (7) is larger than the -percentile values given k degrees of freedom (k = mEM - mT)12 and a confidence level denoted by α. Figure 11 shows the maximum α-value one can assume for a -distribution before you have to reject the assumption that suitable AHP templates are chosen. For example, an corresponds to a confidence that 95% of all samples taken of a given size are well characterized by the STS in use. Here, we find that out of all sample sizes and STSs used, halos 5, 9, and 10 are by far the worst-characterized halos by our STS divisions. For most STSs used, these halos are ill-matched to the generic STS created in our division scheme and therefore challenge the robustness of this method. Such challenges need to be address before this method can be utilized to model the AHP of the MW halo. The solution resides in the development and incorporation of sufficiently realistic models of dwarf CARDs into this method—a goal that will be addressed in future work.

Figure 11. Refer to the following caption and surrounding text.

Figure 11. Figure shows the α-level threshold for accepting or rejecting the null hypothesis that suitable AHP templates were used in estimating values. Colors represent results for the 11 halos examined, and panels compare results for the approximate number of stars observed. See the text for a discussion.

Standard image High-resolution image
Figure 12. Refer to the following caption and surrounding text.

Figure 12.  values for different template weights. The various colors refer to the approximate number of stars used as indicated in Figure 10. Weights are listed in the figure legend. See the text for a discussion.

Standard image High-resolution image

5.2. Sensitivity to Different Mass Bins

Another consideration in assessing the reliability of our method is to determine how well it uncovers AHPs based on the satellite mass regime we are interested in. Taking Equation (6) from Section 2.5, we can calculate values with different weights—i.e., uniform (mean), low-mass preferred, or high-mass preferred—based on what satellite population(s) one prefers to recover. Figure 12 shows the median among all halos for each STS used. The same colors from Figure 10 are used to indicate the number of stars used for the analysis, and symbols and corresponding lines refer to the type of weighting used (see figure legend). Uniformly weighted values are weighted by (i.e., by the number of templates used) and identical to the weighting used for the main results of this paper. Weights that emphasize more accuracy in low- or high-mass satellite AHPs are weighted by the corresponding upper bin mass limits and their reciprocals, respectively.

In the figure, we can see that values for low-mass satellite recovery fair the best, whereas uniform and high-mass satellite recovery-emphasized weights are a factor of in all but the three smallest template sets. In other words, when one emphasizes the accurate recovery of low-mass satellites, the weighting favors templates with lower FoE values, which yields lower overall values. This result further clarifies the immediate strengths of the method: it is adept at differentiating between accreted dwarfs of low mass in CARD space owing to the lack of degeneracies in their occupied region of space. Meanwhile, it is clear that while degeneracies exist in the CARD space occupied by high-mass satellites and larger STSs, we are encouraged by the fact that the introduction of more templates can significantly decrease degeneracies in only two dimensions of CARD space.

5.3. Future Prospects

It is clear from both our results and our reliability tests that the current method fails often for 3 of the 11 halo simulations. From our examination of these three problematic halos we find that all of them show predominately early accretion of massive dwarf galaxies with integrated CARDs that appear to be highly degenerate when compared with the other eight halos' AHP CARDs examined.

To address the degeneracies that exist (particularly among high-mass systems), we posit that differences between mass-dependent (nucleosynthetic) yields for different nucleosynthetic sites and element groups (e.g., see Lee et al. 2013) can be exploited to greatly reduce or remove such degeneracies by expanding the CARD-space basis set.

For example, we only looked at two dimensions in CARD space, whereas more recent work on "chemical tagging" expands the number of dimensions available by establishing the best chemical abundance signatures to pursue in chemical abundance space in order to optimize survey efforts (e.g., the GALAH survey). One way to optimize our surveys for searches in chemical abundance space is to prioritize spectroscopic observations for elements that confer the greatest amount of distinction between systems with different origins. To this end, principle component analysis was used by Ting et al. (2012) to identity and rank the six to nine most distinguishing elements in chemical abundance space. In their work, the chemical abundance space of various parts of both the galaxy and the galactic neighborhood was examined to determine the best elements to observe in order to decipher their galactic chemical evolution. A CARD-space basis set derived from various combinations of these elements is likely to offer the breaks in degeneracies that we require. Lastly, it should be noted that the current and upcoming surveys that are best poised to provide the data required for our approach are the Subaru prime focus spectrograph (PFS; Takada et al. 2014) and the Gaia-ESO (Gilmore et al. 2012) surveys.

6. SUMMARY

In our investigation to determine the efficacy of recovering the accretion history of the MW halo, we used simulated halo data from the Bullock & Johnston (2005) MW halo simulations. Our approach required the CARDs of [α/Fe] and [Fe/H] for the 11 simulated realizations for accretion-grown halos, observed samples of stars from those simulations, and CARD templates of accreted dwarf models in the simulations. From this assortment of data we were able to apply a statistical algorithm (the EM algorithm) that utilizes the model templates with those observed stars to disentangle the accretion history of our simulated halos.

To evaluate the success of our estimates, we examined relationships between a measure of accuracy, the FoE, and a measure of the maximum fraction of the halo's stellar mass that is characterized by this level of accuracy, which we call .

In our analysis, we employed (equally partitioned) STSs as model sets for our generative mixtures (i.e., the simulated halos). The first test of our templates involved 1D STSs, which were composed entirely of either stellar mass or accretion time partitions. In the case of our 1D mass-resolved STS, the EM algorithm estimates for individual templates were made to within a factor of eight (in the worst case) for halo 5 and were within a factor of 1.5–2.5 or better for most mass bins. However, in the case of our 1D time-resolved STS, results were considerably less accurate, with approximately half of the individual templates being off by a factor of 10 or more. In this case, it is important to note that the bulk of these poor estimates occurred for bins containing the least amount of accreted mass. This outcome was not unexpected, but it stands in sharp contrast to estimates that resulted from our mass-resolved case. In both cases, we also examined the effect of increasing our data sets from 1000 to 30,000 stellar chemical abundance observations. While we found that an increase in our data generally led to better estimates from our mass-resolved templates, no improvement was seen for estimates from our time-resolved templates. These results led us to examine what, if any, improvements could be made in our EM estimates by expanding our STS into 2D of accretion time and mass and increasing the number of templates used.

In examining the use of the 2D STS in EM algorithm estimations, we find that these template sets provided more accurate estimates in general. More precisely, we find that our 2 × 5 STS could be used to furnish remarkably good AHP estimates—meaning that we could easily recover a tally of satellites that fell in recently versus those that fell in more than 6.5 Gyr ago. It is clear that in this dichotomous evaluation mode, the EM algorithm can easily detect a distinction between previous satellites that were accreted from 6.5 Gyr ago to now and those satellites that accreted prior to that time using only two dimensions in chemical abundance space. Also, we find that in the case where we try to estimate an early, medieval, and recent accretion history—our 3 × 5 STS tests—the EM estimates do fairly well too. In some cases it was apparent from our 2D STS figures (for our 3 × 5 STS in particular) that degeneracies between templates in a set were possibly degrading our EM estimates and perhaps limiting the potential for this technique. However, despite such degeneracies, we find that we can improve our 1D recovery of both the mass accretion history (functionally similar to mass/luminosity functions) and the accretion time history (a coarse account of mass growth of the halo over time) by marginalizing estimates across templates in the appropriately related dimension. Thus, we are confident that at the very least this technique can be used, albeit carefully, to produce fairly accurate estimates for 1D accretion mass or mass growth functions for the MW halo.

Finally, we compare our tests for all 2D STSs. We find three interesting features that reflect the technique's potential. These features are (1) fairly accurate estimates for AHPs across most STSs used, (2) consistent or improved 1D mass-resolved values from 1D marginalization over an increase in the number of templates used, and (3) a substantial overall improvement in the marginalized time-resolved values across all STSs used over the 1D 5 × 1 STS values. From these features we conclude that, on average, we can recover the bulk of accreted dwarfs' relative contributions to the halo's accretion history by mass, to within a factor of ∼2. Despite this fact, many individual templates (especially our lower mass bin templates) can produce estimates that are far less accurate than estimates given for the main stellar mass contributors to the halo. This is likely due to degeneracies among templates belonging to the same STS and relative contributions of these objects to the general star count of the halo. These issues can be addressed by carefully selecting which observed stars are to be included in the data sample and by expanding the chemical abundance space basis set to better disentangle the individual SFHs of the previously accreted dwarf satellites in our halos (or our MW halo).

Lastly, in spite of the demonstrated drawbacks involving degeneracies between individual templates, we find that, remarkably, it is possible to improve 1D mass function predictions (as a function of accreted satellite mass or accretion time) simply by increasing the number of partitioned time bins (templates) used for EM estimates and then marginalizing over those estimates in either stated dimension. This result means that at the very least it is possible to extract, e.g., accurate luminosity functions with estimates that clearly improve with better resolution in our plane. Further investigation of this result will be pursued in the near future.

7. CONCLUSIONS

In conclusion we note the following implications of our study.

  • 1.  
    Our proof of concept is verified—recovering halo accretion histories using their CARD information works (and works well for a certain level of detail).
  • 2.  
    In particular, even when applying our method to only a 2D CARD space, we appear to be sensitive to:
  • 3.  
    Early accretion events (regions where information in phase space has phase-mixed away).
  • 4.  
    Low-luminosity dwarfs (objects we cannot see in situ because they are too faint).
  • 5.  
    There are degeneracies in 2D CARD space, particularly among high-mass accreted dwarfs.
  • 6.  
    However, since we only looked in 2D and there are prospects of tens of thousands of stars with independent chemical dimensions, it is very important to pursue this method of approach further.

Finally, given these implications, we are compelled to generate more realistic templates from chemical evolution models in higher dimensions and test them against existing dwarf data. It is the hope that by validating the fidelity of such templates, we could, in turn, employ these templates in our method to produce a detailed account of the accretion history of the MW halo.

D.M.L. thanks his dissertation thesis committee for their helpful comments and support in the writing of this paper. D.M.L. and K.V.J. also thank James Bullock, Brant Robertson, and Andreea Font for the collaboration that developed the numerical data sets used in this work. Finally, we thank the anonymous referee for a prompt and helpful report. D.M.L. acknowledges financial support from the following sources: the Strategic Priority Research Program entitled "The Emergence of Cosmological Structures" of the Chinese Academy of Sciences (XDB09000000), the Chinese Academy of Sciences Fellowship for Young International Scientists (2013Y2JB0005), and the National Natural Science Foundation of China (11333003, 11173002, and 11173044). D.M.L. and K.V.J. were also supported by the NSF research grants entitled "Dwarf Galaxies, Abundance Patterns and the Physics of Galaxy Formation" (AST-0806558) and "Mapping the Past in the Future: Science Enabled by High-Resolution Spectroscopic Stellar Surveys" (AST-1107373).

APPENDIX: THE EM ALGORITHM

A.1. Expectation Step

To implement the algorithm, we first need to derive the expression for the complete data log-likelihood, given by Equation (5), which is conditioned on the data. To do this, it is necessary to decide on a mode of usage for zij. The use of z casts the EM algorithm as either hard when its value discretely indicates the of origin or soft when its value probabilistically indicates the origin of point () across all fj. For this application, we chose to implement a hard EM algorithm for estimation of AMLE in which zij has a true value equal to 1 if the data point () comes from model fj and 0 otherwise. Thus, our overall expectation is

where

as defined by Equation (2). Since we are ultimately maximizing Equation (A.1), the nonconstant term, Equation (A.2), becomes the component of interest. To iteratively evaluate this expectation, we let be Equation (A.2) at the tth step:

Since is not defined for the first evaluation, we use a random initialization to generate . Here, it should be noted that convergence is not sensitive to the choice of values in our case, though it can be in cases where the likelihood is riddled with local maxima. If we examine the expression above, we can conceptually define the mechanism for maximization as a "ratcheting up" of values by maximizing with respect to . Derivation of the maximization expression is discussed below.

A.2. Maximization Step

Above we defined an explicit formulation for the expected log-likelihood (Equation (A.2)) given a single parameter and the data (). The argument of the maximum of Equation (A.2) at each iteration t provides an estimate that approaches the MLE of  and is given by

Accounting for the – 1 free parameters of , differentiation of Equation (A.1) with Equation (A.2) proceeds, for , as

where the first term in the summation accounts for all values of and the second term eliminates overcounting of the first term at k = m. The derivative of an argmax is always equal to zero since we are taking a derivative at the maximum point of the function in question (in our case the expectation of the log-likelihood). Thus, we can expand the summation of data points and equate the terms described above to one another:

Consequently, these terms being equal means that every term is equal to each as shown below,

and

where c is some constant.

The unknown constant c appears problematic, but because , algebraic manipulation reveals that c = n, yielding a final solution that can be numerically evaluated:

Finally, to implement this algorithm, we simply compute an initial value for , inserting each component, , into a wikt equal to Equation (A.2) (i.e., with k initially identical to j) and then compute that expression with Equation (A.4) to calculate each new corresponding . This process is repeated until our iteration criterion is met.

In our case, computation of converges relatively quickly for all starting values: on the order of 600 iterations, or half a minute, for n = 1000 (given our stopping criteria). Large values typically emerge after two or three iterations, and most change, absolutely speaking, occurs in the first 50 to 100 iterations. For error estimation, we can provide values for the minimum error possible through an inversion of the Fisher information matrix (see Section A.3 for brief derivation). Although we have an idea of what the best possible errors are, such values exclude the use of more standard approaches to assessments of parameter estimation, like the reduced statistic.

A.3. Derivation of the Minimum Error on EM Estimates

The asymptotic covariance matrix of can be approximated by the inverse of the observed Fisher information matrix, I.

As , there are only free parameters. Thus, let . Using for brevity, the likelihood can then be expressed as

The observed information matrix, I, is the negative Hessian of Equation (A.6), evaluated at the observed data points:

where

with such that (k, r) represents the index of the observed information matrix I.

The observed information matrix of yields the following estimates for covariance and correlation for all m estimated weights in :

Footnotes

  • LSST refers to the Large Synoptic Survey Telescope, APOGEE refers to the Apache Point Observatory Galactic Evolution Experiment, and GALAH refers to the GAlactic Archeology with HERMES survey.

  • This implies that even if the SFR is low in these systems, it will continue to proceed unabated unless there is close interaction with the host galaxy. However, while these studies suggest that our assumption is reasonable, the complete validation for this scenario lies in the nature of interaction pathways for satellites galaxies with their host halos—i.e., whether accretion-driven quenching is either a very short or very long process in duration, meaning that stars that currently belong to the halo predominantly come from "short-duration" accretion events while the current dSphs are long-lived fossils from reionization or "long-duration" accretion/interaction events with the halo such that they do not adversely affect the statistical representation of the stellar halo and the strength of the analysis in kind.

  • Our data are constructed from accreted dwarfs that become completely disrupted and subsumed by the halo as halo field stars (Bullock & Johnston 2005). Cooper et al. (2010) point out that while it is unlikely for the most massive accreted dwarfs to lose all of their mass to the stellar halo, they do typically lose of the stellar mass to the halo field shortly after accretion. This occurrence provides many times over the number of stars needed to chemically represent the accretion of the most massive systems.

  • Each track used is generated by a unique SFH that conforms to the merger tree history of its respective dwarf. This ensures that each satellite template spans the dispersion in SFHs found among the simulated dwarfs used to create it. This, in turn, implies that the success of our method (see Section 3 and onward) allows for the accommodation of differences in SFHs among satellites of similar mass.

  • During the course of our investigation, we repeated our maximization step with random starts (for every halo and sample size) many times. In every instance, each run converged at (nearly) the same optimal value, which illustrates that, given our setup, the global maximum was attained essentially every time.

  • This definition is chosen to obtain the most general sense of FoE statements (which are common in astronomy), such as "the observed (generic) measurements are within a factor of two of theoretical predictions." This statement implies that observed measurements are between less than twice and greater than half of the theoretical values in question.

  • 10 

    In a similar effort to this work, Schlaufman et al. (2012) analyzed the [Fe/H] and [α/Fe] chemical signatures of 9005 SEGUE stars in the MW (smooth) halo to ascertain the relative contributions to the accreted structure of the smooth halo, finding a strong correlation between the SEGUE data and the accretion formation of MW halo analogs in N-body simulations at distances beyond 15 kpc from the Galactic center. Our choice of sample size demonstrates another way in which this data set might be used.

  • 11 

    The actual chosen cutoff here for FoE values is . Given that this research is presented as a proof of concept, we wanted to capture FoE values that were consistent with an FoE = 2. Since such a cutoff is arbitrary, the reader is free to reexamine the selected columns of Figure 9 and reconstruct estimates for different FoE cutoff values.

  • 12 

    Hence, k equals the number of templates in an STS estimate (mEM) minus the number of those templates that are actually occupied in the true AHP (mT).

10.1088/0004-637X/802/1/48
undefined