This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Brought to you by:

IMPROVED ESTIMATES OF THE MILKY WAY'S STELLAR MASS AND STAR FORMATION RATE FROM HIERARCHICAL BAYESIAN META-ANALYSIS

and

Published 2015 June 10 © 2015. The American Astronomical Society. All rights reserved.
, , Citation Timothy C. Licquia and Jeffrey A. Newman 2015 ApJ 806 96 DOI 10.1088/0004-637X/806/1/96

0004-637X/806/1/96

ABSTRACT

We present improved estimates of several global properties of the Milky Way, including its current star formation rate (SFR), the stellar mass contained in its disk and bulge+bar components, as well as its total stellar mass. We do so by combining previous measurements from the literature using a hierarchical Bayesian (HB) statistical method that allows us to account for the possibility that any value may be incorrect or have underestimated errors. We show that this method is robust to a wide variety of assumptions about the nature of problems in individual measurements or error estimates. Ultimately, our analysis yields an SFR for the Galaxy of ${{{\rm \dot{M}}}_{\star }}=1.65\pm 0.19\;{{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}$, assuming a Kroupa initial mass function (IMF). By combining HB methods with Monte Carlo simulations that incorporate the latest estimates of the Galactocentric radius of the Sun, R0, the exponential scale length of the disk, Ld, and the local surface density of stellar mass, ${{{\Sigma}}_{\star }}({{R}_{0}})$, we show that the mass of the Galactic bulge+bar is ${\rm M}_{\star }^{B}=0.91\pm 0.07\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$, the disk mass is ${\rm M}_{\star }^{D}=5.17\pm 1.11\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$, and their combination yields a total stellar mass of ${{{\rm M}}_{\star }}=6.08\pm 1.14\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ (assuming a Kroupa IMF and an exponential disk profile). This analysis is based upon a new compilation of literature bulge mass estimates, normalized to common assumptions about the stellar IMF and Galactic disk properties, presented herein. We additionally find a bulge-to-total mass ratio for the Milky Way of $B/T=0.150_{-0.019}^{+0.028}$ and a specific SFR of ${{{\rm \dot{M}}}_{\star }}/{{{\rm M}}_{\star }}=2.71\pm 0.59\times {{10}^{-11}}$ yr−1.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

Determining the global properties of the Milky Way inherently poses unique challenges. Unlike any other galaxy in the universe, we lack the ability to study the Milky Way from an outside perspective. This disadvantage is greatly amplified by our location within the disk, forcing us to peer through the dusty interstellar medium (ISM) when looking toward other stars. The Galactic ISM inhibits our view of more distant regions of the disk, particularly in the optical and near-UV wavelengths (cf. Herschel 1785; Schlegel et al. 1998). For these reasons, there are a limited number of studies in the literature that aim to produce a global picture of our Galaxy.

Here, we aim to improve our understanding both of the total star formation rate (SFR) and the total stellar mass of the Milky Way. We do this, not by analyzing any new observational data, but by statistically combining the prior measurements of these properties in the literature using the power of a hierarchical Bayesian (HB) method (Press 1997, pp. 49–60). Such methods are not new to astronomy, though still rare in the literature; they can be a robust and versatile tool for both data and model analysis, and subsequently their prevalence in the literature has grown greatly in the past few decades (Loredo 2012). For instance, Newman et al. (1999) applied this technique (in a maximum likelihood framework) to combine the observed properties of 43 ± 7 real and similar number of imposter Cepheid variables found in the Centaurus cluster, handling the possibility of false positive detections, in order to determine a period–luminosity-relation-based distance modulus to NGC 4603. Lang & Hogg (2012) were able to produce tight constraints on the orbit of Comet 17 P/Holmes by applying this method to the diverse set of image query results for "Comet Holmes" obtained from the Yahoo! internet search engine. March et al. (2014) present an HB model to improve constraints on cosmological parameters by combining information from supernovae Ia lightcurves. Shetty et al. (2013, 2014) use the HB method to reveal the Kennicutt–Schmidt relation, i.e., the relationship between SFR, ${{{\rm \dot{M}}}_{\star }},$ and molecular gas surface density, ${{{\Sigma}}_{{\rm mol}}}$, to be non-universal and in many cases sub-linear, indicating that ${{{\Sigma}}_{{\rm mol}}}$ alone is insufficient to predict a galaxy's SFR. Most recently, Mandel et al. (2014) applied this method in order to disentangle the effect of systematic reddening due to host galaxy dust from the expansion-velocity-dependent variation in intrinsic SN Ia colors.

Adopting a Bayesian perspective, our major goal in this paper is to answer the question: Given the previous measurements of a given parameter of the Milky Way, what conclusions can we draw about its true value? We first apply this analysis method to estimate the Milky Way's SFR. We next consider the bulge and overall mass of the Milky Way. This introduces additional complications due to variations in the Galactocentric radius of the Sun, R0, assumed in different measurements. To deal with this, we combine our HB analysis with Monte Carlo (MC) simulations which incorporate the current uncertainties in R0. The MC method allows us to simultaneously produce new estimates for the stellar mass contained in the bulge+bar, ${\rm M}_{\star }^{B}$, the stellar disk mass, ${\rm M}_{\star }^{D},$ and the total stellar mass, ${{{\rm M}}_{\star }}$, of the Milky Way, assuming the single-exponential disk model from Bovy & Rix (2013), and incorporating uncertainties in R0, the exponential scale length of the disk, Ld, and the local surface density of stellar material, ${{{\Sigma}}_{\star }}({{R}_{0}})$.

We structure this paper as follows. In Section 2 we describe the HB formalism we use in order to construct aggregate results incorporating the information from a variety of independent measurements. In Section 3 we apply this technique to the prior estimates of the Milky Way's SFR. The results of the SFR analysis are discussed in Section 3.3. Next, we take on a more complex example, ultimately constraining the total stellar mass in the Milky Way. Section 4.1 details the stellar disk model we assume for this study. In Section 4.2, we explain how we use MC simulations to produce a new estimate of the disk mass, to supplement the HB analysis used to determine the bulge+bar mass, and we combine these two results to yield the total stellar mass in the Galaxy. The results for these three measurements are discussed in Section 4.5. We summarize and discuss the main results of this study in Section 5. Lastly, in the Appendix we explore how plausible it is that Galactic disk deviates from a pure exponential profile, and investigate how this would affect our results.

2. HIERARCHICAL BAYESIAN ANALYSIS

In this section, we describe the analysis methods we use to combine a set of measurements for some observable (e.g., ${{{\rm \dot{M}}}_{\star }}$ or ${\rm M}_{\star }^{B}),$ along with their attendant uncertainties, into one aggregate result using a HB formalism. Ultimately, this process provides a new probability distribution function (PDF), referred to in Bayesian terms as the posterior, for the observable of interest by combining the PDFs yielded from multiple individual estimates, enabling us to incorporate the information gained from a variety of independent observations and analyses. This method allows us to account for the possibility that any of the measurements may be incorrect or affected by systematic errors that have been overlooked. This is a key advantage over calculating simpler statistics, such as the standard inverse-variance weighted mean, which are more easily skewed by outliers and are also contingent on the assumptions that the individual PDFs are Gaussian in form and statistically compatible.

2.1. Defining the Problem

We begin by collecting a set of N independent measurements of observable ${\mathcal{O}}$ from the literature, and denote this data set as ${\mathcal{D}}.$ In a Bayesian framework, each study can be considered to provide a PDF for the true value of ${\mathcal{O}},$ ${{\mu }_{0}}$, given the data obtained; if the probability distribution is Gaussian in form, it can be described by its mean value ${{\mu }_{i}}$ and standard deviation ${{\sigma }_{i}}$. Under the assumption of normality, we know there should be a ∼68% chance of ${{\mu }_{0}}$ being in the range ${{\mu }_{i}}\pm {{\sigma }_{i}}$ and ∼95% chance of it being within ${{\mu }_{i}}\pm 2{{\sigma }_{i}}$, assuming all errors (statistical and systematic) are accounted for in ${{\sigma }_{i}}$. Of course, this is not always a safe assumption; often we may find that two separate measurements of ${\mathcal{O}}$ are in tension with each other, producing results with $1\sigma $ or even $2\sigma $ confidence regions that do not overlap. If this tension is sufficiently great, we can conclude that at least one of the estimates of ${{\mu }_{0}}$ must be affected by a bias or systematic error that has not been incorporated in ${{\sigma }_{i}}$; this is not sufficient to determine which of the two estimates is problematic.

2.2. Relieving the Tension

Practically speaking, the measurements included in our data set, given their nominal errors, must always be in tension with each other at some level of significance. In order to resolve this tension we can hypothesize that some of these studies have overestimated their ability to measure ${\mathcal{O}}.$ Suppose that fgood denotes the fraction of "good" measurements; i.e., the probability that any single measurement within ${\mathcal{D}}$ is accurately described by ${{\mu }_{i}}$ and ${{\sigma }_{i}}$. Thus, $1-{{f}_{{\rm good}}}$ is the global fraction of measurements that are "bad"—i.e., inaccurate—generally due to underestimated error bars (e.g., ones that omit the possibility of significant systematic errors). With no a priori knowledge of which estimates are not "good," we must find a way to remedy their effect when obtaining combined constraints on ${{\mu }_{0}}$. In this study we explore a number of ways to do this. For instance, perhaps the "bad" estimates simply require their error bars to be scaled by a constant factor; we will denote the resulting degraded uncertainty estimate by ${{\sigma }_{n,i}}=n{{\sigma }_{i}}$. We refer to this as the "free-n" model below.

More likely, problematic measurements may have overlooked systematic uncertainties in their methods which should be added in quadrature to the nominal error estimates. Let $\mu _{i}^{{\rm MED}}$ denote the median of all the ${{\mu }_{i}}$ values. We investigate what happens when adding a fractional amount Q of $\mu _{i}^{{\rm MED}}$ in quadrature to the nominal error bars, such that the error for a "bad" estimate is given by $\sigma _{Q,i}^{2}=\sigma _{i}^{2}+{{(Q\mu _{i}^{{\rm MED}})}^{2}}$. We use $Q\mu _{i}^{{\rm MED}}$ here, instead of $Q{{\mu }_{i}}$, to avoid a slight bias toward assigning lower errors to lower valued estimates of ${{\mu }_{0}}$. We refer to this as the "free-Q" model below. Instead, we could consider a case where there is a floor value of fractional error which the the nominal error bars should not dip below. In such a case we would use ${{\sigma }_{F,i}}=F\mu _{i}^{{\rm MED}}$ if ${{\sigma }_{i}}\lt F\mu _{i}^{{\rm MED}}$, or otherwise ${{\sigma }_{F,i}}={{\sigma }_{i}}$. We refer to this as the "free-F" model below.

Furthermore, it is possible that some of the included estimates are entirely wrong, and thus should effectively be excluded from ${\mathcal{D}}.$ In this scenario we would handle this by replacing the nominal Gaussian PDF with ${{\mu }_{i}}$ and ${{\sigma }_{i}}$ by a uniform probability distribution over the full potential parameter range. We label this as the "${{P}_{{\rm bad}}}$-flat" model below. We also examine the results of assuming we have included no "bad" measurements in our analysis by forcing ${{f}_{{\rm good}}}=1$ instead of allowing it as a free parameter (i.e., to assume that all of the ${{\mu }_{i}}$ and ${{\sigma }_{i}}$ are correct). All of these ways of dealing with the inclusion of "bad" measurements in our data set have different advantages and address the problem from a slightly different approach. A hierarchical model allows us to quantify the affect of the "bad" measurements on the combined results by simultaneously fitting parameters that describe the data (e.g., fgood, n, Q, F), while also fitting for those that describe the physical model (e.g., ${{\mu }_{0}}$). It is important to note that, in all cases, the constraints on the physical parameter of interest yielded by this technique will only be improved if the systematics affecting the individual measurements are different, as systematic errors that are common across a set of measurements do not improve by adding more data. In the following section, we describe the HB formalism we use to analyze each scenario and the criteria we use to distinguish which of these models best fits the data.

2.3. The Formalism

We closely follow the prescription laid forth by Press (1997, pp. 49–60) and refer the reader to this work for a more in-depth derivation. Bayes' theorem provides the framework in which we can calculate the probability of a particular model given data ${\mathcal{D}}:$

Equation (1)

where Θ is a vector containing the free parameters of the model. Here, the posterior probability $P({\Theta}|{\mathcal{D}})$ is equal to the product of the likelihood $P({\mathcal{D}}|{\Theta})$ and the prior $P({\Theta})$, divided by the evidence $P({\mathcal{D}})$. The likelihood is the probability of obtaining the actual measurements ${\mathcal{D}}$, given that the parameters Θ specify the correct model of the data. The prior reflects our previous knowledge of what the parameters of the true model are, before the data ${\mathcal{D}}$ are considered. In general, this must be subjectively chosen. The evidence represents the overall probability of finding the data in hand, and when considered on its own it provides a useful means of comparing different models, as we will discuss in Section 2.3.4. This is obtained by integrating the likelihood weighted by the prior (i.e., the numerator of Equation (1)) over all possible values of Θ; hence it is also sometimes called the marginal likelihood. As written here, the posterior yields a properly normalized PDF representing the degree of belief of the model parameters Θ being in a given range.

2.3.1. The Likelihood

We begin with the assumption that all measurements may be represented by the combination of two probability distributions, representing the possibility that it is "good" or "bad." The PDF for ${{\mu }_{0}}$ given a measurement will be $P({{\mu }_{0}})$ = $P({{\mu }_{0}}|{\rm good})P{\rm (good)}$ + $P({{\mu }_{0}}|{\rm bad})(1-P{\rm (good)});$ the probability that a given measurement is "good" is simply fgood. We assume that each measurement in ${\mathcal{D}}$ is statistically independent from all the others; in that case we may write the overall likelihood for ${\mathcal{D}}$ as the product of the likelihoods for each measurement we include. For each of the scenarios described above in Section 2.2, the appropriate likelihood is given by:

For the free-n model (i.e., where "bad" measurements are assumed to be underestimating errors by a constant factor), the likelihood is

Equation (2)

For the free-Q model (i.e., where "bad"-measurement errors are assumed to require extra uncertainty added in quadrature), the likelihood is

Equation (3)

For the free-F model (i.e., where "bad" measurements are assumed to be underestimating errors only if they are below a minimum threshold), the likelihood is

Equation (4)

In this model, fgood is the fraction of "good" measurements assuming the given value of F; i.e., it is the fraction of accurate measurements among those with ${{\sigma }_{i}}\lt F\mu _{i}^{{\rm MED}}$. This is subtly different from the former two models, where fgood characterizes the entire data set at any given value of n or Q.

For each of the above three models, we also explore the results of assuming that all measurements included in ${\mathcal{D}}$ have misestimated errors to some extent by not treating fgood as a free parameter, but rather setting it to zero.

For the ${{P}_{{\rm bad}}}$-flat model (i.e., where "bad" measurements are assumed to be entirely wrong and discardable, and so we replace their PDFs with a flat distribution), the likelihood is

Equation (5)

The constant here is chosen such that the integral of $P({{\mu }_{0}}|$ "bad") is equal to 1.

Lastly, there is the possibility that all of the measurements included in our analysis are "good" and have accounted for all uncertainties in their analyses. We can determine ${{\mu }_{0}}$ for this scenario by setting fgood to unity; in this case, the results of the HB analysis method become equivalent to the inverse-variance weighted mean of the individual measurements. Effectively, this model serves as the null hypothesis of our study, and we denote this as the all-"good" model when comparing our results.

In total, this yields eight different ways of modeling "bad" measurements that could influence our aggregate estimate of ${{\mu }_{0}}.$ In Section 2.3.4 we will describe which of these options is best to follow.

2.3.2. The Prior

If we assume that our prior knowledge of the parameters of our model are unrelated to each other, the overall prior for Θ is separable into the product of the priors for each parameter. For example, in the context of this paper, we would not expect that the probability of an astronomer quoting accurate error bars on his/her result should be different if the Milky Way is truly producing three solar masses worth of new stars each year as opposed to two. This means we can write the joint prior $P({{\mu }_{0}},{{f}_{{\rm good}}})$ as $P({{\mu }_{0}})P({{f}_{{\rm good}}}|{{\mu }_{0}})=P({{\mu }_{0}})P({{f}_{{\rm good}}})$. Likewise, in the absence of data, we would not expect a parameter that describes how inaccurate a study's error bars are to be dependent on the probability of any one measurement being "bad" or the mass of new stars being formed in the Galaxy per year.

For all four free parameters we use to characterize "good" and "bad" estimates we choose flat priors, meaning that we believe there is a 100% chance of the true value being within a given range, and any single value within that range has equal probability to any other. We note that these assumptions are somewhat arbitrary; for instance, a flat prior in a given quantity corresponds to a non-flat prior for the log of that quantity. Flat priors in log space are generally preferred for quantities with no relevant scales/order of magnitude; however, that does not apply here. Effectively, our choice causes all posterior distributions we calculate to be determined only by the likelihood of the observed data. For the free-n model we assume 100% probability that n is in the range [1, 4]; i.e., that the errors for "bad" measurements will be underestimated by a factor no less than 1 and no greater than 4. In turn, for the free-Q model we assume 100% probability that Q is in the range [0, 1]; i.e., the extra error needed to be added in quadrature to correct the nominal error bars of a "bad" measurement is no less than 0 and no more than 100% of $\mu _{i}^{{\rm MED}}$. Lastly, for the free-F model we assume 100% probability that F is in the range [$\sigma _{i}^{{\rm MIN}}/\mu _{i}^{{\rm MED}}$, 1], where $\sigma _{i}^{{\rm MIN}}$ is the minimum error estimate of all measurements included in ${\mathcal{D}};$ i.e., the minimum error estimate for any "bad" measurement should be no less than $\sigma _{i}^{{\rm MIN}}$ and no larger than $\mu _{i}^{{\rm MED}}$. The priors can then be expressed as piece-wise functions,

Equation (6)

Equation (7)

Equation (8)

Equation (9)

2.3.3. The Marginalized Posterior

In this subsection, we detail how the posterior PDF for a single parameter of our model is produced from the joint posterior from the HB analysis, using the free-n model as an example. Returning to Equation (1), the posterior PDF for the parameters of our model will be

Equation (10)

It is often more informative to consider the posterior for an individual parameter; we can simply obtain this by marginalizing: i.e., integrating the PDF over all other parameters of the model. For instance, in our example, if we are interested in the marginalized posterior for ${{\mu }_{0}}$, then we calculate

Equation (11)

Note that this result averages over all possible values of fgood and n. Lastly, we normalize the posterior to be a true PDF by dividing by the evidence, which is obtained by integrating Equation (11) over all ${{\mu }_{0}}$.

2.3.4. Choosing Among Models

In this study, we consider a variety of ways to model the inclusion of "bad" measurements in our data set. Generally speaking, a model with more free parameters will be able to produce a better fit to the data; however, the constraints on the model will be weaker, as there are more degeneracies introduced between parameters. In order to compare the utility of the models to each other, we calculate the evidence for each model. If Mi labels a particular model which is specified by parameters Θ, then the evidence for model Mi is

Equation (12)

this is simply the likelihood integrated over all possible parameter values of the model, weighted by their priors. This provides a natural, Bayesian method of incorporating the principle of Ockham's razor into our model comparison. Essentially, the evidence will be maximized by a model's ability to better fit the data; however, this will be compromised if excessive parameter space is required to achieve such a fit. As the all-"good" model represents the null hypothesis for our HB analysis, for each model we report the Bayes Factor defined as

Equation (13)

Effectively, K represents the ratio of the posterior odds to the prior odds of Mi the being the correct model over ${{{\rm M}}_{{\rm all}-{\rm good}}}$ (Kass & Raftery 1995, and references therein). As defined this way, we choose the best model of the data to be the one with largest K. In our tables we list for each model the difference in ${{{\rm log} }_{10}}\;K$ compared to our fiducial model; a difference of 2 is commonly used to indicate statistically significant differences in model utility.

As alternatives to the Bayes Factor, we also report the Akaike information criterion (Akaike 1974, hereafter AIC), defined as

Equation (14)

and the Bayesian information criterion (Schwarz 1978, hereafter BIC), defined as

Equation (15)

where N is the number independent measurements included in our analysis, k is the number of free parameters of the model, and ${\mathcal{L}}$ is the maximum likelihood value. These provide a secondary and less sophisticated way of weighing the goodness of fit against the number of free parameters included in each model, and whose comparison can serve as a rough (and less computationally intensive) approximation to comparing ${{{\rm log} }_{10}}\;K$ values. However, in contrast to the K values, the best model of the data would be the one that minimizes the information criteria. Thus, for each model we report ΔAIC (or ΔBIC) measured from the lowest AIC (or BIC) value among all models; similar to the ${\Delta}{{{\rm log} }_{10}}\;K$ values, a change of ∼2 is indicative of statistically significant differences in model utility (Burnham & Anderson 2002). Note that, as AIC and BIC values are calculated on a natural logarithm scale, this constraint is a bit less conservative than the one used for ${\Delta}{{{\rm log} }_{10}}\;K$ values (i.e., bigger differences in ${\Delta}{\rm ln} K$ are required to indicate a significant difference).

The remainder of this paper focuses on two application of the HB method. First, we use this technique to constrain the SFR of the Milky Way. In our second and more complex example, we produce an hierarchical estimate of the stellar mass contained in the combined bulge & bar components of the Milky Way, where we must incorporate MC simulations into our technique to reflect uncertainties in the Sun's Galactocentric radius. Simultaneously, we produce PDFs for the stellar mass of the disk component of the Galaxy, as well as its total stellar mass. As mentioned earlier, we are effectively assuming that if there are systematic errors in measurements of the Milky Way properties, they will generally not be in common among all the techniques, but rather, given the multitude of methods applied, many methods should have different systematics (of differing signs). If that is not the case, errors will not be reduced when combining information from multiple results.

3. THE MILKY WAY'S STAR FORMATION RATE

3.1. The SFR Data

Recent work by Chomiuk & Povich (2011) provides a thorough review and renormalization of Galactic SFR, ${{{\rm \dot{M}}}_{\star }},$ estimates made in the last three decades. Examining these original works reveals a discouraging scatter in the derived results, spanning the range of 1 to 10 ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}.$ This proves to be predominantly due to a heterogeneous mixture of initial mass functions (IMFs) and stellar population synthesis (SPS) models used. The authors translate these results all to a uniform choice of the Kroupa broken-power-law IMF (Kroupa & Weidner 2003) as well as the Starburst99 v5.1 SPS model (Vázquez & Leitherer 2005). As a result, these studies, which encompass many different methods and observational surveys, all collectively are in general agreement with each other after being placed on an equal footing, converging to an SFR of ${{{\rm \dot{M}}}_{\star }}=1.9\pm 0.4$ ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}.$ We refer the reader to Table 1 of Chomiuk & Povich (2011) for the data we use to estimate the Milky Way SFR, as well as to their Section 3 for a detailed discussion of the measurements. We do not utilize the first two entries in that table, the measurements from Smith et al. (1978) and Güsten & Mezger (1982), as these were both updated by the Mezger (1987) radio free–free result. Additionally, the error estimate on the Misiriotis et al. (2006) dust-heating measurement has been increased to 0.95 ${{{\rm M}}_{\odot }};$ i.e., due to the lack of error estimates we assign 50% uncertainty to it. This particular data set is denoted ${{{\mathcal{D}}}_{{\rm SFR}}}$ hereafter.

3.2. Setting a Prior on ${{{\rm \dot{M}}}_{\star }}$

To place a prior on its SFR, we utilize the fact that the Milky Way is confidently known to be a spiral galaxy. In addition, it appears that the Galaxy has experienced a relatively quiet merger history, undergoing predominantly secular evolution with no significant interactions to spark a large burst of new stars (e.g., Unavane et al. 1996; Mutch et al. 2011). Therefore, we agnostically assume that the Galactic SFR could be anywhere from 0 to 6 solar masses per year (before considering the data in Section 3.1). This is represented using a uniform probability distribution such that

Equation (16)

3.3. SFR Results

Table 1 shows the HB constraints on ${{{\rm \dot{M}}}_{\star }}$ from each model that we consider. The overwhelming similarity between the posterior results from these different models to a simple weighted average (corresponding to the last line in the table) suggests that nominal errors on each measurement in the SFR data set are likely very accurate, if not overestimated. In Figure 1, we show the joint posterior PDF for fgood and n (normalized to a peak value of 1); the posterior obtained by marginalizing over ${{{\rm \dot{M}}}_{\star }}$ is maximized where fgood and n each approach unity. Figure 2 shows the marginalized posterior probability for fgood for each model which allows it to be a free parameter. It is clear from the plot that modifying the error bars of "bad" measurements in any way yields little preference for a particular value of fgood, whereas throwing out "bad" estimates favors values near one. This occurs because for values of n near 1 (or Q or F near 0), fgood has very little effect, so a broad range of fgood values yield similar results.

Figure 1.

Figure 1. The joint posterior probability distribution function (PDF), $P({{f}_{{\rm good}}},n|{{{\mathcal{D}}}_{{\rm SFR}}})$, describing the probability of each possible value of the fraction of "good" estimates, fgood, (i.e., ones with accurate error bars) included in our Milky Way star formation rate (SFR) data set, ${{{\mathcal{D}}}_{{\rm SFR}}}$, and the scale factor n needed to expand the error bars for any measurement treated as not "good." This two-dimensional probability distribution is produced from a hierarchical Bayesian (HB) analysis, where we have marginalized over all possible true values for the Milky Way's SFR. For ease of reading, we have normalized the peak value to 1. This plot clearly shows how strongly the HB analysis favors a model with minimally adjusted error bars on the individual estimates included in our analysis: the probability given the observed data set is maximized if either n is ∼1, and/or fgood is ∼1.

Standard image High-resolution image
Figure 2.

Figure 2. The posterior PDF, $P({{f}_{{\rm good}}}|{{{\mathcal{D}}}_{{\rm SFR}}})$, for the fraction of "good" estimates (i.e., ones with accurate error bars), fgood, included in our Milky Way star formation rate (SFR) data set, ${{{\mathcal{D}}}_{{\rm SFR}}}$, for each model of "bad" measurements we consider. Each curve is produced from a hierarchical Bayesian analysis, where we have marginalized over all other free parameters in the model. Each corresponds to a different model of how to remedy the inclusion of "bad" measurements included in ${{{\mathcal{D}}}_{{\rm SFR}}}$: the solid red curve corresponds to a case where we must multiply the error bars for "bad" measurements by a free parameter n; the dashed blue curve is produced by adding a fractional amount (given by the free parameter Q) of the median value of all estimates included in ${{{\mathcal{D}}}_{{\rm SFR}}}$ in quadrature to their nominal error bars; the green dash-dotted curve results from imposing a floor on "bad" measurements' nominal error bars equal to a fraction amount (given by the free parameter F) of the median estimated SFR; and the purple triple-dot-dashed curve corresponds to a case where we model "bad" measurements as entirely wrong, and replace them with uniform probability distributions over the entire parameter space. We see that all models peak at ${{f}_{{\rm good}}}=1$, indicating that there is minimal tension between the measurements included in ${{{\mathcal{D}}}_{{\rm SFR}}}$. Smaller values of fgood can be consistent with the data for most models, but only if Q or F is ∼0 or n ∼ 1, corresponding to no difference between "good" and "bad" measurements.

Standard image High-resolution image

Table 1.  Combined SFR Results For Various Model Assumptions

Model Optimal Combined ${{{\rm \dot{M}}}_{\star }}\pm 1\sigma $ k b ΔAIC ΔBIC ${\Delta}{{{\rm log} }_{10}}K$
Valuea $({{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}})$
fgood Free—Some of the measurements have inaccurate error bars.
Free-n 1.00 1.65 ± 0.20 3 4.0 4.4 −0.73
Free-Q 0.00 1.66 ± 0.21 3 4.0 4.4 −0.31
Free-F 0.26 1.67 ± 0.22 3 4.0 4.4 −0.25
${{P}_{{\rm bad}}}$-flat N/A 1.65 ± 0.20 2 2.0 2.2 −0.73
fgood = 0 —All of the measurements have inaccurate error bars.
Free-n 1.00 1.65 ± 0.22 2 2.0 2.2 −1.59
Free-Q 0.00 1.67 ± 0.23 2 2.0 2.2 −0.58
Free-F 0.16c 1.69 ± 0.26 2 2.0 2.2 −0.49
fgood = 1 —None of the measurements have inaccurate error bars.
All-"good"d N/A 1.65 ± 0.19 1

Notes.

aThe value of n, Q, or F marking the peak of marginalize posterior PDF for these quantities in each model. bThe number of free parameters in the model. See Section 2.3.4. cThis corresponds to the lowest allowed value of F in the model, ${{F}^{{\rm MIN}}}$, where the smallest allowed error on any estimate in ${{{\mathcal{D}}}_{{\rm SFR}}}$ is just the minimum error estimate; that is, ${{\sigma }_{F,i}}={{F}^{{\rm MIN}}}\mu _{i}^{{\rm MED}}=\sigma _{i}^{{\rm MIN}}$. Since ${{F}^{{\rm MIN}}}$ affects no estimates in ${{{\mathcal{D}}}_{{\rm SFR}}}$, this is equivalent to setting fgood to unity (see Equation (4)). In fact, the HB analysis most strongly supports making no adjustment to the nominal error bars (i.e., the all-"good" model where we always set ${{f}_{{\rm good}}}=1$). dSince we use flat priors that are much broader the the likelihood PDF, this is equivalent to the inverse-variance weighted mean.

Download table as:  ASCIITypeset image

Similar analyses to that in Figure 1 for each of the models of "bad" measurements yield the same overall message: the measurements are sufficiently consistent with each other that little, if any, adjustment of the nominal errors is demanded. In fact, we find that for any model for how to remedy "bad" estimates within the HB formalism, the data always drives toward a point in k-dimensional parameter space with few points treated as discrepant; i.e., ${{f}_{{\rm good}}}\approx 1$, and ${{{\rm \dot{M}}}_{\star }}\approx 1.66$ ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}$; or, alternatively, ${{f}_{{\rm good}}}\lt 1$ but n = 1 or Q or F = 0, which has equivalent effect. The best model of the data (i.e., the one with largest Bayes Factor, K) turns out to be where we set ${{f}_{{\rm good}}}=1$, uniformly treating every estimate as "good" and hence requiring the fewest free parameters in the fit (only ${{{\rm \dot{M}}}_{\star }}$ itself). This yields an aggregate SFR for the Milky Way of ${{{\rm \dot{M}}}_{\star }}=1.65\pm 0.19$ ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}},$ which we choose as our fiducial result. For comparison, we overlay the individual measurements in the SFR data set on our fiducial hierarchical result in Figure 3. We note that, as we are using flat priors, this scenario gives the same result as the inverse-variance weighted mean (IVWM). The finding that the data are sufficiently compatible that a simple model is sufficient matches well with what we would conclude using the chi-squared statistic to judge goodness of fit. Specifically, the IVWM for our sample of SFR measurements yields ${{\chi }^{2}}=3.83$, and we would expect a 95% chance of this statistic falling in the range [2.18, 17.53].

Figure 3.

Figure 3. The posterior PDF, $P({{{\rm \dot{M}}}_{\star }}|{{{\mathcal{D}}}_{{\rm SFR}}})$, for the Milky Way's star formation rate (SFR) is shown as a solid black line. This result is produced from the hierarchical Bayesian (HB) analysis where we assume all measurements have accurately estimated error bars; since we have assume flat priors over the entire parameter space, this is equivalent to the inverse-variance weighted mean. This model is the simplest one that provides a good fit to the data, and hence is favored by both the Bayesian evidence and information criteria. For comparison, we overlay the individual estimates from our SFR data set as dashed/dotted colored lines. We see that our methods yield a more tightly constrained estimate of the SFR of the Milky Way, while also being consistent with each individual estimate incorporated into the combined result.

Standard image High-resolution image

Chomiuk & Povich present arguments suggesting that the Robitaille & Whitney (2010) measurement may more accurately be treated as a lower limit rather than a best-fit value for the SFR. Given that uncertainty, we investigate the effect of doubling the nominal errors in that measurement within our calculation. Making this change yields a $\lt 1\sigma $ shift in the mean of the posterior distribution. With this deweighting, the inclusion or exclusion of this measurement has little impact on our results (changing the consensus Milky Way SFR by no more than 0.17 ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}$). A second concern arises from that fact that Equations (2)–(5) are contingent on the individual measurements being independent of one another. We note that, even though Bennett et al. (1994) and McKee & Williams (1997) utilize the same COBE data, the errors in their SFR estimates appear to be dominated by the differences between the assumptions made by the authors, and so we expect that we can treat them as independent measurements. We show how our aggregate SFR estimate varies under these different treatments of the Robitaille & Whitney (2010) measurement, as well as when excluding the Bennett et al. (1994) measurement, in Table 2.

Table 2.  Combined SFR Results For Various Data Assumptions

Model Treatment of Data Combined ${{{\rm \dot{M}}}_{\star }}\pm 1\sigma $
$({{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}})$
all-"good" Including RW10a with nominal errors (Fiducial) 1.65 ± 0.19
all-"good" Including RW10 with errors doubled 1.77 ± 0.21
all-"good" Excluding RW10 from the calculation 1.82 ± 0.21
all-"good" Excluding B94b from the calculation 1.63 ± 0.19

Notes.

aRobitaille & Whitney (2010). bBennett et al. (1994).

Download table as:  ASCIITypeset image

If all of the SFR estimates employed suffer from a common systematic error, this would not be reflected in the HB result. For instance, if the Kroupa IMF is unlike the actual IMF in the Milky Way, our SFR estimates could all be systematically off from the true value in a similar way. However, this error would cancel out when the Galaxy is compared to extragalactic objects, for which SFRs and stellar masses are generally calculated assuming Kroupa-like IMFs. Similarly, Chomiuk & Povich have recalibrated each of the Milky Way ${{{\rm \dot{M}}}_{\star }}$ estimates to the Kennicutt (1998) assumption that an SFR of 1 ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}$ produces a Lyman continuum photon rate of $9.26\times {{10}^{52}}$ photon s−1 for a Salpeter IMF (this relationship is then reduced by a factor of 1.44 to convert it to the Kroupa IMF). If this assumption is in error, then all the SFRs in our data set would be affected. However, once more, extragalactic SFRs would be off by the same factor, so that this systematic will drop out when comparing the Milky Way to other galaxies.

Apart from the IMF and the Lyman continuum rate to SFR relationship, common-mode systematics among all the SFR measurements appear to be unlikely. This is due to both the diversity of techniques used to estimate the Galactic SFR and the wide range of assumptions made by the different studies that utilize the same basic techniques. However, one might still worry that there are common underlying assumptions that may systematically offset SFR results from one method in comparison to all the others. For instance, two of the nine studies we use estimate the Galactic SFR by modeling the population of young stellar objects (YSOs) found from infrared (IR) surveys of the Galaxy. While these studies employ entirely different IR data, they are both contingent on assumptions about the properties of YSOs, which do not affect the other measurements, and in turn can systematically shift these results in unison with respect to the measurements utilizing different methods. Our HB analysis assumes that the errors from each study are random compared to each other and hence does not address this type of common-mode effect (much as the IVWM would not).

To test whether method-to-method variations are significant, we have performed bootstrap resampling of the SFR data by randomly drawing only one measurement out of those utilizing each measurement technique (e.g., all those based on the measured ionization rate or YSO counts). Hence, each resampled data set comprises four SFR estimates, each obtained from a unique measurement method, which we then use to calculate the HB posterior using the all-"good" model (allowing no extra compensation for systematic errors). We repeat this process 1000 times, each time measuring the mean from the posterior distribution. We then do a similar analysis, but where we instead draw four measurements at random from the entire data set each time. We find the standard deviation of the mean to be 0.22 if we use four measurements from different methods and 0.25 if we select four measurements completely at random. This indicates that inter-method variations are negligible compared to intra-method random variations—if anything, SFR measurements from different methods are more similar to each other than those which utilize the same technique. We therefore can safely conclude that common-mode systematics do not have a large impact.

4. THE MASS OF THE MILKY WAY'S STELLAR COMPONENTS

In this section we describe the methods we use to produce improved estimates of the total stellar mass, ${{{\rm M}}_{\star }}$, in the Milky Way and its components. To do so, we first make independent estimates of the stellar mass contained in the disk and bulge+bar components, ${\rm M}_{\star }^{D}$ and ${\rm M}_{\star }^{B}$ respectively. For ${\rm M}_{\star }^{D},$ we assume the single-exponential model of the Galactic disk from Bovy & Rix (2013), and use MC simulations to incorporate updated estimates of the Sun's Galactocentric radius, R0. We constrain ${\rm M}_{\star }^{B}$ using our HB formalism, similar to the analysis done in Section 3 for ${{{\rm \dot{M}}}_{\star }},$ but now using the MC simulations to propagate uncertainties in the value of R0 into the bulge mass posterior. Lastly, we combine self-consistent realizations of ${\rm M}_{\star }^{B}$ and ${\rm M}_{\star }^{D}$ to yield a probability distribution describing the total stellar mass.

4.1. The Stellar Disk Model

To model the structure of the Milky Way, we assume the stellar material of the disk is distributed in a single-exponential surface density profile. Integrating this profile over all radii yields the total stellar mass,

Equation (17)

where ${{{\Sigma}}_{\star }}({{R}_{0}})$ is the surface density of stellar mass in the local neighborhood and Ld is the exponential scale length of the disk. Specifically, we constrain these parameters to be consistent with the measurements made by Bovy & Rix (2013, see Appendix; for a discussion of alternative disk models).

Using SDSS/SEGUE spectroscopic measurements, these authors have segregated a uniform sample of ∼16,000 G-type dwarf stars into 43 mono-abundance populations (MAPs) based on their position in [α/Fe]-[Fe/H] space. G-type dwarfs were considered to be the optimal tracers of the structure of the disk as they are most luminous stars whose main-sequence lifetimes are larger than the age of the disk at practically all metallicities. Separated in this way, the spatial distribution of each MAP is well fit by a single-exponential profile both radially from the Galactic center and perpendicularly from the plane of the disk, indicating that the disk is likely composed of a more continuous distribution of populations rather than just the classical separation into thin and thick disk components (see also Bovy et al. 2012; Rix & Bovy 2013).

By independently fitting an action-based distribution function and Galactic potential to each MAP in position–velocity phase-space, the authors are able to measure the vertical force at $|Z|\approx 1$ kpc as a function of Galactocentric radius in the region $4\lesssim R\lesssim 9$ kpc; this quantity is directly proportional to the surface density of stellar mass, ${{{\Sigma}}_{\star }}$. Including the contribution from all MAPs, the authors are able to make the first dynamical determination of the total stellar surface-mass density at the Galactocentric radius of the Sun, ${{{\Sigma}}_{\star }}({{R}_{0}})=38\pm 4$ ${{{\rm M}}_{\odot }}$ pc−2; this is measured directly from the mass distribution of substellar objects, main-sequence stars, and stellar remnants, as opposed to inferring it from their luminosity (cf. Jurić et al. 2008). Similarly, the mass-weighted scale length determined from all MAPs is ${{L}_{d}}=2.15\pm 0.14$ kpc.

Assuming ${{R}_{0}}=8$ kpc and accounting for the uncertainty and covariance in Ld and ${{{\Sigma}}_{\star }}({{R}_{0}})$, Bovy & Rix (2013) find ${\rm M}_{\star }^{D}=4.6\pm 0.3\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ They also find that increasing R0 to 8.5 kpc in their model produces a $1.5\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ increase in ${{{\rm M}}_{\star }};$ this effect approximately scales linearly for intermediate radii. The scale length and local surface density of the disk are independent of R0. For our purposes, it is important to include the current uncertainties in R0 into our model of the Milky Way, and so we employ the following process for this study:

  • (I)  
    We model the covariance between the local surface density and scale length in this model by calculating ${{{\Sigma}}_{\star }}({{R}_{0}})$ as a function of Ld. To do so, we first fit for a linear mapping between these two parameters based on the 2D joint-posterior PDF that describes them (provided by J. Bovy 2013, private communication). This relation gives the appropriate value of ${{{\Sigma}}_{\star }}({{R}_{0}})$ for a given value of Ld. We then determine what uncertainty in ${{{\Sigma}}_{\star }}({{R}_{0}})$ would yield the Bovy & Rix (2013) error of $0.3\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ in ${\rm M}_{\star }^{D}$ using Equation (17) after this covariance and the errors in Ld are accounted for.
  • (II)  
    We assume ${{R}_{0}}=8.33\pm 0.35$ kpc from Gillessen et al. (2009), which is estimated by fitting the orbital parameters of 28 stars in near-orbit of the Galaxy's central massive black hole, building upon 16 years of observations, and taking into account both random and systematic errors. This result is in excellent agreement with other recent measurements of R0 (e.g., Ghez et al. 2008; Vanhollebeke et al. 2009; Sato et al. 2010; Chatzopoulos et al. 2015), with a large enough error to encompass the variation in results among different methods.
  • (III)  
    Given $dM_{\star }^{D}/d{{R}_{0}}$ from Bovy & Rix, we can then predict both the mass that would have been measured by Bovy & Rix for different R0 and the uncertainty in that estimate due to measurement errors in Ld and ${{{\Sigma}}_{\star }}({{R}_{0}})$ alone.

The overall goal of this study is to produce updated and accurate estimates of the total stellar mass and SFR of the Milky Way that may be directly compared with those properties measured for any external galaxy. In particular, we aim to be consistent with the definition of ${{{\rm M}}_{\star }}$ and ${{{\rm \dot{M}}}_{\star }}$ as measured for external galaxies in the MPA-JHU catalogs (Brinchmann et al. 2004), which assume the Kroupa IMF and that ${{{\rm M}}_{\star }}$ includes the contribution from both main-sequence stars and remnants, but not brown dwarfs (BDs), in accordance with the assumptions made for the stellar spectral evolution models of Bruzual & Charlot (2003). The dynamical estimate of ${{{\Sigma}}_{\star }}({{R}_{0}})$ from Bovy & Rix (2013) effectively includes BDs, so one way to get the density we want is to subtract them off.

The BD mass function (i.e., $\xi (M)\propto {{M}^{-\alpha }}$ where $0.005\lt M/{{{\rm M}}_{\odot }}\lt 0.1$) appears to have a power-law index in the range $-0.5\lesssim \alpha \lesssim 0.5$ (Cruz et al. 2007; Kirkpatrick et al. 2012; Burningham et al. 2013; Day-Jones et al. 2013). We normalize the mass function so that the portion corresponding to L0–L3 BDs, i.e., objects with masses in the range $0.03\lt M/{{{\rm M}}_{\odot }}\lt 0.05$ (following the models of D'Antona & Mazzitelli 1997, updated 1998), integrates to a total number density of $1.7\times {{10}^{-3}}$ pc−3 (matching Cruz et al. 2007). Accounting for the range of possible α values, we then integrate $\xi (M)\times M$ over the entire BD mass range to find a mass density of ∼2.5–3 × 10−4 ${{{\rm M}}_{\odot }}$ pc−3. Lastly, as required for a single-exponential disk profile, we multiply this by 2 times a scale height of the disk of hz = 400 pc (Bovy & Rix 2013), yielding predicted BD surface densities of ∼5–6 ${{{\rm M}}_{\odot }}$ pc−2. Earlier measurements, however, have yielded lower surface densities at ∼0.5–2 ${{{\rm M}}_{\odot }}$ pc−2 (Fuchs et al. 1998; Flynn et al. 2006, and references therein). Based on these analyses, we conservatively assume that the local surface density of BDs is in the range $0.5\lesssim {{{\Sigma}}_{{\rm BD}}}({{R}_{0}})\lesssim 6$ ${{{\rm M}}_{\odot }}$ pc−2; assuming a uniform distribution over this range, we find ${{{\Sigma}}_{{\rm BD}}}({{R}_{0}})=3.25\pm 1.59$ ${{{\rm M}}_{\odot }}$ pc−2. Subtracting this from Bovy & Rixʼs dynamical estimate, we find ${{{\Sigma}}_{\star }}({{R}_{0}})=34.75\pm 4.30$ ${{{\rm M}}_{\odot }}$ pc−2.

Alternatively, we could compare to the stellar surface density estimate from Bovy et al. (2012), which omits BDs but also white dwarfs (WDs) that we want to include. To remedy this, we collect a number of literature estimates of the spatial density of local WDs: Oswalt et al. (1996) find 0.0076 pc−3; Leggett et al. (1998) find 0.0034 pc−3; Jahreiß & Wielen (1997) find 0.005 pc−3; Knox et al. (1999) find 0.00416 pc−3; Holberg et al. (2008) find 0.0048 pc−3; and Sion et al. (2009) find 0.0049 pc−3. Using the mean and standard deviation of this sample, we assume the local volume density of WDs is ${{\rho }_{{\rm WD}}}=5.0\pm 1.4\times {{10}^{-3}}$ pc−3. Following our disk model, we multiply this by 2 times a scale height of hz = 400 pc (Bovy & Rix 2013, and ascribing 10% error) and an average WD mass of ${{\langle M\rangle }_{{\rm WD}}}=0.65\pm 0.01$ ${{{\rm M}}_{\odot }}$ (Falcon et al. 2010) to find the local surface density of WDs ${{{\Sigma}}_{{\rm WD}}}({{R}_{0}})={{\rho }_{{\rm WD}}}\times 2{{h}_{z}}\times {{\langle M\rangle }_{{\rm WD}}}=2.6\pm 0.8$ ${{{\rm M}}_{\odot }}$ pc−2. As a consistency check, we add this to the Bovy et al. (2012) photometric estimate of 32.0 ± 3.2 ${{{\rm M}}_{\odot }}$ pc−2 (due to main-sequence stars, assuming the Kroupa IMF and 10% uncertainty), yielding ${{{\Sigma}}_{\star }}({{R}_{0}})=34.6\pm 3.3$ ${{{\rm M}}_{\odot }}$ pc−2; this is in excellent agreement with the corrected dynamical estimate, which we adopt for the remainder of this study, given its more conservative error estimate. For convenience, we tabulate all parameters of our disk model, and their interdependencies, in Table 3. Plugging these expressions into Equation (17), any realization of the total stellar disk mass from our MC simulations is a function of the Galactocentric radius of the Sun and the scale length (independently drawn from their attendant probability distributions):

Equation (18)

Table 3.  Disk Model Parameters

Parameter Value Units Description Reference
R0 8.33 ± 0.35 kpc Galactocentric radius of the Sun Gillessen et al. (2009)
Ld 2.15 ± 0.14 kpc Scale length of exponential disk Bovy & Rix (2013)
${{{\Sigma}}_{\star }}({{R}_{0}})$ $31.75{{L}_{d}}/{\rm kpc}-$ ${{{\rm M}}_{\odot }}$ pc−2 Local surface mass density of MS (J. Bovy 2013, private communication)
$\quad 33.5125\pm 2.89$ stars and remnants for a given Lda
${\rm M}_{\star }^{D}({{R}_{0}},{{L}_{d}},{{{\Sigma}}_{\star }}({{R}_{0}}))$ See Equation (18) ${{{\rm M}}_{\odot }}$ The total stellar disk mass Bovy & Rix (2013)
from Equation (17), given R0 and Ld

Notes.

aSee Section 4.1. This relationship accounts for the covariance between the dynamical estimates of Ld and ${{{\Sigma}}_{\star }}({{R}_{0}})$ from Bovy & Rix (2013), while also subtracting the contribution from brown dwarfs, ${{{\Sigma}}_{{\rm BD}}}({{R}_{0}})$.

Download table as:  ASCIITypeset image

4.2. Monte Carlo Techniques

As we shall see in Section 4.3, many estimates included in our Galactic bulge+bar mass data set are dependent on assumptions made about the values of R0 and/or ${\rm M}_{\star }^{D},$ and hence depend upon all of our disk model parameters. However, as denoted in Table 3, the true value of these parameters is not perfectly known; we can describe our prior knowledge of each one by a probability density function. In order to propagate the uncertainties in these parameters' values into the resulting posteriors describing the stellar mass of the disk and bulge, as well as their combination for the total stellar mass of the Galaxy, we perform a series of MC simulations.

Unlike the parameters describing our "bad"-measurement models (Θ), we are uninterested in producing new results for our disk model parameters. We have much greater confidence in the priors we have chosen for R0, Ld, and ${{{\Sigma}}_{\star }}({{R}_{0}})$ based on the more direct observations described in the previous section than anything we would infer about those quantities from the set of bulge mass measurements. Therefore, in this application we want to ensure that the results of our HB formalism will reflect the information contained in the priors in Table 3. We will collectively denote the set of disk parameters as ${\mathcal{N}},$ as we are uninterested in allowing the ${\rm M}_{\star }^{B}$ data to modify any conclusions about what the values of those "nuisance" parameters may be. Formally, we are making the assumption that $P({\mathcal{N}}|{\mathcal{D}})=P({\mathcal{N}})$—i.e., that our state of knowledge of the nuisance parameters is not affected by the bulge data set—and we refer to this as the "strict-prior" method below. In this framework, our procedure is as follows:

  • (I)  
    We independently and randomly generate a sample of 103 values of R0 and Ld, which collectively reproduce the respective mean and standard deviation given in Table 3. Then for each realization of the scale length of the disk, ${{L}_{d,i}}$, we randomly draw a corresponding value of ${{{\Sigma}}_{\star }}{{({{R}_{0}})}_{i}}$ based on the relationship listed in Table 3.
  • (II)  
    For each of the 103 sets of randomly drawn parameters, ${{{\mathcal{N}}}_{i}}={{\left\{ {{R}_{0}},{{L}_{d}},{{{\Sigma}}_{\star }}({{R}_{0}}) \right\}}_{i}}$, we calculate the corresponding disk mass, ${{{\rm M}}_{\star ,{{i}^{d}}}},$ using Equation (18). The result is a distribution of possible Galactic disk masses, which we normalize to produce $P({\rm M}_{\star }^{D}|{\mathcal{D}})$. Note that this is consistent with our strict-prior assumption that $P({\mathcal{N}}|{\mathcal{D}})=P({\mathcal{N}})$, and so the fraction of times that ${{{\rm M}}_{\star ,{{i}^{d}}}}$ calculated from the ${{{\mathcal{N}}}_{i}}$ occurs in the MC will be proportional to $P({\mathcal{N}})$.
  • (III)  
    We determine the value of each literature ${\rm M}_{\star }^{B}$ measurement (listed in Table 4) that would have been obtained assuming these parameters are correct. This ensures that all measurements make consistent assumptions about the structure of the Galaxy, allowing them to be combined fairly. We refer the reader to the following section for these details.
  • (IV)  
    For each iteration of III, along with the prior chosen for ${\rm M}_{\star }^{B},$ which we will detail in Section 4.4, we calculate the joint k-dimensional posterior $P({\Theta}|{\mathcal{D}},{{{\mathcal{N}}}_{i}})$ via an HB analysis, as described in Section 2. After marginalizing out the other parameters in Θ, each realization of the bulge mass posterior can be written
    Equation (19)
    applying Bayes theorem as usual. By the definition of marginalization, $P({\rm M}_{\star }^{B}|{\mathcal{D}})=\int P({\rm M}_{\star }^{B},{\mathcal{N}}|{\mathcal{D}})d{\mathcal{N}}$; and applying the definition of joint probability and our strict-prior assumption, this is equal to $\int P({\rm M}_{\star }^{B}|{\mathcal{D}},{\mathcal{N}})P({\mathcal{N}})d{\mathcal{N}}$. If we draw values ${{{\mathcal{N}}}_{i}}$ from our prior distribution $P({\mathcal{N}})$, this integral will be equal to
    Equation (20)
    since the fraction of times that ${{{\mathcal{N}}}_{i}}$ occurs in the MC will be proportional to $P({\mathcal{N}})$. Note also that Bayes' theorem allows us to rewrite the denominator in Equation (19) as $P({\mathcal{D}}|{{{\mathcal{N}}}_{i}})=P({{{\mathcal{N}}}_{i}}|{\mathcal{D}})P({\mathcal{D}})/P({{{\mathcal{N}}}_{i}})$, and by applying our strict-prior assumption this reduces to simply $P({\mathcal{D}})$. Thus, in order to construct the combined ${\rm M}_{\star }^{B}$ result for a given model of "bad" measurements, we are able to simply average the individual posteriors from the MC realizations:
    Equation (21)
    The posterior PDF for each of the other parameters of the "bad"-measurement model are calculated in the same way. In addition, we record the AIC, BIC, and K for each iteration, yielding a distribution of values. In practice, we find that 103 realizations produces a standard error in the median ${\Delta}{{{\rm log} }_{10}}\;K$/AIC/BIC values much smaller than 0.5, which is sufficient to assess differences in the utility of different models securely (compared to our ${\Delta}=2$ criterion for significance).
  • (V)  
    We also produce a posterior for the total stellar mass from each iteration, $P({{{\rm M}}_{\star }}|{\mathcal{D}},{{{\mathcal{N}}}_{i}})$, in a model-consistent manner by simply defining ${{{\rm M}}_{\star ,i}}={\rm M}_{\star }^{B}+{{{\rm M}}_{\star ,{{i}^{d}}}}$. Again, we can average the individual posteriors, $P({{{\rm M}}_{\star }}|{\mathcal{D}},{{{\mathcal{N}}}_{i}})$, to obtain $P({{{\rm M}}_{\star }}|{\mathcal{D}})$, similar to Equation (21). Lastly, we also calculate the model-consistent posterior describing the bulge-to-total mass ratio by normalizing the distribution of values ${{(B/T)}_{i}}=\langle P({\rm M}_{\star }^{B}|{\mathcal{D}},{{{\mathcal{N}}}_{i}})\rangle /\langle P({{{\rm M}}_{\star }}|{\mathcal{D}},{{{\mathcal{N}}}_{i}})\rangle $ to integrate to unity, where angled brackets denote the mean of the enclosed posterior distribution.

Table 4.  The Galactic Bulge+Bar Mass Data Set

Reference ${\rm M}_{\star }^{B}\pm 1\sigma $ R0 Assumed Constraint type β a ${\rm M}_{\star }^{B}\pm 1\sigma ({{R}_{0}}=8.33\;{\rm kpc})$
(1010 ${{{\rm M}}_{\odot }}$) (kpc) (1010 ${{{\rm M}}_{\odot }}$)
Kent (1992) 1.69 ± 0.85 8.0 Dynamical 1 1.76 ± 0.88
Dwek et al. (1995) 2.11 ± 0.81 8.5 Photometric 2 2.02 ± 0.78
Han & Gould (1995) 1.69 ± 0.85 8.0 Dynamical 1 1.76 ± 0.88
Blum (1995) 2.63 ± 1.32 8.0 Dynamical 1 2.74 ± 1.37
Zhao (1996) 2.07 ± 1.03 8.0 Dynamical 1 2.15 ± 1.08
Bissantz et al. (1997) 0.81 ± 0.22 8.0 Microlensing 0 0.81 ± 0.22
Freudenreich (1998) b 0.48 ± 0.65 ... Photometric ... 0.48 ± 0.65
Dehnen & Binney (1998) 0.61 ± 0.38 8.0 Dynamical $1/2$ 0.62 ± 0.38
Sevenster et al. (1999) 1.60 ± 0.80 8.0 Dynamical 1 1.66 ± 0.83
Klypin et al. (2002) 0.94 ± 0.29 8.0 Dynamical 1 0.98 ± 0.31
Bissantz & Gerhard (2002) c 0.84 ± 0.09 8.0 Dynamical 1 0.87 ± 0.09
Han & Gould (2003) 1.20 ± 0.60 8.0 Microlensing 0 1.20 ± 0.60
Picaud & Robin (2004) 0.54 ± 1.11 8.5 Photometric 0 0.54 ± 1.11
Hamadache et al. (2006) 0.62 ± 0.31 None Microlensing 0 0.62 ± 0.31
Wyse (2006) 1.00 ± 0.50 None Historical review 0 1.00 ± 0.50
López-Corredoira et al. (2007) 0.60 ± 0.30 8.0 Photometric 2 0.65 ± 0.33
Calchi Novati et al. (2008) 1.50 ± 0.38 8.0 Microlensing 0 1.50 ± 0.38
Widrow et al. (2008) 0.90 ± 0.11 7.94 Dynamical 1 0.95 ± 0.12

Note. Bulge mass estimates, ${\rm M}_{\star }^{B},$ listed in this table have been converted to the Kroupa IMF. See Section 4.3 for further notes on individual estimates.

a β denotes the assumed relationship between each ${\rm M}_{\star }^{B}$ estimate, based on the constraint type, and the R0 assumed; i.e., ${\rm M}_{\star }^{B}\propto R_{0}^{\beta }$. bValue (in both cases) calculated assuming ${\rm M}_{\star }^{D}=5.17\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ and R0 = 8.33 kpc. B/D results are published as a function of R0, so no scaling relation needs to be assumed. cValues provided by McMillan (2011).

Download table as:  ASCIITypeset image

One way of thinking about the strict-prior method is in terms of the Bayesian definition of probability as a degree of belief: $P({\rm M}_{\star }^{B}|{\mathcal{D}},{{{\mathcal{N}}}_{i}})$ represents what we would conclude about ${\rm M}_{\star }^{B}$ based on living in a Galaxy with parameters ${{{\mathcal{N}}}_{i}}$. The prior $P({\mathcal{N}})$ represents our belief of how likely the parameter values ${{{\mathcal{N}}}_{i}}$ are to be correct compared to other possible values, and hence is the correct weighting to determine what we would believe about $P({\rm M}_{\star }^{B}|{\mathcal{D}})$, taking into account all possible values of ${\mathcal{N}}$ and how probable we believe each of those values are. This assumption is appropriate here, given that the bulge mass determinations depend only very indirectly on disk parameters, so we would place little faith in constraints on disk properties that came from the bulge mass data set as opposed to the much more direct methods now available.

An alternative method would be to sum the likelihoods, $P({\mathcal{D}}|{\rm M}_{\star }^{B},{{{\mathcal{N}}}_{i}})$, and normalize that result to unity (which we will refer to as the "weak-prior" method). This is equivalent to calculating the evidence-weighted average of the posteriors calculated in the course of the strict-prior method. In this case, the posterior we calculate incorporates the assumption that $P({\mathcal{N}}|{\mathcal{D}})\ne P({\mathcal{N}})$: i.e., that our state of knowledge of the nuisance (disk) parameters should be influenced by the set of bulge mass measurements. Hence, all the parameters in Table 3 would be treated in the same way as those contained in Θ, differing only in the informativeness and nature of the priors applied. Effectively, the weak-prior method assumes that values of the parameters ${\mathcal{N}}$ under which the data were most likely to have been observed should be given greatest weight, even if they are disfavored by our priors.

We have much greater confidence, however, in the priors we have chosen for R0, Ld, and ${{{\Sigma}}_{\star }}({{R}_{0}})$ based on more direct observations than anything we would infer from the set of bulge mass measurements, and so the weak-prior method appears to be inappropriate here. In practice, the posteriors for the disk mass, $P({\rm M}_{\star }^{D}|{\mathcal{D}})$, and hence also for the total mass, $P({{{\rm M}}_{\star }}|{\mathcal{D}})$, differ significantly (with a mean differing by ∼1σ) between the two methods; however, the bulge mass estimate differs little between the strict- and weak-prior methods (with a mean differing by ∼0.1σ).

4.3. A Uniform Sample of Bulge+Bar Mass Measurements

In this study, we define ${\rm M}_{\star }^{B}$ as the excess mass over a single-exponential disk in the total stellar mass budget for the Galaxy. We begin by searching the literature for measurements of the combined stellar mass of the bulge, pseudo-bulge, and/or bar components of the Milky Way, which collectively fall into the category of ${\rm M}_{\star }^{B}.$ A diverse set of methods, models, and observations has been used to make these estimates. For instance, Dwek et al. (1995) photometrically determined the Galactic bulge morphology, luminosity, and mass using triaxial bar-like models constrained by the COBE/DIRBE observations, measuring ${\rm M}_{\star }^{B}=1.3\pm 0.5\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ In contrast, Klypin et al. (2002) consider ΛCDM-based models for the Milky Way, accounting for the mass and angular momentum of the DM halo, constrained by a variety of kinematic measurements, to estimate that ${\rm M}_{\star }^{B}\approx 1\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ Alternatively, Picaud & Robin (2004) use MC techniques to fit the Besançon model of SPS to the observed near-IR luminosity of the bulge using observations from the DENIS survey, finding ${\rm M}_{\star }^{B}=2.4\pm 0.6\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ These are only a few of the alternatives; we provide the entire list of studies included in our Galactic bulge+bar mass data set, hereafter denoted as ${{{\mathcal{D}}}_{M}}$, in Table 4 for reference.

The incorporated studies use a heterogeneous mixture of assumptions and models for the Galaxy; we follow the basic model of Chomiuk & Povich (2011) and attempt to place them on a uniform basis here. One of the most common sources of variation is the adopted value of the Galactocentric radius of the Sun's orbit, R0. For instance, using the virial theorem one can demonstrate that kinematic estimates of the Galactic bulge+bar mass will be directly proportional to R0. On the other hand, photometric estimates of the luminosity of the bulge, based on the flux measured from our location, should scale as $R_{0}^{2}$.

A bulge mass inferred from the microlensing event rate observed toward the Galactic center, however, has a more complex dependence on R0. The measured rate of microlensing events can be directly converted into an optical depth, $\tau ={{\tau }_{{\rm bulge}}}+{{\tau }_{{\rm disk}}}$, which is the sum of the contribution from the stars in the bulge and the intervening part of the disk. Hamadache et al. (2006) show that ${\rm M}_{\star }^{B}\propto {{\tau }_{{\rm bulge}}}\times {{R}_{{\rm bulge}}}$, where ${{R}_{{\rm bulge}}}$ is the radius of the bulge, which itself should be proportional to R0 based on geometric arguments (i.e., ${{R}_{{\rm bulge}}}={{R}_{0}}{\rm tan} \theta /2$, where θ is the angular diameter of the bulge). Unfortunately, the typical contribution from disk stars, which we would need to subtract in order to obtain ${{\tau }_{{\rm bulge}}}$, is $\sim \tau /3$ (Sumi et al. 2003; Hamadache et al. 2006, and references therein) and turns out to be highly sensitive to the chosen value of R0. As an example, if we are to assume that ${{\tau }_{{\rm disk}}}$ is proportional to the mass of disk stars between us and the bulge, which in turn depends on the total mass of the disk, the disk model described above yields $d\;{\rm ln} \;{\rm M}_{\star }^{B}/d\;{\rm ln} \;{{R}_{0}}\approx 5.7$; i.e., ${{\tau }_{{\rm disk}}}$ should scale roughly as $R_{0}^{6}$. Based on this complexity, we assume for the purposes of this study that estimates of ${\rm M}_{\star }^{B}$ based on measurements of τ do not scale well with R0.

In order to combine the ${\rm M}_{\star }^{B}$ estimates in our data set into one aggregate result, after choosing a value of R0 from our Gillessen et al. (2009) prior, we renormalize each result to that R0 value using the appropriate scaling relation. Unless otherwise noted, the central value and error bar are scaled by the same factor in order to ensure the fractional error remains unchanged (this is equivalent to holding the error bars constant in log space), as essentially none of the literature estimates include the uncertainty in R0 in their error estimates (a problem that our MC technique will fix). For reference, we list the type of observational constraint and appropriate R0 power-law index used to scale to each ${\rm M}_{\star }^{B}$ estimate in Table 4. We note that if we assume that microlensing-rate-based measurements of ${\rm M}_{\star }^{B}$ scale as $R_{0}^{1}$ (as bulge stars are the dominant contribution to τ), rather than treating them as independent of R0, the change in our results proves to be negligible.

As discussed in Section 4.1, our definition of stellar mass includes the contribution from remnants, but not that from substellar objects. Kinematically derived measurements of the bulge mass in our data set will not reflect this distinction. Therefore, we multiply the results from dynamical measurements by a normalization factor of 0.94 ± 0.02 to exclude the contribution from brown dwarfs. We derive this scale factor by varying the power-law index of the IMF in the brown dwarf mass range ($0.005\lt M/{{{\rm M}}_{\odot }}\lt 0.01$) from −0.5 to 0.5 (Cruz et al. 2007; Kirkpatrick et al. 2012; Burningham et al. 2013; Day-Jones et al. 2013) and including their contribution within the Bruzual & Charlot (2003) model using solar metallicity and the Kroupa IMF over the main-sequence mass range.

Unlike dynamically constrained models of the Milky Way, which are indifferent to the mass distribution of the stellar populations, those which rely on photometric constraints (and thus mass-to-light ratios) to estimate ${\rm M}_{\star }^{B}$ will depend strongly on the IMF of choice. It is important then to set all measurements of this type on the same footing before applying the HB analysis by converting them to the same IMF. As previously discussed, we choose that to be the Kroupa IMF in accordance with the MPA-JHU spectrum measurements (Brinchmann et al. 2004).

The HB analysis we apply to estimate ${\rm M}_{\star }^{B}$ requires an error estimate for each independent measurement. Many of the studies in the literature, however, do not provide error estimates. We therefore must estimate the uncertainty in each measurement lacking an error bar in a uniform and unbiased manner; we do this in either one of two ways. First, for any study that lists a single ${\rm M}_{\star }^{B}$ result sans an error bar, we conservatively choose the error to be 50% of the value. However, if the authors instead provide a list of results corresponding to a variety of models or parameters explored, we then find the standard deviation of these listed values and add this in quadrature to 25% of their favored value to produce an error estimate.

Lastly, as we will detail below, the central value of many of the ${\rm M}_{\star }^{B}$ estimates in our data set will vary with the disk parameters that we draw for each MC simulation. In such a case, it becomes problematic to use the median estimated value (which will vary) as a reference value that multiplies Q or F in those models of unreliable data. Consequentially, for the stellar mass portion of this study we always multiply Q and F by 1010 ${{{\rm M}}_{\odot }}$ when augmenting the errors estimates of "bad" measurements. For example, this means that Q = 0.5 corresponds to adding $0.5\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ in quadrature to the nominal error bars when accounting for the possibility that a measurement is inaccurate (compare to Equation (3)).

In the following list, we detail any studies included in our analysis of ${\rm M}_{\star }^{B}$ that require further special treatment to be comparable to the others in Table 4:

  • Picaud & Robin (2004): This work uses the Besançon model of SPS (see also Robin et al. 2003) to simultaneously constrain the mass of the bulge and thin disk through direct comparison with near-IR star counts in ∼100 windows of low extinction in the Galaxy. The thin disk is divided into seven distinct age components with a two-slope IMF, while the bulge is modeled as a single older population with a Salpeter IMF; all of these populations are converted to follow the Kroupa IMF as the first step in our renormalization process. Additionally, the Besançon model features a double-exponential profile for the thin disk. This disk model has a hole at the center, so the same amount of mass at the center of the Galaxy would correspond to a greater bulge mass than in a non-holed model. For consistency, we integrate the holed density profile to determine the mass of the Besançon thin disk within the bulge radius of 3.71 kpc, and subtract this value from the single-exponential disk mass within the same radius for each realization of our disk model. The difference we calculate corresponds to the portion of the Picaud & Robin bulge mass estimate ($2.4\pm 0.6\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$) that has already been included in the Bovy disk, so we subtract it off. Given that this correction is uncertain, we also add in quadrature to the nominal errors an amount equal to 50% of the holed-disk correction when assuming R0 = 8.33 kpc; i.e., 50% of $1.86\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ Picaud & Robin find that changes in the model mass when R0 is changed are comparable in impact to the other variations among the models studied, and so we do not apply any R0 scaling beyond this correction. Changes in R0 affect their models in multiple ways, rendering any simple scaling of their ${\rm M}_{\star }^{B}$ value unsuitable. In the end, the results of Picaud & Robin (2004) are included in our data set as a value of ${\rm M}_{\star }^{B}=0.54\pm 1.11\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ for R0 = 8.33 kpc and best-fit Bovy disk parameters; for simplicity of analysis, this error estimate is used regardless of disk parameters.
  • Zhao (1996): This work models the disk with the Miyamoto & Nagai (1975, hereafter NM) potential, whose profile roughly resembles that of the double-exponential disk. Ideally, we would make the same correction using our disk model as we have done with Picaud & Robin (2004). However, the authors have normalized the NM profile to produce a total disk mass ${\rm M}_{\star }^{D}=8{\rm M}_{\star }^{B}$, chosen so as to incorporate the dynamical effects of the dark halo and produce a flat rotation curve at $1\lt R\lt 3$ kpc from the Galactic center. We find, however, that this produces an extremely high surface density at nearly all radii within the disk, and indeed the authors note that this model does not fit the COBE map data except near the center. Since we cannot disentangle the dark mass from the stellar mass in this model, we do not attempt to do a correction as for Picaud & Robin (2004), since we would not have confidence in its accuracy. To account for that inability to correct, we increase the errors on this estimate to 50% of the derived bar mass. Lastly, as this is a dynamical measurement, we remove the contribution of brown dwarfs to the bulge mass budget by multiplying by a factor of 0.94 ± 0.02. It is therefore included in our data set as a value of ${\rm M}_{\star }^{B}=2.07\pm 1.03\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ We note that changes in the treatment of this measurement yield a much smaller difference in our combined results than our final estimated errors.
  • Dwek et al. (1995): As mentioned above, this work constrains the mass of the Galactic bulge using photometry taken from the COBE/DIRBE observations, measuring ${\rm M}_{\star }^{B}=1.3\pm 0.5\times {{10}^{10}}\;{{{\rm M}}_{\odot }},$ where a Salpeter IMF has been assumed. We convert this measurement to the Kroupa IMF by multiplying it by 1.62, derived using our definition of stellar mass and the assumptions made for the stellar models of Bruzual & Charlot (2003). Hence, we include this measurement in our data set as a value of ${\rm M}_{\star }^{B}=2.11\pm 0.81\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$
  • Freudenreich (1998): This work favors a bar-to-disk ($B/D$) luminosity ratio of 0.33 based upon matching their Galactic model to IR observations of the Milky Way from COBE. This model again includes a holed stellar disk. We integrate this profile both with and without the hole implemented, under the assumption that the mass gained by the disk when excluding the hole was previously incorporated into the bar, in order to calculate a hole-less B/D value. We note that the results of this study are published as a function of R0 in their Table 4, allowing us to interpolate from this table to obtain B/D at a given R0, instead of assuming a scaling relation. Thus, for each MC simulation we include within our data set a value of ${\rm M}_{\star }^{B}={{{\rm M}}_{\star ,{{i}^{d}}}}\times (B/D)$, where (B/D) represents the hole-less model bar-to-disk ratio corresponding to ${{R}_{0,i}}$. We ascribe a 50% error bar to this value, accounting for uncertainty in the mass-to-light ratios of the bar and disk. Similarly to the Picaud & Robin (2004) estimate, we conservatively add in quadrature to this error estimate a value of 50% of the holed-disk correction at R0 = 8.33 kpc. As a nominal value, we list ${\rm M}_{\star }^{B}=0.47\pm 0.65\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ in Table 4, the result from assuming our standard disk model values. This overall error estimate is held constant for other values for the disk parameters.
  • Dehnen & Binney (1998): This study provides a parameterized model of the mass distribution in the Galaxy fitted to several observational constraints. They assume a single-exponential profile for the stellar disk, much like Bovy & Rix (2013). Tables 3 and 4 of their paper list the resulting bulge mass when varying a large number of parameters in the model. Comparing models 2, 2a, and 2b we find that ${\rm M}_{\star }^{B}$ in their model approximately scales as $\sim {{({{R}_{0}}/8\;{\rm kpc})}^{1/2}}$. To provide the central value of this estimate we interpolate among ${\rm M}_{\star }^{B}$ results from their models 1–4 using the value ${{L}_{d}}/{{R}_{0}}=0.26$ from our standard disk model detailed in Section 4.1. We estimate the random uncertainty in this value via propagation of errors: $\sigma {{({\rm M}_{\star }^{B})}^{2}}\simeq \frac{1}{4}{{\left[ {\rm M}_{\star }^{B}({{L}_{d}}+\sigma ({{L}_{d}}))-{\rm M}_{\star }^{B}({{L}_{d}}-\sigma ({{L}_{d}})) \right]}^{2}}$. In addition, we estimate the systematic error to be the standard deviation of the results from Dehnen & Binneyʼs models 2c–i, which we add in quadrature to the random error. Lastly, as this is a dynamical measurement, we remove the contribution of brown dwarfs to the bulge mass budget by multiplying by a factor of 0.94 ± 0.02. Overall, this is included in our data set a value of ${\rm M}_{\star }^{B}=0.65\pm 0.38\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$

4.4. Setting a Prior on ${\rm M}_{\star }^{B}$

To place a prior on the mass of the Galactic bulge, we again consider the properties of spiral galaxies previously observed in the local universe. Some spirals appear to have no bulge component, whereas in extreme cases the bulge-to-total ratio can be as large as $B/T\sim 0.8$ (e.g., Simien & de Vaucouleurs 1986; Gadotti 2009). Given our prior understanding of the total mass of the Milky Way to be of order $\sim 5\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ (e.g., McMillan 2011 and references therein) we assume the mass of the Galactic bulge can be anywhere in the range 0–4 × 1010 ${{{\rm M}}_{\odot }}$ with equal weighting. This is represented by a flat distribution, such that

Equation (22)

4.5. Stellar Mass Results

Table 5 shows the results for each model of "bad" measurements from our HB analysis. We list the optimal value of n, Q, or F, as appropriate for each model, which we take to be the value corresponding to the peak of the marginalized posterior PDF for these parameters. In Figure 4, we marginalize out ${\rm M}_{\star }^{B}$ and show the resulting joint posterior for fgood and F normalized to a peak value of 1. We can see in this case that the likelihood is maximized near ${{f}_{{\rm good}}}\approx 1$ and $F\approx 0$, similar to the SFR results in Section 3.3 which favored ${{f}_{{\rm good}}}\approx 1$ and minimally adjusted error bars. If we are to then marginalize out fgood from the joint posterior, this yields the result $P(F|{\mathcal{D}})$ shown in Figure 5 as a solid green curve. Assuming that ${{f}_{{\rm good}}}=0$ (i.e., that all included measurements are inaccurate to some extent) produces the the dashed green curve in Figure 5; this is equivalent to the curve produced by cutting through the ${{f}_{{\rm good}}}=0$ plane in Figure 4.

Figure 4.

Figure 4. The joint posterior PDF, $P({{f}_{{\rm good}}},F|{{{\mathcal{D}}}_{M}})$, for the free parameters describing the Milky Way bulge+bar mass data set, ${{{\mathcal{D}}}_{M}}$, when applying a hierarchical Bayesian analysis. The parameter fgood quantifies the probability that any one measurement included in ${{{\mathcal{D}}}_{M}}$ is "good" (i.e., has accurately estimated error bars), and F denotes the fraction of 1010 ${{{\rm M}}_{\odot }}$ used as a floor on each individual error estimate when treated as not "good" in the model. For ease of comparison, we have normalized the peak of this distribution to 1. We see, similar to the results for the Milky Way SFR, that the likelihood, and thus the posterior, peaks near ${{f}_{{\rm good}}}\approx 1$ and $F\approx 0$, indicating little tension among the Milky Way bulge+bar mass estimates.

Standard image High-resolution image
Figure 5.

Figure 5. The posterior PDF, $P(F|{{{\mathcal{D}}}_{M}})$, for the floor value, F, defined as a fraction of 1010 ${{{\rm M}}_{\odot }}$ imposed as a minimum error estimate for "bad" measurements in our Milky Way bulge+bar mass data set, ${{{\mathcal{D}}}_{M}}$. These results are obtained by marginalizing over all possible values of fgood (solid curve) or else by setting ${{f}_{{\rm good}}}=0$ (dashed curve). Here we see the competition between multiple possible solutions within the HB analysis. We note that both curves are marginalized over all possible values of the stellar mass contained in the bulge+bar component of the Galaxy, ${\rm M}_{\star }^{B}.$

Standard image High-resolution image

Table 5.  Combined ${\rm M}_{\star }^{B}$ Results For Various Model Assumptions

Model Optimal ${\rm M}_{\star }^{B}\pm 1\sigma $ k b ΔAICc ΔBIC ${\Delta}{{{\rm log} }_{10}}\;K$
Valuea (1010 ${{{\rm M}}_{\odot }}$)
fgood free—Some of the measurements have inaccurate error bars.
Free-n 1.00 0.91 ± 0.07 3 4.00 ± 0.00 5.78 ± 0.00 −0.66 ± 0.00
Free-Q 0.00 0.91 ± 0.08 3 4.00 ± 0.00 5.78 ± 0.00 −0.45 ± 0.00
Free-F 0.08 0.92 ± 0.08 3 4.00 ± 0.00 5.78 ± 0.00 −0.42 ± 0.00
${{P}_{{\rm bad}}}$ flat N/A 0.91 ± 0.07 2 2.00 ± 0.00 2.89 ± 0.00 −0.86 ± 0.00
fgood = 0—All of the measurements have inaccurate error bars.
Free-n 1.00 0.91 ± 0.08 2 2.00 ± 0.00 2.89 ± 0.00 −1.20 ± 0.00
Free-Q 0.00 0.93 ± 0.09 2 2.00 ± 0.00 2.89 ± 0.00 −0.83 ± 0.00
Free-F 0.08 0.94 ± 0.10 2 2.00 ± 0.00 2.89 ± 0.00 −0.85 ± 0.00
fgood = 1—None of the measurements have inaccurate error bars.
All-"good"d N/A 0.91 ± 0.07 1
Non-hierarchical combinations of the data.
IVWMe 0.88 ± 0.06 ${{\chi }^{2}}=14.03$
IVWM(R0 = 8.33 kpc) 0.91 ± 0.06 ${{\chi }^{2}}=13.69$

Note. The ${\rm M}_{\star }^{B}$ results are well described by Gaussian distributions, and the values listed in this column represent the mean and 1σ parameter of fits to these distributions. The distributions we find for ΔAIC, ΔBIC, and ${\Delta}{{{\rm log} }_{10}}\;K$ from the MC simulations are strongly peaked but also highly asymmetric, and so in these columns we quote the median and standard deviation of the median obtained by bootstrapping these sets of values. Errors of 0.00 indicate that the standard deviation of the median is $\ll 0.01.$

aThe value of n, Q, or F corresponding to the peak of marginalize posterior PDF for these quantities in each model. bThe number of free parameters in the model. See Section 2.3.4. cFor each iteration of the HB analysis, we record the AIC and BIC. We then calculate the mean and standard error from the distribution of AIC and BIC values produced from all 103 iterations for each model. ΔAIC and ΔBIC reflects the difference in the mean AIC and BIC value measured from the model which has the lowest AIC/BIC (here, this is the all-"good" model). We see 103 iterations yields sufficiently small standard errors (i.e., less than 0.5) in order to securely assess differences of 2, which would indicate a statistically significant difference between models. dEquivalent to the combined inverse-variance weighted mean (IVWM) from 103 MC simulations, uniformly scaling each estimate to the same R0 value from our Galactic model (see Table 3). eThe IVWM obtained from the nominal estimate values listed in Column 2 of Table 4. Essentially, since we have used flat priors, these are the same values we would find in the all-"good" scenario if we did not use MC simulations to renormalize each estimate to the same choice of R0. IVWM(R0 = 8.33 kpc) shows the results from the same calculation for the ${\rm M}_{\star }^{B}$ estimates after scaling each to R0 = 8.33 kpc, as in the last column of Table 4.

Download table as:  ASCIITypeset image

All model-comparison criteria listed in Table 5 result from comparing values with the all-"good" model for each realization. For example, ${\Delta}{{{\rm log} }_{10}}\;K$ for the free-n model reflects the mean and standard error of the set $\left\{ {{{\rm log} }_{10}}{{({{K}_{{\rm free}-n}}/{{K}_{{\rm all}-{\rm good}}})}_{i}} \right\}$, and similarly for ΔAIC and ΔBIC. In accordance with the likelihood peaking on the ${{f}_{{\rm good}}}\approx 1$ plane, as seen in Figures 4 and 6, all criteria values indicate that the best fit of the data results when the fewest free parameters are employed in the HB analysis. That said, several of the other models listed in Table 5 have criteria values that do not differ enough to be statistically significant (i.e., their differences in ${\Delta}{{{\rm log} }_{10}}\;K$, ΔAIC, or ΔBIC are less than 2). Again, similar to the SFR results, the model with highest K is the one where all measurements are treated as accurate, indicating that once differences in the assumed Galactic model, IMF, and definition of ${{{\rm M}}_{\star }}$ are accounted for, no correction for systematic effects is needed in order to relieve any tension in these measurements. This is supported by the AIC and BIC values as well. Taking the all-"good" model as our fiducial measurement, we therefore find the stellar mass of the Galactic bulge+bar to be ${\rm M}_{\star }^{B}=0.91\pm 0.07\times {{10}^{10}}$ ${{{\rm M}}_{\odot }};$ a comparison with the other values in Table 5 shows that this estimate is quite robust to any way we account for "bad" estimates in the HB analysis. We have also tested the impact that the Bissantz & Gerhard (2002) and Widrow et al. (2008) ${\rm M}_{\star }^{B}$ estimates make on our final results, as these two measurements are more strongly peaked than the other estimates and are centered at similar bulge mass values (see Figure 7). We find that doubling the nominal errors on both of these estimates produces only a 25% increase in the error of our aggregate result and a negligible change to the mean value, indicating that these two estimates are not dominating our final estimate.

Figure 6.

Figure 6. The posterior PDF, $P({{f}_{{\rm good}}}|{{{\mathcal{D}}}_{M}})$, for the fraction of "good" ${\rm M}_{\star }^{B}$ measurements, fgood, (i.e., ones having accurately estimated errors) in our bulge+bar mass data set, ${{{\mathcal{D}}}_{M}}$. Each model shown accounts for the inclusion of "bad" measurements in a different way: the free-n model (solid red) remedies underestimated errors by multiplying them by a scaling factor of n; the free-Q model (dashed blue) adds extra error in quadrature to the errors on a measurement when treated as "bad"; the free-F model (dash-dotted green) places a floor on the errors of "bad" measurements; while the ${{P}_{{\rm bad}}}$-flat model (triple-dot-dashed purple) works to completely disregard a measurement from ${{{\mathcal{D}}}_{M}}$ when considering it "bad." Similar to the SFR results, the PDF for all "bad"-measurement models are peaked around ${{f}_{{\rm good}}}=1$, more strongly so for the free-n and ${{P}_{{\rm bad}}}$-flat scenarios than the free-Q and free-F scenarios. Regardless, it turns out that the best model of ${{{\mathcal{D}}}_{M}}$, i.e., the one favored by the Bayesian evidence and information criteria, is one where fgood is assumed to be 1, instead of allowing it as a free parameter—i.e., an all-"good" model.

Standard image High-resolution image
Figure 7.

Figure 7. The solid black curve shows the aggregate ${\rm M}_{\star }^{B}$ result as determined from our HB analysis when using an all-"good" model and Monte Carlo simulations to incorporate up-to-date information for the Galactocentric radius of the Sun, R0, the scale length of the single-exponential disk, Ld. Here, we have assumed ${{R}_{0}}=8.33\pm 0.35$ kpc from the work of Gillessen et al. (2009) and ${{L}_{d}}=2.15\pm 0.14$ kpc based on the measurements by Bovy & Rix (2013). For comparison, we overlay the individual estimates from our Galactic bulge+bar mass data set as dashed/dotted colored lines. Some of these measurements are in tension with others; the HB methods allows us to account for that tension in obtaining consensus results. Doubling the error bars on the Bissantz & Gerhard (2002) and Widrow et al. (2008) ${\rm M}_{\star }^{B}$ estimates (the relatively strongly peaked dashed red and blue curves) yields only a 25% increase in the error in our aggregate result, indicating that these estimates are not dominating in our final estimate.

Standard image High-resolution image

Lastly, we show in Table 5 the IVWM of the nominal values from our ${\rm M}_{\star }^{B}$ data set (Table 4) with no corrections to uniform R0 and ${\rm M}_{\star }^{D}.$ We can see that the IVWM is more strongly pulled by lower-valued outliers in comparison to the HB results. This is likely in large part the result of the heterogeneous mixture of R0 values in the sample, as the all-"good" model is equivalent to 103 MC simulations of the IVWM, but rescaling all measurements to reflect ${{R}_{0}}=8.33\pm 0.35$ kpc (Gillessen et al. 2009); in this case, the all-"good" model corresponds to the average of the IVWM from our 103 MC simulations, rather than the IVWM of the set of original measurements. With 18 ${\rm M}_{\star }^{B}$ estimates in our data set, and thus 17 degrees of freedom, we expect 68% chance of finding ${{\chi }^{2}}$ in the range [11.31, 22.68] and a 95% chance of it being in the range [7.56, 30.19]. Hence, the observed ${{\chi }^{2}}=14.03$ indicates only modest tension among the measurements; ∼2/3 of the time one would observe a ${{\chi }^{2}}$ this large or larger.

Similarly to our SFR analysis, we have tested whether common-mode systematics have a large impact on our bulge mass estimate, an effect our HB method is unable to account for as we assume that errors from each study are random compared to each other. As initially discussed in Section 3.3, this involves applying the HB method using the all-"good" model to a bootstrap resampling of the ${\rm M}_{\star }^{B}$ data set in two different ways. In the first case, we draw only one measurement at random from all those which use a common measurement technique (i.e., one of the four categorizations listed in Table 4 under the column labeled "Constraint type"), yielding four estimates from a unique measurement method. In the second case, we draw four measurements at random from the entire list of estimates in Table 4. For each case, we perform this process 1000 times and measure the mean from the posterior distribution $P({\rm M}_{\star }^{B}|{\mathcal{D}})$ for each set of measurements. We find the standard deviation of the mean to be 0.19 when resampling with only one measurement of each type from the ${\rm M}_{\star }^{B}$ data set and 0.23 when resampling at random from the entire data set. Again, there is greater scatter between measurements of the same type than between random measurements in the data set. Hence, similarly to our SFR results, we can safely conclude that common-mode systematics make a negligible impact on our bulge mass results.

The results from our HB analysis and MC simulations for $P({\rm M}_{\star }^{D}|{\mathcal{D}})$ and $P({{{\rm M}}_{\star }}|{\mathcal{D}})$ are well-described by Gaussian distributions. These correspond to ${\rm M}_{\star }^{D}=5.17\pm 1.11\times {{10}^{10}}\;{{{\rm M}}_{\odot }}$ and ${{{\rm M}}_{\star }}=6.08\pm 1.14\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ We note that the mass of the stellar halo component of the Galaxy ($\sim {{10}^{9}}$ ${{{\rm M}}_{\odot }}$) is negligible compared to the uncertainties in our estimate (Bullock & Johnston 2005; Bell et al. 2008), and thus we can disregard its contribution. The uncertainty in ${{{\rm M}}_{\star }}$ is dominated by that in ${\rm M}_{\star }^{D},$ making our ${{{\rm M}}_{\star }}$ result highly insensitive to the choice of model used in the HB analysis of the Galactic bulge+bar mass estimates. The largest sources of uncertainty in ${\rm M}_{\star }^{D}$ and ${{{\rm M}}_{\star }}$ thus come from the values of R0 and Ld that we adopt (Table 3). The constraints on these parameters are likely to improve with upcoming Galactic surveys, such as Gaia, and so we also calculate derivatives of our mass estimates with respect to each so that they may be easily adjusted to reflect any improved information. To do so, we simply redo our analysis after independently offsetting either parameter by one-half of its error estimate above and below the central value. The derivative is calculated from the curve yielded from Lagrangian interpolation of the resulting three data points. For convenience, we tabulate our stellar mass results for each component and their derivatives in Table 6.

Table 6.  Milky Way Properties and Derivatives

Property Fiducial Result $\partial /\partial {{R}_{0}}$ $\partial /\partial {{L}_{d}}$
${\rm M}_{\star }^{B}$ $0.91\pm 0.07\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ $0.093\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ kpc−1 $0.004\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ kpc−1
${\rm M}_{\star }^{D}$ $5.17\pm 1.11\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ $3.000\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ kpc−1 $0.469\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ kpc−1
${{{\rm M}}_{\star }}$ $6.08\pm 1.14\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ $3.093\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ kpc−1 $0.473\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ kpc−1
$B/T$ $0.150_{-0.019}^{+0.028}$ −0.061 kpc−1 −0.011 kpc−1
SFR 1.65 ± 0.19 ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}$ 0 ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}$ kpc−1 0 ${{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}$ kpc−1
sSFR $2.71\pm 0.59\times {{10}^{-11}}$ yr−1 $-1.381\times {{10}^{-11}}$ yr−1 kpc−1 $-2.111\times {{10}^{-12}}$ yr−1 kpc−1

Note. Derivatives of ${{{\rm M}}_{\star }}$ are not simply the sum of the derivatives for ${\rm M}_{\star }^{B}$ and ${\rm M}_{\star }^{D}$ due to covariances between the parameters in our model. Derivatives of B/T represent the change in the median bulge-to-total mass ratio with respect to the appropriate parameter, measured independently from any other quantity in this table. The sSFR and its derivatives, however, are calculated directly from the ${{{\rm \dot{M}}}_{\star }}$ and ${{{\rm M}}_{\star }}$ results; e.g., here $\partial ({{{\rm \dot{M}}}_{\star }}/{{{\rm M}}_{\star }})/\partial {{R}_{0}}=-{{{\rm \dot{M}}}_{\star }}\;{\rm M}_{\star }^{-2}\times \partial {{{\rm M}}_{\star }}/\partial {{R}_{0}}$ since $\partial {{{\rm \dot{M}}}_{\star }}/\partial {{R}_{0}}=0.$

Download table as:  ASCIITypeset image

Lastly, we display the PDF for the bulge-to-total mass ratio of our model in Figure 8 as a solid black curve, which indicates a median with the 1σ error estimate of $B/T=0.150_{-0.019}^{+0.028}$. This is obtained from the distribution of values, ${{(B/T)}_{i}}={{{\rm M}}_{\star ,{{i}^{B}}}}/({{{\rm M}}_{\star ,{{i}^{B}}}}+{{{\rm M}}_{\star ,{{i}^{d}}}})$, resulting from each self-consistent realization of the Galaxy. Calculating $P(B/T)$ in a model-consistent way (i.e., accounting for covariances within our model) yields measurably tighter constraints than when doing so by combining independent estimates of ${\rm M}_{\star }^{B}$ and ${{{\rm M}}_{\star }}.$ For example, the blue dashed curve in Figure 8 is the result of combining randomly drawn pairs of ${\rm M}_{\star }^{B}$ and ${\rm M}_{\star }^{D}$ values drawn from the PDFs for each mass component in this study and assuming ${{{\rm M}}_{\star }}={\rm M}_{\star }^{B}+{\rm M}_{\star }^{D}$. The red dash-dotted curve in Figure 8 shows the distribution of bulge-to-total luminosity ratios for a sample of 212 Sbc-Sc galaxies measured from SDSS by Oohama et al. (2009). The mass-to-light ratio of the older stellar population in the bulge is higher than that of the younger stellar disk and so converting this to a bulge-to-total mass ratio would shift the distribution toward slightly higher values; however, we can safely assess that Milky Way lies well within the range of B/T values. Graham & Worley (2008) find the average bulge-to-disk K-band flux ratio for a sample of 79 Sbc galaxies to be ${\rm log} (B/D)=-0.82$, which converts to an average B/T of 0.13; this compares well with our Milky Way value.

Figure 8.

Figure 8. The solid black curve shows the PDF for the bulge-to-total mass ratio, B/T, of the Milky Way as determined by our HB analysis. This is produced through Monte Carlo simulations where each realization of the bulge+bar mass $({{{\rm M}}_{\star ,{{i}^{B}}}})$ and disk mass $({{{\rm M}}_{\star ,{{i}^{d}}}})$ are determined using the same values for structural parameters for the disk, ${{\left\{ {{R}_{0}},{{L}_{d}},{{{\Sigma}}_{\star }}({{R}_{0}}) \right\}}_{i}}$, randomly drawn from their respective distributions listed in Table 3. The histogram of values ${{(B/T)}_{i}}={{{\rm M}}_{\star ,{{i}^{B}}}}/({{{\rm M}}_{\star ,{{i}^{B}}}}+{{{\rm M}}_{\star ,{{i}^{d}}}})$ is then normalized to integrate to unity. The dashed blue curve shows the result of simply drawing from the fiducial estimates of $P({\rm M}_{\star }^{B}|{\mathcal{D}})$ and $P({\rm M}_{\star }^{D}|{\mathcal{D}})$ from our model independently. Calculating B/T in a model-consistent manner yields a noticeably tighter constraint than when not accounting for covariances between ${\rm M}_{\star }^{B}$ and ${\rm M}_{\star }^{D}.$ Lastly, we overlay the distribution of B/T luminosity ratios for a sample of 212 Sbc-Sc galaxies measured from SDSS by Oohama et al. (2009). Although converting to B/T mass ratios would produce a slight shift toward higher values, we see the B/T ratio for the Milky Way is not unusual for galaxies of its Hubble type.

Standard image High-resolution image

5. SUMMARY AND DISCUSSION

In this paper we have developed improved constraints on several of the Milky Way's global properties. We build upon the prior measurements found in the literature, joining them into consensus results using the power of the HB method. This method (Press 1997, pp. 49–60; Lang & Hogg 2012) takes into account the possibility of inaccurate measurements being included in our data sets, and has proven to be quite robust when varying how we deal with such "bad" measurements. By incorporating all the information contained in the individual measurements, we have obtained significantly improved constraints on the Milky Way properties we investigate. At the same time, given the expectation that there could be systematics affecting the Milky Way data, the HB method has given us confidence in the robustness of our results, even though Occam's razor favored the simplest model (i.e., the IVWM) in the end. For convenience, we tabulate the main results from this study in Table 6, as well as their derivatives with respect to the Galactocentric radius of the Sun, R0, and the exponential scale length of the disk, Ld, allowing for our results to be updated if constraints on these parameters improve.

In our first application, we capitalize on the work of Chomiuk & Povich (2011), who provide a tabulation of the SFR estimates in the literature over the last several decades. The authors make huge strides in placing each measurement on equal footing by renormalizing them all to the same choice of IMF and SPS model. Using these updated estimates as our data set, we find the SFR of the Milky Way to be ${{{\rm \dot{M}}}_{\star }}=1.65\pm 0.19\;{{{\rm M}}_{\odot }}\;{\rm y}{{{\rm r}}^{-1}}$ (assuming a Kroupa IMF and Kroupa-normalized Kennicutt 1998 ionizing photon rate).

Next we investigate the stellar mass contained in each of the major components of the Milky Way. In Table 4, we have compiled an extensive list of Galactic bulge, pseudo-bulge, and/or bar mass estimates from the literature. We assume the single-exponential density profile for the disk laid forth by Bovy & Rix (2013); we tabulate the adopted probability distributions for all relevant parameters in Table 3. We then combine our HB analysis with MC simulations that uniformly scale each estimate to the same value of R0, ensuring propagation of R0 errors into the hierarchical ${\rm M}_{\star }^{B}$ result. The MC calculations also yield estimates of the stellar mass of the bulge+bar, the stellar mass of the disk, and the total stellar mass in the Milky Way.

Our combined estimate of the bulge+bar mass is ${\rm M}_{\star }^{B}=0.91\pm 0.07\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ This estimate has substantially smaller errors than the individual estimates used to derive it. We note that the error given not only reflects the random and systematic uncertainties in each individual estimate, but also current uncertainties in R0. Our results show that once we have renormalized each estimate to reflect the same choice of IMF, stellar distribution profile for the disk, and definition of stellar mass that likely there are minimal systematics to further correct for. We adopt the Kroupa IMF in this study and include the contributions of main-sequence stars and remnants, but not substellar material, in our definition of stellar mass in accord with the MPA-JHU measurements of SFR and ${{{\rm M}}_{\star }}$ for external galaxies found in SDSS.

Under consistent assumptions, we find that the disk mass of the Milky Way is ${\rm M}_{\star }^{D}=5.17\pm 1.11\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ Our model of the Galactic stellar disk is based on that of Bovy & Rix (2013), but we directly incorporate current uncertainties in the Galactocentric radius of the Sun. Our disk mass estimate is in broad agreement with other measurements found in the literature. One of the earliest models of the Milky Way by Bahcall & Soneira (1980), primarily constrained by star counts from photometric observations of the Galaxy, yields a stellar disk mass of $5.6\times {{10}^{10}}\;{{{\rm M}}_{\odot }}.$ The ΛCDM-based Milky Way models by Klypin et al. (2002) favor a range of 4–5 × 1010 ${{{\rm M}}_{\odot }}.$ Similarly, the kinematically constrained models from Dehnen & Binney (1998) find the mass of the disk to lie in the range of 4.2–5.1 × 1010 ${{{\rm M}}_{\odot }}.$

We find that the total stellar mass in the Milky Way is ${{{\rm M}}_{\star }}=6.08\pm 1.14\times {{10}^{10}}$ ${{{\rm M}}_{\odot }})$ is negligible compared to our uncertainty). The overall error in our ${{{\rm M}}_{\star }}$ estimate is dominated by that contributed from the disk component, rendering our final result highly insensitive to any assumptions involved with combining bulge mass estimates to determine $P({\rm M}_{\star }^{B}|{\mathcal{D}})$. In comparison, McMillan (2011) find a total baryonic mass of $6.43\pm 0.63\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ from which we can subtract the atomic and molecular phase gas mass of $9.5\pm 3.0\times {{10}^{9}}\;{{{\rm M}}_{\odot }}$ (Dame 1993, corrected for helium contributions by Flynn et al. 2006), giving ${{{\rm M}}_{\star }}=5.48\pm 0.70\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}.$ In addition, Flynn et al. (2006) give a back-of-the-envelope calculation for ${{{\rm M}}_{\star }}$, finding it to be in the range 4.85–5.5 × 1010 ${{{\rm M}}_{\odot }}.$ Our estimate for the total stellar mass compares well with the results of these two recent studies, but takes advantage of a large sample of bulge mass measurements found in the literature and is founded on improved knowledge of several of the properties of the Galactic disk from SDSS. Again, we stress that all of our mass results assume a Kroupa IMF and an exponential profile for the Galactic disk to match the assumptions used in studies of extragalactic objects.

Ultimately, the constraints we are able to place on the stellar mass of the disk component as well as the total stellar mass of the Galaxy depend strongly on the value we assume for R0. For our main results, we have chosen the Gillessen et al. (2009) estimate of 8.33 ± 0.35 kpc for its direct determination of the distance to Sgr A* (i.e., the center of the Milky Way) based on the orbits of nearby stars, its thorough treatment of systematic errors, and (thanks in part to the breadth of its errors) its consistency with a large variety of other R0 measurements (both direct and indirect) in the literature, which range as low as ≲8 kpc or as high as ≳8.5 kpc. It is likely, however, that constraints on R0 will tighten as geometric methods are continually improving, while indirect methods that are plagued by systematic errors become obsolete (Genzel et al. 2010). For instance, the most recent measurement by Chatzopoulos et al. (2015) finds ${{R}_{0}}=8.30\pm 0.09{{|}_{{\rm stat}}}\pm 0.10{{|}_{{\rm syst}}}$ kpc by dynamically modeling the nuclear star cluster dynamics. The constraints from that analysis in the distance versus mass of the black hole plane are in modest tension with those from monitoring stellar orbits by Gillessen et al. (2009); a joint analysis of the two ignoring this tension yields ${{R}_{0}}=8.36\pm 0.11$ kpc. If we are to use this result for our study, the uncertainty in both the disk mass and the total mass are reduced to $0.5\times {{10}^{10}}$ ${{{\rm M}}_{\odot }},$ and our bulge mass error is reduced to $0.06\times {{10}^{10}}$ ${{{\rm M}}_{\odot }},$ while the central values are changed only slightly.

From the distribution of model-consistent realizations of ${\rm M}_{\star }^{B}$ and ${\rm M}_{\star }^{D},$ accounting for covariance between the two, we find that the Milky Way has a bulge-to-total mass ratio of $B/T=0.150_{-0.019}^{+0.028}$. As seen in Figure 8, this result makes our Galaxy typical among galaxies of similar morphological type in the local universe. Finally, combining our results for ${{{\rm \dot{M}}}_{\star }}$ and ${{{\rm M}}_{\star }},$ we find that the specific SFR of the Milky Way is ${{{\rm \dot{M}}}_{\star }}/{{{\rm M}}_{\star }}=2.71\pm 0.59\times {{10}^{-11}}$ yr−1.

In a paper soon to follow, we will continue toward our goal of producing a better global picture of the Milky Way. With the improved constraints placed on ${{{\rm M}}_{\star }}$ and ${{{\rm \dot{M}}}_{\star }}$ in this work, we will next show that we can convert this information into accurate predictions for the integrated photometric properties of the Milky Way; i.e., the brightness and color of our Galaxy as they would be observed by alien astronomers from across cosmological distances. All of this work will culminate in a newfound ability to accurately place the Milky Way in context—i.e., we now have tight constraints on where our Galaxy falls compared to observational trends we find for other galaxies.

We are grateful to Simon Mutch, Harry Ferguson, Sandra Faber, Leo Blitz, Jo Bovy, Hongsheng Zhao, Renbin Yan, Rachel Somerville, and Alister Graham for helpful discussions. We thank Chad Schafer especially for a detailed reading and comments, as well as assistance with clarifying the strict-prior assumptions used in Section 4.2. We also thank the anonymous referee for leading us to a number of improvements in this work. T.C.L. and J.A.N. are supported by the National Science Foundation (NSF) through grant NSF AST 08-06732.

APPENDIX: ALTERNATIVE MODELS OF THE GALACTIC DISK

If the exponential disk model used by Bovy & Rix (2013) is far from reality, the total mass estimates in this paper will, of course, be incorrect. The true global structure of the Galactic disk remains an active area of research, and the direct measurements available in the literature that probe the (luminosity and mass) distribution of its stars have generally been limited to the range of $3\lesssim R\lesssim 9$ kpc (e.g., Jurić et al. 2008; Bovy & Rix 2013). Limiting the range of radii in this way mitigates the problem of needing to distinguish bulge/bar stars from disk stars where they overlap. These components may be separated with kinematical information, but that is available for few stars and is of course impossible for studies based on aggregate light (e.g., Freudenreich 1998; Drimmel & Spergel 2001). Similarly, studies that investigate the inner stellar mass at $R\lesssim 3$ kpc, such as all of those in our Table 4, must account for the bulge+bar and disk components either by fitting for them simultaneously or subtracting off the contribution from stars in the inner disk based on the model that is assumed. We remind the reader that, as described in Section 4.3, we have attempted to renormalize all of the bulge+bar mass measurements in Table 4 to reflect uniform assumptions about the disk. Ultimately, all of this data is consistent with an exponential mass density profile, such as the one we have used for this study; in general, structural decompositions and model magnitude measurements for external galaxies also assume such a profile for disks.

In order to assess the impact that a non-exponential mass profile would have on the inferred total disk mass, we investigate the impact of allowing the mass profile to be described by a Sérsic model (Sérsic 1968) in the radial direction, rather than a pure exponential, which in the radial (but not azimuthal) direction is equivalent to a Sérsic profile with an index of n = 1. Sérsic indices from global fits to the light from star-forming galaxies in the Sloan Digital Sky Survey cluster around a value just above 1, with a tail to larger values whose strength increases to redder colors (Blanton et al. 2003). This is consistent with a picture where disks are indeed exponential and larger Sérsic indices are obtained in earlier-type spirals with greater bulge contributions (a de Vaucouleurs-profile bulge would have a Sérsic index of n = 4).

To investigate this further, we have employed the NYU Valued-Added Galaxy Catalog (Blanton et al. 2005), which includes Sérsic radial profile fits to the r-band 2D images of a sample of galaxies from the Seventh Data Release (DR7; Abazajian et al. 2009) from the Sloan Digital Sky Survey (York et al. 2000). These fits provide the Sérsic index (n), scale radius (r0), and the covariance matrix between n and r0. From Data Release 8 (DR8; Aihara et al. 2011) we have also obtained the spectroscopically measured redshift (z), the fraction of light from a de Vaucouleurs (bulge-like) profile when combined with a pure exponential (n = 1) profile that best fits the 2D image (fracDeV), the axis ratio (b/a) from the best-fit exponential profile to the 2D image, and the $g-r$ color from model magnitudes that are extinction- and K-corrected to z = 0 (which we denote $^{0}(g-r)$; Blanton & Roweis 2007). We restrict to only those galaxies found in both the NYU-VAGC and DR7/8 data sets that are also part of the cleanly measured volume-limited sample described in (T. C. Licquia et al. 2015, in preparation). Overall, this ensures that all galaxies lie in the range $0.03\lt z\lt 0.09$ and have cleanly measured images. From this sample we then reduce to only those that meet the following criteria: $^{0}(g-r)\lt 0.5$, $\sigma (n)\lt 0.25$, $\sigma ({{r}_{0}})\lt 0.25$, fracDeV < 0.1, and $b/a\gt 0.7$. This yields a set of 5,533 star-forming galaxies with minimal bulge-like components that appear ∼face-on and that have well-measured radial Sérsic profile fits. We expect that the Sérsic indices for this sample can be used to broadly constrain the plausible values for the Galactic disk.

We find the median of this distribution to be 1.08 and the 1σ range to be [0.9, 1.3]. Given that this sample still includes objects with a nonzero bulge mass (even a small de Vaucouleurs bulge would increase the combined Sérsic index), values of n near but larger than 1 can be easily explained by bulge contributions to the light. Apparent values of n below 1 are predicted to be observed for intrinsically pure exponential disks due to projection effects and dust absorption (see Pastrav et al. 2013a, 2013b, and references therein). Furthermore, one expects substantial scatter when fitting Sérsic models to objects with visible substructure (e.g., spiral arms). As a result, the Sérsic index distributions of disk-dominated galaxies in SDSS appear to be highly consistent with a scenario where the effective Sérsic index for disks, if measured with no bulge contribution, would in fact be n = 1 (i.e., exponential). We therefore do not find any compelling evidence that would cause us to discard the assumption that the Milky Way disk is exponential so that a more complicated model is needed.

Although we have not found any compelling evidence that the exponential disk assumption should be discarded, in order to assess what the impact of non-exponential profiles could be on our results, we have explored the effect that varying the Sérsic index has on the distribution of stellar mass in the Milky Way compared to a pure exponential (n = 1) profile, while keeping fixed the mean values of the disk parameters listed in Table 3. For fixed disk parameters, decreasing values of n increasingly add more stellar mass toward the center of the Galaxy, while increasing values of n remove it. For instance, decreasing from n = 1 to 0.9 effectively ∼doubles ${{{\Sigma}}_{\star }}$ at the Galactic center as well as the total disk mass at $R\lt 3$ kpc (from $1.97\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$ to $3.74\times {{10}^{10}}$ ${{{\rm M}}_{\odot }}$), which would imply that the disk is the dominant component over the bulge at all radii. The possibility that the disk may be described by a radial Sérsic index that is non-negligibly below 1 can therefore be discarded, as such models would be in strong tension with direct measurements of the Galactic rotation curve at these inner radii, as well as the mass measurements of the bulge we use in this study.

In contrast, we find that the mass of the disk between R = 3 kpc (roughly where the bulge truncates) and R = 25 kpc (well beyond the likely truncation radius of a Sérsic-model disk) is rather stable for $n\geqslant 1$; increasing n anywhere from 1 up to 1.5 yields changes that are always ≲5 × 109 ${{{\rm M}}_{\odot }},$ which is modest compared to the overall uncertainties in the disk mass estimate we present in Section 4.5. For changes at $R\lt 3$ kpc, disk mass and bulge/bar mass are traded off against each other (due to our definition of the central mass), so total masses are minimally affected. Of course, given that we have renormalized all of the bulge+bar mass measurements in Table 4 to reflect an exponential (n = 1) disk, it would not be consistent to combine our HB bulge+bar mass result with a generalized Sérsic model of the disk. Hence, the effects we have explored here only constitute a first-order correction; doing better would require determining how a wide variety of historical bulge mass measurements would change if a very different disk model were used in fitting, which goes far beyond the scope or purpose of this paper. We note that, since our definition of the bulge/bar mass is the additional mass in excess of an extrapolated exponential disk, even if the disk equivalent n were in fact less than 1, we would have included that mass in our bulge component, so our total mass would be minimally affected (with only the fraction of that mass assigned to the bulge and disk changing).

We note that the ultimate goal of this study is to produce measurements of the mass of the Milky Way that may be directly compared to those for other galaxies, particularly the mass determinations from the MPA-JHU SDSS catalog. The photometry in that catalog is based on model fits to galaxies that assume de Vaucouleurs bulges and pure exponential disks; if a very different model were used for determining the Milky Way mass, it is unlikely the results would be directly comparable. We also note that studies of the Galactic disk find that its vertical structure is well described by an exponential distribution with roughly constant scale height (e.g., Jurić et al. 2008; Bovy & Rix 2013), inconsistent with a Sérsic model. Given all this, we consider an exponential mass profile, rather than a generalized Sérsic profile, the best option for modeling the Galactic disk. Should this assumption prove incorrect in the future (e.g., based on additional tests that will be provided by Gaia (Gilmore et al. 2012) or APOGEE-2 (Sobeck et al. 2014) measurements), we have nonetheless found that changing the disk model within reasonable bounds should have negligible impact on our total stellar mass estimates.

Please wait… references are loading.
10.1088/0004-637X/806/1/96