ABSTRACT
We infer distances and their asymmetric uncertainties for two million stars using the parallaxes published in the Gaia DR1 (GDR1) catalogue. We do this with two distance priors: A minimalist, isotropic prior assuming an exponentially decreasing space density with increasing distance, and an anisotropic prior derived from the observability of stars in a Milky Way model. We validate our results by comparing our distance estimates for 105 Cepheids which have more precise, independently estimated distances. For this sample we find that the Milky Way prior performs better (the rms of the scaled residuals is 0.40) than the exponentially decreasing space density prior (rms is 0.57), although for distances beyond 2 kpc the Milky Way prior performs worse, with a bias in the scaled residuals of −0.36 (versus −0.07 for the exponentially decreasing space density prior). We do not attempt to include the photometric data in GDR1 due to the lack of reliable color information. Our distance catalog is available at http://www.mpia.de/homes/calj/tgas_distances/main.html as well as at CDS. This should only be used to give individual distances. Combining data or testing models should be done with the original parallaxes, and attention paid to correlated and systematic uncertainties.
Export citation and abstract BibTeX RIS
1. INTRODUCTION
The European Space Agency (ESA) Gaia mission (Gaia Collaboration et al. 2016b) is obtaining highly accurate parallaxes and proper motions of over one billion sources brighter than . The first data release (Gaia DR1), based on early mission data, was released to the community on 2016 September 14 (Gaia Collaboration et al. 2016a). The primary astrometric data set in this release lists the positions, parallaxes, and proper motions of 2,057,050 stars, which are in the Tycho-2 (Høg et al. 2000) catalogue (93,635 of the these are Hipparcos (Perryman et al. 1997; van Leeuwen 2007) sources). This data set is called the Tycho-Gaia astrometric solution (TGAS; Michalik et al. 2015; Lindegren et al. 2016).
The five-parameter astrometric solutions for TGAS stars were obtained by combining Gaia observations with the positions and their uncertainties of the Tycho-2 stars (with an observation epoch of around J1991) as prior information (Lindegren et al. 2016). This was necessary, because the observation baseline in the early Gaia data was insufficient for a Gaia-only solution. The resulting solutions have median parallax uncertainties of ∼0.3 mas, with an additional systematic uncertainty of about ∼0.3 mas (Gaia Collaboration et al. 2016a; Lindegren et al. 2016).
Using the TGAS parallaxes ϖ and uncertainties , we here infer the distances to all TGAS stars along with (asymmetric) distance uncertainties (as Bayesian credible intervals). The motivation and methods to estimate distances from parallaxes have been described in our earlier works (Bailer-Jones 2015; Astraatmadja & Bailer-Jones 2016, henceforth Papers I and II respectively). We will not repeat the discussion here, except to remind readers that inverting parallaxes to estimate distances is only appropriate in the absence of noise. As parallax measurements have uncertainties—and for many TGAS stars very large uncertainties—distance estimation should always be treated as an inference problem.
2. PROPERTIES OF TGAS PARALLAXES AND THEIR MEASUREMENT UNCERTAINTIES
Panels (a)–(e) of Figure 1 show the distribution of as a function of ϖ, as well as histograms of ϖ and . The distribution in covers a narrow range between 0.2 and 1 mas (cf. Figure 13 of Paper II, which shows the same plot for GUMS data3 ), which reflects the preliminary nature of GDR1. The upper limit of 1 mas is due to the imposed mas cutoff to reject unreliable astrometric solutions, while the lower limit is due to the ∼0.2 mas noise floor, which is dominated by the satellite attitude and calibration uncertainties (Lindegren et al. 2016). Future data releases will much more precise (Gaia Collaboration et al. 2016a).
In Panels (f)–(g) of Figure 1, we show the distribution of the fractional parallax uncertainties of TGAS stars, compared with Hipparcos and GUMS stars. It can be seen here that, interestingly, the combination of TGAS ϖ and produces a distribution of , which is similar to that of Hipparcos stars.
3. METHOD, PRIORS, AND DATA PRODUCTS
The inferred distances of stars depend, not only on the observed parallaxes and their uncertainties, but on the prior. In this paper, we infer distances using two priors: a minimalist, isotropic exponentially decreasing space density prior and a more complex, anisotropic Milky Way prior. The properties of the exponentially decreasing space density have been discussed in Paper I, and in Paper II we have seen that for an end-of-mission Gaia-like catalog, the optimum scale length L is 1.35 kpc. We use this value to derive distances here, even though it is optimized for the end-of-mission catalog, so TGAS stars may have a different true distance distribution.
Although not analyzed here, in our catalog we also provide distances using the exponentially decreasing space density prior using . This value is found by fitting the prior with the true distance distribution of GUMS stars with (this is the V-band magnitude at which Tycho-2 is 99% complete).
The derivation and parameters of the Milky Way prior have been discussed in Paper II, and illustrations of the resulting posterior for several parallaxes ϖ and uncertainties can be seen in Figures 6 and 7 of Paper II. Here, we retain the parameters of the Milky Way model as well as the Drimmel et al. (2003) extinction map, with the exception of the limiting magnitude (Equation (6) in Paper II), used to calculate the faint end of the luminosity function. In this paper, we use , which is the 99.9% percentile of the magnitude distribution of all TGAS stars.
For every single star, we compute the posterior probability density function over distance. The distance estimate we report here is the mode of the posterior, . We do not report the median distance, because as we have seen in Paper II, it is a worse estimator for the priors used here.
In addition to the median, we report in our catalog the 5% and 95% quantiles of the posterior, r5 and r95. Note that many of the posteriors are asymmetric about the mode (and mean and median). The difference between these gives a 90% credible interval, which we then divide by a factor to produce
where s = 1.645 is ratio of the 90%–68.3% credible interval in a Gaussian distribution. Thus, is a simplified (symmetric) uncertainty in our distance estimate, which is equivalent, in some sense, to a 1σ Gaussian uncertainty.
We use neither apparent magnitudes nor colors to help infer the distance, even though we have shown in Paper II that this significantly improves the distance estimation in many cases. This is because GDR1 does not contain color information. We chose not to use the Tycho photometric data on account of its low precision (median photometric uncertainties in BT and VT are, respectively, 136 and 96 mmag).
In the analyses that follow, we have not included in our inference the ∼0.3 mas systematic uncertainties reported for the TGAS parallaxes. This is partly because we know this to be a very rough estimate of the systematics, and is possibly overestimated. But we do provide a second catalog on the website mentioned, which includes this systematic error. It is included by adding it in quadrature with the random parallax error and then repeating the inference. In general, this affects both the mode of the posterior (the distance estimate) and its quantiles (the uncertainty).
4. DISTANCE ESTIMATION RESULTS
The results of the distance estimation are shown in Figure 2 and the statistics of the uncertainties are summarized in Table 1. In Panel (a) of Figure 2, we show the distribution of the estimated distance derived from the mode of the two posteriors already mentioned. The red line in that panel is for a third posterior, which uses the uniform distance prior (Paper I), with a large cutoff at . This posterior is equivalent to inverting the parallax to get a distance, except for the cases where the parallax is very small or negative, in which case the mode of the posterior is at . This is the reason for the peak in the distribution we see in Panel (a). It contains 43,673 stars, which is 2.1% of TGAS. For the exponentially decreasing space density prior, we also see a peak, but at around (it is not very visible as a peak due to the log scale). This is the mode of that prior , and the mode of the posterior is very close to this for stars with large parallax uncertainties. The Milky Way prior also has a mode, but because it is an anisotropic prior, the mode varies with line-of-sight direction. However, the most prominent peak at can be seen, which corresponds to the prior for stars toward the Galactic center, and thus for poorly measured stars in this direction.
Download figure:
Standard image High-resolution imageTable 1. Statistical Summary of the Distance Estimation of Two Million Sources in the Primary Data Set of GDR1
Data set | TGAS | Hipparcos Subset | ||||
---|---|---|---|---|---|---|
10% | 50% | 90% | 10% | 50% | 90% | |
Exponentially Decreasing Space Density | ||||||
All stars | 0.067 | 0.378 | 1.315 | 0.021 | 0.078 | 0.656 |
pc | 0.023 | 0.045 | 0.095 | 0.021 | 0.077 | 0.365 |
Milky Way | ||||||
All stars | 0.066 | 0.273 | 0.874 | 0.021 | 0.077 | 0.365 |
pc | 0.023 | 0.046 | 0.096 | 0.013 | 0.035 | 0.069 |
Note. Columns with headings 10%, 50%, and 90% give the lower decile, median, and upper decile of the fractional uncertainty for all 2,057,050 sources in the primary data set as well as a subset of 93,635 sources in common with Hipparcos.
Download table as: ASCIITypeset image
For distances up to about 200 pc, the distributions of for both priors are similar to each other. Looking again at Panel (d) of Figure 1, we see that for stars with mas, most have . We showed in Paper II that for stars with positive parallaxes and , the distance estimate is largely independent of the choice of prior. Beyond 200 pc, however, the distributions for all priors diverge. For distances of more than 1 kpc, most stars have and the distance estimate becomes much more prior-dependent.
The distribution of the fractional uncertainties in distance is shown in Panels (d) and (e) of Figure 2. For both priors, the combined distribution of and is similar for . Both distributions peak at about 0.15, but beyond that a second peak corresponding to poorly measured stars can be seen at and for the exponentially decreasing space density prior and the Milky Way prior, respectively.
We compare the distances estimated using the two priors with each other, and with distances estimated from the uniform distance prior, in Figure 3. We see again that for distances up to ∼200 pc, distances using all priors are similar. For pc, we start to see elongations that correspond to the mode of the respective priors, as discussed above.
Download figure:
Standard image High-resolution image5. VALIDATION WITH CEPHEID VARIABLES
To see how consistent our estimated distances are with other, more precise, estimates (for distant stars), we compare our estimated distances with the distances of Cepheid variable stars. We took 170 Cepheids from Groenewegen (2013) and cross-matched them with GDR1, using Simbad. We found 105 Cepheids in common with GDR1. The Groenewegen (2013) Cepheids have median fractional uncertainties of about ∼0.054. Almost all of these Cepheids are Hipparcos sources.
Figure 4 compares our distance estimates (for both priors) with those of Groenewegen (2013) for both priors. The bottom row of that figure shows this using the scaled differences
The uncertainties in are taken from Groenewegen (2013), where they were computed in a Monte Carlo simulation, which takes into account uncertainties in the spectrophotometry, the projection factor, and the phase measurements. We multiply these uncertainties by s = 1.645 to scale them to be 90% credible intervals, in order to make a fair comparison with our 90% credible intervals, .
Download figure:
Standard image High-resolution imageTo summarize the differences seen in Figure 4, we calculate the bias , rms of the scaled residuals, as well as the standard deviation of the scaled residuals, for all Cepheids, for both priors. We also do this separately for near ( kpc) and distant ( kpc) Cepheids. These results are summarized in Table 2.
Table 2. The Bias as Well as the rms and Standard Deviation of the Scaled Residuals of Cepheids Stars in the TGAS Catalogue
Prior and sample | |||
---|---|---|---|
Exponentially decreasing space density | |||
All Cepheids | 0.151 | 0.567 | 0.547 |
Cepheids with | 0.298 | 0.678 | 0.608 |
Cepheids with | −0.070 | 0.340 | 0.333 |
Milky Way | |||
All Cepheids | −0.133 | 0.404 | 0.382 |
Cepheids with | 0.022 | 0.395 | 0.394 |
Cepheids with | −0.364 | 0.418 | 0.205 |
Download table as: ASCIITypeset image
Inspecting Figure 4 and Table 2, and assuming the Groenewegen (2013) distances to be "true" (for simplicity), we see that overall the Milky Way prior performs better than the exponentially decreasing space density prior in terms of having a smaller rms and standard deviation. It is slightly less biased than the exponentially decreasing space density prior, although the bias is in the opposite direction: it tends to underestimate distance. This is due to the assumptions the Milky Way prior makes in the face of poor data, which is that a star is more likely to reside in the disk than further away. Hence, this prior becomes mismatched when we only consider the distant Cepheids ( kpc). Distance estimates using the Milky Way prior have a bias of −0.36 for these stars, as is also apparent from Figure 4. For , when the data are poor, the posterior based on this prior has a mode at around about 2 kpc, which roughly corresponds to the radial scale length of the thick disk in our Milky Way model. For Cepheids closer than 2 kpc, however, we see that the Milky Way prior performs well in terms of bias, rms, and standard deviation.
The Milky Way prior also gives a more reasonable credible interval than the exponentially decreasing space density prior, as can be seen in the top row of Figure 4. Most of our TGAS-based distance uncertainties are large, because the Cepheids are distant and have large fractional parallax uncertainties, with median of about 0.48 (versus ∼ 0.2 for all TGAS stars). Furthermore, the posteriors—and, therefore, the credible intervals—are highly asymmetric, with a long tail to large distances. This is a natural consequence of the nonlinear transformation from parallax to distance.
The stars used in this validation are intrinsically bright and relatively distant compared to the typical Milky Way stars used to build the Milky Way prior. Our distance estimation is based solely on measured parallaxes; no photometry is involved. Thus, the Milky Way prior is not well-matched: in the absence of precise parallaxes it tells us that stars are more likely to be in the disk than further away. This explains the poorer behavior of this prior for distant Cepheids. The exponentially decreasing space density performs better in this regime due the scale length L adopted, which puts the mode of the prior at kpc.
6. CONCLUSIONS
We have inferred the distances of two million stars in the Gaia DR1 catalogue using Bayesian inference. The priors used are the exponentially decreasing space density prior with scale length , and the Milky Way prior with the same parameters as in Paper II. The median fractional distance uncertainties () are 0.38 and 0.27 for the exponentially decreasing space density and the Milky Way prior, respectively. If we only consider stars with the estimated distances of pc, the median value of improves to about ∼0.04 for both priors. This applies to about 193,000 stars (the exact number is different for both priors) or about 9% of TGAS.
We validate our distance estimates using more precise distances for Cepheid stars in TGAS taken from Groenewegen (2013). We found that for distances closer than 2000 pc, the Milky Way prior performs better than the exponentially decreasing space density prior. Beyond 2000 pc, the Milky Way prior performs worse for this sample (which are intrinsically bright and distant stars), because it assumes that stars are more likely to be closer in the disk than further away. Our exponentially decreasing space density prior has a longer scale length and thus performs better in this sample, when faced with the same poor measurements. But overall, the Milky Way prior performs better.
Due to the lack of reliable colors, we do not use these in combination with the parallaxes to estimate distances. Rather than using the Tycho magnitudes, significant improvements can be achieved taking spectrophotometric information from other surveys. We choose here just to present astrometric distances.
The distance estimates presented in this paper are useful for individual stars. To obtain the mean distance to a group of stars, such as a cluster, one should do a combined inference using the original parallaxes and taking into account the correlated parallax uncertainties for stars observed in a small field. Note, however, that this combination will still not reduce the uncertainty in the mean below the limit presented by the TGAS systematic parallax error. Similarly, if one wishes to compare a model for distances to the TGAS data, this is normally best done by projecting the model-predicted distances into the parallax domain, rather than using individual estimated distances.
This work has made use of data from the European Space Agency (ESA) mission Gaia (http://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, http://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement. We also made use of NASA's Astrophysics Data System; the SIMBAD database, operated at CDS, Strasbourg, France.
APPENDIX: DATA CATALOGS
The appendix contains the distances and uncertainties of all TGAS sources in Table 3 and for a subset, the Groenewegen (2013) Cepheids 4. These distance catalogs are also available at http://www.mpia.de/homes/calj/tgas_distances/main.html.
Table 3. Distances and Uncertainties of All TGAS Sources
Column | Description |
---|---|
1 | Hipparcos identifier |
2 | Tycho-2 identifier |
3 | Source identifier |
4 | Galactic longitude at epoch 2015.0 |
5 | Galactic latitude at epoch 2015.0 |
6 | Absolute barycentric stellar parallax at the reference epoch |
7 | Standard error absolute barycentric stellar parallax |
8 | Gaia G-band mean magnitude |
9 | Mode distance of posterior using the exponentially decreasing space density prior with |
10 | 5th percentile of posterior using the exponentially decreasing space density prior with |
11 | 50th percentile of posterior using the exponentially decreasing space density prior with |
12 | 95th percentile of posterior using the exponentially decreasing space density prior with |
13 | Standard error in distance using the exponentially decreasing space density prior with |
14 | Mode distance of posterior using the exponentially decreasing space density prior with |
15 | 5th percentile of posterior using the exponentially decreasing space density prior with |
16 | 50th percentile of posterior using the exponentially decreasing space density prior with |
17 | 95th percentile of posterior using the exponentially decreasing space density prior with |
18 | Standard error in distance using the exponentially decreasing space density prior with |
19 | Mode distance of posterior using Milky Way prior |
20 | 5th percentile of posterior using Milky Way prior |
21 | 50th percentile of posterior using Milky Way prior |
22 | 95th percentile of posterior using Milky Way prior |
23 | Standard error in distance using Milky Way prior |
Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.
Download table as: DataTypeset image
Table 4. Distances and Uncertainties of all TGAS Sources
Column | Description |
---|---|
1 | Hipparcos identifier |
2 | Tycho-2 identifier |
3 | Source identifier |
4 | Galactic longitude at epoch 2015.0a |
5 | Galactic latitude at epoch 2015.0a |
6 | Absolute barycentric stellar parallax at the reference epochb |
7 | Standard error absolute barycentric stellar parallax |
8 | Gaia G-band mean magnitude |
9 | Mode distance of posterior using the exponentially decreasing space density prior with c |
10 | 5th percentile of posterior using the exponentially decreasing space density prior with c |
11 | 50th percentile of posterior using the exponentially decreasing space density prior with c |
12 | 95th percentile of posterior using the exponentially decreasing space density prior with c |
13 | Standard error in distance using the exponentially decreasing space density prior with c |
14 | Mode distance of posterior using the exponentially decreasing space density prior with d |
15 | 5th percentile of posterior using the exponentially decreasing space density prior with d |
16 | 50th percentile of posterior using the exponentially decreasing space density prior with d |
17 | 95th percentile of posterior using the exponentially decreasing space density prior with d |
18 | Standard error in distance using the exponentially decreasing space density prior with d |
19 | Mode distance of posterior using Milky Way priore |
20 | 5th percentile of posterior using Milky Way priore |
21 | 50th percentile of posterior using Milky Way priore |
22 | 95th percentile of posterior using Milky Way priore |
23 | Standard error in distance using Milky Way priore |
24 | Groenewenegen 2013 Hipparcos identifier |
25 | Groenewenegen 2013 Tycho-2 identifier |
26 | Groenewenegen 2013 Cepheid distance |
27 | Error in the fit of Groenewenegen 2013 Cepheid distance |
28 | Error based on Monte Carlo simulation of Groenewenegen 2013 Cepheid distance |
Notes.
aAat epoch 2015.0. bAt the reference epoch. cUsing the exponentially decreasing space density prior with L = 0.11 kpc. dUsing the exponentially decreasing space density prior with L = 1.35 kpc. eUsing Milky Way prior.Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.
Download table as: DataTypeset image
Footnotes
- 3
GUMS, the Gaia Universe Model Snapshot (Robin et al. 2012), is a mock catalog, which simulates the expected content of the final Gaia catalogue.