A Gaia Early DR3 Mock Stellar Catalog: Galactic Prior and Selection Function

Jan Rybizki; Markus Demleitner; Coryn Bailer-Jones; Piero Dal Tio; Tristan Cantat-Gaudin; Morgan Fouesneau; Yang Chen; René Andrae; Léo Girardi; Sanjib Sharma

doi:10.1088/1538-3873/ab8cb0

1. Introduction

The Gaia mission (Gaia Collaboration et al. 2016) and second Data Release 2 (GDR2) (Gaia Collaboration et al. 2018) have provided positions, parallaxes, proper motions, and three photometric bands for 1.3 billion sources across the sky. It also provided effective temperatures, luminosities, extinctions, and radial velocities for various subsets of these sources. While this has led to an unprecedented rich view of our Milky Way system (i.a. Belokurov et al. 2018; Cantat-Gaudin et al. 2018; Helmi et al. 2018), it is at the same time hard to understand the limits of this data set. To help with this, the community has produced mock stellar catalogs that have similar selections to, and provide the same observables, as Gaia, in which the underlying truth is known. Grand et al. (2018), Sanderson et al. (2020) used N-body cosmological simulations of Milky Way-like galaxies. These have been used to interpret patterns in the stellar phase-space structure seen in GDR2 in terms of our Galaxy's merger history (Belokurov et al. 2020; Grand et al. 2020), and they have been used to estimate the mass of our Galaxy (Grand et al. 2019). A slightly different approach was taken by some of the present authors in Rybizki et al. (2018), were we used an underlying Milky Way model (Robin et al. 2003) to produce a mock stellar catalog with galaxia (Sharma et al. 2011), a tool to sample stars from density distributions or N-body data. We published this in the same way as GDR2, namely via Astronomy Data Query Language (ADQL) and mimicking the GDR2 data model. This proved useful for testing the Gaia selection function (Bailer-Jones et al. 2018a; Conroy et al. 2019b) and also to estimate false positive rates in common proper motion pairs (El-Badry & Rix 2018; Tian et al. 2020). It served also as a Galaxy prior (Bailer-Jones et al. 2018b) and provided an easy way to query a Milky Way model (Yuan et al. 2019; Amarante et al. 2020) or estimate starcounts for future surveys (Conroy et al. 2019a).

In this paper we present Gaia early DR3 mock (GeDR3mock), a simulated Gaia catalog with entries for 1573,457,319 individuals stars brighter than G = 20.7 mag. It is intended as a community service for the preparation of the upcoming Gaia Early Data Release 3 (Gaia EDR3). Compared to our GDR2mock catalog Rybizki et al. (2018), for GeDR3mock we have updated the Milky Way model (Czekaj et al. 2014) and have added the Magellanic Clouds and over 1000 open clusters (Kharchenko et al. 2013; Cantat-Gaudin et al. 2018). We simulate observational uncertainties empirically using GDR2 uncertainties scaled to the longer baseline of 34 months for Gaia EDR3 (compared to 22 months in GDR2). We again mimic the GDR2 data model, and additionally provide all underlying stellar parameters, e.g., teff, logg, feh, age, extinctions in all bands, initial and current mass, and which galactic component the star belongs to. All values provided in the catalog are noise free and we provide all values for all stars in the catalog, i.e., including those that will be absent from GeDR3 or even Gaia DR3. For example we provide radial velocities for all stars, which means that the user has to apply an appropriate selection. We assist with the selection function by providing both, maps of limiting magnitudes for four different GDR2 selections⁸ (all sources, all with parallax, all with BP-RP, all with parallax and BP-RP), and the tools we used to create the maps (Rybizki 2019). We provide example ADQL queries to illustrate how to access the data.

The paper is structured as follows: In Section 2 we sketch the generation of GeDR3mock. Section 3 discusses selection effects of the Gaia instrument, following by a comparison to GDR2 in Section 4. In Section 5 we discuss the catalog content and limitations. We provide example queries in Section 6.

2. Catalog Generation

Our catalog has been generated using galaxia (Sharma et al. 2011) a tool to turn an underlying chemo-dynamical Milky Way model via stellar isochrones into a synthetic or "mock" stellar catalog. It also has the functionality to turn N-body data into mock stellar particles, which we use to generate the Magellanic Clouds and open clusters. We use version 0.8.1 of galaxia (Sharma et al. 2019) and made some modifications to the code that we explain below. We have linked the final version of our galaxia code in the galaxia_wrap⁹ python package. Both can be used in interplay to redo or customize our catalog.

2.1. The Milky Way Model

The underlying Galaxy model of galaxia is based on the Besançon (Robin et al. 2003) model. Since 2003 the Besançon model has seen many updates for various Galactic components. We have implemented a selection of these changes and list them in the following subsections. For each Galactic component we indicate the population ID (popid) which can be used to only select stars of a specific component. Basic information on age and local mass normalization of the thin- and thick-disk components can be inspected in Table 4.

2.1.1. Thin Disk—Popid = 0–6

We use a thin disk scale length of 2.2 kpc for popid 1 to 6 (Reylé et al. 2009), but use the fiducial 5 kpc for the youngest disk population (popid = 0) as in Czekaj et al. (2014). The star formation rate (SFR) is modeled as exp(−0.12τ), where τ is the time from 10 Gyr ago, in accordance with Czekaj et al. (2014).¹⁰ We use the KH-v6 initial mass function (IMF) from Czekaj et al. (2014, Table 1) for the thin disk. For the metallicity we implemented the values from Robin et al. (2012a, Table 5).

2.1.2. Thick Disk—Popid = 7

For the thick disk, we implemented an age spread of 1 Gyr (Sharma et al. 2019) and left the mean at 11 Gyr. The thick disk metallicity is set to −0.48 ± 0.3 dex (Czekaj et al. 2014).

2.1.3. Halo—Popid = 8

We set the age of the halo to 13 Gyr instead of the default 14 Gyr because of isochrone limitations. The metallicity is −1.5 ± 0.5 dex (Robin et al. 2012a, Table 5). The velocity dispersion is taken from Robin et al. (2012a, Table 7).

2.1.4. Bulge—Popid = 9

Metallicity is 0.0 ± 0.2 (Robin et al. 2012a, Table 5). Velocity dispersion is also taken from Robin et al. (2012a, Table 7)

2.1.5. Magellanic Clouds—Popid = 10

The most prominent extragalactic features in the sky density maps of GDR2 data are the Magellanic clouds (MCs) (see the middle panel of Figure 4). We include a simple model of the MCs in order to study first order selection effects that occur in such dense regions at large distances. In case the user does not want to include the MCs when querying GeDR3mock, this can be done adding the following to a query:

WHERE popid !=10 - - this is part of an ADQL query

Download table as: ASCII Typeset image

To generate N-body particles (from which galaxia then produces mock stellar particles) that represent the MCs, we use the parameters of the MC model tabulated in Robin et al. (2012a, Table 10) and assumed a constant star formation rate for both MCs. The sky position is taken from Paturel et al. (2003). We arbitrarily set the velocity dispersions to 20 and 10 km s⁻¹ for the LMC and SMC, respectively. The stellar masses are set to 3.8 × 10⁹ M_⊙ for the LMC and to 6 × 10⁸ M_⊙ for the SMC, values that reproduce approximately the starcounts in those regions of the sky. To build the LMC we drew 10⁵ particles from a Gaussian distribution with a standard deviation of 1.075 kpc. We assume spherical symmetry for the SMC as well for which we drew 10⁴ particles with a standard deviation of 0.525 kpc. The velocity distribution is randomly applied to each particle (relative to the 3D velocity of the center-of-mass of the LMC given in Robin et al. 2012a, Table 10) using a normal distribution and neglecting the position of the particle within the MCs (which we accept is dynamically inconsistent). This means that GeDR3mock MCs kinematics should not be used to compare to the detailed Gaia observations, which a.o. allows for rotation field inference in these galaxies. For galaxia to generate mock stellar particles from these 10⁵ and 10⁴ N-body particles we calculate a 6D smoothing length using EnBiD¹¹ (Sharma & Steinmetz 2006).

2.1.6. Open Cluster—Popid = 11

Our underlying Milky Way model is smooth, but we know that the real Galaxy has many localized overdensities in phase-space, like moving groups, and open and globular cluster. Unraveling and cataloging such structures with the help of Gaia data is an active topic of research (e.g., Liu & Pang 2019; Castro-Ginard et al. 2020). We add mock open star clusters to our catalog, so that the astronomical community can train their algorithms to detect them and to extract their underlying astrophysical parameters.

As an input catalog we use 1118 real clusters from Cantat-Gaudin et al. (2018) and Kharchenko et al. (2013). We mock up the unknown astrophysical parameters of these in order to create an underlying truth, from which we can sample stars. The exact procedure can be inspected in notebook 7a of galaxia_wrap (where also a fits file with the exact values can be found), but in brief, the procedure for assigning parameters to individual objects is as follows.

1.
The metallicity, [Fe/H], is forced to be 0.1 dex in the inner disk, −0.35 dex in the outer disk with a linear transition in between 8 and 12 kpc galactocentric radius. We add Gaussian noise of ±0.1 dex to these [Fe/H] values.
2.
Mass values of the clusters are picked from a truncated normal distribution (300 < M_⊙ < 2050, most clusters have low masses) and were sorted and assigned according to the number of member stars in GDR2. We chose the mass distribution in order to roughly reproduce the overall number of all cluster members, which is of order of 400 K stars.
3.
We assume a solid body rotation for the stellar clusters with random spin axis. Rotational velocity is also correlated with the number of member stars (more stars mean higher rotational velocity). Velocities range from 0.1 to 0.7 km s⁻¹ and are given at the cluster radius.
4.
cluster center-of-mass positions and velocities were directly taken from the input catalog of 1118 clusters.

The mock data was generated using notebook 7. The particles contained in each cluster were distributed in a Plummer sphere using amuse (Portegies Zwart et al. 2009). To the resulting self-consistent velocity distribution we added the internal rotation depending on the position of each stellar particle with respect to the spin axis. The open cluster population can be easily queried¹² via:

SELECT GAVO_NORMAL_RANDOM(pmra,pmra_error) AS

pmra_obs, - - noise added values

GAVO_NORMAL_RANDOM(pmdec,pmdec_error) AS pmdec_obs

FROM gedr3mock.main

WHERE popid = 11 - - selects only open cluster stars

- - takes about 10 minutes

Download table as: ASCII Typeset image

This query was used to generate the data for Figure 1, where observational noise has already been added via the GAVO_NORMAL_RANDOM function.¹³ We show the proper motions for mock and GDR2 cluster members (Cantat-Gaudin & Anders 2020) in orange and blue, respectively. Although the real clusters and their mock counterparts differ on a star-by-star basis, their statistical properties are (by design) in overall agreement.

Finding and characterizing the mock clusters might be a good exercise to test the capabilities of detection methods to be used on the Gaia EDR3 data. If the user is not interested in those mock clusters, they can be excluded from a query via the statement:

WHERE popid! = 11 - - de-selects the open clusters

Download table as: ASCII Typeset image

If users would like to mock up their own N-body data, e.g., clusters including tidal tails, streams or whole galaxies, they can adjust the procedure used in galaxia_wrap notebook 6 and 7.

2.1.7. Galactic Warp and Flare

We update the parameterization of the warp, based on Gyuk et al. (1999), following Reylé et al. (2009). Their comparison to 2MASS starcounts reveals that the displacement of the mid plane, characterized by the term γ_warp in the expression

$\begin{eqnarray}&&{z}_{\mathrm{warp}}(R)={\gamma }_{\mathrm{warp}}\times (R-{R}_{\mathrm{warp}})\times \sin (\phi -{\phi }_{\mathrm{warp}})\end{eqnarray} \tag{ 1 }$

needs to be lowered from 0.18 to 0.09. ${z}_{\mathrm{warp}}(R)$ denotes the height of the warp above the plane. The starting galactocentric radius of the warp, R_warp is left at 8.4 kpc. For the warp angle, ϕ_warp, we change the value from 0° to 15° in line with Yusifov (2004), which had no major effect on the fitting in Reylé et al. (2009).

2.1.8. Thin- and Thick-disk Normalization

The various changes to the default galaxia MW model outlined above, especially to the SFR and IMF, result in a substantial change in the starcount distribution over all sky in our updated model. To gauge the new model to GDR2 data we produced models with different thin- and thick-disk normalizations, i.e., we rescaled the density distribution of the underlying model by a linear factor for thin- and thick-disk separately. We compared to local densities which are based upon Jahreiß & Wielen (1997) data (see Table 4) and global starcounts (see Figure 4). For the latter we applied HEALpix¹⁴ dependent G magnitude limits (Rybizki & Drimmel 2018) (as explained in Section 3.2) to the mock and the real data and cut out the MCs. We also inspected how well the mock data would fit the real data, using a Poisson likelihood based on binned CMDs per HEALpix, where the HEALpix level is variable in order to have a similar amount of stars in each HEALpix (and therefore CMD). The exact procedure and algorithms can be looked up in galaxia_wrapnotebook 5.¹⁵ A compromise between the overall starcounts, the local mass normalization, and the CMD likelihood was then chosen by eye¹⁶ resulting in a thin disk normalization of 0.9 and a thick disk normalization of 0.8. The new thin disk normalization of 0.9 applies to all thin disk populations,¹⁷ i.e., popid ∈ [0, 1, 2, 3, 4, 5, 6].

2.2. PARSEC-COLIBRI Isochrones

The set of isochrones is the main astrophysical input that turns the underlying density distribution into mock stellar observations. Therefore we included the latest updates on these, as well as white dwarf tracks.¹⁸ The basic isochrones come from Marigo et al. (2017), and are built joining the PARSEC evolutionary tracks from Bressan et al. (2012) with the thermally pulsing asymptotic giant branch (AGB) from Pastorelli et al. (2019).¹⁹ To these tracks, we add WD tracks from Miller Bertolami (2016), using cooling sequences of initial metallicity Z = 0.01 from Renedo et al. (2010). These grids of white dwarfs have been extrapolated up to a final WD mass of 1.1 M_⊙ by using fitting relations. The derived isochrones are converted into the Gaia DR2 magnitudes by means of synthetic photometry performed with the YBC software (Chen et al. 2019),²⁰ in this case using the Weiler (2018) filter transmission curves for Gaia, which provides two BP bands, i.e., one for bright and one for faint magnitudes with a limit of G = 10.87 mag.

For the generation of Gaia photometry galaxia uses the complete isochrone set. In order to calculate the extinction in all bands (Gaia, SDSS, 2MASS, UBV) and the photometry in other systems (SDSS, 2MASS, UBV) we use a gridded version of the isochrones, reducing the total number of model stars from 8102,858 to 243,238. We create a grid with the following number of bins (step-size in parentheses) [boundaries in brackets]: [Fe/H] 36 (0.05 dex)[−1.5, 0.34]; ${\mathrm{log}}_{10}({T}_{\mathrm{eff}})$ 162 (0.02) [2.45, 5.68]; ${\mathrm{log}}_{10}({L}_{\odot })$ 217 (0.05) [−4.60, 6.24]. A combination of those three-dimensions determines the index_parsec (LLLTTTFFF, L = lum, T = teff, F = feh). The median of all stars that fall into a specific gridpoint is taken and also the standard deviation inspected.²¹ We report here the 50 and 99 percentiles of the standard deviation for all bins of this grid for the other stellar parameters: log(age) [0.03, 1.22]; initial mass [0.10, 2.96]; current mass [0.03, 4.14]; log(g) [0.04, 0.51]; G [0.04, 0.78]; G_BP [0.04, 1.12]; G_RP [0.04, 0.68]; G_RVS [0.04, 0.63]. These photometric bands and extinctions can be queried via a separate table: gedr3mock.parsec_props. An example is given in Section 6.5. Due to the nonlinear scaling of extinction with dust column density (reddening of an already reddened spectrum is weaker), extinction values are given for 6 different A0 values: 1, 2, 3, 5, 10, 20 mag. For an extinction law we used Cardelli et al. (1989) plus O'Donnell (1994), with R_V = 3.1, and higher order bolometric corrections have been taken into account. Links to the raw and reduced isochrone data are given in galaxia_wrap. Notebooks on the generation of the grid can be found here.²²

2.3. New 3D Extinction Map

An integral part of a mock stellar catalog generation is the application of interstellar reddening due to dust, because most stars which would have a G magnitude brighter than 20.7 in the absence of dust have a fainter G magnitude after extinction has been added. We build upon our experience with the Bovy et al. (2016) combined dust map, which was put together using different 3D extinction maps (in order to get full-sky coverage). We replace the Bayestar 2015 map (Green et al. 2015) by Bayestar 2019 (Green et al. 2019) up to $| b| \lt 20^\circ$ and Bayestar 2017 Green et al. (2018) above (Bayestar 2017 has less clustering in low dust regions). Toward the Galactic center the combined map uses Marshall et al. (2006) which goes deeper since it is based on infrared data, whereas Bayestar requires photometric measurements in the optical as well. Parts that are not filled due to the Pan-STARRS (Chambers et al. 2016) footprint are filled with Drimmel et al. (2003). See Figure 1 of Bovy et al. (2016) for the footprint of each map. Resolution was increased to HEALpix level 9 (nside = 512, area of 47 arcmin²) from Healpix level 7 (nside = 128, 755 arcmin²) in GDR2mock and distance sampling is refined to 120 bins logarithmically sampled from 60 pc to 60 kpc.²³ The data cube is linked in galaxia_wrap and methods for application are present in the library/add_extinction.py file of galaxia_wrap. A₀ (monochromatic extinction at lambda = 547.7 nm in mag) values are interpolated linearly in distance while adjacent HEALpix values are not interpolated (HEALpix footprint is visible, as are the borders between the different extinction maps).

As reported before, the extinction in specific bands (G, BP_bright, BP_faint, RP and RVS; these two BP bands will be merged in a later step, described in Section 2.4) has been precalculated for 6 different values of A₀: 1, 2, 3, 5, 10, 20 mag. In order to calculate the extinction in a specific band for a specific value of A₀ (that comes from the 3D extinction map), we make a cubic fit to those 6 values onto a finer grid between 0 and 20 mag and then interpolate linearly to the exact A₀ (this two step interpolation is a compromise between accuracy and speed, when operating with large arrays of extinction).²⁴ For values of A₀ that are larger than 20 mag, we linearly scale the value for A₀ = 20 mag. The procedure outlined above illustrates the steps taken in library/util.py:apply_extinction_curves() and gives the extinction in the respective photometric band which we add to the apparent magnitude of the unreddened stars as generated by galaxia.

2.4. Apparent Magnitude Cut

For GeDR3mock we compute all stars with G brighter than 20.7 mag using galaxia. Afterwards we apply our 3D extinction map and add absorption to each band. Thereafter we remove stars with G > 20.7 mag, which diminishes stars by a factor of 4. Until this point we have used BP bright and faint bands separately, but now we use either of these as BP depending on whether the source is brighter or fainter than G = 10.87 mag (Weiler 2018). Note that some of the sources in our catalog can have BP or RP magnitudes much fainter than 20.7 mag in their respective bands. Thus in order to retrieve sources from our catalog that would have BP and RP measurements in GDR2 (or Gaia EDR3), the user may want to apply cuts to our catalog on these magnitudes. We do not model BP or RP excess flux spilling over from nearby sources, which will brighten up those bands for faint stars in dense areas in the Gaia data.

2.5. Uncertainty Model

In GDR2mock we used the pre-launch nominal sky-averaged error model. This underestimates uncertainties, especially in the bright regime. To simulate the Gaia measurement metrics (e.g., no. of observations, parallax or photometric uncertainties) more accurately for GeDR3mock, we use GDR2 data to fit a predictive model of a metric as a function of parameters that we can simulate from galaxia (e.g., magnitude, color, position). Specifically, we select 0.5% of GDR2 data at random and use this to train ExtraTree models (Geurts et al. 2006; Pedregosa et al. 2011). The first model uses Galactic longitude and latitude as inputs to predict the number of visibility_periods_used (VPU) and phot_g_n_obs (NOBS). These are mutliplied by 34/22 (the longer baseline of Gaia EDR3 compared to GDR2) and rounded to the nearest integer. We then train separate models with G, BP-RP, VPU and NOBS (all values are still coming from GDR2) as inputs to predict the parallax_error and phot_g_mean_mag_error (using approximate values computed from the symmetrized flux uncertainties). These are then scaled by $\sqrt{22/34}$ to account for the longer baseline. This scaling factor assumes that the dominant noise factor is source-noise rather than systematics, which is not actually the case at the bright end. The factor of $\sqrt{22/34}$ is the factor for flux uncertainties, not magnitude uncertainties. Similarly, the radial_velocity_error is predicted from G, BP-RP, and Teff and the same rescaling is applied. The procedure outlined above can be inspected in notebook 8.

From NOBS we derive NOBS for BP and RP by fitting a linear relation that minimizes the least squares. We similarly derive the photometric uncertainties in the other bands from linear relations on phot_g_mean_mag_error, and uncertainties in positions and proper motions from linear relations²⁵ on parallax_error. The fitted relations are listed in Table 1 and the procedure to obtain those values can be inspected in notebook 8a. We account for the ${\left(22/34\right)}^{1.5}$ uncertainty scaling for the proper motions. We produce the mock errors this way in order to save storage in the ADQL database, because simple scaling relations with other columns do not require additional space.

Table 1. Empirical Scaling Relations that Evaluate the Quantity in the First Column as a Function of the Quantity (from GDR2) in the Second Columns

Derived Quantity	Scaling Relation
`phot`_`bp`_`n`_`obs`	0.092 `phot`_`g`_`n`_`obs`
`phot`_`rp`_`n`_`obs`	0.096 `phot`_`g`_`n`_`obs`
`phot`_`bp`_`mean`_`mag`_`error`	19.85 `phot`_`g`_`mean`_`mag`_`error`
`phot`_`rp`_`mean`_`mag`_`error`	9.12 `phot`_`g`_`mean`_`mag`_`error`
`pmra`_`error`	1.71 `parallax`_`error`
`pmdec`_`error`	1.52 `parallax`_`error`
`ra`_`error`	0.81 `parallax`_`error`
`dec`_`error`	0.75 `parallax`_`error`

Download table as: ASCII Typeset image

2.6. Catalog Entries are Reported Noise-free

All quantities that we report in GeDR3mock are noise-free. Noise can be added based on the uncertainty estimates derived in Section 2.5 from within ADQL: see the example in Section 6. As in the GDR2 data model, GeDR3mock contains the phase space distribution in the following observables: ra, dec, l, b, parallax, pmra, pmdec, radial_velocity and similarly for the photometry, though we add an extra G_RVS column. A few stellar parameters have been estimated in GDR2 (Andrae et al. 2018) and these are reported together with all the other known quantities in GeDR3mock: teff_val, ag_val, a_g_val, e_bp_min_rp_val, radius_val, lum_val, feh, a0, initial_mass, current_mass, age, logg, popid, a_bp_val, a_rp_val, a_rvs_val. The column descriptions can be inspected here.²⁶

3. Selection Function

Here we explain and investigate two effects that prevent stars from entering into the real Gaia catalog, even though they would be brighter than G = 20.7 mag and therefore inside the GeDR3mock catalog. For a proper comparison between mock and data, these selection effects should be taken into account.

3.1. Contrast Sensitivity

When two sources in Gaia are close to each other, the fainter one might not get allocated an observational window by Gaia, depending on their separation and magnitude difference (de Bruijne et al. 2015). This effect, dubbed "contrast sensitivity," has been quantified to some degree for GDR2 by Brandeker & Cataldi (2019). We used their Table 1 to calculate, for each source in GeDR3mock, its probability to be seen, which we call "visibility." We compute and add to GeDR3mock a quantity d11y that gives an integer from 0 to 100, where 0 means no visibility. The ADQL query that pre-selects close pairs and calculates d11y is linked to the galaxia_wrap repository. As can be seen from Table 2, 69 million sources have issues with too bright and too near neighbors. When accounting for the magnitude limits from GDR2 this number drops to 33 million. GeDR3mock does not include binaries or globular clusters, which would otherwise increase those numbers.

Table 2. Number of Sources in GeDR3mock with Certain Visibility Values, for all Sources (Second Column), and for Sources Brighter than the Magnitude Limits Given in Table 3 (Third Column)

Visibility	GeDR3mock	GeDR3mock with G Maglim
%	million

0	34	16
1–50	20	10
51–99	15	7
100	1505	1304

Download table as: ASCII Typeset image

3.2. Magnitude Limit of GDR2

The effective magnitude limit along a line of sight can be shifted toward brighter magnitudes by a combination of crowding (Gaia Collaboration et al. 2016) and a limited number of scans. The latter can, for faint sources, drop below the number of observations required for specific Gaia data products to be included in a release, e.g., parallax (Lindegren et al. 2018), G, BP, RP, or RVS. This magnitude limit can be approximated by the mode of the magnitude distribution within a specific HEALpix. When, in the following, we speak of the magnitude limit, we refer to the mode of the magnitude distribution binned in 0.1 mag bins. To illustrate how this manifests itself in the real data we show in Figure 2 such maps for GDR2 for G < 20.7. The top panel shows the magnitude limits when only requiring G measurements. We see that in the bulge and Magellanic Clouds the magnitude limits are brighter than everywhere else. Away from the disk the limit becomes rather noisy. In the middle panel we require that a parallax measurement be available. This makes the bulge limits brighter, and satellite scanning patterns become visible. In the bottom panel we show the same map again (requiring parallax measurement), but this time we only set the limit from the G-magnitude distribution for a HEALpix if it has more than 1e5 sources per deg². In all other HEALpix the limits are set to 20.7.

As we can see the mode estimator—upper panel of Figure 2 has two main failure modes: (a) the starcount in a specific HEALpix is low, such that the magnitude distribution gets noisy due to Poisson sampling, and (b) a peak in the magnitude distribution is produced by some localized stellar population in a distant overdensity, e.g., red clump stars in the Magellanic Clouds, that is not characteristic of the crowding limit. An easy fix for (a) is to only apply the magnitude limits in dense areas and to set the magnitude limit to 20.7 in all HEALpix that have a small stellar density. This is what we do in the bottom panel of Figure 2 for a density threshold of 1e5 sources per deg².²⁷

To illustrate those failure modes further we plot in Figure 3 the G magnitude distributions for three different HEALpix at level 6, namely toward Baades window, the LMC and a low density field at l = 20 and b = 30. The following query exemplifies the data acquisition for Figure 3:

SELECT COUNT(∗) AS ct, ROUND(phot_g_mean_mag,1) AS mag

FROM gaia.dr2light

WHERE source_id BETWEEN 4657847914607935488 AND 4657988652096290815

- - healpix level 6 pointing on Baades window

GROUP BY mag

Download table as: ASCIITypeset images: 1 2

**Figure 3.** GDR2 G magnitude distributions in different directions of the sky. The pointings are toward Baades window, the LMC, and a low density field at l = 20, b = 30. Each curve corresponds to one HEALpix at level 6. From these magnitude distributions we approximate the limiting magnitude by taking the mode.
Download figure:
Standard image High-resolution image

We can see how the red clump peak (blue points) in the LMC at around G = 19 mag can yield an incorrect magnitude limit estimate. The low density field (red points) is not yet too noisy such that the mode of the distribution is still a good estimator for the magnitude limit but one can see how the Poisson noise in the magnitude distribution can produce modes at brighter magnitude limits if the stellar density gets even lower or the HEALpix level increases. In Baades window (green points) it can be seen that a brighter magnitude limit of about G = 19 mag is reached and sources are petering out beyond that. The red clump peak in the luminosity function of Baades window at G = 16 mag is well visible, but does not bias our magnitude limit estimation in this particular case, since the mode is at fainter magnitudes still.

We provide both variants, i.e., with and without a density threshold applied in the tables where the latter has no suffix and the former has _density_thresholdadded. We also provide magnitude limits for BP (also under the condition that G < 20.7). For each band the magnitude limit comes for four flavors (and each flavor has one with and without density threshold applied):

(1)
G < 20.7 mag (applies to all variants)
(2)
parallax measurement is available
(3)
BP and RP measurement is available
(4)
parallax, BP and RP are available.

In Table 3 we list the number of sources that are included in the G magnitude limits for both GeDR3mock and GDR2. In total, GeDR3mock has 1573 M sources compared to 1451 M in GDR2.²⁸ When considering only sources that are brighter than the HEALpix dependent magnitude limit the numbers are more similar. The reason why the mock catalog has more sources than GDR2 is because of the density limit of the Gaia instrument of about 1.05 M sources deg⁻² (Gaia Collaboration et al. 2016). The highest density area in GeDR3mock has 5.6 M sources deg⁻². An illustration of this can be seen in Figure 8, the CMD of Baade's window, where the magnitude limit that also requires the existence of color and parallax measurements is depicted too.

Table 3. Number of Sources in GeDR3mock and GDR2 for G < 20.7 mag

Magnitude Limit	GeDR3mock	GDR2
Column Name	Starcounts in Million

No magnitude limit	1573	1451
`maglim`_`g`	1332	1321 (131)
`maglim`_`g`_`parallax`	1146	1100 (168)
`maglim`_`g`_`color`	1231	1123 (131)
`maglim`_`g`_`parallax`_`color`	1111	1012 (158)

`maglim`_`g`_`density`_`threshold`	1361	1358 (94)

Note. Starcounts are shown for stars brighter than the limiting G magnitude given in the online tablegedr3mock.maglim_6. In parenthesis the numbers are given for the stars that are outside of the magnitude limit but are still brighter than G = 20.7 mag and fulfill the selection criteria, e.g., need parallax measurement for maglim_g_parallax.

Download table as: ASCII Typeset image

We generated HEALpix maps of those magnitude limits for HEALpix levels 5, 6, and 7 (nside = 32, 64, and 128, have areas of 3.36, 0.84, and 0.21 deg², respectively) using the gdr2_completeness package²⁹ (Rybizki & Drimmel 2018). They can be accessed via gedr3mock.maglim_X, where X is the HEALpix level.

Notebook 3 and 4 of gdr2_completeness illustrate how to generate those maps. We encourage the user to produce maps for their specific use-cases, e.g., accounting for quality cuts or the existence of measurements, for example radial velocity.

We did not provide RP magnitude limits because those are mainly governed by the condition that G is brighter than 20.7 mag. Since RP is usually brighter than G, sources are usually lost because they get too faint in G not because they get too faint in RP.

We also caution the use of the BP magnitude limit, because in dense areas, faint sources can acquire very bright BP (and RP) magnitudes due to flux contamination from neighboring sources, something that is not modeled in GeDR3mock. BP maps might still be useful when comparing to other data or when modeling the BP and RP flux excess.

More details on all bands and a comparison to GeDR3mock magnitude limits can be found in Appendix A.

Once the real data, Gaia EDR3, comes out we will provide updated magnitude limit maps in the TAP service.

An example of how to query all stars in GDR2 that are below the maglim_g magnitude limit for HEALpix level 6 is given below.

SELECT COUNT(∗)

FROM gaia.dr2light AS g

JOIN gedr3mock.maglim_6 AS lim

ON (g.source_id/140737488355328=lim.hpx)

- - matches catalogs on HEALpix number (level 6)

WHERE phot_g_mean_mag<lim.maglim_g

- - takes about 1 to 2 hours

Download table as: ASCII Typeset image

A python package with a more rigorous method providing completeness as a function of magnitude per HEALpix (Boubert & Everall 2020) can be found here.³⁰ One drawback of this is that the magnitude limits seem to depend on the authors' all-sky partition into equal density areas.

4. Comparison to GDR2

As a first quality assessment and to get an overview of the catalog parameters, we compare GeDR3mock with GDR2. This also serves to illustrate how the catalog can be queried via TAP services using ADQL queries.

4.1. Sky Distribution

In order to compare the source density over the sky between GeDR3mock and GDR2, we apply the contrast sensitivity and the magnitude limits from the previous section to GeDR3mock:

SELECT COUNT(∗) AS ct, hpx

FROM gedr3mock.main AS g

JOIN gedr3mock.maglim_6 AS lim

ON (g.source_id/140737488355328=lim.hpx)

WHERE

phot_g_mean_mag<lim.maglim_g_density_threshold

AND d11y-RANDOM()∗100 > 0

- - samples the visibility probability

GROUP BY hpx ORDER BY hpx

- - starcounts per hpx are returned

- - takes about 1 to 2 hours (varies with load)

Download table as: ASCII Typeset image

This returns 1336 M stars. For GDR2 this returns 1358 M stars.

In Figure 4 we show the stellar densities in HEALpix level 6 for GeDR3mock and GDR2 and a comparison map at the bottom. GDR2 has more sources toward the poles, but overall the agreement is reasonably good. Globular clusters and the Sagittarius stream are visible, and the Magellanic Clouds show more structure in GDR2. When looking at the comparison map, we see the footprint of the Marshall et al. (2006) extinction map transitioning into Drimmel et al. (2003) (see Figure 1 of Bovy et al. 2016), where there is a discrete jump in the model starcounts (color getting redder) toward the right in the galactic plane. The warp is more prominent in the model, and the bulge structure is not well reproduced. The fit to the Magellanic clouds is poor, owing to the simplistic Gaussian distribution in GeDR3mock.

4.2. Color–Magnitude Diagram (CMD)

Another insightful test is the CMD comparison. Here we do not apply HEALpix-dependent magnitude limits to either catalog, as those do not change the basic structure of the distributions. The query is:

spSELECT COUNT(∗) AS N,

AVG(phot_bp_rp_excess_factor) AS excess,

ROUND(phot_bp_mean_mag---phot_rp_mean_mag,2) AS

color,

ROUND(phot_g_mean_mag,1) AS mag

FROM gaia.dr2light TABLESAMPLE(50)

- - this only uses 50 WHERE phot_g_mean_mag < 20.7

GROUP BY color, mag

- - this query takes between 1 and 2 hours

Download table as: ASCIITypeset images: 1 2

For GDR2 and GeDR3mock.³¹ These queries count the stars and average the phot_bp_rp_excess_factor³² in magnitude bins (the excess factor has not been modeled in GeDR3mock). The data is shown in Figure 5 where the density distribution is given for GeDR3mock and GDR2 in the left and middle panel, respectively. We see that GDR2 lacks sources³³ below the gray dashed line. The line indicates where the number of stars drops sharply when cutting on G_BP < 22 mag. It seems to be a limit where the bulk part of sources is getting lost in GDR2 (with G < 20.7 mag). In the right panel of Figure 5 we see that sources which go below that line have issues with contaminated BP and RP measurement. Similarly the very blue stars in the GDR2 data have no counterpart in GeDR3mock. Again these stars have high phot_bp_rp_excess_factor, which is not modeled in GeDR3mock. The other structures in the CMD are fairly well reproduced. With respect to catalog selection function there are only 1.6 M sources in GDR2 (5 M if including sources with G > 20.7) with G_BP > 22 mag, while GeDR3mock has 36 M.

**Figure 5.** Color–magnitude diagram for GeDR3mock in the left and GDR2 in the middle panel, color-coded by number of sources per CMD bin. The right panel shows again the GDR2 CMD but this time the average `phot`_`bp`_`rp`_`excess`_`factor` per bin is depicted. The dashed gray line (its functional form is $G=-0.9(\mathrm{BP}-\mathrm{RP})+23$ ) indicates a sharp limit in GeDR3mock and GDR2, below which no stars are left if we cut on ${G}_{\mathrm{BP}}\lt 22$ mag.
Download figure:
Standard image High-resolution image

**Figure 5.** Color–magnitude diagram for GeDR3mock in the left and GDR2 in the middle panel, color-coded by number of sources per CMD bin. The right panel shows again the GDR2 CMD but this time the average `phot`_`bp`_`rp`_`excess`_`factor` per bin is depicted. The dashed gray line (its functional form is $G=-0.9(\mathrm{BP}-\mathrm{RP})+23$ ) indicates a sharp limit in GeDR3mock and GDR2, below which no stars are left if we cut on ${G}_{\mathrm{BP}}\lt 22$ mag.
Download figure:
Standard image High-resolution image

5. Catalog Content and Limitations

The catalog contains 1573,457,319 stars. It is hosted at GAVO³⁴ and can be queried via gedr3mock.main. Example queries see Section 6. A bulk download is also available.³⁵

5.1. Data Model and Catalog Content

Our catalog, by design, mimicks the GDR2 data model, which will be similar in Gaia EDR3. Some fields are filled with NULLs rather than omitted in order for GDR2 ADQL queries not to throw errors. Values like phot_bp_rp_excess_factor or ruwe are not easy to model because they depend on the actual measurement, but one could train models on the real data to predict those values for the mock catalog, using the method presented in Section 2.5 (notebook 8).

Entries in GeDR3mock that have no counterpart in the GDR2 data model are now explained:

1.
phot_g_mean_mag_error For convenience we provide magnitude errors for all photometric bands. These are only good approximations of the flux error for small values.
2.
phot_rvs_mean_mag Since we have the isochrone models with an approximate RVS band³⁶ we also provide RVS mag (simply computed assuming a Vegamag zero-point) because it is useful to select magnitude complete RVS samples.
3.
popid The popid from the Besançon model (see Table 4; halo = 8 and bulge = 9), additionally having the Magellanic clouds = 10 and the open clusters = 11.
4.
d11y The visibility is given in percentage. Can be lower than 100 due to bright sources in the near vicinity (see Section 3.1).
5.
index_parsec Is an index for joining the main mock catalog to other photometric bands/extinctions in the gedr3mock.parsec_props table.
6.
a_bp_val, a_rp_val, a_rvs_val These are extinctions in the specified bands, in analogy to a_g_val in the G band.
7.
source_id The most significant bits identify the HEALpix number as with Gaia source_id. The rest of the source_id is a running number. The source_id can be easily turned into HEALpix number for any arbitrary HEALpix level, n, smaller than or equal to 12 (level 12 corresponding to Nside = 4096) via division:
$\begin{eqnarray}&&\mathrm{Healpix}(\mathrm{level}=n)=\mathrm{FLOOR}\left(\displaystyle \frac{{\mathtt{source}}\_{\mathtt{id}}}{{2}^{35}\times {4}^{(12-n)}}\right).\end{eqnarray} \tag{ 2 }$

Few additional stellar parameters not listed above but can be found in Section 2.6. General information on the catalog and its columns can be inspected here.³⁷

5.2. Limitations

The underlying Galaxy model is a simple approximation of reality with know shortcomings, see lower panel of Figure 4 and discussion thereof in Section 4. There have been improvements in the thick disk, halo (e.g., Robin et al. 2014) and bulge (Robin et al. 2012b) components of the Milky Way model, but these updates did not build on each other, so we decided to stay with the basic model update from Czekaj et al. (2014). LMC and SMC have only Gaussian distributions with inconsistent velocity prescription. We only simulate single stars. The star formation in GeDR3mock is smooth (not clumpy) and independent of the 3D extinction model, therefore the two do not show the correlations one observes in the real MW. The all-sky 3D extinction map is up-to-date but not perfect, especially where different maps have been joined together.

5.3. Updates when GaiaEDR3 is Released

We plan to update our mock catalog after the release of GaiaEDR3, foreseen for late 2020. This will contain magnitude limit maps as well as error, nobs, ruwe and contrast sensitivity columns based on GaiaEDR3 data. As some of those already exist, we will add abbreviations indicating that these were derived using Gaia EDR3 data. Updates to GeDR3mock will be announced here.³⁸

5.4. Extension to GDR3 Content

The "full" GaiaDR3 currently planned for late 2021 will include many more data products. To assist the use and analysis of that catalog, we plan to augment GeDR3mock in a follow-up study with:

1.
binaries
2.
galaxies and quasars
3.
models of BPRP and RVS spectra (if publicly available)
4.
chemical abundances using chemical evolution models.

6. Example Use Cases with ADQL Queries

6.1. Distance Prior

The user might be interested in producing a distance prior for the GDR2 RVS sample to be used in a Bayesian parameter estimation similar to the distance estimation in Bailer-Jones et al. (2018b) (see also McMillan 2018). Following is a query that would mimick the GDR2 RVS sample selection and returns the mean distance per HEALpix:

SELECT AVG(1000/parallax) AS mean_distance,

ivo_healpix_index(5, ra, dec) AS healpix

FROM gedr3mock.main

WHERE phot_rvs_mean_mag < 12

AND teff_val < 6900

AND teff_val > 3550

- - selection mimicking RVS sample

GROUP BY healpix

- - takes about half an hour

Download table as: ASCIITypeset images: 1 2

The function ivo_healpix_index(5, ra, dec) shown here computes HEALpix indices based on R.A. and decl.; for Gaia and related data products, this is in general not necessary because by construction of the source_id column one can obtain the HEALpix (in this case, of order 5) somewhat faster by computing ROUND(source_id/ $\left({2}^{35}\times {4}^{(12-5)}\right)$ $\left({2}^{35}\times {4}^{(12-5)}\right)$ ), but the function might be useful for tables without source_id.

We can not use the statement "WHERE radial_velocity IS NOT NULL" because in GeDR3mock all radial velocities are known. Therefore the selection function needs to be approximated.

Figure 6 shows the mean distance per HEALpix which could be directly used as a prior parameterization. 7.1 M sources are returned by GeDR3mock which is more than the 5.3 M that GDR2 has below G_RVS = 12 (G_RVS needs to be approximated using Equations (2) and (3) from Gaia Collaboration et al. 2018). The reason of course is that the effective magnitude limit is brighter in the dense parts of the sky. Cutting on G_RVS < 12 is only a first order approximation. For refinement we recommend to produce a custom magnitude limit map for the RVS sample using the gdr2_completeness package.

**Figure 6.** Mean distances over the sky in galactic coordinates in GeDR3mock with ${G}_{\mathrm{RVS}}\lt 12$ and $3550\lt {\mathtt{teff}}\_{\mathtt{val}}\lt 6900$ . The color encodes mean distance logarithmically. In total this selection returns 7.1 M sources.
Download figure:
Standard image High-resolution image

6.2. Parallax Uncertainty

Because measured parallaxes can have very large uncertainties, the distribution of measured parallaxes can be quite different than for mock (true) parallaxes. We show this for the HEALpix 7876 (at level 5) which is a low density out-off-plane field at l = 20, b = 30. GDR2 contains 46k sources (G < 20.7) and GeDR3mock has 39k sources in that HEALpix. Figure 7 shows, from left to right, the inverted parallax versus the G for: GeDR3mock; the same with parallax noise added; GDR2. In the absence of measurement uncertainty on the left we see a bimodal distribution in parallax at G = 20.7, the peak at 1 kpc consists mainly of lower main sequence stars while the one at about 8 kpc consists mainly of upper main sequence and turn-off stars. These two sequences merge when the parallax uncertainty is added (the G magnitude error is negligible in this diagram). Similarly, the diagonal line in the top right of the three CMDs, which corresponds to the red clump, becomes blurred once noise is added. Noise can be added from within ADQL using:

SELECT parallax, phot_g_mean_mag,

GAVO_NORMAL_RANDOM(parallax,parallax_error) AS

parallax_obs

FROM gedr3mock.main

WHERE source_id BETWEEN 4433793833146253312 AND

4434356783099674623

- - only a low-density HEALpix of level 5

- - takes few seconds

Download table as: ASCII Typeset image

The numbers in the "WHERE" statement are ${2}^{35}\times {4}^{(12-5)}\,\times 7876$ ${2}^{35}\times {4}^{(12-5)}\,\times 7876$ and ${2}^{35}\times {4}^{(12-5)}\times 7877-1$ ${2}^{35}\times {4}^{(12-5)}\times 7877-1$ . The above ADQL query produces the data for the plot together with the analog query for GDR2.

**Figure 7.** Inverted parallaxes (in mas) vs. G magnitude for GeDR3mock, GeDR3mock noise added and GDR2 from left to right.
Download figure:
Standard image High-resolution image

6.3. CMD in Baade's Window—Magnitude Limit

Here we look at a CMD in a high density area, namely Baade's window. This time we add noise to the mock photometry and compare to GDR2 data. The G magnitude limit, when parallax and BP and RP measurement are required, is 18.9 mag. We only query in a circle of radius 0.1° to keep the runtime short (query runs in synchronous mode). GDR2 contains 13k sources, whereas GeDR3mock contains 134k sources. When applying the magnitude cut, these numbers change to 12k and 18k, respectively. The GeDR3mock data for Figure 8 comes from the following query:

SELECT phot_g_mean_mag, phot_bp_mean_mag,

phot_rp_mean_mag,

GAVO_NORMAL_RANDOM(phot_g_mean_mag,phot_g_mean_mag_error)

AS g_obs,

GAVO_NORMAL_RANDOM(phot_bp_mean_mag,phot_bp_mean_mag_error)

AS bp_obs,

GAVO_NORMAL_RANDOM(phot_rp_mean_mag,phot_rp_mean_mag_error)

AS rp_obs

FROM gedr3mock.main

WHERE DISTANCE(270.879, −30.022, ra, dec) < 0.1

- - takes a few seconds

Download table as: ASCII Typeset image

The left (blue) plume contains upper main sequence and turn-off stars, while the right (red) plume contains giant stars. The overdensity at G = 16 mag is the red clump (see Figure 3). Both plumes seem to have merged at fainter magnitudes in GDR2, whereas even with noise applied these remain distinct in mock. Only at fainter magnitudes does the noise become visible, as seen in the spreading in color.

**Figure 8.** CMD of Baade's Window (a circle of 0.1°) for GeDR3mock, GeDR3mock noise added, and GDR2 (panels from left to right). The empirically determined g magnitude limit is indicated as a gray line. The density distribution is renormalized above the magnitude limit.
Download figure:
Standard image High-resolution image

6.4. Local 50 pc Sample

The local normalization, i.e., the local stellar mass density, is a common benchmark for Galaxy models, we query the 50 pc sample using:

SELECT initial_mass, current_mass, age, popid,

feh, bp_rp, phot_g_mean_mag, parallax,

GAVO_NORMAL_RANDOM(parallax,parallax_error) AS

parallax_obs

FROM gedr3mock.main

WHERE 1/parallax < 0.05

- - takes about 10 minutes

Download table as: ASCII Typeset image

This returns 49,934 sources.

In Table 4 we compare our local mass density to Model B of Czekaj et al. (2014), their Table 7. The mass values agree pretty well, just the thick disk is only about 5% of the local mass density, compared to their 9%. Figure 9 shows the age distribution of the local 50 pc sample. The piecewise flat, but exponentially decreasing SFR (for thin-disk popid 0 to 6) is visible, as well as a local overdensity of very young (dynamically cold) stars. All stars of the 50 pc sample are depicted in Figure 10 together with their respective metallicities (color coded). The sample contains 4162 white dwarfs (WD) for which the current mass is much lower than the initial mass. Extrapolating to a 100 pc sample 10,856 of these WDs would be within the completeness range of Jiménez-Esteban et al. (2018). They find 8555 stars, which is only a 20% difference.

**Figure 9.** The age distribution of the 50 pc sample.
Download figure:
Standard image High-resolution image

**Figure 10.** The color absolute magnitude diagram of the 50 pc sample, with metallicity color coded. In total we have 50k sources with 4k being white dwarfs.
Download figure:
Standard image High-resolution image

Table 4. Contribution of all Galactic Components to the Local Stellar Mass Density

Popid	Age (Gyr)	GeDR3mock	Model B
	(Gyr)	(10⁻³ × M_⊙ pc⁻³)

Thin disk 0	0–0.15	1.7	1.9
1	0.15–1	4.9	5.0
2	1–2	3.6	4.1
3	2–3	3.1	2.8
4	3–5	5.4	4.9
5	5–7	5.7	5.0
6	7–10	11.1	9.3
Total thin disk	0–10	35.5	33.0
White dwarfs	0–12	5.0	7.1
Thick disk 7	10–12	1.7	2.9

Note. We compare to Model B from Czekaj et al. (2014).

Download table as: ASCII Typeset image

The stellar distribution in the CMD looks reasonably well but the pre-main sequence might be a bit too pronounced, as it was in GDR2mock (Rybizki et al. 2018). We find that 214 (0.4%) mainly faint sources would have scattered out of our 50 pc sample if cutting on observed parallax. Vice versa 236 that are truly outside of 50 pc would have scattered in when cutting on observed parallax. The 10% increase for the in-scattering stars is due to the assymetric volume at the border of the 50 pc sphere, given that the stellar density is almost isotropic.

6.5. Other Photometric Bands

It is possible to query the absolute magnitudes and extinctions in other bands (UBV, 2MASS, SDSS) for specific values of A₀ via the gedr3mock.parsec_props table. An example query for apparent magnitudes in 2MASS bands could be:

SELECT phot_g_mean_mag, bp_rp AS color,

tmass_j-5∗LOG10(parallax/100)+a0∗A0_1_tmass_j

AS tmass_j,

tmass_ks-5∗LOG10(parallax/100)+a0∗A0_1_tmass_ks AS

tmass_ks

FROM gedr3mock.main

JOIN gedr3mock.parsec_props

USING (index_parsec)

- - crossmatching with the PARSEC isochrone table

WHERE source_id BETWEEN 4433793833146253312 AND

4434356783099674623

- - takes a few seconds

Download table as: ASCII Typeset image

Beware that all values in the parsec_props table are binned according to the procedure outlined in Section 2.2 (they are mapped onto the catalog stars via index_parsec), which means that the CMD distribution will be somewhat discretized. Also by using A0_1_tmass_j in the above query we approximate the extinction with a low A₀ value, which means that for large A₀ values the extinction will be overestimated since the absorption does not scale linearly with A₀ as the source spectrum gets redder with dust column density.

7. Summary

We have presented the generation and content of a Gaia early DR3 mock stellar catalog (GeDR3mock). With respect to the previous version, GDR2mock (Rybizki et al. 2018), we have updated the thin disk model (Czekaj et al. 2014) as well as the 3d extinction map (Green et al. 2019) and isochrones (Marigo et al. 2017), which now also include white dwarfs (Miller Bertolami 2016). We also added a simple model of the Magellanic Clouds and open clusters, the latter including internal rotation.

We refined the uncertainty model by training empirically on GDR2 data and scaled it to the longer time baseline of Gaia EDR3. A main focus of our investigation is modeling the selection function of the Gaia instrument and DPAC filtering. We provide all-sky magnitude limit maps (Rybizki & Drimmel 2018) approximated empirically by the mode of the magnitude distribution in a specific line of sight. A better comparison between model and data is achieved when applying those cuts to the relevant subsets. Similarly we investigate how many sources in GeDR3mock would suffer from decreased visibility due to contrast sensitivity (Brandeker & Cataldi 2019) and flag those stars in GeDR3mock.

In order for the user to be able to create their own synthetic stellar catalog from N-body data or a galaxy model, we provide the routines we used for generating our catalog in the python package galaxia_wrap (Rybizki 2019), as well as the isochrones and the modified galaxia software, and the jupyter notebooks that illustrate their use.³⁹

We provided some example ADQL queries to show the many possible catalog interactions and to compare GDR2 to our mock stellar catalog. We plan to add columns/tables that update the magnitude limits and uncertainty estimates once Gaia EDR3 is released. These additions will be announced on the GAVO site of the catalog.⁴⁰ In preparation for the "full" Gaia DR3, we plan to augment GeDR3mock with data products that will be new in full Gaia DR3, including binaries, extragalactic objects, and chemical abundances.

This work made use of the following software packages: topcat (Taylor 2005), HEALpix (Górski et al. 2005), astropy (Astropy Collaboration et al. 2018), mwdust (Bovy et al. 2016), dustmaps (Green 2018), amuse (Portegies Zwart et al. 2009).

We estimate the CO₂ footprint of this publication as follows: 6 person-months of work (MPIA yearly average per employee: 9 tons) = 4.5 tons. Data access: 3 yr ∗ 1 KW (conservative estimate of server electricity consumption) ∗ 5% (GeDR3mock consumed data volume) = 1.3 MWh corresponding to 0.6 tons CO₂ with the average German energy mix. I will not travel anywhere by plane for the purpose of promoting this paper.

We thank the anonymous referee for their thorough inspection and helpful comments.

We thank the German Astrophysical Virtual Observatory⁴¹ for the publishing platform and for fruitful discussions on the technical aspects of this endeavor.

YC acknowledges support from the ERC Consolidator Grant funding scheme (project STARKEY, G.A. n. 615604).

This research has made use of the VizieR catalog access tool, CDS, Strasbourg, France (doi:10.26093/cds/vizier). The original description of the VizieR service was published in A&AS 143, 23

This work has made use of data from the European Space Agency (ESA) mission Gaia, processed by the Gaia Data Processing and Analysis Consortium (DPAC). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.

This work was supported by the MINECO (Spanish Ministry of Economy) through grant ESP2016-80079-C2-1-R and RTI2018-095076-B-C21 (MINECO/FEDER, UE), and MDM-2014-0369 of ICCUB (Unidad de Excelencia "María de Maeztu"). T.C.G. acknowledges support from Juan de la Cierva—Formación 2015 grant, MINECO (FEDER/UE). This work was funded by the DLR (German space agency) via grant 50 QG 1403.

Appendix A: Magnitude Limits Derived from GDR2 and GeDR3mock

Here we show the magnitude limit maps for G, BP and RP derived as explained in Section 3.2 at HEALpix level 7 for GeDR3mock and GDR2. For GDR2 we required: a parallax measurement, a color measurement, and G < 20.7 (which corresponds to gedr3mock.maglim_7.maglim_g_parallax_color). For GeDR3mock we required nothing, which implies G < 20.7.

Starting with the G band, we see in Figure A1 that in GeDR3mock the magnitude limit is generally G = 20.7 which is expected from the way the catalog is generated. Exceptions from this are the central parts of the LMC and low density areas toward the Galactic poles. The former is due to the red clump producing a peaked luminosity function at the distance modulus of the LMC and the latter is due to Poisson noise in the magnitude distribution in low density HEALpix. For GDR2 we also see magnitude limits below G = 20.7 in low density areas and in the Magellanic clouds, albeit a different pattern to the GeDR3mock LMC. Additionally we see values as low as G = 15 mag in the bulge, but also few scanning patterns which come exclusively from the parallax measurement requirement.

For the BP band, which is depicted in Figure A2, we have to remember that we conditioned our queries on G < 20.7 mag, and also that BP is usually fainter than G. Therefore the magnitude limits of BP can be fainter than for G which is apparent in the disk for GeDR3mock. This time the outskirts of the LMC and also the SMC have bright magnitude limits. GDR2 looks similar with respect to the MCs and also has quite faint limits in the disks high extinction areas but again the bulge and scanning law patterns have brighter magnitude limits. We have to also keep in mind that in high density areas sources in GDR2 experience BP and RP flux excess. Which can also drive the magnitude limits to the brighter end.

RP for most stars is usually brighter than G. Therefore the magnitude limits are somewhat brighter as well, as can be seen in Figure A3 since we condition on G < 20.7. Again the LMC sticks out in GeDR3mock footprint but this time we also see a small band in the high extinction areas in the mock catalog. In the high extinction areas the G limit of 20.7 cuts out fainter sources, that would have still be seen in RP but did not make it into the catalog. In the real data of GDR2 this effect can also be seen. As well as the usual bulge and scanning law pattern. The SMC and LMC can also be picked up. While the LMC has a highly lopsided feature that can also be picked up in the G and BP magnitude limit maps.

Appendix B: Popid Queries

As a supplement to the general overview of the GeDR3mock we show here three different plots for each of the following populations:

1.
0 = young thin disk
2.
1–6 = thin disk
3.
7 = thick disk
4.
8 = halo
5.
9 = bulge
6.
10 = Magellanic clouds
7.
11 = open cluster

In the Figures B1–B7 following we show from left to right: All-sky stellar density distribution, a binned CMD colored by number of sources and a binned CMD using reddened absolute magnitudes also colored by number of sources. The following queries led to the data for these figures, respectively:

- - All sky map

SELECT Count(∗) AS ct,

ivo_healpix_index(6, ra, dec) AS hpx

FROM gedr3mock.main

WHERE popid = 0

GROUP BY hpx ORDER BY hpx

- - duration depends on population

Download table as: ASCIITypeset images: 1 2

- - CMD

SELECT COUNT(∗) AS ct,

ROUND(phot_bp_mean_mag---phot_rp_mean_mag,2) AS

color,

ROUND(phot_g_mean_mag,1) AS mag

FROM gedr3mock.main

WHERE popid=0

GROUP BY color, mag

- - duration depends on population

Download table as: ASCII Typeset image

- - CMD using reddened absolute magnitudes

SELECT COUNT(∗) AS ct,

ROUND(phot_bp_mean_mag---phot_rp_mean_mag,2) AS

color,

ROUND(phot_g_mean_mag + 5∗log10(parallax/100),1) AS mag

FROM gedr3mock.main

WHERE popid=0

GROUP BY color, mag

- - duration depends on population

Download table as: ASCII Typeset image

**Figure B1.** Overview of young thin disk (popid = 0). Left an all-sky view in galactic coordinates and aitoff projection of starcounts per HEALpix in level 6. Middle and right panels are the CMD and the CMD with reddened absolute magnitudes, also colored by sources per bin.
Download figure:
Standard image High-resolution image

**Figure B2.** As Figure B1 but for the remaining thin disk, i.e., 0 < `popid` < 7.
Download figure:
Standard image High-resolution image

**Figure B3.** As Figure B1 but for the thick disk, i.e., `popid` = 7.
Download figure:
Standard image High-resolution image

**Figure B4.** As Figure B1 but for the halo, i.e., `popid` = 8.
Download figure:
Standard image High-resolution image

**Figure B5.** As Figure B1 but for the bulge, i.e., `popid` = 9.
Download figure:
Standard image High-resolution image

**Figure B6.** As Figure B1 but for the Magellanic clouds, i.e., `popid` = 10.
Download figure:
Standard image High-resolution image

**Figure B7.** As Figure B1 but for the open cluster, i.e., `popid` = 11.
Download figure:
Standard image High-resolution image

A Gaia Early DR3 Mock Stellar Catalog: Galactic Prior and Selection Function

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Catalog Generation

2.1. The Milky Way Model

2.1.1. Thin Disk—Popid = 0–6

2.1.2. Thick Disk—Popid = 7

2.1.3. Halo—Popid = 8

2.1.4. Bulge—Popid = 9

2.1.5. Magellanic Clouds—Popid = 10

2.1.6. Open Cluster—Popid = 11

2.1.7. Galactic Warp and Flare

2.1.8. Thin- and Thick-disk Normalization

2.2. PARSEC-COLIBRI Isochrones

2.3. New 3D Extinction Map

2.4. Apparent Magnitude Cut

2.5. Uncertainty Model

2.6. Catalog Entries are Reported Noise-free

3. Selection Function

3.1. Contrast Sensitivity

3.2. Magnitude Limit of GDR2

4. Comparison to GDR2

4.1. Sky Distribution

4.2. Color–Magnitude Diagram (CMD)

5. Catalog Content and Limitations

5.1. Data Model and Catalog Content

5.2. Limitations

5.3. Updates when GaiaEDR3 is Released

5.4. Extension to GDR3 Content

6. Example Use Cases with ADQL Queries

6.1. Distance Prior

6.2. Parallax Uncertainty

6.3. CMD in Baade's Window—Magnitude Limit

6.4. Local 50 pc Sample

6.5. Other Photometric Bands

7. Summary

Appendix A: Magnitude Limits Derived from GDR2 and GeDR3mock

Appendix B: Popid Queries

Footnotes