CoSHA: Code for Stellar properties Heuristic Assignment -- for the MaStar stellar library

We introduce \cosha{}: a Code for Stellar properties Heuristic Assignment. In order to estimate the stellar properties, \cosha{} implements a Gradient Tree Boosting algorithm to label each star across the parameter space ($T_\mathrm{eff}$, $\log{g}$, $[\mathrm{Fe}/\mathrm{H}]$, and $[\alpha/\mathrm{Fe}]$). We use \cosha{} to estimate these stellar atmospheric parameters of $22\,$k unique stars in the MaNGA Stellar Library (MaStar). To quantify the reliability of our approach, we run both internal tests using the G\"ottingen Stellar Library (GSL, a theoretical library) and the first data release of MaStar, and external tests by comparing the resulting distributions in the parameter space with the APOGEE estimates of the same properties. In summary, our parameter estimates span in the ranges: $T_\mathrm{eff}=[2900,12000]\,$K, $\log{g}=[-0.5,5.6]$, $[\mathrm{Fe}/\mathrm{H}]=[-3.74,0.81]$, $[\alpha/\mathrm{Fe}]=[-0.22,1.17]$. {We report internal (external) uncertainties of the properties of $\sigma_{T_\mathrm{eff}}\sim43\,(240)\,$K, $\sigma_{\log{g}}\sim0.2\,(0.4)$, $\sigma_{[\mathrm{Fe}/\mathrm{H}]}\sim0.16\,(0.24)$, $\sigma_{[\alpha/\mathrm{Fe}]}\sim0.09\,(0.08)$.} These uncertainties are comparable to those of other methods with similar objectives. Despite the fact that \cosha{} is not aware of the spatial distribution of these physical properties in the Milky Way, we are able to recover the main trends known in the literature. The catalog of physical properties for MaStar can be accessed in \url{http://ifs.astroscu.unam.mx/MaStar}.


INTRODUCTION
The interpretation of galaxy observations into stellar physical properties ultimately relies on a set of ingredients, namely: a selection of stellar evolutionary tracks, an initial mass function and a stellar spectral library (e. g., Bruzual & Charlot 2003;Maraston 2005;Conroy et al. 2009). While, the first two are better constrained through our theoretical knowledge of the stellar interiors and stellar evolution, the latter provides the fundamental link between such theoretical knowledge and the observable quantities. In its essence, a stellar library is a collection of stellar spectra as homogeneous (in sampling and resolution) and curated (from instrumental and flux calibration artifacts) as possible, in the spectral space (e. g., Terndrup et al. 1990;Lancon & Rocca-Volmerange 1992;Fluks et al. 1994). In the parameter space, usually defined by T eff , log g, [Fe/H], and [α/Fe], the distributions of stars in the library are expected to span the range of plausible physics according to the theory, also in an homogeneous way.
These requirements present several challenges since stellar surveys are mostly limited to our galaxy and, in some cases, further limited to the solar neighborhood (e. g., Le Borgne et al. 2003;Valdes et al. 2004;Sánchez-Blázquez et al. 2006;Chen et al. 2014). Such limitations in our ability to acquire large samples of stars, eventually lead to a poor sampling of the parameter space. Furthermore, these limitations are exacerbated by the fact that most galaxies (including our own) usually display heterogeneous distributions in their stellar contents (e. g., Hayden et al. 2015;Fernández-Alvar et al. 2017;Helmi et al. 2018; Barbuy et al. 2021). Therefore it is of paramount importance to better sample our own galaxy and eventually other galaxies with a large spatial coverage and a high spectral resolution to be able to measure spectral line abundances (e. g., García Pérez et al. 2016). To mitigate these restrictions in the parameter space sampled by the so called empirical stellar libraries compiled from observations, several groups of authors have implemented theoretical stellar libraries (e. g., Lejeune et al. 1997;Lejeune et al. 1998;Coelho et al. 2007;Mészáros et al. 2012). Conceptually, theoretical libraries should be able to overcome the issue of sampling in parameter space. However in practice, this flexibility comes at a cost: spectral model imperfections. Theoretical libraries can only model a limited observable space, mainly due to incomplete theoretical knowledge about stellar atmosphere opacities and/or in-complete atomic and molecular data (see Conroy 2013;Coelho et al. 2020, and references therein).
Both empirical and theoretical libraries are clearly complementary approaches (see Coelho et al. 2020). As a matter of fact, most commonly used synthesis of stellar population models combine both, in order to improve the accuracy in their predictions (e. g., Cid Fernandes et al. 2014). In this sense, the MaNGA Stellar Library (MaStar Yan et al. 2019) comprising 24.4 k stars in the same footprint of the MaNGA survey, promises to be a huge leap forward in the direction of a better sampling of the parameter space while preserving the completeness of spectral features in the optical range of empirical libraries if compared with other widely used libraries like MILES (Sánchez-Blázquez et al. 2006). Furthermore, MaStar poses the possibility of analyzing MaNGA IFS galaxies with a stellar library observed and reduced with the same instrumental settings. Such possibility would reduce the sources of uncertainties in the analysis of physical properties extracted from MaNGA galaxies.
A major challenge is to calibrate such data set in T eff , log g, [Fe/H] and [α/Fe], since a high-quality data set with a similar coverage in spectral and parameter space would be needed in order to fit a predefined model or train a ML algorithm. We could use a theoretical model but those are still far from being consistent with one another as they usually rely on ad hoc assumptions with huge impact on stellar atmospheres (e. g., Coelho et al. 2020;Lançon et al. 2021) Hence we have to make a compromise a priori with the limitation of the chosen theoretical library. Other alternatives could be to use observed and well calibrated stars (e.g., MILES, APOGEE), but those are usually restricted in spectral/parameter space and often rely themselves on theoretical models.
More recently, given the nature of the highly nonlinearity of this problem, a departure from conventional fitting techniques has flourished in the literature (e. g., Singh et al. 2006;Ness et al. 2015;Sharma et al. 2020;Ting et al. 2019). For instance, Ness et al. (2015) fits a 2-degree polynomial function in parameter space to a small (∼ 500) set of training stars. They use this as a generative model to infer the physical properties (labels) of stars in the APOGEE survey . A limitation of Ness et al. (2015) approach is that they assume a functional form to model the spectrum using the known labels in the training set, also under the supposition that this labels are exhaustive in describing the stellar spectra and that the set functional model is accurate. To overcome those limitations, Ting et al. (2019) trained an artificial neural network (ANN), dubbed The Payne with a similar objective: to build a generative model and predict stellar parameters from observed spectra. Even though they managed to train with a relatively small sample of (∼ 2000) stars, as a consequence their model is restricted in the parameter space. To mitigate such limitation, Xiang et al. (2021) introduced HotPayne to predict stellar abundances for observed hot stars (> 7500 K). Both, Xiang et al. (2015); Ting et al. (2019) are examples of machine learning approaches overcoming the limitation of imposing a functional form of the model a priori, hence improving the likelihood of capturing non-linear behaviour when mapping from labels to spectral space. While training in the parameter space allowed the authors to substantially reduce the size demand on the training set when training on the spectral space, this advantage came at the expense of generalization of the model. Another limitation of both method (cannon and The Payne) is that both rely on spectral fitting to make the actual prediction of the labels, hence hampering the opportunity of using ML to its fullest potential, i. e., by training in the spectral space to predict directly the labels. In this paper we seek to fill the gap of using conventional (non-ANN) ML approaches and to predict labels directly, without requiring spectral fitting.
Conventional machine learning (ML) algorithms (e. g., ensemble methods) usually demand less training data than ANNs, while retaining the flexibility of the trained model not having a predefined shape (e. g. Géron 2017, see also Appendix A). We introduce a new method to directly determine atmospheric physical parameters of the MaStar spectra by implementing a ML algorithm called Gradient Tree Boosting (GTB). GTB is an ensemble method that successively trains a predefined number of decision trees, each improving upon its predecessor errors. In early experiments we tested other ML regression methods, namely: decision tree, naive Bayes, support vector, etc.. Because our approach was heuristic from the beginning, we did not set to optimize hyperparameters. We adopted GTB as was first method that returned reasonable results when tested on the testing set and compared to APOGEE. This paper is organized as follows. In § 2 we describe the inputs for the training and testing sets; in § 3 we present the physical parameter estimator and training process; in § 4 we evaluate the physical parameters using a set of internal and external tests, and in § 5 we present the MaStar stellar library physical properties; finally, we conclude in § 6. Figure 1. Examples of the spectra in the MaStar library color coded according to their T eff as predicted by CoSHA. The typical 1σ band in each spectrum is also shown. In light grey we show the closest (in parameter space) GSL spectra to each MaStar spectrum. The mismatch in 27-1429 may be attributed to a mismatch in the actual properties of both spectra. The first data release of the library spans a wide range of the physical parameter space suitable for training a machine learning (ML) model to predict the properties of the rest of the stellar spectra.

TRAINING AND TESTING SETS
In this section we describe the samples used to build the training and testing subsets.

Input from the theoretical library GSL
Since we intend to use a ML method, we have to be aware that one of the most important challenges when it comes to this approach is the need for a clean, complete and already labelled sample (e. g., Géron 2017). In this particular case, in order to predict reliable estimates of T eff , log g, [Fe/H] and [α/Fe] for the MaStar spectra, we need a sample of stellar spectra with reliable estimates of these same properties. In the previous section we already sketch out the difficulties of having such empirical library at our disposal. However, state-of-the-art theoretical libraries do prove to be a suitable option.
In the literature there are plenty of theoretical libraries, most of which are focused on particular types of stars (e. g., Kirby 2011;Coelho 2014). However, we require a multi-purpose theoretical stellar library in order to build models as general as possible. For this purpose we choose the Göttingen Spectral Library (GSL; Husser et al. 2013, ∼ 27 k stars). The GSL is a stellar spectral library based on the latest version of the stellar atmospheric parameter code phoenix (Hauschildt et al. 1999). This library offers the following advantages for our study: (i) it samples a wide range of stellar stages, from sub-dwarf up to super giant stars (log g = 0.0 -6.0), and spectral types (T eff = 2300 -12000 K); (ii) it predicts non-solar abundance patterns ([Fe/H] = −4.0 -+1.0 and [α/Fe] = −0.2 -+1.2); (iii) it implements the latest atomic and molecular data, and opacity solutions to predict high-resolution spectra (R ∼ 500 k in λ = 3000 -25000Å, see Husser et al. 2013;Lançon et al. 2021, and references therein). We acknowledge that perhaps two important weaknesses of the GSL are the lack of stars hotter than 12000 K and α-enhanced/depleted atmospheres for stars outside the ranges −3 ≤ [Fe/H] ≤ 0 and 3500 ≤ T eff ≤ 8000 K. For more details on the GSL, we refer the reader to Husser et al. (2013) and references therein.
In an attempt to mitigate some of the abovementioned deficiencies of the GSL, we also adopt the physical properties included in the first data release of MaStar (Yan et al. 2019, ∼ 3 k stars). The MaStar stellar properties presented in Yan et al. (2019, hereafter MaStarDR1) are a compilation of the properties reported by the APOGEE (Majewski et al. 2017), SEGUE (Yanny et al. 2009) and LAMOST (Cui et al. 2012) surveys, all of which are based on different kind of observations and analysis methods (García Pérez et al. 2016;Lee et al. 2008;Xiang et al. 2015, respectively). These methods mostly implement algorithms based on template matching of spectroscopic and or photometric data to other (well-determined) observed or theoretical stellar libraries or a combination thereof. In particular, Lee et al. (2008) also implements an Artificial Neural Network algorithm in part of their analysis to estimate atmospheric stellar properties. We stress that the atmospheric parameters reported in Yan et al. (2019) are intended to aid the target selection of the MaStar stellar library and not to be used on stellar population analysis. Furthermore, the fact that most of these stellar properties are derived from different wavelength ranges, signal-to-noise levels and using different types of analysis methods, we may expect different sources of uncertainties (not present in a theoretical library such as GSL) to propagate during the training stage.

Input from the empirical library the MaNGA Stellar Library: MaStar
In this section we briefly describe describe the aspects of the MaStar library that are relevant to this paper.

Observations
MaNGA is an integral field spectroscopy survey of 10, 000 nearby galaxies (Bundy et al. 2015;Law et al. 2015;Yan et al. 2016;Wake et al. 2017) as part of the SDSS-IV along with other two surveys: eBOSS (Dawson et al. 2016) and APOGEE-2 (Majewski et al. 2017). It uses the Baryon Oscillation Spectroscopic Survey spectrographs (Smee et al. 2013), fiber-fed using different fiber bundle configurations for science targets, standard stars and sky observations (Drory et al. 2015). The wavelength sampling of the spectrographs combined covers ∼ 2600 -10000Å with an average resolution of R = 1800. In order to make observations of different fields during one night, the fiber-plugged plates are stored in housings called cartridges. MaNGA shares six out of nine cartridges with APOGEE-2. This configuration presented the opportunity to piggyback on the APOGEE survey to build MaStar. The observing strategy, target selection and data reduction are detailed in Yan et al. (2019).

Quality control
Before we can analyse the MaStar spectra, we need to ensure the quality of the observations. In this section we explain the cuts we take in order to have the best possible spectra to train the model.
In the MaStar library, the stars have been observed under different atmospheric conditions, using different total integration times and most of them have been observed in several visits (∼ 3 per star on average). In total there are ∼ 68.2 k visits. It is expected that some visits to the same star may have better quality than others. In order to ensure that we keep only the best quality data, we use a series of internal quality flags and estimated parameters provided by MaStarDR1, namely: MJDQUAL: describes the quality of the spectroscopic calibration.

MNGTARG2
: describes the quality of the photometric selection.
RADVEL: the radial velocity. Stars with no measure of this parameter are flagged as bad.
The first two are bit mask flags. MJDQUAL is required not to have bits 1 (good sky subtraction), 4 (good point spread function fitting), 5 (good estimates of the Figure 2. The spatial distribution of the MaStar stars in the cleaned sample (purple) compared to that of the Gaia DR2 (orange). top: the distribution in right ascension (RA) and declination (DEC) shows the in-homogeneous spatial sampling of the MaStar. bottom: the distribution of MaStar in the galactic projections x -y (left) and x -z (right) demonstrates that MaStar extends well beyond the solar neighborhood. Lower panel plots were generated using the Python package https: //pypi.org/project/mw-plot/0.3.0/. extinction), 6 (good estimation of the radial velocity, with scatter lower than 10 km s −1 across multiple exposures), 7 (good calibration after visual inspection), 8 (no emission line in Hα with equivalent width greater than 0.6Å) and 9 (good signal-to-noise per pixel, with median > 50). MNGTARG2 is required not to have bit 15 activated, this means that the photometric selection is reliable. To have good quality distances estimates from Bailer-Jones et al. (2018), we further select only those stars in MaStar also present in Gaia DR2 (Brown et al. 2018). We implement these distance estimates to calculate dust extinction values A V for the MaStar spectra using the 3D dust map from Green et al. (2015) and the extinction curve model from Fitzpatrick (1999) with R V = 3.1. The typical (median) extinction for MaStar is A V ∼ 0.1 mag.
After selecting the best quality data according to the aforementioned flags, our sample comprises ∼ 23 k good quality spectra of unique stars (∼ 98.5% of the original sample). In Fig. 1 we show several examples of the spectra in the MaStar library color-coded according to their T eff as reported in MaStarDR1. As a qualitative measure of how compatible MaStar and GSL spectra are, we also show the closest (in parameter space) GSL spectra in light grey. This comparison shows reasonable compatibility between both libraries. However, the observed differences, specially in the case of the coolest star, can be attributed to a mismatch in abundance parameters and/or the above mentioned limitations of theoretical libraries.
Even though we effectively clean the initial spectra sample from obvious artifacts using the internal flags described above, some spectra may still remain with notable artifacts due to imperfect flagging in the first place. There are several common issues still appreciated in some spectra: emission lines, noisy spectra, bad continuum calibration and missing pixels. Some of those problems could be addressed using external information, while others would remain being part of the limitations of the method.
In order to mitigate possible issues with the spectrophotometric calibration we use the Gaia DR2 (Brown et al. 2018) colour distribution. We select only those stars from the MaStar library that have been catalogued by Gaia and that have a reliable astrometry and photometry (Evans et al. 2018). In addition, we derive the synthetic magnitude in the Gaia photometric system (G bp , G rp and G) from the spectra of the MaStar matched library. Then we select those stars with: (i) realistic blue Gaia color (G bp − G rp ≥ −0.76 mag), hence excluding stars with possible flux calibration issues; (ii) a parallax > 0.05 mas to mitigate uncertain distances estimates; and (iii) a median signal-to-noise ratio S/N ≥ 20 along the entire wavelength range. After cleaning the sample, we keep ∼ 22 k stars with reliable flux calibration and peak near S/N ∼ 100, median ∼ 140 and mean > 200 (e. g., Chen et al. 2020). For the remaining of this study, we use this cleaned version of the MaStar sample, unless otherwise stated.
In Fig. 2 we show the spatial distribution of the MaStar stars, compared to that of Gaia. It is clear that MaStar presents a more incomplete coverage of the MW than Gaia, due to the nature of the former survey. None the less, it is clear that MaStar samples several regions of the Milky Way and is not limited to the solar neighborhood, within the declination limits of the survey.

Pre-processing
In this section we implement the ML algorithms. The unfamiliar reader may refer to § A to briefly get acquainted on the common ML jargon and the notation adopted throughout this study. We encourage the seasoned reader to skip § A altogether.
To solve the problems related to the presence of emission lines and the missing pixels, we implement two separate unsupervised ML algorithms: an outlier detection method and a missing features (pixels) filler. Both of these algorithms are based on the k-nearest neighbors algorithm (KNN, Fix & Hodges 1951). In a nutshell, the KNN algorithm consists on finding the closer k samples (e. g., in an Euclidean sense) to each sample spectrum. In the following we elaborate on the usage of the KNN to remove emission lines and to fill-in missing pixels.
For the emission line problem, we implement the Local Outlier Factor (LOF) algorithm (Breunig et al. 2000), from the scikit-learn Python library (Pedregosa et al. 2011a). The problem is posed as follows: given an observed spectrum f obs = (λ i , f i ) for i = 1, . . . , n, this method looks for under-densities in the locality of each pixel (λ i , f i ). Such locality is defined by the k-nearest neighbors to (λ i , f i ) and the surrounding density is measured using the distance between those neighbors. In this case we define an Euclidean distance between each pair (λ i , f i ) and the rest of the spectrum pixels. A pixel is then flagged as outlier (and turned into a missing pixel) if its local density is smaller than a factor of the density of its k-neighbors. Given the nature of the stellar spectral energy distributions, this algorithm is prone to produce a large fraction of false positives (e. g., absorption lines). In order to regularize the LOF algorithm, we set k = 5 and the expected contamination for emission lines to be 0.4%. This way we improve the accuracy of the emission line detection.
For the missing pixel problem, we implement the KNN "imputer" (filler) algorithm (Troyanskaya et al. 2001). Essentially, given a spectrum observed in the wavelength range f obs = (λ i , f i ) for i = 1, . . . , n with missing values in an arbitrary sub-subset of wavelengths λ missing ∈ {λ m } ⊂ λ obs , this method looks for the KNN spectra in the library with no missing values in λ missing and then computes a distance-weighted average spectrum defined in λ missing to fill-in the missing values.

Training and testing sets
Once we have cleaned the sample and pre-processed the spectra, we select the training and the testing subsets. We arrange the spectra (MaStarDR1+GSL) into a N spec × N pix matrix, so that each row is a spectrum and each column is a wavelength pixel, where N spec ∼ 30 k (MaStarDR1: 2.7 k; GSL: 27 k) and N pix = 6351. In order to build such matrix, we resampled both libraries to ∆λ = 1Å in the range λ = 3650 -10000Å and downgraded the GSL spectral resolution to the MaS-tarDR1 wavelength resolution. The chosen wavelength sampling is arbitrary, but covers most of the original MaStar wavelength range. We further normalize the spectra to a common flux scale defined by f λ /f 5500 , so that only the shape of the spectra will inform the algorithm about the physical properties. We will refer to this N spec × N pix matrix as the features matrix, denoted by X. Correspondingly, each spectrum in the features matrix will be characterized by the set of parameters (T eff , log g, [Fe/H], [α/Fe]), arranged into a N spec × 4 labels matrix, Y. It is the goal of this study to train a model using the features matrix to translate new observations (rows in the features matrix) into its corresponding physical properties.
It is important that the training and the testing subsets are not biased against common types of stars (e. g., dwarfs). This is to ensure that the trained model is as general as possible and can predict stellar properties of any kind of stars. From the stellar spectra libraries discussed in the § 2, only GSL fulfills these requirements, albeit with the intrinsic limitations of a synthetic stellar library. Therefore, we select a random subset comprising 90% (∼ 25 k) of the stars in the GSL to be part of the training set. In order to avoid hampering the estimator during training with stellar spectral lacking the common instrumental and calibration artifacts (e. g., low S/N , emission lines, sky subtraction and flux calibration), we also include as part of the training a random subset of 90% (∼ 2.4 k) of the MaStar DR1 spectra. The training subset comprises ∼ 27 k out of the total of ∼ 30 k stellar spectra with good estimation of the physical parameters. The remaining 10% (∼ 3 k) of the GSL+MaStarDR1 (∼ 2.7 + 0.3 k) stars are devoted to test the parameter estimator after training. Since most of the stars in these subsets have well known physical properties, the testing selection is suitable for the set purposes. In Fig. 3 we show the distribution of the parameters in the GSL+MaStarDR1 sample (grey), and the corresponding segregation by stellar library (GSL: blue, MaStarDR1: red) and by subset (training: solid line, testing: dashed line), both represented by 99% confidence contours. As expected the GSL stellar library follows a nearly uniform distribution across the parameter space, while the MaStarDR1 draws a distribution that resembles the observed trends in stellar populations (e. g. in the solar neighborhood). The fact that the training and testing subsets have similar distributions, regardless of the stellar library, is indicative that the testing procedure will be robust, despite the much smaller size of this subset.
In the next section we elaborate on the algorithm adopted for the parameter estimator, the training and the testing procedures. We will refer to the corresponding subsets as X train , Y train and X test , Y test , respectively.

THE PROPERTIES ESTIMATOR: CoSHA
Most ML algorithms can be mathematically expressed as a functional F(θ, φ) where θ is a vector of parameters that define the trained (fitted) model and φ is the vector of hyper-parameters that define how the model will be trained (e. g., the merit or loss function, optimization algorithm, etc.). Once the hyper-parameters are set, F ≡ F(θ, φ), the algorithm is ready to train a modelF = F (X train , Y train ). The training process to buildF , consists in maximizing (minimizing) the score (loss) function in order to find the optimal set of parameters,θ. Once trained, the fitted model should be able to provide reliable predictions given a set of features from new observations (not seen during the training). In reality, most problems require further exploration of the hyper-parameter space during the training phase in order to ensure robust results. However, in order to save computational time, we adopt an heuristic approach to train an ad-hoc selection of hyper-parameters (see e. g., Géron 2017;Ivezić et al. 2019).
In choosing our preferred ML algorithm, we distinguish between conventional ML (e. g., decision trees, support vector machine) and Deep Learning (e. g., ANNs) algorithms. Although the latter are becoming increasingly frequent in the astronomical community (e. g., Ting et al. 2019;Xiang et al. 2015). One important caveat of ANNs is that they usually require high computational power and/or large training samples to reach a desired level of accuracy without compromising generalisation (e. g., Chollet 2017). There are ways to overcome these limitations when training an ANN, namely: reducing the number of neurons/layers in the ANN, change the architecture of the ANN (e. g., by limiting the number of connections per neurons) and/or constraining model predictions to a smaller label/feature space. Even though these ways may effectively reduce training time required to reach a robust ANN prediction, they impose a trade-off with the generalization of the model (e. g., Ting et al. 2019).
In this study we implement a conventional ML algorithm, which require less training data and computational power, whilst retaining model flexibility and generalization (e. g., Géron 2017). We therefore choose a Gradient Tree Boosting (GTB) algorithm to train a model suitable for predicting the physical properties of the MaStar spectra. The GTB has two interesting characteristics, namely: it is an ensemble method and it is based on decision trees. Given the relatively small training set (27.4 k spectra), a decision tree based method is a convenient option. We choose to implement our code based in the package scikit-learn (Pedregosa et al. 2011b). This allow to keep our code base small, as we only have to implement the data munging and bookkeeping procedures, namely: reading the sample, preprocessing, splitting into training and testing, storing the trained model and storing its predictions. Another important advantage of the scikit-learn of GTB implementation is that it allows for quantile regression, a type of regression in which it is possible to estimate any quantile of the probability distribution of Y conditional on X, P (Y | X). Hence, this type of regression is more general as it is not limited to finding the mean of that distribution,F . In order to train a quantile regression estimator, we change the hyper-parameters of the GTB algorithm accordingly and set to predict the 16th and 84th percentiles (P 16 and P 84 , respectively). In Table 1 we show the hyper-parameters used to train the different estimators. In the following we use the mode estimator to predict the stellar labels, unless otherwise stated.
Moreover, the fact that GTB is an ensemble algorithm means that it is the combination of several individual estimators (in this instance, decision trees). The boosting character of this algorithm comes from the fact that each trained decision tree improves its predecessor. This last trait of the GTB algorithm entails more robust results than single-estimator algorithms. A GTB is therefore a type of ensemble composed by a predefined number of n decision trees, that can be expressed as the summation: where f 0 is the first decision tree trained on the original training subset (X train and Y train ) and the successive f i (i > 0) are the decision trees trained on the original training spectra (X train ) but using as labels the residual between the original labels and the label predicted by the previous estimator, We acknowledge GTB carry some caveats, for instance, since it is built upon the combination of several trained models, the easiness in the interpretation of the model is mitigated, if compared to a decision tree. Still GTB methods are more easy to interpret than ANN (e. g., Chollet 2017). Furthermore, ANNs' flexibility implies that a large hyper-parameter space needs to be defined/constrained: number of layers, number of neurons per layer, number of neuron connections, type of connection response, etc.. From Table 1, the hyper-parameters that we actually change during the development of this work are the loss and alpha. The rest of the parameters were left to their default values according to scikitlearn v0.23.2.
Finally, we use the trained model inF to predict the stellar properties from the entire clean spectral library, to build the matrix Y model . We also use the estimators trained for the percentiles P 16 and P 84 to predict the precision of CoSHA by computing σ Y = (P 84 − P 16 )/2, which under the assumption of a P (Y | X) Gaussian distribution is equivalent to the standard deviation. In the following sections we evaluate the internal reliability of the resulting distributions in Y model in the stellar parameter space.

MODEL EVALUATION
In this section we evaluate the reliability of the method described in the previous sections. We run two types of tests: internal and external. 1 In the internal tests we look for the model accuracy on the testing set. We also look for trends between the residuals of our method and the true values in this subset.
Quantitatively, we measure the accuracy of the model through the residual defined as: where Y model ≡F (X), X is any set of spectral features for which we know the corresponding set of true properties, Y true . We adopt the mean and the standard deviation of these residuals as an estimation of the systematic and random errors, respectively.
In the external tests, on the other hand, we compare the predictions of CoSHA with the previously published results, namely: the MaStar DR1 (MaStarDR1) and APOGEE DR14 (Majewski et al. 2017;Holtzman et al. 2018). We restricted this comparison to those stars covered by both samples. This comparison only accounts for the consistency between the methods being compared, and not the absolute accuracy of the procedure. We estimate the level of consistency (δ other ) through the definition of the discrepancy (not to be confused with the residual above) between the two predictions, as: where Y other is the estimation performed by other authors on the same sample of stars.
Since the MaStarDR1 data set is part of the training/testing subsets, we reckon this is not an independent nor fair comparison. Nonetheless, the consistency between CoSHA predictions and those of MaStarDR1 is still interesting and deserves some exploration.

Fig. Set 4. Precision and accuracy versus S/N
In this section we use the testing subset of stars to measure the reliability of CoSHA. Since a large percentage (∼ 90%) of this testing subset corresponds to theoretical predictions in the GSL, for these stars we can safely assume that we know the true values of the atmospheric properties studied in this work. Hence, we use (2) in the parameter space. The results for the testing set is shown in grey. The residuals for the GSL and MaStarDR1 subsets are shown in blue and red, respectively, like in Fig. 3. The univariate residuals are represented in the diagonal planes (histograms and solid lines). A Gaussian distribution with the intrinsic mean and standard deviation of the residuals is also represented (dashed lines) in the diagonal planes. The contours in the off-diagonal planes enclose 1σ of the probability distribution. We note the intrinsic errors from MaStarDR1 are larger than our CoSHA internal errors and show larger deviation from the Gaussian behaviour. The lack of correlation in almost all projections of the residuals space suggests a striking lack of degeneracies. See text in § 4.1 for details. The complete figure set (9 images) is available in the online journal and includes similar figures of the precision and accuracy versus SN.
the testing subset of GSL stars to quantify the internal accuracy and precision of our method. The rest of the testing subset corresponds to MaStarDR1's estimates and will be used to quantify the consistency between both methods. In Fig. 4 we show the residuals (c. f. Eq. 2) in the parameter space. The grey distributions represent the testing subset, while the GSL and MaS-tarDR1 subsets are represented (as in Fig. 3) in blue and red, respectively. The contours enclose 1σ of the corresponding probability distributions. We reckon that this kind of plot can reveal potential dependencies amongst the physical properties, or the lack thereof. These correlations are commonly known as degeneracies.
In general, all projections in the entire residual space (grey distributions) shows almost no correlations. The accuracy is ∼ −1.4 K in T eff and ∼ 0.004 dex (at most) for the rest of the properties. However the segregation of the testing subset into GSL (solid blue) and MaStarDR1 (solid red) subsets uncovers that their individual accuracies are rather different. The comparison between these distributions (blue and red) shows that our method is both more accurate and precise than MaStarDR1, across the parameter space. We recall though, that GSL residuals represent the real (internal) errors of our method while the residuals in the MaStarDR1 subset combines both CoSHA and MaStarDR1 errors. We can estimate the MaStarDR1 intrinsic residuals as a Gaussian distribution with µ = µ Y19 − µ GSL and σ = σ 2 Y19 − σ 2 GSL (dashed red). Since the Gaussian supposition has the potential to visually hint asymmetries and other non-Gaussian behaviors, we also show, for the sake of completeness, the corresponding Gaussian distribution for the GSL subset (dashed blue), using µ = µ GSL and σ = σ Y19 . The residuals with the GSL shows the accuracy and precision across the parameter space characterized by ∆T eff ∼ −3 ± 48 K, ∆ log g ∼ 0.00 ± 0.20 , ∆[Fe/H] ∼ 0.00 ± 0.13 and ∆[α/Fe] ∼ 0.00 ± 0.09 . On the other hand for MaStarDR1, the intrinsic accuracy and precision are, in general, larger: ∆T eff ∼ 3 ± 240 K, ∆ log g ∼ 0.02 ± 0.38 , ∆[Fe/H] ∼ −0.02 ± 0.24 and ∆[α/Fe] ∼ −0.01 ± 0.08 . Since these residuals and, in particular those quoted for the GSL testing set, are small we can safely rule out CoSHA is over-fitting the training data. Otherwise, these residual distribution would display large biases. In order to investigate how the accuracy behaves as a function of the noise level in the spectrum, we run the following experiment. In the testing subset of GSL spectra (∼ 2.7 k), we add random Gaussian noise at several S/N levels (50, 100, 200, 300 and ∞) and predict in each case the corresponding stellar parameters using CoSHA. In Table 2 we show a summary of our results.
We find that the accuracy is independent of the level of noise of the input spectra. This further supports our conclusion in the last paragraph: CoSHA is not suffering from over-fitting. We complement these results with figures similar to Fig. 4 for the different S/N values adopted.

Precision
We quantify CoSHA precision using the quantile predictions as outlined in § 3. In Fig. 5 we show the map of CoSHA precision in the log g versus T eff plane. The contours represent the density distribution of stars. Clearly the loci of more precise determinations are not consistent with the locus of the highest star density. This seems to indicate that the origin of such (im)precision is not due to number statistics. The fact that most imprecise predictions correspond to temperature boundaries indicates that stars at extreme values of the parameter space are likely to have unreliable determinations. This result proves one important limitation of ML (and arguably any stellar parameter estimation method): the determination of atmospheric properties depends on the existence of a comprehensive set of spectra (either theoretical or observed) with good quality stellar properties spanning an as wide as possible parameter space. From the distributions in Fig. 5 we quote the following typical (median) precision, for each parameter in the cleaned ∼ 22 k (and in the MaStarDR1) sample: σ T eff ∼ 179(148) K, σ log g ∼ 0.42(0.42), σ [Fe/H] ∼ 0.27(0.21) and σ [α/Fe] ∼ 0.14(0.09). When compared this numbers (in parenthesis) with those obtained from the S/N simulations described above (S/N = ∞), we find that CoSHA is actually underestimating its precision. On the other hand, when compared to the case S/N = 100 (closer to the typical value for MaStar spectra) both estimates of the precision are closer to agreement, with CoSHA still underestimating the precision for T eff and log g but notably overestimating for the abundance parameters. Finally, we find that the determination of the precision by CoSHA for the MaS-tarDR1 testing subset is generally congruent, with only T eff having an overestimated precision by ∼ 60 %.

Consistency with MaStarDR1
Fig. Set 6. Consistency with GSL Fig. 6 shows the discrepancy in T eff , log g, [Fe/H] and [α/Fe], as defined in Eq. 3, between the values estimated in this study and those reported in MaStarDR1. We only compare those stars for which MaStarDR1 made estimations in our clean sample (∼ 3 k stars). It is clear from these comparisons that T eff is the most robust property (i. e. less independent on the methodology), having the best consistency (< 10% discrepancy)   across the whole range. Both the mean and the standard deviation in the marginal distributions are in agreement within ∼ 2 K and ∼ 22 K for T eff , and ∼ 0.03 and ∼ 0.09 dex for the rest of the properties, respectively. Since most of these stars belong to the training set (∼ 90%), we expect an overall high consistency in this particular comparison. However, it is interesting to note that some trends may appear given the different methodologies adopted by MaStarDR1 and in this study. We note that this discrepancies are comparable to the MaStarDR1 instrinsic errors estimated in the previous section. This indicates that even though we used the MaStarDR1 spectra to train CoSHA, we are able to disentangle the typical (intrinsic) error of both methods above. Furthermore, the observed discrepancies in Fig. 6 are likely to be dominated by MaStarDR1 errors. T eff and log g discrepancies show almost no deviation from the Gaussian distribution. However, [Fe/H] and [α/Fe] display clear trends with respect to the MaS-tarDR1 estimates. In the particular case of the [α/Fe], the distribution of discrepancies displays a negative slope with respect to the values reported by MaStarDR1: the higher the [α/Fe], the more inconsistent becomes our estimate with respect to MaStarDR1 estimate, in the negative sense. This inconsistency may originate in either method or (most likely) in both. Therefore we will need to compare to external estimates of these parameters in order to find clues on the origin of this trend.

Consistency with APOGEE
The APOGEE survey offers a great validation tool since it is known to have accurate distributions of T eff , log g and abundances of several chemical species, including α-elements and Fe (e. g., Jönsson et al. 2018). The APOGEE DR14 reports T eff , log g, [Fe/H] and [α/Fe] values for most of the stars in the survey using two different methods: ASPCAP (García Pérez et al. 2016) and the CANNON (Ness et al. 2015). In the following we compare only with the original method developed for the APOGEE: ASPCAP, but see the supplementary material. Since the target selection for the MaStar survey results from the combination of piggybacking on the APOGEE plates and cherry-picking to improve the sampling of the parameter space (specially in the T eff ), some stars in our clean sample of MaStar belong to the APOGEE survey, by construction. However, the sample cleaning described in § 2 and the match with Gaia to retrieve distances, cut the matching subset between MaStar and APOGEE down to just ∼ 400 stars. In Fig. 7 we show the comparison between the physical properties derived by CoSHA and those published by APOGEE for those stars in common in both surveys.
As in the previous sections, we find that the T eff determinations are the most consistent results, with most of the stars within the boundary of a 10% discrepancy (c. f. T eff inset plot). Overall, we report a discrepancy with APOGEE in T eff , in the mean and the standard deviation of the marginal distributions of ∼ −45 and ∼ 101 K, respectively. The distribution of log g show the larger scatter around the perfect consistency line. However the overall shape of the marginal distributions are rather consistent. These distributions disagree in their mean and standard deviation by at most ∼ 0.18 and ∼ 0. The discrepancies in T eff and log g show no trend with respect to the APOGEE estimate. Notwithstanding, both T eff and log g show in fact a systematic discrepancy, whereby CoSHA seems to over-predict cooler and more dwarf stars than APOGEE. We expect the APOGEE error estimations to account for some of the observed discrepancy (∆T eff ∼ 79 K and ∆ log g ∼ 0.05 , respectively). However, the remaining discrepancy is likely to have an origin in CoSHA.
[Fe/H] shows a mild discrepancy systematic and almost no trend with the values reported by APOGEE. [α/Fe], on the other hand, shows a trend between its discrepancy and the values reported for APOGEE: the higher the APOGEE estimates, the larger the discrepancy in the negative sense. It is worth mentioning that a similar trend was observed in the discrepancies with MaStarDR1. It is encouraging though, that the mean and standard deviation of the distribution of discrepancies are overall rather small.
The observed trends in the abundance parameters when comparing CoSHA to APOGEE and MaStarDR1 (largely based on APOGEE) have been reported before by other authors using different methods (e. g., Ting et al. 2019;Nandakumar et al. 2020). Ting et al. (2019) in their Figs. 11 show similar trends for O, Mn, Ca, Ti. Nandakumar et al. (2020) in their Fig. 1 show similar trends for [α/Fe] (based on cannon). Interestingly, when training (cannon) using the APOGEE labels (their Fig. 2) those trends do not appear or are, at least, mitigated. None of these groups of author explain these trends in detail. However the fact that such trends are stronger when training with labels different than those predicted for APOGEE may indicate a mismatch between APOGEE labels and the spectra predicted for those labels using generative models as The Payne and cannon. As a matter of fact, theoretical recipes for stellar atmospheres predictions are known to make ad hoc Figure 7. Similar to Fig. 6 but comparing our estimates to those derived by the APOGEE spectral analysis pipeline (ASPCAP). The systematic discrepancies with APOGEE are larger than those found with MaStarDR1. In summary, CoSHA predicts systematically cooler and more dwarf stars and marginally Fe-poorer than APOGEE, with essentially no bias in the [α/Fe] estimates. See text in § 4.4 for details.
assumptions that may turn into inconsistencies in the output stellar spectrum. We recall that CoSHA (GSL), APOGEE/MaStar (ATLAS9, Mészáros et al. 2012) are based on different theoretical libraries to make physical property predictions. Lançon et al. (2021), for example, compared GSL with X-Shooter stellar library and found several discrepancies across the HR diagram. They argue that such inconsistencies may originate from the use of different theoretical prescriptions.

a partial volume correction for MaStar
The MaStar library is intended to provide a homogeneous and complete coverage of the parameter space in T eff , log g, [Fe/H], and [α/Fe]. Hence, by construction, the corresponding physical properties are not representative of the distribution of stellar populations in the Milky Way, i. e. intrinsically rare stars may be overrepresented. In order to reliably compare the distributions of these physical properties with those already known from other (volume-complete) surveys, we first need to make a volume correction. In principle, we would need a sample of the stars representative of all the Milky Way stellar populations. However, to the best of our knowledge such data set does not exist. Our best choice to date is to use the Gaia survey DR2 (Brown et al. 2018), containing well over 60 M unique stars and complete down to 12 mag in the G band. Since the Gaia survey is not volume-complete and the MaStar sample is not representative of all plausible stars, this is a partial volume correction (Bailer-Jones et al. 2018). Therefore, the volume corrected MaStar sample is only representative of the stellar populations (as seen in the color-magnitude distribution) sampled by the Gaia survey (Evans et al. 2018;Arenou et al. 2018).
We compute the volume correction using the distribution of stars in the color-magnitude diagram (CMD). Mathematically, we express the volume as: where the PDF MaStar and PDF Gaia are the corresponding probability density distributions of observing stars in the CMD for the corresponding catalog. We implement a KDE to estimate these PDFs from the MaStar and Gaia samples. We download the Gaia source catalogs from the archives 2 and applied the cuts documented in Arenou et al. (2018, § 2) in order to build the H-R diagram. To correct the CMD from the MW extinction effects, we followed Alzate et al. (2021, and references 2 http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia source/csv/. therein). The absolute magnitude, M G was calculated using the relation in Arenou et al. (2018) from the parallax estimation. After applying these cuts we have a sample of ∼ 66 M stars from Gaia. In Fig. 8 we show the original distribution in the MaStar (left), the resulting Gaia distribution (middle) and the corrected MaStar distribution (right).

The MaStar properties distributions
In the previous sections we explored the MaStar distribution of the stellar properties through its marginal distributions (c. f. Figs. 6 and 7). In Fig. 9 we introduce the final joint distribution of the stellar properties for the cleaned version of the (whole; ∼ 22 k) MaStar library. The light purple distributions represent the raw estimates from our model. Since by construction these distributions are not derived for a representative sample (see Fig. 8), this means that we cannot confidently validate our results by comparing with the currently known trends in the Milky Way (MW) provided by APOGEE (e. g., Hayden et al. 2015). In order to solve for this issue, we implement the (partial) volume correction described in § 5.1.
In Fig. 9 we show the volume corrected sample of MaStar parameters in dark purple. Interestingly, T eff becomes flatter after correction, thus the peak around T eff ∼ 6000 K becomes less pronounced in favor of cooler stars. In the log g distribution dwarf stars (log g ∼ 5 ) become more relevant. The [Fe/H] distribution in the raw MaStar distributions shows an oversampling of low iron abundance ([Fe/H] −1 ) that is redistributed towards solar abundance after volume correction. [α/Fe] is perhaps the most unchanged distribution with a slight redistribution of [α/Fe] ∼ 0.5 towards solar values. It is worth noticing that well-known distributions such as the log g versus T eff and the [α/Fe] versus [Fe/H] seem to follow the expected trends: relatively cool dwarf (main sequence) stars dominate the distributions, with abundances biased towards solar values (e. g., Hayden et al. 2020). More interesting would be to see if we can recover the spatial trends in the abundances subspace.

Spatial distributions
We set out to investigate the spatial trends in the [α/Fe] versus [Fe/H] plane resulting from CoSHA (on MaStar spectra) and compare to APOGEE. We match the APOGEE sample (∼ 250 k stars) with Gaia DR2 in order to have distances of each star in APOGEE. Then we match the resulting APOGEE subset with estimated distances (∼ 64 k) to the MaStar sample, resulting in ∼ 400 stars. In Fig. 10 we show the spatial distribution in the z direction of the galactic plane, for the stars in MaStar (purple) and the APOGEE (grey) surveys. In the top panels we show the matched subset of stars between both surveys (APOGEE and MaStar). In all the panels the expected trends appear whereby stars become more Fe-poor and α-enhanced towards the thick disk (higher |z|), and they present abundances more similar to the solar one towards the thin disk (e. g., Hayden et al. 2015;Nandakumar et al. 2020). In the bottom panels we compare the full samples to see if the aforementioned trends remain in MaStar. Although the scatter is clearly larger in this case, the trends are still significant. It is interesting that we retrieve these trends using CoSHA since, in principle, the training process is unaware of the spatial distribution of the stars. This result reinforces the robustness of our CoSHA, already drawn from the internal and external tests (c. f., § 4).

SUMMARY AND CONCLUSIONS
We present the novel Code for Stellar properties Heuristic Assignment (CoSHA) to estimate atmospheric parameters from the MaStar stellar optical spectral energy distributions, namely: T eff , log g, [Fe/H] and [α/Fe]. CoSHA implements a conventional machine learning (ML) approach named Gradient Tree Boosting (GTB) which consists in training a predefined number of decision trees sequentially, each improving its predecessor. Because the code base of CoSHA is small (once the training sample is cleaned), the main strength of our approach is that it is easier implement, interpret its results (as is based in decision trees) and easier to scale than more commonly used approaches such as ANN, as there are less hyper-parameters to tune (e. g. Géron 2017;Chollet 2017;Ivezić et al. 2019). Once trained, CoSHA only requires the input spectra to be in the same spectral/parameter space as the training sample to predict reliable atmospheric properties.
For the training and testing samples we used a combination of empirical (MaStarDR1) and theoretical (GSL) libraries. Both have advantages and disadvantages that were explained (see § 2). Despite of those we were able to use train a model that yields reliable results. Based on the internal tests (i. e., comparing the input with the output) using the testing subset, an overall performance of CoSHA of: ∆T eff ∼ −1.4 ± 90.0 K, ∆ log g ∼ 0.002 ± 0.246, ∆[Fe/H] ∼ −0.004 ± 0.174 and [α/Fe] ∼ 0.004 ± 0.088. Moreover, the performance of CoSHA on the segregated MaStarDR1 and GSL subsets showed a systematically more imprecise estimate of the physical properties on the former, even after the internal errors from CoSHA were removed. The fact that the errors on the GSL properties are smaller than in the MaStarDR1 points towards an origin in the MaStarDR1 subset itself. We reckon several possibilities: instrumental systematics and random noise and and/or internal errors introduced by the different methods adopted by MaStarDR1. None of these issues are present in the GSL subset. When introducing random noise in GSL spectra, we are able to reconcile these differences. The main results can be summarized as follows: • We found no statistically significant difference on the distributions of the predicted parameters between the values derived by CoSHA and those published by MaStarDR1. The systematic and random discrepancy between such distributions are: δT eff ∼ 10 ± 264 K, δ log g ∼ 0. • The comparison between CoSHA and APOGEE-ASPCAP estimates on a subset of (∼ 400) stars in common between both surveys, revealed systematic discrepancies comparable to those in the previous item: δT eff ∼ −37 ± 113 K, δ log g ∼ 0.17 ± 0.35, δ[Fe/H] ∼ −0.02 ± 0.17 and δ[α/Fe] ∼ 0.01 ± 0.08. The errors reported by ASPCAP can only account for a fraction of the total systematic discrepancy. We interpreted this to mean that CoSHA systematically predicts slightly cooler and more giant stars than APOGEE. The trend found for [α/Fe] in the MaS-tarDR1 discrepancy remains present when comparing to APOGEE, although with a less steep slope. Since similar trends have been found by other authors when comparing with APOGEE and MaStarDR1 is largely based on APOGEE estimates, we interpreted this finding as an evidence that this trend is most likely to be originated in a mismatch in the prescriptions used to label APOGEE stars with those used in other studies (including the present).
• We predicted the atmospheric properties of the entire (cleaned) MaStar stellar spectra using CoSHA and characterized the resulting distribution in the parameter space: the most common stars in the library seem to have around T eff ∼ 6000 K (similar to the Sun's) and being close to the turn-off point. The iron abundance is slightly sub-solar and the α-elements abundance relative to iron is slightly above the solar value. This result highlights a deficit of thin disk stars in MaStar, according to CoSHA predictions. The volume-corrected distributions support these conclusions.
• In summary we have assigned parameters to the MaStar stellar library using a simple heuristic, nonexhaustive, machine learning approach to predict the atmospheric parameters T eff , log g, [Fe/H] and [α/Fe] from the optical spectra of ∼ 22 k unique stars. Our method, dubbed CoSHA, not only allowed to expand the state-of-the-art empirical libraries in size, but also in dynamical range of the parameter space. The robustness of CoSHA predictions is clear since without any information about the spatial distribution of the physical properties of the stars in the library, it reproduces the known trends, at least to a qualitative level. The version of MaStar presented in this work will grow our ability to analyze resolved and unresolved stellar populations with a precision without precedents.  (Oliphant 2006), scipy (Virtanen et al. 2020), matplotlib (Hunter 2007), seaborn(Waskometal.2017),scikit-learn(Pedregosaetal. 2011b), astroML (Vanderplas et al. 2012), astropy (Price-Whelanetal.2018),astroquery(Ginsburgetal.2019),dustextinction (https://pypi.org/project/dust-extinction/), dustmaps (Green 2018) APPENDIX

A. MACHINE LEARNING: A BRIEF INTRODUCTION
Machine Learning (ML) can be comprised into a series of statistic and calculus algorithms combined in order to uncover patterns in a data set, without making strong assumptions on the shape of those patterns. The process of uncovering such structures in a given data set is the so called training process and its result is the model itself, which can then used to make predictions on new observations. In broad terms, ML algorithms can be classified, regarding the training process, into supervised, semi-supervised, unsupervised and reinforcement learning. In this study we implement the two most common ones: supervised and unsupervised algorithms. We expand on those in the following.
Supervised learning : The main goal of these algorithms is to reveal the shape of the relationship between a given data set arranged in a matrix X and a set of variables arranged in matrix Y, such that for each record (sample) in X corresponds a record in Y. The supervised character of these algorithms comes from the fact that the mentioned relationship is learned from a controlled sample, for which the target variable, Y is well-known. Examples of these algorithms are classifications and regressions.
Unsupervised learning : In this instance no knowledge is required about the variables Y, instead the learning process consists in finding patterns among the variables in X. Examples of these algorithms are clustering.

A.1. Precision for GSL
In Fig. 11 we show the precision of the recovered parameters for GSL using CoSHA. As expected, the best determination of T eff and [Fe/H] occurs in the loci of high density in this plane, becoming worse at extreme values of temperature. Interestingly, the log g seems to follow a similar pattern than T eff , albeit with a slight improvement towards higher temperatures. The observed pattern for T eff and [Fe/H] is reversed for [α/Fe] with the best determinations of this parameter being at extreme values of temperature and slightly worsening towards the higher density loci. We remark that, albeit different behaviours in this plot if compared to 5, the level of precision in all parameters remain.

B. COMPARISON WITH OTHER STELLAR LIBRARIES
In this section we explore to what extend the MaStar library as analized by CoSHA improves the sampling of the parameter space upon previous empirical libraries. This is not meant to be an exhaustive exploration, but merely a comparison with widely used stellar libraries. For this purpose we choose the IndoUS (Valdes et al. 2004) and MILES (Sánchez-Blázquez et al. 2006;Cenarro et al. 2007) stellar libraries. Since we both these libraries are freely available to download, we analyze the corresponding spectra using CoSHA and then compared the resulting parameter distribution with the one distributed with those libraries. There is a caveat though: the wavelength range coverage. As a matter of fact, MaStar, MILES and IndoUS do not have the same coverage of the optical wavelength range, nor the same sampling and resolution. We use the same pre-processing procedure described in § 2.2.3. We fill in the missing pixels in MILES (λ ∼ 7, 500 -10, 000Å) and in IndoUS (λ ∼ 9, 500 -10, 000Å) using as a reference the MaStar spectra. We compared the performance of CoSHA with and without filling in the missing pixels in MILES and IndoUS and found a considerable improvement in the parameter space after extending the spectral range in those libraries. Another difference is that both this libraries have no publicly available [α/Fe] estimate (however see Knowles et al. 2021;Coelho et al. 2020, in the case of MILES), hence we limit our consistency tests to T eff , log g and [Fe/H] only.
In Fig. 12 we show the parameter space coverage of the MILES, IndoUS and MaStar libraries, for comparison purposes. MILES, IndoUS and MaStar lowest contour enclose 75% of the density distribution. For completeness we show the [α/Fe] distributions of MILES and IndoUS computed using CoSHA (dashed distributions). Clearly MILES and IndoUS span a wider range in T eff , reaching to ∼ 30, 000 K as opposed to 12, 000 K for MaStar. In log g all libraries have similar coverage, with the MaStar extending the limits only marginally. In [Fe/H] MaStar extends the lower end of the distribution, reaching down to ∼ −4 dex as opposed to ∼ −3 dex for MILES and IndoUS. We recall that the limitations in the MaStar parameter coverage is likely to have its origin in the training set and can be lifted by including hotter stars. Albeit such weaknesses, the version of the MaStar presented throughout this study has an important advantage: since MaStar has over one order of magnitude more stars than these stellar libraries, it provides the best sampling of the parameter to date by any empirical library.