A Self-consistent Data-driven Model for Determining Stellar Parameters from Optical and Near-infrared Spectra

Data-driven models, which apply machine learning to infer physical properties from large quantities of data, have become increasingly important for extracting stellar properties from spectra. In general, these methods have been applied to data in one wavelength regime or another. For example, APOGEE Net has been applied to near-IR spectra from the Sloan Digital Sky Survey (SDSS)–V APOGEE survey to predict stellar parameters (T eff, log g, and [Fe/H]) for all stars with T eff from 3000 to 50,000 K, including pre-main-sequence stars, OB stars, main-sequence dwarfs, and red giants. The increasing number of large surveys across multiple wavelength regimes provides the opportunity to improve data-driven models through learning from multiple data sets at once. In SDSS-V, a number of spectra of stars will be observed not just with APOGEE in the near-IR, but also with BOSS in the optical regime. Here, we aim to develop a complementary model, BOSS Net, that will replicate the performance of APOGEE Net in these optical data through label transfer. We further improve the model by extending it to brown dwarfs, as well as white dwarfs, resulting in a comprehensive coverage between 1700 < T eff < 100,000 K and 0 < logg < 10, to ensure BOSS Net can reliably measure parameters of most of the commonly observed objects within this parameter space. We also update APOGEE Net to achieve a comparable performance in the near-IR regime. The resulting models provide a robust tool for measuring stellar evolutionary states, and, in turn, enable characterization of the star-forming history of the Galaxy.


Introduction
The Sloan Digital Sky Survey in its fifth iteration (SDSS-V) is aiming to obtain spectra of several million stars across the Galaxy covering a wide range of ages and masses (Kollmeier et al. 2017).This necessitates efficiently and homogeneously deriving their stellar properties to enable subsequent analyses.
Although a number of pipelines exist to derive parameters of stars, usually through comparing spectra to theoretical templates (e.g., García Pérez et al. 2016), such approaches usually have significant limitations.These templates do not always accurately describe the data, and as a result they produce a number of systematic features that complicate interpretation of the data.This affects certain types of stars to a greater extent than others, as models cannot always incorporate complex features of, e.g., late-type stars in full (Cottaar et al. 2014;Kounkel et al. 2018).More boutique types of data processing pipelines also exist, but they are most efficient when focusing on a narrow parameter space (e.g., Souto et al. 2022).
Data-driven pipelines are an alternative approach.They make it possible to generalize across several boutique solutions for stars of different stellar types in order to create a more selfconsistent solution that may improve on the original.
A number of data-driven pipelines have been developed for SDSS data products, each with a somewhat different approach, incrementally improving on its predecessor through expanding the resulting parameter space.The Cannon (Ness et al. 2015) primarily focused its efforts on red giants.The Payne (Ting et al. 2019) has improved processing of T eff , g log , and abundances of the solar-type main-sequence stars.APOGEE Net has incorporated pre-main-sequence stars and K and early-M dwarfs (Olney et al. 2020), as well as OB stars (Sprague et al. 2022), into the mix as well.However, the bulk of these data has been specifically developed for the spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE), which is the instrument that has conducted the bulk of the observations of stellar objects in the previous iterations of the survey.
In addition to APOGEE, SDSS also utilizes a second spectrograph, the Baryon Oscillation Spectroscopic Survey (BOSS).Previously, BOSS has primarily been used to observe extragalactic objects, with only a sparse stellar program (e.g., Yan et al. 2019;Imig et al. 2022).In SDSS-V, the scope of stellar observations with BOSS has been significantly expanded (Almeida et al. 2023): in the first year of operations alone it has increased the number of stars for which it has obtained a spectrum by an order of magnitude compared to the entirety of SDSS-IV.In some cases, both the APOGEE and BOSS spectra can be obtained for the same stars, thus making it beneficial to produce a self-consistent solution for both instruments.
In this paper we present a data-driven pipeline, BOSS Net, that takes advantage of previous efforts to measure stellar parameters across different surveys in order to create a model for characterization of T eff , g log , and [Fe/H] in optical (BOSS and LAMOST) spectra.We also provide an update to APOGEE Net to take advantage of newly available training sets.In Section 2, we describe the data used and the manner in which the labels have been derived for training the pipeline.In Section 3, we describe the neural network model.In Section 4, we present the resulting parameters, and discuss them in Section 5. We conclude in Section 6.

Spectra
BOSS is an optical spectrograph covering a wavelength range of 3622-10354 Å with the resolution of R ∼ 1800 (Smee et al. 2013).APOGEE is a near-IR spectrograph covering a range of 1.51-1.7 μm with R ∼ 22,500 (Wilson et al. 2010;Majewski et al. 2017;Wilson et al. 2019).Both of them are installed at the Apache Point Observatory 2.5 m telescope (APO; Gunn et al. 2006;Blanton et al. 2017), which is capable of obtaining up to 500 BOSS and 300 APOGEE spectra simultaneously in a given field (Pogge et al. 2020).A similar setup has also been mounted at the Las Campanas Observatory DuPont 2.5 m telescope (LCO; Bowen & Vaughan 1973); combined they offer a full view of the entire sky.APO has the field of view of 3°with the fiber diameter of 2″, while LCO has the field of view of 2°with the fiber diameter of 1 3. BOSS can position its fibers using a robotic positioner, enabling rapid reconfiguration, and it operates simultaneously with the APOGEE spectrograph.
To date, BOSS has obtained more than 500,000 stellar spectra of >300,000 objects, rapidly increasing their census by the night.Despite this, so far there is not a significant overlap between BOSS and APOGEE observations.In part, this is due to initial limitations of the bright limit of BOSS, being able to only observe stars fainter than G > 13 mag.Strategies for overcoming this limitation are currently in place, which would allow to have more spectra in common between the two instruments.In part, however, this can also be attributed to the targeting strategy: as BOSS can observe fainter stars, it includes many targets that are inaccessible to APOGEE in the first place, including white and brown dwarfs.
The limited overlap makes it difficult to transfer labels for T eff , g log , and [Fe/H] from APOGEE to BOSS directly.However, it can be done through an intermediate step.
The Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) is a spectrograph operated by the National Astronomical Observatories, Chinese Academy of Sciences.It has R ∼ 1800, covering the wavelength range of 3700-9000 Å, capable of obtaining up to 4000 spectra in a single exposure (Yan et al. 2022).Having been operated for over a decade, it has obtained spectra of 10 million stars to date, and it has hundreds of thousands of stars in common with APOGEE, making it possible to transfer the labels.In addition, a number of pipelines have been developed for LAMOST specifically to characterize different corners of the parameter space (e.g., Lee et al. 2015;Ho et al. 2017;Du et al. 2021), although, so far there has not been a comprehensive pipeline that is able to take advantage of all of its data in a selfconsistent manner.
In many respects, with the exception of the wavelength coverage, LAMOST is a very similar instrument to BOSS, down to the reduction pipelines utilized by the two surveys and the resulting data architecture.As such, a data processing pipeline can be built to operate on spectra of both instruments.This enables different sets of labels both for BOSS and LAMOST spectra to be used in a complementary manner in constructing the necessary model for their characterization.
It should be noted that despite the similarity between the two instruments, there is some difference between the signal-tonoise ratios (SNRs) of these two respective data sets.Since transitioning to the fiber positioner system, SDSS has limited the exposure time to only 15 min.BOSS has also been unable to observe stars brighter than ∼G RP > 12 mag due to the saturation point of the instrument.As such, very few sources have SNR >30.By contrast, SNR >100 is not uncommon in LAMOST.

Stellar Parameters
To construct the training set of sources with previously measured T eff , g log , and [Fe/H], we utilized a variety of approaches.Here, we describe the data sets used for training BOSS Net.The comparison between these data sets when multiple labels for the same stars are available is discussed in Appendix A. Updates to the APOGEE Net training set are presented in Appendix B.

FGK stars
A number of red giants and solar-type main-sequence stars observed by APOGEE have previously had their parameters estimated using the Payne (Ting et al. 2019), which is a neural net trained on synthetic spectra.This catalog includes 1676 stars observed by BOSS and 73,403 stars observed by LAMOST.It is complemented by the sample produced by APOGEE Net (Olney et al. 2020;Sprague et al. 2022), which in part has been built on the Payne, expanding it into a greater range of the parameter spaces.The Payne has processed spectra only through SDSS DR14 (Abolfathi et al. 2018), while APOGEE Net includes all the spectra released through SDSS DR17 (Abdurro'uf et al. 2022); as such, the latter includes a greater number of recently observed sources.APOGEE Net provides parameters to an additional 441 stars observed by BOSS, and 92,016 stars observed by LAMOST that, alongside the Payne parameters, we adopt for the training set.
However, while sources previously observed and characterized by APOGEE offer a good starting place for the initial model, it does have several blind spots.Training a model without filling in these gaps biases the resulting predictions.Recently released Gaia DR3 includes estimates of T eff , g log , and [Fe/H] from its spectra (Fouesneau et al. 2023).Crossmatching against this catalog shows that sources in SDSS-IV APOGEE have a systematic deficit in sources with g log > 4.4 at T eff ∼ 6000 K in comparison to a number of sources observed in SDSS-V due to the previous targeting strategy of the survey.
Because of this, we adopt Gaia-derived parameters for stars with 5500 K < T eff < 7500 K, excluding those found on the red giant branch with T eff < 6300 K and g log < 3.8.We also exclude all of the sources targeted by the APOGEE and BOSS Young Star Survey (ABYSS; Kounkel et al. 2023b), as Gaia does not accurately derive their spectroscopic parameters.
We also adopt Gaia parameters for sources with 3800 K < T eff < 6000 K and 3.8 < g log < 5, limiting the sample to the programs within SDSS-V that have specifically targeted evolved stars with a cleanly defined main sequence.Combined, parameters for 57,763 LAMOST and 5903 BOSS spectra were obtained from Gaia; the number of sources was limited so as to prevent this parameter space from overwhelming the rest of the sample.
To further improve the [Fe/H] < -1 sample, we have added sources in the halo that were observed in SDSS-V, the spectra of which have been processed with MINESweeper (Cargile et al. 2020;V. Chandra 2024, in preparation).We utilized the internal mwmhalo_clean_rcat_V0.07_MSGversion of the catalog.This produced labels of 19,160 BOSS spectra, and they were cross-matched with LAMOST to yield an additional 1949 spectra.

Pre-main-sequence Stars
One of the areas where APOGEE Net has significantly improved over the Payne is in its characterization of pre-mainsequence stars.In particular, the resulting g log values are sensitive to the age of a young star, enabling it to be used independently of photometry in characterizing the star-forming history of a given population (e.g., Kounkel et al. 2022).Nonetheless, the absolute calibration is imperfect: the shape of the resulting observed isochrones runs parallel to the main sequence, which is inconsistent with the theoretical isochrones or the mass and radius measurements from the young eclipsing binaries.In part, this is due to the limited sample of young stars that have been available for training, as well as imprecise initial labels.
We improve on these efforts.In the initial iteration training BOSS Net we adopt APOGEE Net solutions for the young stars that were in common, and we then apply this model on BOSS and LAMOST data.The resulting T eff is well calibrated to the older stars, and it appears to match well the properties of the pre-main-sequence stars (Kounkel et al. 2023a).We thus adopt the resulting T eff for the low-mass sources targeted by ABYSS with SDSS-V (Kounkel et al. 2023b), but we renormalize other parameters.In particular, we estimate the ages of these stars using Sagitta (McBride et al. 2021).We interpolate the combination of ages and T eff values against MIST isochrones (Choi et al. 2016) in order to estimate g log .The resulting T eff and g log have then been used in subsequent iterations of the BOSS Net model.This amounts to 2808 young stars in LAMOST and 1948 stars in BOSS.
We also improve on the [Fe/H] determination for these stars.Previously, APOGEE Net has adopted solar metallicity for all pre-main-sequence stars, as its training set has included only a handful of nearby star-forming regions that all appear to be chemically homogeneous: star-forming regions that are spread out over ∼500 pc and even more evolved young clusters such as the Pleiades all have [Fe/H] to be almost precisely solar (Soderblom et al. 2009;Spina et al. 2017;Kos et al. 2021).With the inclusion of a larger number of populations across the Galaxy, this approximation is no longer appropriate.Additionally, as the special treatment of the pre-main-sequence stars was initially limited to the low-mass stars, this created a gradient in [Fe/H] as a function of T eff (Román-Zúñiga et al.

2023).
There is a significant [Fe/H] gradient in the Galaxy as a function of radius R, reflective of the inside-out star-forming history (e.g., Pilkington et al. 2012).Young stars are expected to have [Fe/H] values most closely correlated with this gradient, as they did not have sufficient time to migrate from their birth sites and thus act as excellent tracers of the chemical composition of the gas in which they formed.Thus, we adopt the relationship of from Hayden et al. (2015).At R = 8 kpc, this does reproduce the solar [Fe/H] of the nearby young populations, and the gradient is sufficiently shallow that at larger distances the uncertainties in parallax should have a negligible effect.We adopt the resulting [Fe/H] for all ABYSS targets regardless of their T eff ; this was done for 2068 stars in BOSS and 4045 stars in LAMOST.

OBA Stars
APOGEE Net has previously been enabled to estimate spectroscopic parameters of OBA stars.While the resulting T eff values were reliable and g log did show some ability to differentiate between dwarfs and giants, it struggled to fully populate the entire g log space that is expected by the theoretical models.This is in part due to the previous targeting strategy of the survey and the lower sensitivity to g log of near-IR (NIR) spectra, as well as imprecise labels-typically obtained through interpolating the spectral type and luminosity class to T eff and g log .Furthermore, no [Fe/H] information was available.Although in the absence of other labels we adopt APOGEE Net parameters, as well as parameters derived through a similar interpolation from the spectral type, it is necessary to compensate for this bias.
Since then, however, there has been a significant increase in the number of high-mass stars with accurately derived properties.In particular, HotPayne (Xiang et al. 2022) has provided T eff , g log , and abundances for a number of stars in LAMOST spectra.We adopt these parameters for sources in LAMOST spectra that reported a precision in T eff of <4%, and precision in g log of <0.2 dex.Since we began this work, there has also been a separate release of the parameters of hot stars in SDSS-V data: zeta Payne (Straumit et al. 2022).We did not include these labels, as the parameter distribution showed a significantly larger number of systematic features compared to HotPayne.
The sample in HotPayne has a very sharp edge at 7000 K-it may include sources that are somewhat cooler that got aliased toward ∼7000 K due to the edge effects.We cross-matched these sources with Gaia DR3 parameters and have overwritten the parameters for the subset where Gaia has reported T eff between 6000 K and 7000 K.
Since the distribution of T eff in the full training set does not have a smooth distribution due to the inhomogeneity of the catalogs, we randomly downsampled the remaining hot stars very close to the boundary so that the transition between the sources in HotPayne and cool stars is more continuous.In total, our training set of these sources consists of 128,430 LAMOST and 663 BOSS spectra.

Low-mass Stars and Brown Dwarfs
APOGEE Net is unable to provide stellar parameters for stars cooler than 3000 K, as, at the time it was initially developed, APOGEE did not generally obtain spectra of such objects.In recent years, APOGEE has begun to observe these stars more routinely, and they are also common in BOSS spectra due to its higher sensitivity.
Since the goal of this project is to create a pipeline that is capable of running on all stellar optical spectra to provide a homogeneous set of parameters without any unintended edge effects, we need to account for these cool stars.Unfortunately, at the moment of assembling the labels, there have been very few reliable stellar parameters for these stars.Instead, to incorporate them into the model, we rely on photometric relations to derive their parameters (Figure 1).Specifically, we examine the properties of cool dwarfs from Cifuentes et al.
(2020) that were measured from the CARMENES spectra.We obtain an interpolation of: that is valid for stars with (G RP − K S ) > 2.3 mag; the scatter in the fit relative to the available data is 85 K in T eff and 0.11 K in g log .Unfortunately, no [Fe/H] information can be inferred from this sample.
We adopt parameters from this interpolation for 2853 BOSS and 9934 LAMOST spectra that satisfy (G RP − K S ) > 2.3 mag, p -+ H 5 log 1000 5 > 6 mag, and π > 10 mas (for LAMOST), or were targeted as a part of the Solar Neighborhood Census (for BOSS) to select nearby faint and red stars that are most likely to be cool dwarfs.

Subdwarfs and White Dwarfs
Spectra of a number of compact objects have been observed both by LAMOST and APOGEE.Although they are substantively different from stars and are usually treated separately, their spectral features follow the same T eff and g log relations as those of normal stars.As such, including these objects in the training set enables the development of a more self-consistent stellar model that can help compensate for imperfections in the labels of different classes.
We incorporate T eff and g log parameters of 874 LAMOST and 12 BOSS spectra of hot subdwarfs from Luo et al. (2019).A number of hot subdwarfs have also been incorporated in the catalog from HotPayne (Xiang et al. 2022).We also include 2654 LAMOST and 218 SDSS-V BOSS spectra of white dwarfs (WDs) from Kepler et al. (2019).More recently, Gentile Fusillo et al. (2021) produced a more comprehensive catalog of spectral parameters of WDs in the legacy SDSS data, including not only hydrogen-rich DA types, but also helium-rich DB types, and several others.We adopt these parameters for 6821 legacy SDSS I-IV BOSS spectra and 536 cross-matched LAMOST spectra.No [Fe/H] is available for these sources, although, given their evolved status, it is difficult to define this value on the same meaningful scale as for regular stars.
The catalogs of WDs from Kepler et al. (2019) and Gentile Fusillo et al. (2021) have only a few sources with T eff < 7000 K.The SDSS-V WD selection function appears to include many cooler sources in its targeting.If they are not included in the training set, the model predictions produce a sharp transition of the WD sequence onto the main sequence at T eff ∼ 7000 K, as the model does not recognize very cool sources with high surface gravity as a valid parameter space.In the preliminary training of the model, we identified these cool sources: while their g log was not accurate, their T eff was informed by the training set of stars.Following that preliminary training, we selected sources targeted as WDs that have landed onto the main sequence and adaopted them into the training set, preserving their T eff but setting their g log = 8 (as their sizes should not change significantly as they cool down).Although the absolute calibration of T eff for these sources may systematically differ by a few hundreds of Kelvins at these g log , we expect that the overall ordering of sources from hottest to coldest is sufficiently self-consistent.This was done for 1945 sources observed BOSS; LAMOST targeting appears to lack such sources.

Summary
In total, the training set consists of 412,099 spectra, of which 371,072 are in LAMOST and 41,027 are in BOSS.It covers 1700 K < T eff < 100,000 K and 0 < g log < 10 (Figure 2).Of them, 349,135 and 30,256 spectra, respectively, have labels for [Fe/H], with 99% of the sources occupying the range of -2 < [Fe/H] < 0.5; the sources with valid [Fe/H] occupy g log < 5 and T eff > 3200 K.

Radial Velocity
In the commissioning of SDSS-V, the wavelength solution of BOSS spectra has been rigorously tested to ensure selfconsistency.Those tests have shown that the radial velocities (RVs) derived using the pipeline from the previous generation of the survey can carry a significant offset of 5-10 km s −1 relative to high-resolution spectra such as APOGEE.The performance in the red part of the detector was somewhat stable, but in the blue part, especially at <4000 Å, it significantly degraded.Furthermore, this offset had a dependence on the pointing angle of the telescope, as it was not consistent across the sky.This amounts only to a fraction of a pixel on the detector, and the typical resolution of the instrument is only ∼5 km s −1 .Nonetheless, such an offset was considered a significant detriment to SDSS-V.Previously, BOSS had focused on extragalactic redshifts, for which such an offset is negligible.
With improved arc lamp line list and a better observing strategy for the calibration, the wavelength solution has subsequently stabilized.The pipeline PyXCSAO (Kounkel 2022) was developed to improve the quality of the RVs, limiting the range of wavelengths over which cross-correlation was performed.The resulting RVs are consistent with RVs derived from APOGEE, and they are stable across all stars in various clusters (where all stars should have the same velocity), regardless of T eff (Kounkel et al. 2023b).
However, because of the limited wavelength coverage, while PyXCSAO is effective for cool stars, it does not function well for sources where the bulk of their light is in the blue portion of the spectrum.Among the sources for which it is unable to derive RVs are WDs, which necessitates an alternate approach.
The issue of poor RVs is not unique to BOSS.Both LAMOST and BOSS pipelines share common origins, and although both underwent significant modifications through the present day, issues with the wavelength solution appear to date back to SEGUE data (Yanny et al. 2009), which was the original survey on which both of them were based.It is unclear to what degree these data can be reprocessed to improve RVs.As such, while PyXCSAO can offer some improvement in the RV stability over the native LAMOST pipeline, some systematic offsets remain, not in the least because the wavelength coverage of the instrument does not extend as far into the red.
As part of the stellar parameter determination, we aim to improve somewhat the RV determination for both instruments.In our training set, we include stable RVs with R > 6 reported by PyXCSAO for BOSS.We also cross-match the sources in the training set with APOGEE and other high-resolution surveys collated in Tsantaki et al. (2022) to adopt RVs for some sources.While there are spectroscopic binaries that introduce RV variability, their overall fraction is expected to be <10% of the total sample (Price-Whelan et al. 2020), of which significantly fewer would exhibit RV variability >5 km s −1 (the resolution of BOSS and LAMOST), as such their presence should not significantly skew the model.We also adopt RVs from the most confident and stable determinations from Gaia DR3 (Katz et al. 2023).For WDs from legacy SDSS spectra, we adopt RVs from Anguiano et al. (2017).
In total, RV data were available for 194,978 LAMOST spectra and 36,351 BOSS spectra that were already included in our training set.

Data Processing
BOSS and LAMOST data have a comparable resolution (Figure 3), and they cover a similar wavelength range, but they are not identical.Furthermore, since both of them are multiobject spectrographs, a given spectrum can be somewhat offset in wavelength depending on where it is positioned on a detector.In training of a model, this is not optimal, as the data need to be standardized with a consistent shape.To do this, we interpolate all of the data onto a common 3900 element wavelength grid ranging uniformly from 3800 to 8900 Å, approximating the typical resolution of the data within that range.Because of this standardization, BOSS Net may have applications outside of just these two data sets and may function as a general tool for optical spectra in general.
To compress the dynamical range of the data in order to improve the performance of the model and ensure numerical stability, we apply the log scaling to the flux; however, we do not perform continuum normalization, as it is nontrivial to do it self-consistently across all types of stars over such a large wavelength regime.The data set comprising LAMOST and BOSS stars was randomly segregated into distinct train, validation, and test sets, at a 80:10:10 ratio, respectively.

Model Architecture and Training
BOSS Net is a 1D residual convolutional network consisting of a series of 1D convolutional Residual Network blocks and a final linear network for prediction.
The BOSS Net model takes a star's spectrum as input.To regularize and avoid overfitting, data augmentation techniques are employed, such as randomly removing continuous segments of flux, dropping specific values in the flux, and adding noise to the flux by scaling a normal distribution sample by the error.
The model starts with a single convolutional block that includes a 1D convolutional layer, batch normalization, an Exponential Linear Unit (ELU) activation function, and a 1D max pooling layer (Figure 4).Batch normalization helps to improve the speed and stability of the training process by normalizing the inputs to each layer, reducing covariate shift, which is the change in input distribution of model layers as the model trains (Ioffe & Szegedy 2015).The ELU activation function enhances the performance and convergence speed of the model compared to other activation functions such as Rectified Linear Units (ReLU; Clevert et al. 2015).The max pooling layer helps reduce the number of parameters in the model, which can prevent overfitting and improve generalization to unseen data.Additionally, the model includes a positional encoding as a channel to the first convolutional layer of the first block, allowing for a more effective capturing of local patterns in the spectra at a given wavelength.
The output from the initial convolutional block flows into the first of many Residual Neural Network (ResNet) blocks.Each block consists of two 1D convolutional layers, each with batch normalization and an ELU activation function.The residual connection in this block allows the network to learn the residual mapping from the input to the output rather than the complete mapping, which can help with the vanishing gradient problem during training.ResNets were designed to allow for easier flow of information and gradients throughout the network, enabling the training of deeper models (He et al. 2016).
After the output of the residual blocks is obtained, it is passed through an adaptive average pooling layer.This layer serves to adjust the dimensions of the outputs to a fixed size, allowing for easier integration with the final linear network.The outputs of the adaptive pooling layer are then fed into several linear layers, each followed by the ReLU activation function.The final linear network provides the model prediction for the star's T eff , g log , [Fe/H], and RV.BOSS Net is evaluated with the mean squared error between the predicted and actual values.To encourage accurate modeling of the less populated regions of the parameter space, the loss of stars was adjusted, either by increasing or decreasing their weight in the loss calculation during training.The weight of specific regions is determined by the reciprocal of the Kernel Density Estimation (KDE) values computed from the training labels.This allows the model to prioritize the learning of less common samples, as indicated by their low-density regions in the KDE estimation, while reducing the impact of wellmodeled samples, which are associated with high-density regions.As shown in Figure 5, the white dwarfs and the hot subdwarfs are weighted higher in the loss calculation, as they are in less dense regions of the parameter space.In addition, [Fe/H] of metal-poor stars is weighted higher in comparison to the metal-rich stars, and RVs of WDs or stars with RV > 200 km s −1 are weighed higher than for other sources, once again due to their rarity in the sample.The model was trained with a learning rate of 0.0001 using the Adamax optimizer (which is more robust to the presence of outliers; Kingma & Ba 2014), and a batch size of 512.

Results
In Table 1, we present BOSS Net derived parameters for LAMOST DR8 and stellar objects observed with BOSS through SDSS DR18, which also includes the legacy SEGUE data, as well as those in the MaStar library (Imig et al. 2022).SDSS-V BOSS data are still proprietary to the collaboration.The code is integrated into Astra, the primary stellar data processing pipeline for SDSS-V.As such, its data products will be included in the subsequent data releases.
As the model was trained on the data in the training set, it was regularly evaluated against the validation set to ensure an adequate ability to generalize on the unseen data, and to prevent the model from overfitting.The training is terminated at the point when the performance on the validation set stops improving.
Since the validation set does end up influencing the model, once the model is finalized, it is evaluated against the test set.The resulting comparison between the predictions and the labels for these data is shown in Figure 6.The detailed comparison of the predictions with respect to each subset of labeled data is available in Appendix A.
In comparison to the initial labels, some of the cool WDs may have their g log underestimated, placed in between the main and WD sequences; these tend to be lower SNR sources where confident spectroscopic determination of their g log is difficult.Similarly, there are some sources with very metalpoor labels, the [Fe/H] of which is overestimated in the preditions; these tend to be hotter stars that intrinsically have very weak metal lines, and these labels typically originate from Gaia (near the edge of their [Fe/H] grid), which may not have as much sensitivity to [Fe/H] as higher resolution spectra.
We apply BOSS Net to all of the stellar spectra observed by BOSS to date.This does include the data originally used for training, but, as BOSS obtains spectra of several thousand stars on a nightly basis, the majority of the data are new.
We derive the uncertainties in the predictions following a similar implementation to the previous iterations of APOGEE Net.In particular, we generate 20 different realizations of the same spectrum by scattering the input fluxes by the reported uncertainties.To ensure the stability of the model, if uncertainties are larger than 5 times the median across the spectrum (such as, e.g., in the regions dominated by the telluric lines, or near the edges of the spectrum), they are capped to that level.All these realizations are separately passed through the model.The scatter in the resulting predictions is then evaluated and adopted as the uncertainties.These uncertainties are model dependent and are not representative of systematic errors that can be assessed through comparison to external data sets.However, the reported uncertainties do provide meaningful variance.
The example spectra plotted as a function of the predicted T eff are shown in Figure 7.The resulting distribution of T eff and g log is shown in Figure 8.It provides a good coverage of the underlying parameter space that is present in the training data.There is some degree of scatter in g log of cool stars with extremely low SNR being stranded in the parameter space between the main and the WD sequences; these sources can be identified through g log uncertainties.Figure 8 also highlights the difference in the selection function between SDSS-V BOSS observations, those conducted in prior SDSS iterations, and those done with LAMOST.LAMOST does not go to as faint magnitudes as     high above the Galactic plane.They are preferentially more metal-poor, with almost half the sample with [Fe/H] < -1.They lack high-mass stars, since they are unlikely to migrate far out of the disk due to their short lifetimes.Moreover, in such a deep survey, high-mass stars would also be preferentially excluded due to their brightness.There is also significant representation from compact objects.In particular, subdwarfs can be clearly seen as a continuation of the horizontal branch, which is consistent with the interpretation of their origin (Heber 2008).
The typical uncertainties in g log for the sources with SNR >15 are 0.09 dex for the cool stars (T eff < 6700 K) and 0.13 dex for hot stars.In T eff , they are typically 0.007 and 0.02 dex in cool and hot stars.In [Fe/H], they are 0.07 and 0.16 dex, respectively.In RV, they are 7 and 12.5 km s −1 (Figure 9).Several sources have been observed multiple times, enabling testing the consistency of the predictions and the accuracy of the resulting uncertainties (Figure 10).In general, the resulting scatter is well replicated by the reported errors, with the full width at half-maximum (FWHM) being almost exactly 1σ.Comparing parameters in the sources observed by both BOSS and LAMOST, the scatter is somewhat larger, typically 1.2σ-1.5σ,but, nonetheless, both sets of parameters are very consistent with one another (Figure 6).
In comparing SDSS-V and LAMOST RVs to those of other surveys such as APOGEE, BOSS Net appears to be significantly more stable in the low SNR regime than the native data reduction pipelines for these instruments, and there do not appear to be any sky-dependent systematic offsets such as those that were originally apparent in LAMOST data.They do, however, remain in the legacy SDSS I-IV BOSS spectra, with typical offsets on the order of ∼5 km s −1 .Despite their being interpolated onto a common grid, BOSS Net was able to learn how to differentiate between BOSS and LAMOST spectra, in order to correct for LAMOST offsets.But, since SDSS I-IV spectra use the same instrument as SDSS-V, the inconsistency in the data reduction strategy has resulted in an inconsistent calibration of RVs.
In comparing the parameters produced by BOSS Net to other pipelines developed within SDSS-V, we generally achieve good agreement in most cases.In some cases, e.g., RVs of WDs, BOSS Net solutions have higher uncertainties than what is possible to achieve through using more specialized pipelines.Additionally, since BOSS Net does not normalize the continuum prior to determining parameters, caution should be exercised with early-type stars that are very extincted (T eff > 7000 K and A G > 3).Extinction is not as significant of a concern for later-type stars, since they are very rich in spectral features; moreover, with a high degree of extinction they are unlikely to be observed within the magnitude limit.

Discussion
SDSS-V consists of several dedicated programs, each focused on different types of stars (Figure 11), such as, e.g., nearby dwarfs, young stars, compact binaries, distant red giants, OB stars, and many others (Almeida et al. 2023).Each program has been responsible for producing a catalog of likely candidates that represent their sources of interest; these catalogs are used in targeting for the survey.As such, every spectrum observed by SDSS-V has a flag specifying which program it has been targeted by, separating objects into different classes.Of course, this preliminary tagging is imperfect and prone to contamination, but examining the derived spectroscopic properties of the sources grouped by their targeted program in bulk enables the evaluation of the performance of BOSS Net for different types of stars.Here, we examine sources from just a few programs that are representative of the underlying parameter space.
The stars observed by the young star program (Kounkel et al. 2023b) are particularly useful in testing g log .The ages of these stars (typically <30 Myr) can be estimated through photometry alone, be it through isochrone fitting or through a data-driven approach (McBride et al. 2021).As such, given the rapid evolution of the pre-main-sequence low-mass stars, their parameters can be evaluated from the comparison to the theoretical models of the stars of that age.Indeed, spectroscopically derived g log values presented here demonstrate an excellent agreement with the MIST isochrones (Choi et al. 2016; Figure 12).The calibration here is significantly improved and does not appear to show any systematic trends relative to the models that were present in the previous iterations of APOGEE Net (Olney et al. 2020).
High-mass stars quickly settle onto the main sequence; thus, it is difficult to observe them in a pre-main-sequence phase.However, comparing the position of the young stars to those targeted by the OB star program (Zari et al. 2021) shows that the former tend to be found at higher g log values (Figure 11).High-mass stars have short lifetimes; younger sources will be found on the main sequence, but they will evolve away from it in as little as <100 Myr.As they transition to giants, they will become more luminous and as such are more likely to be targeted.Given that, it is unsurprising that the stars within the OB program tend to be predominantly more evolved than those in the young star program.
Sources observed by the solar neighborhood census program (targeting sources within 100 pc based on Gaia parallaxes) are predominantly M dwarfs and brown dwarfs.Earlier type stars are unlikely to have been selected, as, due to their proximity, they are brighter than the (current) bright limits of BOSS, which would result in saturation.By contrast, other programs are unlikely to contain brown dwarfs, since at larger distances they become increasingly too faint.
Some observations were conducted of members of various moving groups from Kounkel et al. (2020).These populations are found within 3 kpc, with ages ranging between 30 Myr and 4 Gyr.The observed sources are predominantly found on the  low-mass part of the main sequence, since they are old enough to have settled there but are still sufficiently young to have not evolved along the red giant branch.Similarly to the above, hotter stars tend to be too bright for BOSS to currently observe.The resulting sequence is relatively tightly concentrated in g log , with some scatter predominantly from the [Fe/H] spread.To date, there are only a few groups with spectroscopic data for more than a handful sources.With a growing census in the future, it would be possible to use them to evaluate the stability in [Fe/H].
Stars in the halo have been targeted though several different methods; here, we focus on just a few of them, restricting the sample to distant stars with distances >2 kpc.Sources that have been identified as metal-poor dwarfs are typically found with higher g log than the more metal-rich main-sequence stars of comparable T eff in other programs.Similarly, metal-poor giants are hotter than the metal-rich giants.Both of these are expected given evolutionary models (e.g., Choi et al. 2016), and the resulting [Fe/H] also reflect this difference.
Compact binaries have been typically selected based on their UV excess to preferentially select systems containing WDs.A sizeable fraction of these sources have g log consistent with being main-sequence stars, since this would typically be a significantly brighter star in the system.However, sources where the spectrum is dominated by the flux of WDs or hot subdwarfs are also easily apparent in the sample.
Similarly, we compare the parameters for the legacy SDSS spectra for WDs to the classification from Gentile Fusillo et al.
(2021) in Figure 13.We well recover the expected temperature differences between different classes.Almost all of the sources in that catalog that are confirmed to be WDs (as opposed to stars, hot subdwarfs, and other types of objects) are indeed distinguishable based on their g log , although DZ (in contrast to similarly cool DQ) type WDs may be most succeptible to being caught in between the WD and main sequences.
Most of these trends are not unexpected, and indeed many of these types of sources were originally present in the training set, even though in some cases particular classes of sources might have been split across different catalogs.Nonetheless, the volume of the underlying data set and the self-consistency of the derived labels does make the comparison between the sources more illuminating.

Conclusions
We present BOSS Net, a model for evaluating spectroscopic stellar parameters from optical spectra, namely, from BOSS and LAMOST.This model is capable of deriving T eff , g log , and [Fe/H] across all of the stellar objects observed by these instruments in the range of 1700 K < T eff < 100,000 K and 0 < g log < 10 in a self-consistent manner.This includes mainsequence stars (from OB stars to brown dwarfs), pre-mainsequence stars, evolved stars (both hot and red giants), and hot subdwarfs, as well as white dwarfs.In pre-main-sequence stars, the resulting parameters are calibrated to the stellar evolutionary models, and g log can be used as an independent indicator of the age of the star.
This model has been built using the training set consisting of carefully assembled catalogs produced by a number of different studies that specialize in specific types of sources.The narrow focus of these studies has provided crucial expertize in characterizing the specific features in the parameter space of these stars.For example, comparing the spectra to the synthetic templates makes it possible to understand the fundamental physics behind various stellar objects; however, such an approach has its limitations.Each one has a boundary in the parameter space it explores, and joining the resulting parameters can be nontrivial without introducing systematics that complicate the interpretation of these parameters near the boundaries.Synthetic spectra may also miss vital spectroscopic features that are present in the real data that further add systematics.
The data-driven approach presented here enables us to bridge across all of these studies, allowing the model to improve on the self-consistency of the input parameters.Stars of different types are still governed by the same physics, even though they occupy different regions of the parameter space; their inclusion makes the model more general.The ability to characterize all of the stellar spectra within a single model is vital in large surveys, as this enables the model to take advantage of all available data.Data-driven models also allow the model to translate the parameter space between different surveys and different instruments.Furthermore they are fast and efficient, which is also an important consideration for data processing in a large survey.BOSS Net will be available publicly on GitHub, and it is also being incorporated into Astra, which is the analysis framework for the Milky Way Mapper within SDSS-V, and which manages various data processing pipelines.As a result, the parameters that it produces will be made available in the subsequent data releases.
There are a number of more specialized pipelines within Astra, including pipelines focused on low-mass stars, white dwarfs, hot stars, metal-poor stars, etc.At the time of the data releases, an assessment by the working groups within the survey will be made regarding the recommended set of parameters for each stellar class.
We do note that while this pipeline was designed to be capable of processing all of the stellar spectra in SDSS to provide a homogeneous set of stellar parameters, it was primarily developed to support pre-main-sequence stars, and it may not be a one-size-fits-all solution for all applications.It may be unreliable for hot stars that are highly extinct (A G > 3 and T > 7000 K).The reported parameter space for the cool WDs likely has significant systematics; thus, using stellar parameters from specialized pipelines would be recommended for these sources.Reasonable caution should also be exercised with regards to the more exotic types of stars that have not been accounted for in the training set: if they overlap with the previously explored parameter space, there may be systematic offsets in their absolute calibration.If they are outside of the parameter space covered by the training set, the predicted parameters could become unphysical, or they could be aliased to the stars with the most similar features.In such cases, a comparison with an independently derived set of parameters from different pipelines may be advised.In future, it may be possible to isolate these sources to a separate catalog.
In Figure 14 we show the location on the Kiel diagram of various subsets that were used to train BOSS Net.Most of them are located in well-isolated corners of the parameter space, which provides safeguards against the model predictions being systematically distorted in the overlap.However, among red giants and intermediate-mass main-sequence stars, there are four subsets: APOGEE Net, the Payne, Gaia, and MINE-Sweeper, which do have some overlap.
We thus compare labels produced by these pipelines in Table 2, to ensure they are sufficiently self-consistent.We note that in training each star is assigned only one label, but the comparison is done if independent measurements from both pipelines are available.APOGEE Net and the Payne have good agreement between each other; this is to be expected, since APOGEE Net initially was based on the Payne.
Stellar parameters produced by the Gaia photometric pipeline have substantial systematic offsets with respect to these two pipelines for the red giants, but for sources with g log > 3.5 (since the vast majority of all sources for which labels have been adopted from Gaia are dwarfs) there is a much better agreement for both T eff and g log , with only a slight scatter in [Fe/H].
MINESweeper parameters are primarily computed for metalpoor stars of the halo, which have not been observed with APOGEE; however, its catalog also includes a small subset of other sources that have been observed with both instruments.These sources are almost exclusively found on the red giant branch.Although there is a systematic offset in log T eff of 0.01 dex between the MINESweeper model versus ANet and the Payne, which propagates to no nonlinear systematic differences in g log , inflating the scatter, there is nontheless a good agreement between the two sets of labels.
Finally, we compare the outputs of the final model of BOSS Net to the labels in each of the subsets.In general, the performance across them is reasonably consistent, although some do have more scatter, for example, g log measurements of young stellar objects; since the labels were derived photometrically, they were somewhat less precise than the spectroscopic labels in other subsets.Gaia [Fe/H] values also appear   to have more scatter relative to the predictions than other subsets, as XP spectra from which they are derived have very low resolution and may not have as much sensitivity toward very subtle changes.Finally, the parameters of subdwarfs and WDs have sizeable scatter relative to the predictions because these spectra tend to have low SNRs.
T eff > 20,000 K, BOSS Net produces somewhat more optimal g log , for which there are multiple reasons.First, BOSS has observed a significantly larger number of such sources for which labels are available.Additionally, optical spectra contain larger numbers of spectral features associated with OB stars than what is available in the H band.
Finally, we examine the sample differences between DR18 and SDSS-V sources that have been observed so far.The primary difference has been in pre-main-sequence stars: SDSS-V lacks extremely young YSOs in the APOGEE sample, as due to the targeting strategy initial observations pointed specifically at several notable star-forming regions have preferentially observed low-mass YSOs with BOSS.Additionally, as previously mentioned, SDSS-V has shorter exposure times to enable the large volume of the survey, thus these YSOs do not reach to as cool T eff values as in the legacy era.There are also some differences in the targeted blue and yellow supergiants, although in both cases they are somewhat rare in comparison to BOSS data.

Figure 1 .
Figure 1.Photometric interpolation of T eff and g log values for cool dwarfs using the data from Cifuentes et al. (2020).

Figure 2 .
Figure 2. Distribution of the T eff and g log of the sources that form the training set; BOSS objects are in red and LAMOST in blue.

Figure 3 .
Figure3.Zoomed-in comparison of BOSS and LAMOST spectra for some of the sources for which spectra from both instruments are available.All spectra have been interpolated onto a common wavelength grid.The shown sources have SNR ∼ 100 for BOSS, and ∼30 for LAMOST.Both BOSS and LAMOST spectra were normalized through dividing out the median flux within the presented wavelength range, and different sources were arbitrarily offset.

Figure 4 .
Figure 4. Architecture of the neural net model used in this paper.

Figure 5 .
Figure 5. Distribution of the T eff and g log from the training set colored according to the weight applied to the loss.

Figure 6 .
Figure 6.Left: the performance of BOSS Net on the withheld test set, showing the consistency between the original labels for T eff , g log , and [Fe/H] vs. the resulting predictions.The x-axis shows the labels.Right: comparison between the derived parameters from BOSS and LAMOST spectra of the sames sources.The x-axis shows the BOSS measurements.

Figure 7 .
Figure 7. Example BOSS spectra for stars of different T eff .

Figure 8 .
Figure 8.The distribution of the derived T eff and g log in the BOSS spectra, color coded by SNR.

Figure 9 .
Figure 9.Typical uncertainties in T eff , g log , [Fe/H], and RV across the full parameter space.

Figure 10 .
Figure 10.Difference between the predictions obtained for the same stars with multiple visits, divided by the uncertainties in the predictions (added in quadrature).Blue curve shows the resulting distribution of the scatter in g log , yellow in T eff , and red in [Fe/H].

Figure 11 .
Figure 11.The distribution of T eff and g log of sources by various SDSS-V programs, including young stars (<30 Myr), OB stars, sources within 100 pc, members moving groups (>30 Myr), sources in the halo with distance (including K giants, horizontal branch stars, RR Lyr, and metal-poor dwarfs and giants), and compact binaries.

Figure 12 .
Figure 12.The distribution of T eff and g log of the pre-main-sequence stars, color coded by photometrically derived ages using Sagitta (McBride et al. 2021).The lines with white outlines show MIST isochrones (Choi et al. 2016) with ages of 6.2, 6.6, 7.0, 7.4, and 7.8 dex (from top to bottom).Note that g log values independently show a strong correlation with age, and that the spectroscopic parameters are consistent with the photometric estimates.Top panel: outputs from BOSS Net.Bottom panel: outputs from APOGEE Net (Appendix B); the apparent difference at the youngest end is driven primarily by the selection function of the sources observed in each wavelength regime to date.

Figure 13 .
Figure 13.Derived spectroscopic parameter space of white dwarfs in legacy SDSS I-IV data, color coded by the source classification from Gentile Fusillo et al. (2021).

Figure 14 .
Figure 14.Subsets of the different samples of the labeled data used in training.

Figure 16 .
Figure 16.Comparison between the derived parameters from APOGEE Net and BOSS Net (including both BOSS and LAMOST spectra) of the sames sources, using the withheld test sample not used in training.

Figure 15 .
Figure 15.Top: sample of labels used to train APOGEE Net.Bottom: predicted parameters for SDSS I-IV DR18 and for SDSS-V APOGEE data.

Table 1
Stellar Properties from LAMOST, Legacy SDSS Spectra, and MaStar Stellar Library Table1is published in its entirety in the electronic edition of the Astronomical Journal.This table describes the contents of each column in the full table.It is shown here for guidance regarding its form and content..The data is also available in three FITS tables in Zenodo at doi: 10.5281/ zenodo.10641761.
(This table is available in its entirety in machine-readable form.)BOSS,lacking very cool brown and white dwarfs.In large part, it is also less targeted toward the more exotic objects, while, e.g., pre-main-sequence stars or some of the largest blue supergiants are present in LAMOST data due to the sheer number of sources it observed; they compose a significantly smaller fraction of the catalog.By contrast, legacy SDSS optical observations were primarily motivated by extragalactic sources: they are pointed

Table 2
Comparison of Scatter between Labels from Different SubsetsScatter between the final model and the labels from different subsets a a Reported only for the withheld test set.

Table 3
Stellar Properties from APOGEE DR18 Spectra Uncertainty in [Fe/H]Note.Table3is published in its entirety in the electronic edition of the Astronomical Journal.This table describes the contents of each column in the full table.It is shown here for guidance regarding its form and content.The data is also available in three FITS tables in Zenodo at doi: 10.5281/ zenodo.10641761.(Thistable is available in its entirety in machine-readable form.)