A Model RRNet for Spectral Information Exploitation and LAMOST Medium-resolution Spectrum Parameter Estimation

This work proposes a residual recurrent neural network (RRNet) for synthetically extracting spectral information and estimating stellar atmospheric parameters together with 15 chemical element abundances for medium-resolution spectra from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST). The RRNet consists of two fundamental modules: a residual module and a recurrent module. The residual module extracts spectral features based on the longitudinally driving power from parameters, while the recurrent module recovers spectral information and restrains the negative influences from noises based on Cross-band Belief Enhancement. RRNet is trained by the spectra from common stars between LAMOST DR7 and the APOGEE-Payne catalog. The 17 stellar parameters and their uncertainties for 2.37 million medium-resolution spectra from LAMOST DR7 are predicted. For spectra with a signal-to-noise ratio ≥ 10, the precision of estimations (T eff and log g) are 88 K and 0.13 dex, respectively, elements C, Mg, Al, Si, Ca, Fe, and Ni are 0.05–0.08 dex, and N, O, S, K, Ti, Cr, and Mn are 0.09–0.14 dex, while that of Cu is 0.19 dex. Compared with StarNet and SPCANet, RRNet shows higher accuracy and robustness. In comparison to Apache Point Observatory Galactic Evolution Experiment and Galactic Archaeology with HERMES surveys, RRNet manifests good consistency within a reasonable range of bias. Finally, this work releases a catalog of 2.37 million medium-resolution spectra from the LAMOST DR7, the source code, the trained model, and the experimental data, respectively, for astronomical science exploration and data-processing algorithm research reference.


INTRODUCTION
With the gradual development of various large-scale spectroscopic surveys (e.g.Steinmetz et al. 2006;Yanny et al. 2009;Gilmore et al. 2012;De Silva et al. 2015;Luo et al. 2015;Majewski et al. 2017), a large number of spectra are observed along with them.The spectra of these surveys provide important data supports for astronomers to investigate fundamental astronomical problems.The spectra contain numerous stellar information, such as effective temperature (T eff ), surface gravity (log g), and elemental abundances.These information can be used in studying the formation and evolution of the Milky Way (Frankel et al. 2018;Bland-Hawthorn et al. 2019).
The majority of spectroscopic surveys are equipped with pipelines for obtaining stellar parameters and element abundances.These pipelines commonly match observed spectra with theoretical (or empirical) spectra by minimizing χ 2 distance, and using the labels of the best-matched theoretical spectra as the parameter estimation from the observed spectra.For example, LAMOST developed the stellar parameter pipeline LASP (Wu et al. 2011;Luo et al. 2015) based on the ULySS package (Koleva et al. 2009); APOGEE is equipped with the stellar parameters and chemical abundance pipeline ASPCAP (García Pérez et al. 2016;Jönsson et al. 2018) for near-infrared spectra; GALAH applied the Spectroscopy Made Easy tool to obtain the stellar parameters (Piskunov, Nikolai & Valenti, Jeff A. 2017).However, the application of the above-mentioned methods to LAMOST spectra at medium-resolution (R ∼ 7500) or lowresolution (R ∼ 1800) faces the following challenges: The theoretical spectra come from stellar atmosphere models which depend on overly simplified physical assumptions, resulting in some gaps between theoretical and measured spectra.In addition, the presence of blended features in spectra makes matching the correct theoretical spectra more difficult.
With the arrival of artificial intelligence and the big data era, deep learning methods have been attempted to deal with the estimation of stellar parameters.This kind methods usually estimate parameters by approximating the mapping relationship from observed spectra to stellar parameters (e.g.Fabbro et al. 2017;Bialek et al. 2020;Leung & Bovy 2018;Wang et al. 2020), or from stellar parameters to observed spectra (e.g.Ting et al. 2019;Xiang et al. 2019;Rui et al. 2019) using neural networks.However, most of the above models use simple fully-connected neural networks or convolutional neural networks to establish the mapping relationships.While, it is shown that these methods are difficult to extract some deep spectral features from low signal-to-noise spectra.
To this end, we designed a Residual Recurrent Neural Network (RRNet), and established a mapping from LAMOST medium-resolution spectra to stellar atmospheric parameters and elemental abundances using deep ensembling.The RRNet consists of two fundamental modules to comprehensively extract spectral features: the residual module and recurrent module.The residual module is longitudinally driven by the spectrum labels in the training data set to extract features, while the recurrent module exploits spectral features and restrains the negative influence from noise based on Cross-band Belief Enhancement (CBE) between the observations on different wavelength subbands.The CBE refers to the maximum information recovery and extraction by fusing the observations on different wavelength subbands in case of the existence of some correlations between them and some difference of disturbances on them (See Fig. 4).Som LAMOST spectra with APOGEE-Payne label was extracted as reference spectra.These spectra were observed from 28,523 common stars between LAMOST DR71 and APOGEE-Payne catalog (Ting et al. 2019).The reference spectra are used for training and testing the RRNet model.Finally, stellar atmospheric parameters, elemental abundances and the 1σ uncertainties of the corresponding parameter estimations are derived for 2,377,510 medium-resolution spectra from LAMOST DR7 by RRNet.
This paper is organized as follows: Section 2 introduces the APOGEE-Payne catalog, the LAMOST DR7 mediumresolution spectra, and the associated data pre-processing methods.Section 3 describes the RRNet model and its validation experiments.Section 4 presents the results of our study.Summary and outlook are made in Section 5.The computed catalog for 2.37 million medium-resolution spectra from the LAMOST DR7, the source code, the trained model and the experimental data are released on: https://github.com/Chan-0312/RRNet.

DATA
To learn the model parameters of RRNet, a reference set is needed.The reference dataset consists of some LAMOST DR7 medium-resolution spectra with the labels of their stellar parameters (T eff , log g) and 15 elemental abundances from the APOGEE-Payne catalog.This reference set is established by cross-matching the observations between LAMOST DR7 medium-resolution spectra and the APOGEE-Payne catalog.The establishment of the reference set and the related pre-processing procedures are described furtherly in the following two subsections.

Reference dataset
LAMOST provides a large number of precious spectra for researchers.LAMOST began its Phase II mediumresolution survey in July 2017 and released 5.6 million medium-resolution spectra in LAMOST DR7, with a total of 2.4 million spectra with a signal-to-noise ratio (S/N) greater than 10 (Rui et al. 2019).During the survey, two spectra are obtained for each exposure, one is the blue part and the other is the red part, and their wavelength coverages are [4950,5350]  Following Wang et al. (2020), we cross-matched the LAMOST DR7 medium-resolution spectra with the APOGEE-Payne catalog, obtained 161,447 LAMOST DR7 spectra from 34,372 common stars.To ensure the reliability of the dataset, the LAMOST spectra with S/N < 10 are eliminated from the cross-matched data set.In addition, some LAMOST spectra are affected by cosmic rays and other influences, which result in a large number of outliers (bad pixels) in them.Therefore, the spectra with more than 100 outliers or more than 30 consecutive outliers are rejected in this work.In the APOGEE-Payne catalog, we kept the information from the spectra with quality flag = good and T eff ∈ [4000, 6500] K. Through the above processing, we finally obtained 28,523 common stars and 114,853 mediumresolution spectra of the common stars from LAMOST DR7.These spectra are used as the reference data for our model.

Data pre-processing
To facilitate the optimization of the model, we pre-process the LAMOST spectra as follows: Wavelength correction: The Radial Velocity (RV) can broaden or narrow down the spectral lines in the observed spectrum.To account for this kind of broadening and narrowing down effects, a much large reference set should be used in training a machine learning parameter estimation scheme for a similar accuracy.To simplify the spectral parameter estimation problem based on the available reference set, we perform wavelength correction on each spectrum by shifting it to its rest frame using the RV provided by the LAMOST catalog: where λ is the corrected spectral wavelength in the rest frame, λ is the observed spectral wavelength, RV is the radial velocity of the corresponding spectrum, and c is the speed of light.(For more discussion see appendix C.) Spectral resampling: According to the distribution of the spectral wavelength ranges from the reference set, the common part of the corrected spectral wavelength ranges are computed.The common wavelength ranges on blue part and red part are respectively [4968, 5328] Å, and [6339, 6699] Å.On the common wavelength range, each spectrum is resampled using a linear interpolation procedure with a step 0.1 Å.Finally, 7200 fluxes (3600 in the blue part and 3600 in the red part) are obtained for each observed spectrum.
Spectral normalization: Consistent with Wang et al. (2020), each resampled spectrum is divided by its pseudocontinuum, which is computed using a 5-order polynomial fit.The normalized spectra are used as input values into the RRNet model.

A RRNET MODEL FOR SPECTRAL PARAMETER ESTIMATION
To extract features from stellar spectra and restrain the negative influence from noise, we propose a neural network RRNet.The RRNet consists of a residual module, a recurrent module and an uncertainty prediction module.Some experiments are conducted on the reference data (section 2), RRNet shows good performance on them.In addition, RRNet has higher accuracy and better generalization capability compared to the typical models such as StarNet (Fabbro et al. 2017;Bialek et al. 2020).

RRNet: Residual Recurrent Neural Network
Earlier works Bailer-Jones et al. (1997) and Manteiga et al. (2010) applied neural networks to atmospheric parameter estimation on synthetic stellar spectra.However, the effects of both works are limited by the data and hardware resources at that era.Fabbro et al. (2017) proposed a convolutional neural network (StarNet), consisting of two convolutional layers and three fully-connected layers.The StarNet is trained by ASSET (Koesterke et al. 2008) synthetic spectra and APOGEE observed spectra, and then used in estimating stellar parameters from APOGEE observed spectra.Subsequently, Bialek et al. (2020) improved StarNet using deep ensembling to give StarNet the ability to predict parameter uncertainty.Wang et al. (2020) developed a residual-like network (SPCANet) consisting of three convolutional layers and three fully-connected layers, and applied it to LAMOST DR7 medium-resolution spectra for stellar parameter and chemical abundance estimation.
The above-mentioned models only employ a few layers of convolutional computations and fully-connected computations to approximate the mapping from spectra to stellar parameters.Unfortunately, there is much space to increase the quality of the extracted features based on the experiments in section 3.4.To this end, we develop a Residual Recurrent Neural Network (RRNet) specifically for extracting spectral features and used it for estimating stellar parameters from the medium-resolution spectra in LAMOST.The RRNet mainly consists of three kinds of modules: residual module, recurrent module and uncertainty prediction module.The structure of RRNet is shown in Figure 1.The input layer is a pre-processed spectrum (section 2.2).Immediately following the input layer are Nr residual blocks.The residual blocks share a common structure and are used to reinforce the spectral features and restrain the negative effects from noises and irrelevant components based on their correlations with the parameter to be estimated.Subsequently, the spectrum is reshaped into sequence data S of length Ns, and further processed by a recurrent layer to obtain S .The recurrent module extracts spectral information by analyzing the correlations between the spectral features on various wavelength subbands and fusing their information.The final step is to establish a mapping from the spectral feature to the Probability Density Function (PDF) of stellar parameters and elemental abundances.
Residual module: The residual module is to explore the correlation between the spectral features and the reference labels (the parameters to be estimated).Based on this correlation, not only the features sensitive to the parameters are enhanced, but also the negative effects from noises or irrelevant components are restrained.The residual module contains N r residual blocks (He et al. 2016) with the same structure.A typical configuration in residual module is the use of batch normalization (BN) to accelerate the training procedure by reducing internal covariate shift (He et al. 2016;Ioffe & Szegedy 2015).However, our experiments on stellar parameter estimation show much more fluctuations in the training procedure on the implementation with BN than that without BN.As a result, the training time is increased approximately 127%, and there is some reduction on the accuracy of the spectral parameters from the learned model.These phenomena possibly come from the existences of a certain number of bad pixels in many of the LAMOST spectra.The fluxes of the bad pixels are usually enormously deviated from the normal pixels and have evident negative influences on the estimations of mean and variance in BN (Ioffe & Szegedy 2015).Therefore, this work removed the BN from the traditional residual blocks.In addition, our stellar parameter estimation experiments show better performance on the implementation with an Average-pooling layer than that with a Max-pooling layer.Therefore, the proposed residual module used an average-pooling layer in each residual block.
Recurrent module: The recurrent module is to extract the features of various subbands from the spectrum and analyze the correlation between them.This correlation enables maximum information recovery and noise suppression.Since neither convolutional networks nor fully-connected networks can achieve this kind of information extraction across subbands.For this reason, the spectral vectors are reshaped into sequence data S = {S 1 , S 2 , • • • , S Ns−1 , S Ns } of length N s , and then S is processed cross wavelength by a recurrent layer to obtain S .Through this processing, RRNet can achieve information recovery and noise suppression from the spectral features on different wavelength.
Uncertainty prediction module: Following Bialek et al. ( 2020), we used three fully-connected layers to predict the Probability Density Function (PDF) of stellar parameters and chemical abundances.The PDF is approximated using a Gaussian distribution.Therefore, RRNet just needs to output the estimation of the mean µ and variance σ 2 .In addition, we added Dropout layers (Hinton et al. 2012) into the Uncertainty prediction module to avoid overfitting in the model training process.The Dropout deals with overfitting by reducing co-adaptations between neurons (especially in the same layer) and restraining the relying of a neuron on the presence of particular other neurons.It should be noted that to ensure the stability of the model output, Dropout is only enabled in the training phase and is disabled in the inference phase.

Model training
The reference set is randomly divided into a training set, a validation set and a test set at the ratio of 7:1:2.The three data sets respectively consist of 80,812 spectra from 19,996 stars, 11,473 spectra from 2,852 stars, and 23,198 spectra from 5,705 stars.The training set is used for the training of the RRNet model (see section 3.2), the validation set is used for the selection of the model hyperparameters (see section 3.3), and the test set is used for the evaluation of the model (see section 3.4).In LAMOST observations, some spectra are from the same sources.If the spectra from the same source simultaneously appear in the training set, validation set and test set, there is the possibility of reducing the objectivity of the model evaluation.Therefore, this work divides the reference set based on the sources instead of the spectra.
In order to enable the RRNet model to predict the PDF of physical parameters, consistent with Bialek et al. (2020), we use the negative log-likelihood of the normal distribution as the loss function of the model, as follows: where x, y are the input spectra and the corresponding reference labels, µ θ (x), σ 2 θ (x) are the mean and variance of the Gaussian distribution predicted by the model, and θ is the parameter of the RRNet model to be optimized.To learn the model parameter θ, the Adam (Kingma & Ba 2014) is used as the optimizer to speed up the convergence of the learning procedures.And during the training process, a data augmentation by disturbing a reference spectrum with Gaussian noise can improve the robustness of the model.
Finally, in order to accurately estimate the PDF of the physical parameters, M instances of the RRNet are trained with different random initializations (M = 6 in this paper).The mean of ensembling μ(x) is obtained by averaging the predicted mean of M RRNet instance models.The variance of ensembling σ2 (x) is determined by the following equation: Table 1.Experimental results for determining the hyperparameters Nr and Ns of RRNet.These two hyperparameters respectively represent the numbers of residual blocks and wavelength sub-bands in RRNet.The model performance is measured using the Mean Absolute Error (MAE, defined in equation 4) and computed from the validation set.

Model selection
In the application of machine learning, the choice of model hyperparameters have strong influences on model performance.The depth of the RRNet is determined by the number of residual blocks.After the residual blocks, the information of a spectrum is represented using a vector v and further processed by the recurrent module in RRNet (Fig. 1).In the recurrent module, the v is divided into a series of wavelength sub-bands (Fig. 1).The numbers of residual blocks and the wavelength sub-bands are represented by N r and N s respectively.They are the hyperparameters of RRNet.To explore the effects of the hyperparameters N r and N s on model performance, we compared the RRNet with its variants.Various variants of the RRNet are designed using different configurations of N r and N s .In these experiments, the performance measures are computed on the validation set for determining the model hyperparameters N r and N s .This work used the Mean Absolute Error (MAE) to evaluate the performance of a stellar atmospheric parameter estimation model as follows: where ŷi and y i denote the predicted value by RRNet and the corresponding reference value from a spectrum x i respectively, and n denotes the number of samples.The experimental results are presented in Table 1.
It is shown that the model performance is improved with the increases of N r and N s .However, after these hyperparameters reach certain thresholds, the performance improvement will be trivial or even degrade.The above-mentioned phenomena indicate that more residual blocks and more wavelength sub-bands in an RRNet model help to improve the spectral information extraction capability and increase the parameter estimation performance to a certain degree.More residual blocks result in an RRNet with more complexity and more model parameters to be optimized.This kind of models need a training data set with more reference spectra.However, the scale of the training set is difficult to be expanded without any limitations in a real application.Therefore, the performance of the RRNet firstly is improved gradually with the increase of N r , and then decreases in case that the complexity of the RRNet exceeds that supporting from the available training data.Therefore, the performance of the RRNet can be improved further by adding more residual blocks if a training set with a larger scale can be obtained.
Another characteristic of the RRNet is that the model can exploit spectral features and restrain the negative influences from noise based on CBE between the observations on different wavelengths (Fig. 1).Two non-directly adjacent sub-bands can indirectly communicate with each other through a series of directly adjacent sub-bands between them, and conduct belief enhancement.In the recurrent module, a series of sub-bands can be numbered and indexed by integers from 1 to N s from left to right, and the distance between each two sub-bands can be measured by the difference between their indexes (Fig. 1).The further the distance between two non-direct adjacent sub-bands, the weaker the communication and information fusion ability between them.In the case of the number of sub-bands changing from 1, to 2, to 3, similarly to more, the amount of sub-band pairs with a directly adjacent relationship or small distance from each other increases gradually.Therefore, the CBE ability and prediction effect of RRNet are improved gradually in this procedure.At the same time, with the increase of N s , more and more long-distance sub-band pairs appear.This kind of sub-band pairs results in poor communication between them and unsatisfactory belief enhancement.Therefore, the performance of the RRNet increases firstly and then decreases in the case of N s gradually increasing.Ultimately, this work utilized the configuration N r = 3 and N s = 40 based on the experimental investigations.

Model evaluation
After determining the optimal hyperparameters of the model, we demonstrate the performance of RRNet on the test set and compare it with other models.Figure 2 shows the distribution of the inconsistencies, and the dependence of the 1σ uncertainty predicted by RRNet on S/N and inconsistencies.The inconsistency refers to the difference between the RRNet predictions and the APOGEE-Payne catalog on the test set.On the whole, the RRNet predictions have high accuracy and precision.The precision of the stellar parameters (T eff and log g) are 88 K and 0.13 dex respectively, elements C, Mg, Al, Si, Ca, Fe, Ni are 0.05 dex to 0.08 dex, and N, O, S, K, Ti, Cr, Mn are 0.09dex to 0.14 dex, while that of Cu is 0.19 dex.In addition, although the quality of a spectrum helps improve the parameter estimation performance (small bias and standard deviation) on the whole, the standard deviation of the RRNet prediction varies in a small range.Therefore, the RRNet are robust against noises and disturbances.The robustness benefits from the CBE of the recurrent module, which enables the RRNet model to fuse the feature information from different wavelength sub-bands and restrain the negative influences from noise and disturbances based on the correlation between different sub-bands.
To further demonstrate the performance of RRNet, we compared RRNet with models such as StarNet.It is shown that RRNet has significant advantages over the other three models (Figure 3).The performance of ResNet is lower than that of RRNet because ResNet only extracts flux features longitudinally from the spectrum based on the correlation between the fluxes and the parameter to be estimated, and lacks the capability of CBE.The StarNet consists of several convolutional layers and fully-connected layers.It is shown that this method does not work as well as RRNet on LAMOST spectra.After adding a recurrent module, a variant of StarNet is obtained and referred to as StarNet-R.Therefore, the StarNet-R has the capability of CBE and shows a performance superior to the original StarNet (Figure 3).In RRNet, there are both the residual module and recurrent module.Therefore, the RRNet has better performance on 1) feature extraction for the residual module than StarNet and StarNet-R; 2) information recovery and noise & disturbances abatement for CBE from recurrent module than StarNet and ResNet.Therefore, it is shown that the RRNet is more accurate and stable compared with other models.
To visualize why the RRNet modules are effective, Figure 4  The black line is the distribution curve of the inconsistencies (difference) between RRNet predictions and APOGEE-Payne catalog, and three colored curves (blue, green, and red) present the uncertainties (standard deviation) of the inconsistencies.µr and σr are respectively the mean and standard deviation of the difference, σm is the mean of the 1σ uncertainty predicted by RRNet.
T  [6000, 6300] K) in the test set, respectively.The results show that, on the whole, three models are consistent with each other very well on their average partial derivatives.The comparisons on (RRNet, StarNet, StarNet-R) and their visualizations are further summarized.1) Both StarNet-R and RRNet have a recurrent module which enhances some hidden sub-band features compared to StarNet without the recurrent module (See Figure 4 for the sub-bands shaded in red).These phenomena indicate that the recurrent module does help recover the spectral features through CBE and thus improve the performance of the model.2) Both StarNet-R and RRNet have recurrent modules, the difference is that StarNet-R uses a traditional convolutional neural network to extract spectral features while RRNet uses the superposition of multiple residual blocks to extract features.The results show that the residual module reinforces more dense features than the traditional convolutional neural network (See Figure 4 for the sub-bands with borders shaded in green).These phenomena indicate that the residual module does provide stronger feature extraction performance than the traditional convolutional neural network.

APPLICATION ON LAMOST DR7
In this section, we applied RRNet to the medium-resolution spectra from LAMOST DR7, computed a LAMOST-RRNet catalog and evaluated the catalog.The establishment of the LAMOST-RRNet catalog and the related validation are further described in the following subsections.

LAMOST DR7 parameter estimation
After training and testing the RRNet model (sections 3.2 and 3.4), the stellar parameters and elemental abundances are derived for medium-resolution spectra from LAMOST DR7 by RRNet.Based on the distribution range of the stellar parameters in the reference set, we kept only the LAMOST DR7 spectra with LASP (Luo et al. 2015;Wu et al. 2011) estimation T eff ∈ [3500, 7000] K, and processed these spectra using the pre-processing procedures in section 2.2.Ultimately, the LAMOST-RRNet catalog is obtained by RRNet.This catalog contains stellar atmospheric parameters, chemical abundances, and corresponding 1σ uncertainties for 2,377,510 medium-resolution spectra in LAMOST DR7 estimated by RRNet.
The T eff − log g distributions for the spectra with S/N in multiple intervals are shown in Fig. 5, and three MIST stellar isochrones (Dotter 2016;Choi et al. 2016) with stellar ages of 7 Gyr are added to the figure for reference.The stellar parameters estimated by RRNet are consistent with the three MIST stellar isochrones, and the distribution of T eff − log g becomes "cleaner" as S/N increases.The density distributions of [X/Fe] relative to [Fe/H] for giant and dwarf stars are shown in Figure 6.In general, the elemental abundances estimated by RRNet are more dense, especially for Si, S, Ca, and Ni.In addition, the α elements (O, Mg, Si, S, Ca, Ti) show a more obvious bimodal structure on the giant star samples.

Some comparisons with other surveys
To verify the accuracy and precision of LAMOST-RRNet catalog, we investigated the consistency of LAMOST-RRNet catalog with the SPCANet catalog, the APOGEE-ASPCAP DR16 catalog, and the GALAH DR3 catalog on the spectra from the common stars between them.Wang et al. (2020) proposed a residual-like neural network model (SPCANet) to estimate stellar parameters and elemental abundances for 1,472,211 medium-resolution spectra from Both SPCANet catalog et al. 2020) and LAMOST-RRNet catalog are computed from parameter estimation models learned from the the spectra of the common stars between APOGEE-Payne catalog and LAMOST observations.Therefore, the comparability between these two catalogs are very well.The comparing tests are performed on the LAMOST observations from 24,347 common stars.It should be noted that there are three kinds of estimations from the spectra of these common stars: the estimation of the SPCANet model, the estimation of the RRNet model, and the reference values of the APOGEE-Payne catalog.
APOGEE (Majewski et al. 2017) is a medium-high resolution (R ∼ 22500) spectroscopic survey, which uses the Sloan telescope at Apache Point Observatory in New Mexico City, USA, to achieve observations of stellar spectra.The APOGEE spectral band covers the near-infrared 1.51-1.70µm, and the stellar parameters and elemental abun- Note-The µr, σr are the mean value and the standard deviation of the difference between the two catalogs, respectively.
Compared with the SPCANet catalog (see Table 2 (SPCANet-Payne, RRNet-Payne)), LAMOST-RRNet adds the estimations for [K/H] and [Mn/H], and reduces the overall MAE by 14% on their common parameters.In addition, LAMOST-RRNet has smaller bias and standard deviation compared to the SPCANet catalog.Therefore, RRNet has better estimation performance.For elements Ti and Cu, the precision improvement of the RRNet model are not significant, which may be caused by the lack of stronger metal lines in the blue part from the LAMOST spectra.In addition, the SPCANet input spectrum has 8000 data points (fluxes), while the RRNet input spectrum has only 7200 data points (fluxes).The less inputs result in less feature space that can be extracted by RRNet than SPCANet.Compared to the GALAH and APOGEE-ASPCAP catalogs (see Table 2    APOGEE-ASPCAP, in addition to the inconsistency in RRNet model estimation, may partly originate from the APOGEE-Payne label.

Uncertainty analysis
The RRNet is able to predict the PDF of parameters more accurately through deep ensembling.The uncertainty of the predicted parameters σ pred is obtained by the PDF of the parameter estimations.In addition, in the LAMOST sky survey, some stars are observed for multiple times at various time and under different observation conditions.Therefore, we can use this phenomenon to analyze the uncertainty caused by observation errors, this uncertainty is denoted by σ obs .Suppose we have n s repeated observations {x 1 , • • • , x ns } from a source.From these observations, the RRNet gives n s estimations {y 1 , • • • , y ns } for any stellar parameter X.The standard deviation of {y 1 , • • • , y ns } is the corresponding uncertainty σ obs .
Figure 7 shows the dependencies of the uncertainty of LAMOST-RRNet catalog on S/N.The dots indicate uncertainty σ pred predicted by RRNet and the length of the line segment centered on the dots indicates the uncertainty σ obs .The lower uncertainty indicates the strong robustness and generalization ability of the RRNet model.When S/N ≥ 20, the σ pred of the parameters T eff , log g, [Fe/H], [Cu/H] are 146 K, 0.18 dex, 0.07 dex and 0.22 dex, respectively, and those of the remaining elements are 0.06 dex ∼ 0.17 dex.In addition, both σ pred and σ obs decrease with the increase of S/N.This phenomenon indicates that the σ pred and σ obs are rational indicators of uncertainty.At the same time, the changes of σ pred trend curve is mild.Therefore, the RRNet model is stable for estimating parameters from LAMOST spectra.

Test on open clusters
To further examine the accuracy of element abundance from LAMOST-RRNet, we performed more tests on open clusters.Since the stars in open clusters are produced almost simultaneously from gas clouds, open clusters are a stellar population of chemically homogeneous (Bovy 2016;Ness et al. 2018).Therefore, we performed some tests using 8,811 cluster member stars published by Zhong et al. (2020).We selected the three open clusters (Melotte 22, NGC 2682, andNGC 2632) with the largest number of matches to LAMOST-RRNet and removed parameter estimations with large uncertainties σ pred (section 4.3).
Figure 7 shows the dependencies of the elemental abundances (from LAMOST-RRNet) on T eff in the above-mentioned clusters.In agreement with Ting et al. (2019), the LAMOST-RRNet do not show any evident [X/H]-T eff trend in all three aforementioned clusters, and the chemical abundances exhibit a low standard deviation.In addition, Figure 9 shows the comparison of the chemical abundances of LAMOST-RRNet and SPCANet catalogs in the above-mentioned clusters.Overall, the standard deviation of chemical abundances from LAMOST-RRNet is slightly lower than that of the SPCANet catalog.The overall chemical homogeneity from LAMOST-RRNet on the three clusters are 0.055±0.017dex, 0.064 ± 0.022 dex and 0.047 ± 0.016 dex, respectively.These phenomena indicate that RRNet has higher accuracy compared to SPCANet.

LAMOST DR7 RRNet catalog
Finally, the LAMOST-RRNet catalog of stellar atmospheric parameters and elemental abundances for 2,377,510 medium-resolution spectra from LAMOST DR7 is made publicly available online.This catalog contains the following information: the identifier for the observation spectrum (obsid), the fits file name corresponding to the observation spectrum (filename), coordinate information (ra, dec), the extension name of the spectrum (extname blue, extname red), the S/N of the spectrum (snr blue, snr red), effective temperature (Teff[K]), surface gravity (Logg), 15 elemental abundances (XH), 1σ uncertainty of parameters (X err) and recommended flags (flag).The detailed catalog description information is shown in Table 3, and the complete parameter catalog can be downloaded from https://github.com/Chan-0312/RRNet/releases.

SUMMARY AND OUTLOOK
This paper designed a novel model Residual Recurrent Neural Networks (RRNet) for estimating stellar parameters, computed a LAMOST-RRNet catalog by estimating stellar atmospheric parameters and elemental abundances from The LAMOST-RRNet catalog for 2.37 million medium-resolution spectra from the LAMOST DR7, the source code, the trained model and the experimental data are released on: https://github.com/Chan-0312/RRNet.In the future, with the increasing of spectral observations and experienced labels, the RRNet can be re-trained to improve its performance and used to estimate stellar atmospheric parameters and elemental abundances from more survey spectra.

B. Z-SCORE RESIDUALS IN FUNCTION OF STELLAR ATMOSPHERIC PARAMETERS
To further demonstrate the accuracy of RRNet predictions across the range of parameters, Figure 11 shows the Z-score residuals (difference) between RRNet predictions and APOGEE-Payne catalog for the stellar atmospheric parameters (T eff , log g, [Fe/H]).Overall, the RRNet predictions exhibit less bias from the APOGEE-Payne catalog.This is the results that RRNet has Cross-band Belief Enhancement (CBE) capability from the recurrent module.The CBE allows the model to filter out some random noises, and enhances the spectral features based on belief propagation between the observations on different wavelengths.Furthermore, there is a slight underestimation from RRNet on log g in case of log g > 4.4 (dex).This phenomenon is consistent with the results of Wang et al. (2020), may result from the intrinsic complexity of the relationship between stellar spectra and log g, and the scarcity of training examples in this parameter range.To simplify the spectral parameter estimation problem based on the available reference set, we perform wavelength correction on the spectrum using the Radial Velocity (RV) provided by the LAMOST catalog.However, some slight uncertainties may be introduced from the uncertainty of RV.To explore the effects of RV uncertainty on the RRNet model, we defined four variants of RRNet and conducted some comparison experiments: 1) RRNet-u is trained using uncorrected spectra and its model structure is consistent with RRNet; 2) RRNet-ur is trained using uncorrected spectra similary with the RRNet-u, however, this new variant estimate one more parameter, RV; 3) RRNet-c is the method of this paper, which is trained using the RV-corrected spectra; 4) RRNet-cp is trained from RV-corrected spectra with some augumentation data generated by RV perturbations.Table 4 shows the MAE of the above four model on the validation set.
Overall, the models trained with uncorrected spectra (RRNet-u, RRNet-ur) perform less well than the models trained with corrected spectra (RRNet-c, RRNet-cp).Therefore, it is necessary to perform wavelength correction on the spectra at the scale of the current training dataset.Comparing RRNet-u and RRNet-ur, the addition of estimating one more parameter RV reduced the performance of the model.Comparing RRNet-c and RRNet-cp, data augmentation with an RV perturbation does not significantly improve the performance of the RRNet model.This may be due to the fact that the recurrent module of RRNet already has the ability to learn features across different bands.Therefore, smaller RV perturbations do not have any significant effects on RRNet.
Å and [6300, 6800] Å respectively.Ting et al. (2019) analyzed the spectra from APOGEE DR14 based on the Kurucz theoretical model, obtained a high-precision APOGEE-Payne catalog.The catalog gives stellar parameters for 222,702 stars, including T eff , log g, and 15 elemental abundances.The coverages of the stellar atmospheric parameters in the APOGEE-Payne catalog are [3050, 7950] K on T eff , [0, 5] dex on log g, and [−1.45, 0.45] dex on [Fe/H].The accuracies of the three parameters are 30 K, 0.05 dex and 0.05 dex respectively.

Figure 1 .
Figure1.A diagram of the RRNet model.The input layer is a pre-processed spectrum (section 2.2).Immediately following the input layer are Nr residual blocks.The residual blocks share a common structure and are used to reinforce the spectral features and restrain the negative effects from noises and irrelevant components based on their correlations with the parameter to be estimated.Subsequently, the spectrum is reshaped into sequence data S of length Ns, and further processed by a recurrent layer to obtain S .The recurrent module extracts spectral information by analyzing the correlations between the spectral features on various wavelength subbands and fusing their information.The final step is to establish a mapping from the spectral feature to the Probability Density Function (PDF) of stellar parameters and elemental abundances.
Fabbro et al. (2017) mentions that StarNet can be tried to apply to other spectral studies.Therefore, this work investigated its application on LAMOST DR7 by training the StarNet using the training set same with RRNet, and compared its results with our model.In addition, to verify the effectiveness of the combination of residual module and recurrent module, we constructed a model StarNet-R by adding the recurrent module same with RRNet into the StarNet, and compared the RRNet with ResNet and StarNet-R.Figure 3 shows the MAE (equation 4) of parameter prediction for RRNet, ResNet, StarNet and StarNet-R on the test set.

Figure 2 .
Figure2.The consistencies between RRNet prediction and APOGEE-Payne catalog and the robustness against data quality.The black line is the distribution curve of the inconsistencies (difference) between RRNet predictions and APOGEE-Payne catalog, and three colored curves (blue, green, and red) present the uncertainties (standard deviation) of the inconsistencies.µr and σr are respectively the mean and standard deviation of the difference, σm is the mean of the 1σ uncertainty predicted by RRNet.

Figure 4 .
Figure 4.The average partial derivatives of the stellar parameters by StarNet, StarNet-R and RRNet models over the cool and hot stars.The upper and lower panels present the average partial derivatives respectively for cool stars (T eff ∈ [4000, 4300] K) and hot stars (T eff ∈ [6000, 6300] K).Each average partial derivative is computed from 1000 randomly selected spectra in the test set.The sub-bands shaded in red indicate that the model can extract more weak spectral features with the recurrent module, and the sub-bands with borders shaded in green show that the model can perform more detailed features with the residual module.

Figure 6 .
Figure 6.Distribution of elemental abundances [X/Fe] relative to [Fe/H] from LAMOST-RRNet catalog.The left panel is for giants (log g < 4) and the right panel is for dwarfs (log g > 4).

Figure 7 .
Figure 7. Dependencies of parameter estimation uncertainties on S/N.The dots indicate the uncertainty predicted by RRNet, and the length of the line segments centered on the dots indicate the uncertainty estimated from repeated observations (> 5 times).

Figure 8 .
Figure 8.The elemental abundances (from LAMOST-RRNet) variations with T eff in the three open clusters of Melotte 22, NGC 2682, NGC 2632.The three colors in the figure correspond to the three open clusters, and the mean µ and standard deviation σ of the chemical abundances are added to each panel.The reference line is obtained by linear regression fit.

Figure 10 .
Figure 10.The loss curves of RRNet on the training set and validation set.The solid line indicates the mean of the M model losses (M = 6 in this paper), and the shaded area indicates the 1σ range of the M models.

Figure 11 .
Figure 11.Performance evaluation of RRNet in estimating three stellar atmospheric parameters (T eff , log g, [Fe/H]).In the above figure, the vertical axis presents the distribution of the z-score residuals (difference) between RRNet predictions and APOGEE-Payne catalog on test set.

Table 2 .
Some comparisons of the LAMOST-RRNet catalog with other catalogs.

Table 3 .
The description of LAMOST-RRNet catalog.LAMOST DR7 medium-resolution spectra using the RRNet.The RRNet model is trained and tested on reference data from common stars between LAMOST observation spectra and high-precision APOGEE-Payne catalog.With the trained RRNet model, we estimated the stellar atmospheric parameters, chemical abundances and corresponding uncertainties from 2,377,510 medium-resolution spectra in LAMOST DR7.In case of S/N ≥ 10, the precision of the parameters T eff , log g, [Fe/H], and [Cu/H] are 88.5K, 0.13 dex, 0.05 dex, and 0.19 dex respectively, while the precision of the other elemental abundances is 0.05 dex ∼ 0.14 dex.To verify the performance of the RRNet model, we conducted a series of comparing experiments with other neural network models and other surveys.The various experiments demonstrate that RRNet has higher accuracy, robustness and has good consistency with other surveys.

Table 4 .
Experimental results of RRNet with different pretreatment methods.The model performance is measured using the Mean Absolute Error (MAE, defined in equation 4) and computed from the validation set.RRNet-ur, RRNet-c and RRNet-cp are four variants of RRNet by considering the uncertainties of RV, and defined in Appendix C.