StellarGAN: Classifying Stellar Spectra with Generative Adversarial Networks in SDSS and APOGEE Sky Surveys

Extracting precise stellar labels is crucial for large spectroscopic surveys like the Sloan Digital Sky Survey (SDSS) and APOGEE. In this paper, we report the newest implementation of StellarGAN, a data-driven method based on generative adversarial networks (GANs). Using 1D operators like convolution, the 2D GAN is modified into StellarGAN. This allows it to learn the relevant features of 1D stellar spectra without needing labels for specific stellar types. We test the performance of StellarGAN on different stellar spectra trained on SDSS and APOGEE data sets. Our result reveals that StellarGAN attains the highest overall F1-score on SDSS data sets (F1-score = 0.82, 0.77, 0.74, 0.53, 0.51, 0.61, and 0.55, for O-type, B-type, A-type, F-type, G-type, K-type, and M-type stars) when the signal-to-noise ratio (S/N) is low (90% of the spectra have an S/N < 50), with 1% of labeled spectra used for training. Using 50% of the labeled spectral data for training, StellarGAN consistently demonstrates performance that surpasses or is comparable to that of other data-driven models, as evidenced by the F1-scores of 0.92, 0.77, 0.77, 0.84, 0.84, 0.80, and 0.67. In the case of APOGEE (90% of the spectra have an S/N < 500), our method is also superior regarding its comprehensive performance (F1-score = 0.53, 0.60, 0.56, 0.56, and 0.78 for A-type, F-type, G-type, K-type, and M-type stars) with 1% of labeled spectra for training, manifesting its learning ability out of a limited number of labeled spectra. Our proposed method is also applicable to other types of data that need to be classified (such as gravitational-wave signals, light curves, etc.).


Introduction
Past decades have witnessed the rapid development of large spectroscopic surveys, e.g., the Sloan Digital Sky Survey (SDSS; York et al. 2000), the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST; Cui et al. 2012), and other massive surveys like the European Space Agency Gaia mission (Gilmore et al. 2012), and forthcoming Vera Rubin Observatory (Ivezić et al. 2019).Large spectroscopic surveys achieved high-resolution (R 20,000), high signal-tonoise ratio (S/N) spectra for 10 5 -10 6 stars, for which deriving the stellar labels are of extreme importance.For instance, the observed stellar spectra are valuable resources for the discovery of astronomically intriguing stars, which increases the statistical significance of proper spectra classification methods.Currently, large astronomical surveys commonly observe stellar spectra across a wide range of spectral types.This underscores the importance of developing automatic spectral classification processes.This issue has been extensively discussed in the framework of expert computer programs (Gray & Corbally 2014), and there are also numerous studies proposing other automatic stellar classification methods.
The Morgan-Keenan (MK) classification (Morgan & Keenan 1973;Luo et al. 2015) of stellar spectra categorizes stars based on their spectral characteristics and temperature, subdividing them into seven main categories: O, B, A, F, G, K, and M. Each category is further divided based on finer details of temperature and spectral features, allowing detailed classification of stars.Although the MK classification can effectively improve the efficiency of stellar spectral classification by automatically selecting the optimal band combination, it does not fully utilize the broad spectral features of stars (Gray & Corbally 2014), such as the derived temperature, surface gravity, and chemical abundance.Covey et al. (2007) developed an algorithm for identifying point sources whose colors differ significantly from those of normal stars.Based on seven-dimensional color distance (Covey et al. 2007) calculating a point source's minimum separation from the stellar locus in a seven-dimensional color space, their results showed that different types of stellar spectra can be identified according to the physical parameters.Regarding the MK classification, not only are the volume and S/N are of concern for large data sets, but so is the appearance of rare spectral features that do not fall into the classification templates.These cases become more frequent in large data sets and there are also multiple studies highlighting this issue (Wei et al. 2013).
As a robust automatic method for data classification, machine learning has achieved considerable success in deriving the labels of stellar spectra, using the mappings from a training set to predict stellar labels for the observed spectra.Bailer- Jones & Lamm (2003) proposed an improved data analysis method for infrared photometry, offering solutions for enhanced sensitivity and reliability in future infrared variability searches.Transitioning toward pattern classification, Liu et al.Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
(2016) made significant strides in adaptive imputation, addressing the challenge of missing values.Furthermore, Zhang et al. (2020) offered an innovative approach to derive stellar labels from LAMOST spectra using the Stellar LAbel Machine (SLAM).Due to its unique ability to leverage support vector regression for handling highly nonlinear problems, SLAM provides unprecedented precision for a large spectral range.Meanwhile, the random forest (RF)-a valid astronomical target classification method-shows its potential in multiwavelength data classification (Gao et al. 2009).For instance, Jones et al. (2017) extracted the infrared spectrum characteristics of the point source and used a decision-tree algorithm to realize stellar classification.The idea of using pattern recognition algorithms to provide parallel and hierarchical classification frameworks for the data set from the European Space Agency's Gaia mission was discussed in Bailer- Jones & Lamm (2003).The RF is also applied to predict specific active galactic nucleus subclasses via the detected gamma-ray spectral properties, combined with support vector machine (SVM; Hassan et al. 2013).In order to reduce the cost of computation, the spectra should be preprocessed by principal component analysis (PCA) or locally linear embedding (LLE) before being classified by SVM (Jiang et al. 2013).Machinelearning approaches, based on artificial neural networks (ANNs) that emulate the human brain's information-processing techniques, are incrementally being implemented in astronomical data processing (Liu et al. 2021).While the data are propagating into deeper layers, more abstract high-level data representation will be extracted and the essential features can be obtained out of the original data (Kuntzer et al. 2016).
As a cornerstone of machine learning, supervised learning provides a powerful tool to classify and process data using machine language.In supervised learning, the labeled training data are used to infer a learning algorithm, which is subsequently applied to predict the classification of other unlabeled data.An excellent demonstration of this concept can be found in Singh et al. (1998), which successfully implemented an automated spectra classification method utilizing convolutional neural networks (CNNs).This novel approach surpassed traditional shallow machine-learning techniques both in accuracy and generalization.Recent analysis proved that helium-burning red giant stars could be effectively separated from the rising red giant stars (at an accuracy of 99%) with 1D CNNs (Hon et al. 2017).In astronomy, deep feature extraction methods based on convolutional networks (Li et al. 2022) and autoencoder networks (Hinton & Salakhutdinov 2006) are also widely used as general tools for spectra analysis.Based on a deep neural network trained with a fast layer-wise learning algorithm, Wang et al. (2017) demonstrated the superior performance of such a scheme on LAMOST spectral data, both in spectral classification and defective spectra recovery.However, it is worth nothing that the supervised methods demand a large data set with labeled true types.Recent efforts have been made to apply deep neural networks in stellar spectral processing.Navarro et al. (2012) proposed an automatic spectral classification process using ANNs, which allows the classification of stellar spectra with a considerably low S/N.More recently, Villavicencio-Arcadia et al. (2020) presented the application of deep neural networks on the LAMOST spectra, yielding reliable and accurate spectral classification over the LAMOST data.
The generative and discriminative views have been blended to create a novel machine-learning framework known as a generative adversarial network (GAN).The GAN architecture consists of two "adversarial" models: a generator (G) that creates data, and a discriminator (D) that evaluates if data comes from G or real training data (Goodfellow et al. 2014).The GAN has been used as the foundation for a number of semisupervised deep-learning (SSL) and unsupervised-learning techniques.For example, a series of works explored GANbased SSL methods and showed applicability to skin and heart disease image classification (Madani et al. 2018;Yi et al. 2018).Although several GAN-based techniques have demonstrated potential in the semisupervised classification of 2D images, they cannot be directly applied to star spectra classification due to their 1D spectral structure.Interestingly, recent studies also shifted toward the application of GANs on 1D data, particularly in the realm of time-series data generation (Donahue et al. 2018;Huang et al. 2018).For instance, the bidirectional long short-term memory-CNN GAN model was proposed to align with existing clinical records while preserving patient privacy (Zhu et al. 2019).In astronomy, GANs also demonstrate great potential in spectral denoising and data augmentation for improved astronomical object classification.In particular, they are employed to generate synthetic light curves from variable stars, thus enhancing the classification accuracy when training with synthetic data and testing with real data (García-Jara et al. 2022).A feed-forward neural network called Spectra Generative Adversarial Nets (Spectra-GANs) outperforms traditional methods such as PCA, wavelet analysis, and restricted Boltzmann machines in solving spectral denoising challenges (Wu et al. 2020).The variational autoencoder (VAE) model, another type of generative model, is more effective in analyzing complex astronomical data (such as high-resolution galaxy spectra from SDSS).It also outperforms PCA in reconstructing the SDSS spectra with fewer latent parameters (Portillo et al. 2020).The generation of realistic light curves of periodic variables is achieved using a physics-enhanced VAE model, offering a new methodology for creating synthetic time series with varying cadences (Martínez-Palomera et al. 2022).Concurrently, a dimensionality reduction is accomplished for high-resolution galaxy spectra from the SDSS using VAEs, unveiling compact and interpretable latent spaces while outperforming traditional techniques like PCA (Portillo et al. 2020).Furthermore, the noise that hinders gravitational-wave (GW) detectors, specifically glitching mimicked GW signals, is tackled using GANs to learn and generate artificial populations of glitches (Lopez et al. 2022).Additionally, the DVGAN, a unique three-player Wasserstein GAN, has shown its potential in simulating timedomain signals, emphasizing the generation of smoother signals that resonate closely with actual data (Dooney et al. 2022).
In this paper, we present a new and robust 1D semisupervised classification scheme (StellarGAN) that allows us to classify spectra with only 1% of labeled spectra for training.The first (to the best of our knowledge) proposal of this approach could be traced back to Hippel et al. (1994), based on a supervised back-propagation algorithm.The main advantage of the present paper is that we employ a network first pretrained with unlabeled spectra and then fine-tuned with a small number of labeled spectra.Therefore, the information embedded in both unlabeled and labeled stellar spectra would be derived simultaneously.The paper is organized as follows.In Section 2, we give a brief introduction to the spectral data used in this paper.In Section 3, we provide a detailed description of our StellarGAN scheme. 6The experimental results and feature visualization are presented in Section 4. Finally, we summarize the conclusions in Section 5.

Observations of Spectra to be Classified
In order to perform comparative experiments, we need to define different data sets to be used.The yearly data releases from SDSS, milestones in the field of astronomy since the first release, provide a robust statistical foundation for our studies.
The SDSS started routine operations in 2000 April, with the original goals of obtaining imaging in five broad bands over 10,000 deg 2 of high-latitude sky, as well as spectroscopy of 1,000,000 galaxies and 100,000 quasars (York et al. 2000;Abazajian et al. 2004).The optical quasar catalog of SDSS enables the Sloan Digital Sky Survey Quasar Lens Search (SQLS), which has provided a large statistical lens sample apposite for cosmology study (Cao et al. 2012a(Cao et al. , 2012b;;Cao & Zhu 2012).In this study, we utilize data from the seventh data release of SDSS (DR7), also marking the end of the second phase of SDSS (SDSS-II).In addition to completing the original SDSS science goals, SDSS-II has also carried out extensive stellar spectroscopy of 460,000 stars (Abazajian et al. 2009).Such data could also be used to probe the chemical evolution, stellar kinematics and substructure, covering the distances from the solar neighborhood to ∼100 kpc.In this paper, we attempt to assess the accuracy of stellar spectra classification on a subset of SDSS-II stars with a wide variety of spectral types (A-type, F-type, G-type, K-type, M-type, O-type, and B-type stars).
The fourth phase of SDSS (SDSS-IV) includes three main surveys: the Extended Baryon Oscillation Spectroscopic Survey, Mapping Nearby Galaxies at Apache Point Observatory (APO), and the APO Galactic Evolution Experiment 2 (APOGEE-2; Blanton et al. 2017).APOGEE-2 is the only near-infrared spectroscopic survey to investigate the composition and dynamics of stars in the Galaxy, based on multiplexed high-resolution spectrographs (Majewski et al. 2017).In order to build up the S/N for faint stars in APOGEE fields, the multiple-visit spectra are combined into one spectrum for each star (which is called the "visit combination").In this paper, we turn to the complete release of the APOGEE-2 survey from SDSS DR17, which contains infrared spectra of over 650,000 stars (Abdurro'uf et al. 2022).We restrict our analysis to a subsample of stars selected on the basis of a series of APOGEE flags (A-type, F-type, G-type, K-type, and M-type stars).In particular, such a choice of the stellar spectral types could helpfully contribute to the differentiation between dwarfs and giants and their respective proportions.
Following the reduction process described in Nidever et al. (2015), the 3D SDSS/APOGEE raw data are reduced into well-sampled, combined, sky-subtracted, telluric-corrected, and wavelength-calibrated 1D spectra.In the first stage, the individual spectra of each star on a plate is derived from the raw spectra of consecutive, spectrally dithered exposures of one visit.The second stage includes dark subtraction, flat-fielding, wavelength and flux calibration, and removal of sky emission and absorption within the Earth's atmosphere.Finally, the individual spectrally dithered exposures are combined into a single spectrum for each star.The reduced spectra of the SDSS and APOGEE are available through the Science Archive Server. 7The information about the observations and the stellar parameters determined from the spectra can also be found on the SDSS website. 8We randomly select 59,240 unlabeled stellar spectra and 107,800 labeled stellar spectra from the full SDSS sample.Similarly, 21,810 unlabeled and 28,200 labeled stellar spectra are selected from the APOGEE sample.In Figure 1 we plot an example of the observations from both SDSS and APOGEE, in order to illustrate the differences of wavelength coverage and features of different stellar-type spectra.The SDSS spectra trained by our model cover the wavelength range of 3700-9100 Å (with a resolution of R ∼ 1800), while the APOGEE data have a wavelength range of 15100-15600 Å (with a resolution of R ∼ 22,500).Note that the spectral absorption lines contain important information to enable us to perform accurate classification.For O-type and B-type stars with higher stellar surface temperatures, hydrogen is completely ionized and the fraction of the hydrogen atoms excited to the n = 3 level increases very rapidly.The lines of helium (He), which originate from levels much higher in energy than the ionization potential of hydrogen, take over the spectra of these hottest stars.For A-type stars, prominent hydrogen Balmer lines (H) dominate the visual region of the spectra, due to the strong absorption out of the n = 2 level of hydrogen.For F-type and G-type stars with lower temperatures, their spectra will be dominated by H lines and the increase of neutral metal lines (Ca, Fe), generated by various trace elements with lower energy levels.For the coolest K-type and M-type stars, the absorption lines from neutral metals (Fe, Ca, and Mg) will dominate the spectral appearance, with the survival of molecules from these species in the stellar photosphere.
To achieve the ambitious goal of performing accurate classification of spectra with different S/Ns, a large number of stellar spectra with a wide range of S/Ns are needed.The S/N distributions for both SDSS and APOGEE are shown in Figure 2. In our case, more than 90% of the SDSS spectra has an S/N < 50, and 90% of the APOGEE sample has an S/N < 500.Therefore, the selection of SDSS and APOGEE samples as experimental data is beneficial due to diversified S/N, enhancing the generalization capability of machinelearning algorithms.In the preprocessing of stellar spectra for AI-driven analyses, standardization is paramount to accelerate neural network convergence (LeCun et al. 2002).This is achieved by applying the formula -, where x represents a spectral data point, μ is the mean of the data set, and σ is its standard deviation.

Classification Algorithms and the Methodology
The prevalent classification tasks primarily utilize two paradigms: traditional machine-learning methods such as SVM (Hassan et al. 2013), RF (Gao et al. 2009), and deep neural network learning.Most deep neural network methods mainly use the CNN architecture (Hershey et al. 2017), including feature extraction operators such as convolution, pooling, and normalization, and require a large number of parameters to learn data features.The common feature of traditional machine-learning methods is that they do not require a large number of parameters to participate in model training.For instance, SVM constructs a hyperplane or a set of hyperplanes in a high-dimensional or infinite-dimensional space, which can be used for classification tasks.
The aforementioned algorithms are generally used as supervised classification approaches, using labeled data for model training.In this analysis, we focus on GAN, a prevalent SSL network, outstanding in its ability to exploit "hidden" structures within unlabeled data to enhance learning from the labeled segment.This becomes crucial, since the real-world data, including stellar spectrum classification, are often only partially labeled due to prohibitive costs and time required for full labeling.Meanwhile, the lack of labeled data for rare stellar spectra also limits the performance of classification models.SSL methodologies provide a solution to such issues.Our paper specifically adopts a generative model, framing an underlying distribution from unlabeled data to supplement labeled data learning.12: Update the classifier by increasing its stochastic gradient.13: end For

The StellarGAN Model
In this section, we first elucidate the training process of GANs, followed by an in-depth discussion of the unique attributes of our proposed model, StellarGAN.The procedure of training and fine-tuning StellarGAN is shown in Algorithm 1.
The training of GANs involves a systematic alternation between the training of the generator network (denoted as G) and the discriminator network (denoted as D) (Goodfellow et al. 2014).Initially, the generator G produces a fixed output.This output is optimized by maximizing the discriminator Dʼs score, as described by where x represents real data and z denotes the noise vector that inputs G. Subsequently, the G's training process is completed by minimizing the following function In the early stages of learning, it is easy for D to separate the real data from the fake data generated by G.However, as the training advances, the synthetic data output by the network begins to more closely mimic genuine data.In the event that the generated data successfully deceives the discriminator network D, one could expect that the GAN training process has converged.
Our StellarGAN is built on a GAN structure which allows for both generating stellar spectra learning and identifying stellar spectra from real observation.As is shown in Figure 3, the proposed network consists of a generator and a discriminator subnetwork.We first use the generator to generate fake stellar spectra with the same dimensions as the real spectrum, as described in Equation (2).Then, the real spectrum and the generated spectrum are fed into the discriminator to learn and recognize whether it is a real spectrum or a generated spectrum, as described in Equation (1).The G utilizes ConvTranspose1d ) activations, culminating in a sigmoid-activated output layer.We incorporate a batch normalization (BN) layer to normalize the activation of preceding layers (Ioffe & Szegedy 2015).Through the application of BN, the mean activated values consistently approximate 0 (with the standard deviation of 1) (Ioffe & Szegedy 2015).The details of the network structure of the generator and the discriminator are shown in Table 1.As can be seen in Figure 3 and Table 1, we propose the 1D models G and D by adopting and adapting the CNN architecture to stellar spectra.
For the generator network G, with the input of a 100dimensional random noise vector drawn from a uniform distribution  ( ) , the dimension of its output should be equal to that of the authentic stellar spectra.Therefore, we utilize ConvTranspose1d to rescale the data to the same dimension (2000) as the authentic spectra.The generated fake spectrum is sent to the input layer of D. As for the discriminator network D, we feed into it both authentic unlabeled spectra and fake spectra generated by G to pretrain.
Upon completing the training of G and D, D initially functions as a binary classifier during the pretraining phase, using a sigmoid function in its classification layer to assess the authenticity of stellar spectra.After the convergence of this pretraining, the generator network G is removed, and D is transformed into a multiclass classifier.To facilitate K-category star classification tasks (e.g., five or seven categories), we modify the neuron count in the network's classification layer from 1 to K and replace sigmod with softmax.This adjustment, detailed in the 15th layer of Table 1, enables the network to differentiate and categorize inputs into the specified number of star categories.The likelihood of a spectrum being attributed to a particular category is determined according to the method described in Chen et al. (2016).
Supposing o l K Î  is the output of the last layer of D network, the softmax function is expressed as For D we use the cross entropy loss function as The schematic structure of StellarGAN in the pretraining phase.The StellarGAN models architecture comprises two essential stages, each contributing uniquely to its training.Initially, in the pretraining stage, noise vectors serve as inputs, channeled into the generator (G), culminating in the creation of synthetic spectra.These generated entities are distinctly labeled "0," signifying their nonauthentic origins.In succession, these synthetically generated spectra, along with the genuine spectra (labeled "1,") are directed toward the discriminator (D), fostering its ability to distinguish synthetic from authentic spectra.Progressing to the second stage, fine-tuning is pivotal.Herein, D transitions into a sophisticated multiclassification network, seamlessly integrating parameters acquired from the preliminary pretraining stage.
Note.The prefixes G-or D-represent the generator and discriminator layers.
where Ŷ represents the predicted probability of the input stellar's class, while Y denotes the stellar's actual class label.In summary, there are three loss functions of StellarGAN: Equations (1), ( 2), and (4), where (1) and (2) are the loss functions of the first stage of training, and (4) is the second stage loss function designed for multiclass classification.

How to Train StellarGAN
We summarize the overall training and fine-tuning process as follows.In the pretraining phase, StellarGAN learns the flux features from unlabeled data.Subsequently, in the fine-tuning training phase, the labeled stellar spectra are divided into the fine-tuning set and the test set.In the pretraining phase, we use both the unlabeled stellar spectra and spectra fabricated by G to train both D and G, with well-trained models automatically saved.In the fine-tuning phase, the annotated data set utilized is detailed in Table 2, wherein it is partitioned into a training set and a test set for the purposes of fine-tuning.Specifically, 1%, 10%, and 50% of the total annotated data set are used for finetuning the pretrained D, with the remaining spectra forming the test set.It is necessary to highlight that the data engaged during the fine-tuning stage differ from that used in the pretraining phase.
We use Pytorch library to build StellarGAN.In both pretraining and fine-tuning stages, the stochastic gradient descent optimizer method is employed (Jin et al. 2014).We run this method for 100 epochs in pretraining and 200 epochs in fine-tuning, each time using a learning rate of 10 −4 .In our experiments utilizing 10% of the SDSS data set, the training time required for the pretraining phase is approximately 8 hr, while the fine-tuning phase is completed in roughly 3 hr.The above experiments are performed on an Intel Core i5-4590 3.3 GHz CPU with 20 GB random access memory and a Titan X GPU.

Experimental Results and Analysis
In Figure 4, the spectra generated by StellarGAN during its pretraining phase are presented.These spectra, which were produced at various epochs, highlight the capability to evolve and refine its generative power as it continues training on the unlabeled data set.It is revealed that during the initial training iterations, the model has not yet learned to capture salient spectral features.This results in spectra that predominantly exhibit random noise (see Figure 4(a)).However, as the model undergoes additional training iterations, its capacity to approximate the inherent characteristics of authentic stellar spectra improves markedly.The generated data increasingly resemble authentic stellar spectra, as depicted in Figures 4(b)-(d).This indicates that the generator G has effectively learned the underlying data distribution.The convergence of the model over time is further substantiated by the loss curve presented in Figure 5.It demonstrates how well G performs in generating synthetic spectra that resemble the true data distribution.Initially, the loss is relatively high as G starts its attempts with naive synthetic outputs.As training progresses, the generated spectra are becoming increasingly similar to the real ones, thereby making it more challenging for D to differentiate.This graphical representation also offers a comparative view of the adversarial dynamics between the generative and discriminative components throughout the training process.The discriminative network's loss curve quantifies Dʼs ability to distinguish genuine samples from the synthetic ones produced by G.In the early phases of training, a rapid decline in loss can be observed, signaling that D is quickly learning to classify the synthetic spectra generated by a yet unrefined G.As Gʼs generative capability enhances, a stabilization in Dʼs loss curve becomes evident.This plateauing suggests the onset of a Nash equilibrium, a state where neither network can realize further advancements without corresponding modifications to its counterpart.The network parameters after D pretraining would be taken as initial parameters for spectral identification.
We used a precision metric (P) to quantify the performance of our classification model.P is denied as the ratio of true positive (TP) predictions to the total number of positive predictions made by the model.We evaluate two scenarios: TP where actual positives are rightly predicted, and false positives (FP), representing misclassified negatives.For example, when F-type spectra are taken as a positive sample, other types of spectra (such as G-type and K-type) are viewed as negatives.Now the precision P and the recall R can be expressed as FN stands for false negatives, which represents the number of positive instances incorrectly classified as negative.In this work, the classification performance of StellarGAN is assessed with the F1-score (which ranges between 0 and 1) To assess the capability of the model to classify across diverse categories, we display the confusion matrices for multiple algorithms on the SDSS data set in Figure 6.Due to the scarcity of samples for O-and B-type stars, all models exhibit comparatively diminished performance for these two spectral classes.Notably, StellarGAN consistently surpasses the performance of other algorithms in our evaluations.As depicted in Figure 7, when trained with only 1% of the data set, StellarGAN exhibits enhanced precision and expedited convergence speed compared to the CNN model.The CNN considered in our analysis corresponds to a D network that has not undergone pretraining.Meanwhile, the mean F1-scores of SVM, RF, PCA, LLE, CNN, and StellarGAN, trained using fine-tuning sets (1%, 10%, and 50%) on the SDSS and APOGEE data sets, are shown in Table 3 and Table 4.The overall scores are derived as the mean of scores across all categories in the test set, corresponding to each varied proportion of the data set used for training.For the network training, we implement a k-fold cross validation method with k set to 10.It should be noted that within these fine-tuning sets, 90% is allocated for training purposes, while the remaining 10% is used for validation.In our comparative analysis, we initially implement both PCA and LLE to reduce the dimensionality of the stellar spectrum from 2000 dimensions to 800.Subsequently, the features obtained from this dimensionality reduction are fed into an SVM classifier for the purpose of categorization.In order to make a fair algorithm   only 1% labeled data, which is 0.15 higher than SVM, 0.13 higher than RF, and 0.14 higher than CNN on the same training set.Benefiting from heuristic knowledge provided by unlabeled spectra and learned in the pretrained D, StellarGAN generalizes well on the unseen test data after being fine-tuned by a few labeled data.In contrast, the learning ability of SVM and RF is confined by the limited labeled data and the features embedded in unlabeled spectra are neglected.We remark here that our method requires a substantial volume of data but only a small volume of labeled ones compared to other approaches.The first step of StellarGAN consists of unsupervised training over the entire data set, then a supervised step over a small training set.
In the case of APOGEE, StellarGAN still excels compared to other models, to a different extent, which could be seen from the average F1-scores of different models in Table 4. Furthermore, given that the volume of data from SDSS during the fine-tuning phase surpasses that of APOGEE, the model exhibits superior performance on the SDSS data set.
In order to explore the impact of the S/N on the classification of StellarGAN, we display the relation between S/N and classification accuracy in Figure 8.In our evaluation (model trained on 10% of the total data set), we use the SDSS and APOGEE data set with varying S/Ns to assess the performance of StellarGAN.The experimental results reveal a direct correlation between S/N and the F1-score.Incorporating the data distribution illustrated in Figure 2, Figure 8 demonstrates lower F1-scores at reduced S/N.This trend is likely attributable to a confluence of factors: the increased noise inherent in lower-S/N scenarios and the diminished volume of data available for fine-tuning.To elucidate the influence of the number of convolutional layers in the model on its performance, we performed ablation studies specifically targeting the convolutional layers within the D. Figure 9 delineates the relationship between the number of these layers in D and the corresponding F1-score.Given the richer data volume in the SDSS data set compared to the APOGEE data set, StellarGAN exhibits superior performance on the former.We experimented with convolutional layers ranging from three to seven.There is a discernible uptrend in the F1-score as we augment the number of layers, but after reaching a count of five, the score plateaus.To strike an optimal balance between model performance and computational efficiency, we have elected to set the number of convolutional layers at five.

Feature Visualization
In this subsection, inspired by the network-in-network approach (Lin et al. 2013), we aim to provide an intuitive explanation of how different regions in the stellar spectrum influence the final classification.In our research, we employed the gradient-weighted class activation mapping technique (Selvaraju et al. 2017) with the discriminator D to visualize the model's focus on various wavelengths.A forward pass is conducted using the stellar spectra as an input, enabling the computation of gradients with respect to the feature maps in the final convolutional layer of network D. Following this, global average pooling (Lin et al. 2013) is performed on these gradients to determine the weights of the feature maps.We then generate a localization map by combining the weighted feature maps, apply ReLU activation, and normalize this map.
In Figure 10, we present the truncated spectra obtained from SDSS and APOGEE alongside the generated heat map for various types of stars.The color shading indicates the Notably, in a majority of equivalent epochs, StellarGAN outperforms, demonstrating superior F1-scores that underscore its enhanced precision and recall capabilities.

Table 3
The F1-score of SVM, RF, CNN, and StellarGAN Trained by 1%, 10%, and 50% of Labeled Data in SDSS contribution of different wavelengths in determining a specific type of stellar spectra.It can be seen from the heat map that the absorption lines of the stellar spectra play an important role in distinguishing different stellar types.For instance, the hydrogen Balmer lines at the wavelength of ∼4000 Å have a great influence on the model classification of F-type and G-type spectra.For K-type spectra, when the absorption lines are not obvious (Mg at ∼5200 Å and Na at ∼6000 Å), their characteristics in this range have little influence on spectral classification.Such an approach enables a more detailed exploration of stellar populations and their spectral properties.

Conclusions
In this paper, we proposed a novel 1D StellarGAN for stellar classification and explicitly visualized the stellar spectral features it learned, in order to manifest its representational learning ability out of a limited number of labeled spectra.Furthermore, we evaluated the performance of StellarGAN on the observational data from both the SDSS and APOGEE surveys.Here we summarize our main conclusions as follows: 1.The 1D StellarGAN consists of a generator G that endeavors to generate a real stellar spectra distribution, and a discriminator model D that distinguishes whether the stellar spectra are authentic or generated ones.The training of StellarGAN should undergo both the pretraining and fine-tuning phases.In the pretraining phase, it learns the features from unlabeled data to accumulate heuristic knowledge of stellar spectra.Afterwards, the pretrained D is modified to a multiclass classifier to be fine-tuned on a small number of labeled spectra (1%, 10%, or 50% of labeled spectra randomly selected from the full sample).The performance of the fine-tuned D is   tested by the rest of the labeled spectra (i.e., the testing set).Upon utilizing only 1% of the labeled spectra for training, our findings demonstrate that StellarGAN's performance on the SDSS data set significantly outpaces conventional machine-learning approaches.
2. We employ heat maps as a visual tool to effectively illustrate the learning outcomes of the model.Visual analysis through the heat map clearly indicates that StellarGAN exhibits varying degrees of attention to identical wavelengths across different types of stellar spectra.3. Our findings indicate that in certain scenarios, the G might not yield optimal data owing to unstable training phases, exemplified by the incomplete convergence of G in Figure 5. Consequently, viable solutions could include employing advanced techniques like W-GAN (Arjovsky et al. 2017).In our future research endeavors, we aim to investigate the influence of the generative model's stability on the overall classification performance.4. Our proposed method could be used not only for stellar spectra, but could also be applicable to many other types of data that need to be classified.This also opens the way to detecting unusual spectra in future classifications and demonstrates that it is possible to train StellarGAN to identify peculiar objects.Furthermore, we anticipate that the methodology we have proposed could be effectively adapted to classify other forms of data, such as GW signals, light curves, and similar data sets, broadening its applicability in various scientific domains.

Figure 1 .
Figure 1.Examples of spectra observed by SDSS and APOGEE.The left panels cover different types of stars from SDSS (from top to bottom: O, B, A, F, G, K, and M types).The right panels cover different types of stars from APOGEE (from top to bottom: A, F, G, K, and M types).Prominent hydrogen Balmer lines (H) are visible for A-type stars, while H lines and neutral metal lines (Ca, Fe) are prominent for F-type and G-type stars.The absorption lines from neutral metals (Fe, Ca, and Mg) dominate the spectra of K-type and M-type stars.

Algorithm 1 .
The procedure of training and fine-tuning StellarGAN.Here D(sigmoid) represents the discriminator network with sigmoid activation as the last layer for binary classification, while D(softmax) denotes the discriminator network with softmax activation for multiclass classification.1: Train G and D(sigmoid).2: for number of training epoches do 3: Unlabeled spectra data set: Fake stellar spectra from p g .5: Update the D by ascending its stochastic gradient.6: Update the G by descending its stochastic gradient.7: end For 8: Save D and G model.Replace the last activation layer of D(sigmoid) with softmax, with the generation of D(softmax) for multiclass classification.9: Fine-tune D (softmax).10: for number of fine-tuning epoches do 11: Sample minibatch of M labeled samples {

Figure 2 .
Figure2.Signal-to-noise ratio (S/N) distribution of SDSS spectra (left panel) and APOGEE spectra (right panel).A total of 107,800 labeled and 59,240 unlabeled data are selected from SDSS, while 28,200 labeled and 21,810 unlabeled data are selected from APOGEE.The blue and orange histograms denote the labeled and unlabeled data, respectively.

Figure 4 .
Figure 4. generated spectra from StellarGAN trained on an SDSS data set.Panels (a)-(d) display the data generated by G at different epochs, with each spectra maintaining a consistent dimensionality of 2000.It can be seen from the figure that the spectrum generated by G is random noise at the beginning.As the number of epochs increases, the G can generate data similar to the spectrum.

Figure 5 .
Figure 5.The training loss curves of G (blue solid line) and D (red solid line) in StellarGAN.In StellarGAN training, observing the D loss indicates its improving accuracy in discerning real from generated samples.Concurrently, the G loss oscillation around 10 epochs exemplifies the adversarial adjustment in response to D's refinements, portraying the typical adversarial dynamic where G continuously strives to enhance data generation quality against D's discriminations.

Figure 6 .
Figure 6.The comparison of confusion matrices for different algorithms on the SDSS data set.During the fine-tuning process, a random subset comprising 10% of the labeled data set is utilized for training, while the remaining 90% serves as the test data.

Figure 7 .
Figure7.Comparative analysis of loss curves and F1-scores for StellarGAN and CNN using the SDSS data set, with only 1% of the data set employed in the training process.Utilizing the SDSS data set, StellarGAN, benefitting from pretraining, exemplifies accelerated convergence relative to CNN in terms of loss evolution.Notably, in a majority of equivalent epochs, StellarGAN outperforms, demonstrating superior F1-scores that underscore its enhanced precision and recall capabilities.

Figure 8 .
Figure8.The performance of StellarGAN on the SDSS data set (left) and the APOGEE data set (right) evaluated across different S/Ns.It is noticeable that there is a correlation between the S/N and the models performance on the SDSS and APOGEE data sets.Specifically, an increase in S/N is associated with an enhancement in the model's performance, elucidating the models sensitivity to the quality of the input data.

Figure 9 .
Figure 9. Variation of F1-score with convolutional layer count.During the fine-tuning phase, a random subset comprising 10% of the labeled data set is employed for training, while the rest serves as the test set.The depicted blue curve captures the trajectory of StellarGAN's F1-score on the SDSS data set as a function of convolutional layers.Similarly, the red curve charts out the performance trend of StellarGAN on the APOGEE data set against the convolutional layer count.

Figure 10 .
Figure 10.Comparative heat map of SDSS and APOGEE data sets.The left panels cover different types of stars from SDSS (from top to bottom: O, B, A, F, G, K, and M types), and the right panels cover different types of stars from APOGEE (from top to bottom: A, F, G, K, and M types).Color variations represent the model's varying attention on specific spectral wavelengths.

Table 1
The Structure of StellarGAN

Table 2
The Data Set from SDSS and APOGEE Note.The SDSS data set includes a diverse range of seven distinct star types, encompassing a total of 107,800 individual stars.In contrast, the APOGEE data set focuses on five specific star types, with an aggregate count of 28,200 stars.
comparison, we use the same number of training and test sets for SVM, RF, PCA, LLE, CNN, and StellarGAN.Furthermore, StellarGAN utilizes distinct data sets for pretraining and finetuning phases.Our experiments demonstrate that StellarGAN could achieve much higher classification accuracy on the SDSS data sets, compared with other supervised-learning techniques in astronomical spectral processing, i.e., SVM, RF, and CNN.More specifically, Table 3 reveals that StellarGAN attains the highest average F1-score on an SDSS data set (0.63) utilizing

Table 4
The Comparison of the F1-score Obtained by Different Algorithms on the APOGEE Data