Metallic-line Stars Identified from Low-resolution Spectra of LAMOST DR5

, , , , , , , , and

Published 2019 May 29 © 2019. The American Astronomical Society. All rights reserved.
, , Citation Li Qin et al 2019 ApJS 242 13 DOI 10.3847/1538-4365/ab17d8

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0067-0049/242/2/13

Abstract

The Large Sky Area Multi-Object Fibre Spectroscopic Telescope data release 5 (DR5) released more than 200,000 low-resolution spectra of early-type stars with a signal-to-noise ratio  > 50. The search for metallic-line (Am) stars in such a large database and a study of their statistical properties are presented in this paper. Six machine-learning algorithms were experimented with using known Am spectra, and both the empirical criteria method and the MKCLASS package were also investigated. Comparing their performance, the random forest (RF) algorithm won, not only because the RF has high successful rate, but because it can also derive rank features. Then the RF was applied to the early-type stars of DR5, and 15,269 Am candidates were picked out. Manual identification was conducted based on the spectral features derived from the RF algorithm; 9372 Am stars and 1131 Ap candidates were compiled into a catalog. Statistical studies were conducted including temperature distribution, space distribution, and infrared photometry. The spectral types of Am stars are mainly between F0 and A4 with a peak around A7, which is similar to previous works. With the Gaia distances, we calculated the vertical height Z from the Galactic plane for each Am star. The distribution of Z suggests that the incidence rate of Am stars shows a descending gradient with an increasing $| Z| $. On the other hand, Am stars do not show a noteworthy pattern in the infrared band. As the wavelength gets longer, the infrared excess of Am stars decreases, until there is little or no excess in W1 and W2 bands.

Export citation and abstract BibTeX RIS

1. Introduction

As a class of chemically peculiar (CP) stars, metallic-line (Am) stars show weaker Ca ii K lines and enhanced metallic lines in their spectra than normal A-type stars. They were first described by Titus & Morgan (1940) and were formalized into the MK system by Roman et al. (1948). Conti (1970) gave a more detailed definition of Am stars, describing the its nature. These stars, in whose atmosphere present an underabundance of the calcium (or scandium) elements and/or the overabundance of iron-group elements, are defined as the Am stars. According to the above definition, Conti (1970) divided Am stars into three subgroups, which are stars with both weak Ca ii K lines and strong metallic lines, stars with only weak Ca ii K lines, and stars with only strong metallic lines. The common point of the three subgroups of Am stars is that the spectral subtype of the Ca ii K line is earlier than the metallic lines.

Am phenomenon is quite common among A- and early F-type main-sequence stars. The incidence of Am stars exceeds 30% (Smith 1971; Abt 1981; Gray et al. 2016). Such a high incidence has attracted the attention of many researchers. How do Am stars evolve? What characteristics do they exhibit in multiple bands? Are they pulsating? What is their pulsation mechanism? Due to the limitation of the number of known Am stars, many questions do not have satisfactory answers. For example, Smalley et al. (2017) used 864 Am stars to study pulsation and metallicism. Abt (2017) researched Am stellar evolution on the basis of 462 Am stars. Chen et al. (2017) conducted an infrared photometric study of 426 known Ap and Am stars. Balona et al. (2015) investigated light variations of 29 Am stars. However, the numbers of Am stars used in the above studies are still too few for their incidence, and the lack of Am stars has become a bottleneck in understanding Am phenomenon.

After the first catalog of the CP stars (Renson et al. 1991), Renson & Manfroid (2009) collected about 4000 Am stars (or probable) from a large number of the literature and presented another catalog of CP stars, in which 116 stars have been well studied. According to an empirical separation curve (ESC) derived from the line index of the Ca ii K line and nine groups' Fe lines, Hou et al. (2015) found 3537 Am candidates from the Large Sky Area Multi-Object Fibre Spectroscopic Telescope data release 1 (LAMOST DR1). This is the first search work of Am stars in a large database of low-resolution spectra. However, Am stars and normal stars cannot be distinguished in their marginal region simply using a separation curve alone. In the following year, Gray et al. (2016) employed the MKCLASS program to classify spectra in the LAMOST-Kepler field and totally obtained 1067 Am stars with hydrogen line types ranging between A4 and F1. The MKCLASS is a spectral classification software that performs the classification through mimicking humanlike reasoning, but it was designed for spectra with a common type and high quality and sometimes will not succeed on low-quality or rare objects.

A number of of sky survey projects, such as the Radial Velocity Experiment (RAVE; Steinmetz et al. 2006), LAMOST, the Sloan Extension for Galactic Understanding and Exploration (SEGUE; Yanny et al. 2009), and Gaia-ESO (Gilmore et al. 2012), etc., collecting a massive number of stellar spectra provide us with opportunities to search for Am stars. Traditional astronomical research methods, such as manual operation and human identification, are no longer sufficient. Many machine-learning algorithms have been used in the analysis of astronomical data because of their ability to efficiently search and to recognize certain type of stars. In this paper, we intend to search for Am stars using machine-learning methods. Compared with various sophisticated classification algorithms, the random forest (RF) algorithm wins both in its success rate and efficiency. In addition, we still need a manual inspection step to guarantee correctness of Am stars obtained by machine learning since the metallic lines in low-resolution spectra are very weak and are easily affected by noise. Therefore, we adopt the RF algorithm to design a classifier, and manual examination is used to further check the results.

A key issue is to rule out the contamination by Ap stars, which are also belong to a class of CP stars. Only one or a few elements (including silicon, chromium, strontium, and europium) are extremely enhanced in their stellar atmosphere. Since some Ap stars also exhibit abundance characteristics of Am stars (Smith 1996; Romanyuk 2007) to a certain degree, the obtained Am data set may contain a small amount of Ap stars. The largest difference between Am and Ap stars is that Ap stars have intense magnetic fields. However, it is difficult to distinguish between Am and Ap stars in low-resolution spectra without spectral features caused by the magnetic effect. In this work, we only can label some spectra with an extreme abundance of elements, such as silicon, chromium, strontium or europium. Since those elements are also generally enhanced in Am stars (Fossati et al. 2008a; Gebran et al. 2008), we need a follow-up analysis with high-resolution spectra to identify whether they are Ap stars or Am stars.

A large sample of Am stars from a single survey without instrumental or processing differences is useful for statistical study. In this paper, we searched for Am stars in LAMOST DR5 using machine-learning methods in conjunction with manual inspection. The paper is organized as follows. In Section 2, we describe the data sets used in this study and data preprocessing steps. In Section 3, we compare various classification methods and show the advantage of the RF algorithm in searching for Am stars, and give the result of Am through manual checking, as well as possible Ap stars. In Section 4, we conduct some physical statistical analyses for the Am stars. Then, a discussion is presented in Section 5. Finally, we summarize this work in Section 6.

2. Data

2.1. LAMOST Data

LAMOST is a reflecting Schmidt telescope with a 4 m effective aperture and 5° field of view. Four thousand fibers are mounted on the focal plane that enable it observe 4000 objects simultaneously. The telescope is dedicated to a spectral survey over the entire available northern sky and is located at the Xinglong Observatory of Beijing, China (Cui et al. 2012; Luo et al. 2012, 2015; Zhao et al. 2012). Compared with the Sloan Digital Sky Survey's spectroscopic observations, the LAMOST survey is more concentrated on the Galactic disk where more young stars exist and is very advantageous to search for Am stars.

By the end of the first five-year regular survey, LAMOST has obtained 9,017,844 spectra of stars, galaxies, and quiet stellar objects (QSOs) with spectral resolutions of R ≈ 1800, wavelength coverage ranging from 370 to 900 nm, and magnitude limitation that is about r ≈ 17.8 mag for stars. The total number of stellar spectra reaches 8,171,443, making it a gold mine waiting to be exploited. All the above numbers can be found in the LAMOST spectral archive.5

Before searching for Am stars, we limited the search scope to a certain range through the following conditions:

  • 1.  
    Because spectral type of Am stars are generally A- and early F-type, we limited the search to only A-, F0-, F1-, and F2-type stars in LAMOST DR5, whose spectral types come from the LAMOST 1D pipeline.
  • 2.  
    The spectral features of Am stars mainly appear in the blue wavelength band, and their metallic lines are relatively weak and susceptible to noise, thus we only retained spectra with the signal-to-noise ratio of the g band greater than 50 (a signal-to-noise ratio ≥ 50).
  • 3.  
    To ensure the accuracy and stability of classification, we eliminated some spectra containing zero flux in the blue spectra.

More than 10% objects have been repeatedly observed multiple times by LAMOST. For such targets, we only retain the spectra with the highest signal-to-noise ratio. Finally, we obtain 193,345 stellar spectra as our searching data set. We need labeled data to form the training and testing data sets, and all the labeled data are also handled using the three operations above, except for Evaluation Set II.

2.2. Labeled Data

In order to train and test the classifier, we collected known samples of Am stars and non-Am samples. We first selected Am samples with confidence greater than 0.5 and all non-Am stars from the work of Hou et al. (2015) and then removed those close to the experience separation curve to ensure the purity of the Am positive and the non-Am negative sample sets. This yielded 1805 Am stars as positive samples. For the non-Am stars, further screening was conducted using the MKCLASS software. We randomly chose the same number of non-Am stars as negative samples, and these positive and negative samples were distributed on average in the training and test sets. Finally, we obtained 1806 training samples and 1804 test samples.

In addition, we also chose some Am stars from Gray et al. (2016) and Renson & Manfroid (2009) as evaluation sets in order to evaluate the performance of a variety of methods. For 1067 Am stars presented by Gray et al. (2016), we applied the pretreatment process in Section 2.1. Then, according to K-line and metallic-line spectral subtypes, those samples were classified into classical Am stars and marginal Am stars. The former, the K-line type, is earlier than the metallic-line type for at least five spectral subtypes. For the latter, the difference between the two types is less than five spectral subtypes (Morgan et al. 1978; Gray & Corbally 2009). We obtained 357 classical Am stars and 76 marginal Am stars as Evaluation Set I. The 116 known Am stars from the catalog of Renson & Manfroid (2009) were crossmatched with LAMOST DR5, and only 4 counterparts were found, which were comprised in Evaluation Set II. All labeled data sets are summarized in Table 1.

Table 1.  The Labeled Data Sets

Data Sets Number of Samples Function Data Source
Training Set 1806 Training classifiers Hou et al. (2015)
Test Set 1804 Internal test classifiers Hou et al. (2015)
Evaluation Set I 433 External evaluating performance of methods Gray et al. (2016)
Evaluation Set II 4 External evaluating performance of methods Renson & Manfroid (2009)

Note. The first column lists the name of the data set. The second column contains the number of samples in the data set screened out by various criteria. The third column is the functional description of these data sets in this work. The last column lists the sources of these data sets.

Download table as:  ASCIITypeset image

3. Searching for Am Stars

3.1. Input Feature Selection

According to the characteristics of the underabundance of Ca elements and the overabundance of Fe-group elements in the atmosphere of an Am star, Hou et al. (2015) classified Am stars using the ESC, which is derived from line indices of the K line and nine groups' Fe lines. We used Evaluation Set I to evaluate the classification ability of the ESC, as shown in Figure 1, in which there are 357 stars labeled as classical Am stars (green dots in the figure) and 76 stars labeled as marginal Am stars (blue dots). We found that only 345 classical and 52 marginal Ams were judged as Am stars by the ESC (red curve), i.e., the recall through the ESC is 0.966 for classical Am stars and 0.684 for marginal Am stars, respectively.

Figure 1.

Figure 1. Overview of classification performance of the ESC for classical and marginal Am stars. The yellow area indicates the range of EW(Ca ii K) $\in [1.5,6.5]$ Å. The red curve is the ESC. Those points in the yellow area above the curve are classified as Am stars by Hou et al. (2015). The classification accuracy is 0.966 for classical Am stars and 0.684 for marginal Am stars.

Standard image High-resolution image

Obviously, the ESC based on the line index method is slightly inadequate for distinguishing marginal Am stars from normal early-type stars. This is mainly because the chemical peculiarity of the marginal Am stars is much weaker than the classical Am stars, and the difference between the marginal Am stars and the normal early-type stars is smaller than that of the classical Am stars. In addition, some spectral lines, such as Fe lines, are weaker, and the batch calculation of those line indices will lead to large errors, which will reduce the recall rate of marginal Am stars.

Based on the above reasons, we decided to use the fluxes of spectra as the input features of the classifier. Considering that the spectral line characteristics of Am stars are more densely concentrated in the blue wavelength range than the red, we choose the normalized fluxes in the wavelength range between 3800 and 5600 Å as the input values of the classifier model.

3.2. Input Feature Normalization

Before selecting a classification model, we must remove the influence of pseudocontinuum on the classifier. This is the key to successfully distinguish Am stars. We improved the fitting technology of pseudocontinuum (Lee et al. 2008) by applying the automatic identification operation of strong lines. The details of this procedure are as follows. Step 1: the wavelength of all spectra was truncated from 3800 to 5600 Å. Step 2: a ninth-order polynomial was used to fit each spectrum, and those points outside 3σ away from the fitted function were masked including the strong spectral lines, cosmic rays, and sky emission residual from data reduction. Step 3: a ninth-order polynomial was used to iteratively fit spectra, where points more than 3σ below the fitted function were rejected. The purpose is to find the approximate upper envelope of each spectrum as its pseudocontinuum. Step 4: the pseudocontinuum was removed from each spectrum by dividing the observed spectrum by the pseudocontinuum. The intensity of each spectrum was rectified using this method.

Figure 2 shows the results obtained with the improved method. One can see from Figure 2 that the pseudocontinuum is well removed from the spectrum.

Figure 2.

Figure 2. Example of a rectified flux. The top panel shows the truncated original spectrum, and the red line shows the fitted continuum. The red line shows the fitted continuum obtained with the procedure in Section 3.1. The bottom panel shows the rectified flux. Note that the emission line in the spectrum would not affect the continuum fitting.

Standard image High-resolution image

3.3. Classifier Selection

We selected six sophisticated classification algorithms from scikit-learn web: K-nearest neighbors (KNN), support vector classification (SVC), a Gaussian process (GP), a decision tree (DT), the RF, and Gaussian naive Bayes (GNB). According to the parameter values recommended by the website, we trained these classifiers on the training set separately and tested their performance on the test set. We then performed two external evaluations for the first three winning classifiers and compared them to the previous methods of searching for Am stars, such as the ESC and MKCLASS software, and finally chose the RF algorithm as the classifier in our work.

3.3.1. Classifier Evaluation Criteria

We used precision, accuracy, recall, and the F1 score as the criteria to evaluate the classifiers. The four evaluation criteria are defined as follows:

Since the test set is composed of labeled samples, it is easy to judge whether the classifier's classification results are correct on the test set. TP is the number of true positive samples that are correctly classified as Am stars by the classifier. FP is the number of false positive samples that are misclassified as Am stars by the classifier. Similarly, TN is the number of true negative samples that are correctly classified as non-Am stars by the classifier, and FN is the number of false negative samples that are misclassified as non-Am stars by the classifier. Precision is the fraction of true positive samples among the set of Am classified by the classifier. Accuracy measures the fraction of samples that are correctly classified in the entire set. Recall measures the fraction of Am that are correctly classified over the total amount of Am. F1 score is the harmonic mean of precision and recall.

3.3.2. Internal Testing

The samples in the training set and the test set come from Hou et al. (2015), among which the positive samples (Am star) were labeled by the ESC while the negative samples (non-Am) were labeled by both the ESC and MKCLASS software. According to the catalog of Hou et al. (2015), the positive samples in the training set consisted of 490 (54.3%) classical Am stars and 413 (45.7%) marginal Am stars without considering the uncertainty of the spectral subtypes of the K line and metallic lines. Through the aforementioned classifiers trained on the training set, their classification performance on the test set are listed in Table 2. This table is ordered in terms of the F1 scores. Clearly, the first three classifiers (the GP, KNN, and RF) show better performance during the internal test.

Table 2.  Performance of the Classifiers for Test Set

Classifier Test Set Classical Am Marginal Am non-Am
  Precision Accuracy Recall F1 Accuracy F1 Accuracy F1 Accuracy
GP 0.998 0.996 0.994 0.996 1.000 1.000 0.986 0.993 0.998
KNN 0.998 0.995 0.992 0.995 1.000 1.000 0.980 0.990 0.998
RF 0.991 0.987 0.982 0.987 0.998 0.999 0.957 0.978 0.991
SVC 0.997 0.977 0.958 0.977 0.971 0.985 0.937 0.968 0.997
DT 0.973 0.962 0.950 0.961 0.982 0.991 0.900 0.947 0.973
GNB 0.893 0.911 0.935 0.913 0.995 0.970 0.840 0.913 0.888

Note. Here, we only show the accuracy and F1 in the classical Am set and marginal Am set because the recall equals to the accuracy and the precision equals 1 for positive samples. Meanwhile, only the accuracy is listed in the non-Am set because the precision, the recall, and F1 are all equal to zero for negative samples.

Download table as:  ASCIITypeset image

We also divided the test set into three subsets—classical Am, marginal Am, and non-Am—and tested the performance of the classifiers on them separately. The detailed information is listed in Table 2. As can be seen from the table, the first three classifiers also have good classification performance for the marginal Am stars.

3.3.3. External Evaluation I

The Evaluation Set I comes from Gray et al. (2016), which consists of 357 classical and 76 marginal Am spectra, and were labeled by the MKCLASS software. We tested the classification performance of the GP, KNN, and RF and compared them with the ESC using the data set. These results are shown in Table 3. For comparison purposes, the table is ordered in terms of the F1 scores. It can be seen from Table 3 that the classification performance of the RF is more stable than that of other machine-learning algorithms. In addition, the classification ability of the RF is also more prominent than the ESC method for both classical and marginal Am stars.

Table 3.  Performance of Classifiers for Evaluation Set I

Classifier Classical Am Stars Marginal Am Stars
  Accuracy F1 Accuracy F1
RF 0.978 0.989 0.789 0.882
GP 0.972 0.986 0.724 0.840
ESC 0.966 0.983 0.684 0.812
KNN 0.966 0.983 0.671 0.803

Note. Here, we only show accuracy scores because accuracy equals recall and precision equals one. The table is sorted in descending order according to F1 scores.

Download table as:  ASCIITypeset image

3.3.4. External Evaluation II

Evaluation Set II was used to compare the RF algorithm with the MKCLASS software. The samples in Evaluation Set II come from the catalog of Renson & Manfroid (2009), in which only four counterparts can be found in LAMOST DR5. The RF algorithm and the MKCLASS package were used to classify the four well-studied Am stars, and the results are listed in Table 4. These stars were also recognized as Am in some literature based on analyzing the abundance of chemical elements, and these literature are also listed in Table 4 for reference. The RF algorithm classified these four stars as Am Stars, which is consistent with the results from the literature. However, the MKCLASS software can only classify the star HD 73818 out of the four stars as an Am star. Obviously, the RF classifier is a more suitable tool for searching for Am stars. After all, the MKCLASS software is not a specially developed software for Am stars.

Table 4.  The Results of the RF and the MKCLASS with the Evaluation Set II

HD Number LAMOST FITS File Name RF MKCLASS Spectral Type of References
HD 108486 spec-55959-B5595905_ sp01-168.fits Am A2 IV-V SrSi Am(1, 7, 8)
HD 108642 spec-55959-B5595905_ sp05-141.fits Am A1 IV SrSi Am(1, 7, 8)
HD 108651 spec-55959-B5595905_ sp01-134.fits Am A7 mA0 metal weak Am(6, 7, 8)
HD 73818 spec-57392-KP083141N185915V01_ sp13-195.fits Am kA6hF1mF2 Eu Am(2, 3, 4, 5)

Note. The first column indicates the identifiers of the stars in the HD catalog. The second column lists the names of FITS files of the LAMOST counterparts. The next two columns show the results of the RF algorithm and MKCLASS software, respectively. The last column gives the literature that identify the four stars as Am based on element abundance. Articles numbered 1, 2, 3, 4, 5, 6, 7, and 8 correspond to Gebran et al. (2008), Fossati et al. (2008a, 2008c, 2008b, 2007), Iliev et al. (2006), Monier & Richard (2004), and Burkhart & Coupry (2000), respectively.

Download table as:  ASCIITypeset image

For visual inspection of the four Am stars, we plot their spectra and corresponding normal stellar templates with the same spectral types given by H lines in Figure 3. The best-matching Kurucz template (Castelli & Kurucz 2003) for each spectrum was obtained through cross correlation. The black one in each panel is the normalized Am spectrum, while the red one is the best-matching template. One can see that the Balmer lines of the four Am spectra fit well with their best templates, but the strength of the K lines are weaker than that of their templates; on the other hand, the metallic lines show just the opposite. This is in line with the characteristics of the first subgroup of Am stars with weak K lines and strong metallic lines.

Figure 3.

Figure 3. Comparison between the Am spectra and their best-matching templates in Evaluation Set II. The black curves show the Am spectra, and the red curves are the best-matching templates. The name of the Am stars and the atmospheric parameters are also listed next to their corresponding curves in black and red. The blue, red, and green vertical dashed lines indicate the positions of the Ca ii K line, Balmer lines, and some Fe lines from Hou et al. (2015), respectively. All spectra and templates are normalized, and the templates have been offset vertically by 0.6 continuum units for clarity. There is an abnormally strong absorption line at around 4480 Å of panel (d), which was caused by bad charge-coupled device pixels present in the raw data. The bad pixels were not removed by the data reduction pipeline.

Standard image High-resolution image

Compared with ESC, MKCLASS, GP, KNN, SVC, DT, and GNB methods, the RF algorithm is the best choice to search for Am stars. After obtaining Am candidates using the RF algorithm, an eyeball check was conducted comparing with the best-matching templates.

3.4. RF-based Classifier

The RF algorithm a one-of-a-kind of bagging algorithm in ensemble learning. N training samples are randomly selected from the original sample set using the Bootstrapping method with replacement, and K training sets are obtained by K-round extraction. The K training sets are independent of each other, and elements can be duplicated. The K decision tree models are trained on the K training sets and vote to produce classification results.

The number of decision trees, K, is a key parameter in the RF algorithm: the larger the number of decision trees, the better the classification results and the longer time consumption. After multiple attempts, we used 1800 as the value of the number of decision trees as well as the number of input features. The remaining parameters were set to the default values.

One advantage of the RF algorithm is that it can be used to evaluate the importance of each feature. Figure 4 shows the importance and accumulative importance of all features. The importance decreases sharply with the number of features and is almost negligible after number 300. The first 300 features play important roles in classification, and their accumulated importance reaches 91.2%. Figure 5 shows the distribution of the first 300 features in a spectrum. Those features basically fall on the absorption lines of Ca ii K, H, and transition metal elements, which are considered to be very important elements for distinguishing Am stars.

Figure 4.

Figure 4. Level of importance and accumulated importance as a function of the feature number. The x-axis is the feature number, and the left y-axis shows the scale of the red solid line, indicating level of the importance of each feature. The right y-axis shows the scale of the blue dashed line, representing the level of the accumulated importance as a function of the feature number.

Standard image High-resolution image
Figure 5.

Figure 5. Overview of the distribution of the first 300 features. The cyan and magenta vertical dashed lines indicate the locations of the Ca ii K line and the H lines, respectively. Red dots indicate the first 50 features. Blue stars indicate the next 50 features. Green squares indicate features numbering from 100 to 300.

Standard image High-resolution image

We identified the spectral lines where the first 50 feature points are located by consulting the line table (Moore et al. 1966) and other early-type stars literature (Smith 1973; Coupry et al. 1986; Adelman 1994; Przybilla et al. 2017). The details are listed in Table 5. We only listed the main elements contained in spectral lines since metal lines in low-resolution spectra are mostly blended lines. Note that feature numbers and importance are not absolute. Different RF algorithm classifiers will produce different results because the data used by RFs in constructing each decision tree are randomly selected from the training set. Fortunately, their ranking and importance do not change much for most of the important features. It should also be noted that the wavelengths of the features are all in a vacuum because the spectra of LAMOST are all converted to a vacuum wavelength, and you can find the relevant keyword "VACUUM" in the FITS header of the spectra.

Table 5.  Identification of Relevant Spectral Lines Based on the Location of the First 50 Feature Points

ID Name Vacuum Wavelength Importance
1, 11, 24 Ba ii, Fe i 4555.292, 4557.394 2.121, 1.61, 1.187
2, 3, 18, 26 Fe i 4921.871 2.050, 2.021, 1.289, 0.997
4, 16, 36 Fe i, Ni i 5478.086, 5478.432 1.882, 1.295, 0.741
5, 9, 38 Fe i 4384.766 1.87, 1.701, 0.729
6, 10 Fe ii, Cr i, Co i 5277.457, 5277.526, 5277.629 1.853, 1.620
7, 17, 19 Sr ii 4078.849 1.797, 1.289, 1.279
8, 33, 43 Fe i 5234.395 1.77, 0.828, 0.621
12, 13 Fe i, Ni i 4715.671, 4715.720 1.579, 1.551
14, 46 Fe i, Ni i 5372.982, 5372.82 1.543, 0.576
15 Fe i, Fe ii 4523.775, 4523.885 1.506
20, 23, 49 Fe i 4133.207 1.273, 1.197, 0.564
21, 22 Fe i, Ba ii 4935.391, 4935.456 1.232, 1.21
25 Fe i 4178.763 1.174
27 Fe i 5416.703 0.964
28, 40 Fe i 4145.021 0.949, 0.638
29 Fe i, Ni i 5100.113, 5101.343 0.916
30, 35 Fe i 5407.276 0.892, 0.778
31 Fe i 5425.576 0.871
32 Fe i 4669.36 0.855
34 Fe ii 5318.086 0.800
37 Fe i 5228.634 0.736
39 Mn i, Fe i 4824.844, 4825.473 0.642
41 Fe ii 4925.288 0.636
42 Fe ii 5019.834 0.633
44 Ca ii K 3934.767 0.592
45 Fe i 4272.952 0.580
47 Fe i, Ni i 5080.631, 5081.94 0.570
48 Fe i 5431.203 0.566
50 Fe i 4199.4 0.560

Note. The first column lists the feature ID. In order to make the table more concise, features that fall on the same absorption line are placed in the same entry. The second column shows the name of the line in which feature points are located. The third column lists the vacuum wavelength corresponding to each spectral line. The fourth column shows the importance of the corresponding feature determined with the RF algorithm.

Download table as:  ASCIITypeset image

3.5. Manual Inspection

Three reasons require manual inspections of the Am candidates obtained using the RF method. First, the intensities of metal lines in Am spectra are very weak and are easily masked by noise, which would lead to errors in the results. Second, although the spectra were rectified, the residual continua still could affect the classification. Third, the precision of the RF algorithm is 0.991, which means there is still a small fraction of stars that might be wrongly recognized.

The specific process of manually inspection is to compare the spectral lines of those candidates with their best-matching synthetic template. A set of quantitative standards is as follows:

where Kspe and Kmod are the Ca ii K lines of a spectrum and its matching template, respectively. Mspe and Mmod are metallic lines of a spectrum and its matching template, respectively. EW(·) is the equivalent width of a spectral line.

In this work, we adopted the same EW definition of the Ca ii K line as Liu et al. (2015), the line is in the window of [3927.7–3939.7 Å], and the blue and red sidebands are in [3903–3923 Å] and [4000–4020 Å], respectively. For metallic lines, conventional EW calculation is not suitable because the Fe absorption in A-type stars is generally weak and too narrow to give wavelength bands that EW needs. So we had to use the method proposed by Hou et al. (2015) to calculate their equivalent width. We selected part of the Fe i lines listed in Table 5 by eliminating several Fe i lines that blended with ionized elements. We merged adjacent Fe lines into 15 Fe-group lines and listed the left ends and the right ends of these groups in Table 6. To calculate the EW, sidebands for blue and red are defined in [Left_End-5 Å, Left_End] and [Right_End, Right_End+5 Å], respectively. We limited the sidebands to 5 Å to get the best local pseudocontinuum for each of the 15 Fe-group lines, avoiding affection by other lines. Figure 6 shows an example of a local pseudocontinuum of 15 Fe-group lines.

Figure 6.

Figure 6. Local continuum spectrum of 15 Fe-group lines.

Standard image High-resolution image

Table 6.  Wavelength Ranges of 15 Fe-group Lines

Number Center Band
  Left End (Å) Right End (Å)
1 4128 4138
2 4142 4146
3 4171 4192
4 4194 4207
5 4268 4278
6 4383 4390
7 4668 4671
8 4714 4718
9 5079 5087
10 5097 5104
11 5225 5240
12 5362 5375
13 5405 5420
14 5424 5439
15 5477 5485

Note. All values are in vacuum wavelengths.

Download table as:  ASCIITypeset image

Although there are some uncertainties in calculating the equivalent width in batches, such as noise interference in Fe lines or mixing of other metal lines in the Ca ii K line wing, the line ratio of the Ca ii K to or can be verified as important criteria during manual inspection.

3.6. Labeling Ap Stars

There is some contamination by Ap stars when obtaining Am candidates because of similar spectral features. Most of the Ap stars are actually B-type stars, while a small portion of Ap stars are also found in A- and F-type stars. Among those cooler Ap stars, some also exhibit the characteristics of the overabundance of Fe elements and the underabundance of Ca elements. Therefore, a small amount of Ap stars will be mixed with Am stars.

According to the definition of an Ap star, we thought of objects whose Sr, Cr, Eu, or Si element are extremely abundant as Ap candidates. In general, the abundance of Sr, Cr, Eu, and Si in Am stars rarely exceeds 2.0 relative to the abundance of the Sun (Lane & Lester 1987; Smith 1996; Catanzaro & Ripepi 2014; Catanzaro et al. 2015). Therefore, stars with Sr, Cr, Eu, or Si element abundance exceeding 2.0 are likely to be Ap stars. The detailed method of finding Ap candidates is as follows. First, according to the prominent spectral lines in the blue-violet spectra of Ap stars (Gray & Corbally 2009), and excluding some lines with nearby Fe i lines or other line interference, we selected the 4077 Å line as a reference line, which is a blending line of Sr ii, Cr ii, and Si ii in LAMOST spectra. Second, we generated corresponding synthetic templates by setting the values of [Sr/H], [Cr/H], [Eu/H], and [Si/H] as 2.0 in the spectrum generator SPECTRUM. Stellar parameters that Kurucz model required, such as Teff, $\mathrm{log}\,g$, and [Fe/H], were taken from the corresponding Am models that have a normal abundance of the above-mentioned heavy elements. Finally, we use the method of Hou et al. (2015) to calculate the EW(tem) and EW(obs) of the 4077 Å blend line for both the templates and observed Am spectra and marked those objects as Ap candidates if their EW(obs) > EW(tem).

3.7. Result

For 193,345 spectra of the searching data set described in Section 2.1, we simply fitted with Kurucz templates in the wavelength range of [3900 Å, 5600 Å] to obtain Teff and $\mathrm{log}\,g$. With the stellar parameters, we can further constrain the searching data set with 6500 K < Teff ≤ 11,000 K and $\mathrm{log}g\,\geqslant \,4.0$ dex, since the Am phenomenon often occurs in A- and early F-type main-sequence stars. By this constraint, 98,202 spectra were retained, and then the RF classifier was applied to identify 15,269 Am candidates, for which we carried out manual inspections. After these inspections, we discarded 4766 spectra, among which 1338 candidates (28%) do not meet the reference criteria of Section 3.5; 2585 spectra (54%) cannot be recognized by human eyes because of their small peculiarity; and 843 spectra (18%) are of bad spectral quality and are insufficient to identify the Fe line. In addition, using the method described in Section 3.6, we found 1131 objects with an extreme overabundance of Sr, Cr, or Si elements and labeled them as Ap candidates in the catalog. Whether or not they have the nature of Ap needs to be identified by a subsequent analysis of their magnetic field strength. In total, the catalog has 10,503 entries, including 9372 Am stars and 1131 Ap candidates. In the statistical analysis section below, we excluded them from Am stars.

For each Am star in the catalog, we also determined three different spectral subtypes of its K line, H lines, and metallic lines using the template-matching method. The band used to match the spectral subtype of metallic lines is the combined band of [4140 Å, 4300 Å], [4410 Å, 4600 Å], and [4900 Å, 5400 Å]. The matching band for H lines is a combination of the , , and bands. It is worth noting that the spectral subtype obtained by matching with templates in the specific wavelength ranges alone are not completely accurate, and the spectral subtypes of some Am stars do not conform to the common characteristics of Am stars, i.e., the K-line spectral subtype is earlier than the metallic-line spectral subtype. This is why we cannot use this criterion for Am search directly.

The complete catalog of identified Am stars can be downloaded in the online journal and from  http://paperdata.china-vo.org/Qinli/2018/dr5_Am.csv; the example catalog is presented in the Appendix.

4. Statistical Analysis

4.1. Effective Temperature Distribution

We analyzed the effective temperature distribution for these Am stellar samples. Figure 7 shows that the distribution of Am stars and the incidence of Am stars in different effective temperature bins. As can be seen from Figure 7, the results are consistent with the conclusion presented by Smalley et al. (2017)—namely that the temperature of Am stars is mostly distributed between 7250 K (F0) and 8250 K (A4), peaking near 7750 K. Due to our strict screening, the fraction of Am stars to the total A- and early F-type stars is smaller than the values reported in previous studies (Smith 1971; Abt 1981; Gray et al. 2016). The incidence of Am stars we give can be used as the lower limit.

Figure 7.

Figure 7. Distribution and incidence of Am stars as a function of the effective temperature. The y-axis on the left indicates the distribution of Am stars, and the y-axis on the right is the frequency of the occurrence of Am stars. The red points is the incidence of Am stars in each bin; they are displayed to the left of each bin.

Standard image High-resolution image

4.2. Space Distribution

The spatial distribution of Am stars in the Galactic coordinate plane is plotted in Figure 8. The blue points indicate all A- and early F-type stars, and the red points are for Am stars. As shown in Figure 8, the number of Am stars on the Galactic disk is significantly higher than that in other regions because most of the observations of LAMOST are concentrated on the Galactic disk, especially in the anti-center direction. The LAMOST Spectroscopic Survey of the Galactic Anti-center (LSS-GAC), which covers Galactic longitudes of $150^\circ \,\leqslant {\ell }\leqslant 210$ and latitudes of $| b| \leqslant 30^\circ $, is a unique component of LAMOST Experiment for Galactic Understanding and Exploration spectroscopic survey (Luo et al. 2015).

Figure 8.

Figure 8. Spatial distribution of Am stars on the Galactic coordinate plane. The blue and the red dots correspond to all early-type stars and Am stars.

Standard image High-resolution image

In order to further understand the spatial distribution of Am stars, we analyze the frequency of occurrence of Am stars as a function of the vertical distance from the Galactic plane (Z). We obtained the parallax (ω) of most spectra by crossmatching with Gaia DR2. For spectra with a parallax ≥0, we then crossmatched with the catalog in Bailer-Jones et al. (2018) and got their estimated distances. Eventually, we obtained the distances of 92,870 early-type stars and 8951 Am stars. The vertical distance Z for each star can be calculated with the following formula:

where b is the Galactic latitude, and r is the estimated distance. In Figure 9, the blue and green histograms show the distribution of early-type stars and Am stars along the vertical distance, $| Z| $, respectively. In each bin, we calculated the incidence of Am stars and represented them as red points. Figure 9 suggests that the incidence of Am stars increases as $| Z| $ decreases.

Figure 9.

Figure 9. Distribution and incidence of Am stars as a function of the vertical distance from the Galactic plane, Z. The red points on the left to each bin are the incidence of Am stars, and we did not compute the incidence for the bins in which there are less than 10 early-type stars because othere is no statistical meaning.

Standard image High-resolution image

4.3. Infrared Photometry

We also performed an infrared photometric analysis on these Am stars. First, we crossmatched them with Two Micron All-Sky Survey (2MASS) and Wide-field Infrared Survey Explorer (WISE) catalogs with a matching radius of 3.0 arcsec and obtained the corresponding magnitudes of the J, H, K, W1, W2, W3, and W4 bands. Because the WISE satellite has low angular resolutions of 6.1, 6.4, 6.5, and 12.0 arcsec in the W1, W2, W3, and W4 bands, we often found that one 2MASS counterpart is a different object to the WISE counterpart. In order to avoid this, multiple WISE sources within a search radius of 10.0 arcsec were eliminated. In order to improve the accuracy of the result, the photometric errors were limited to less than 0.1 mag in all three 2MASS bands (Skrutskie et al. 2006) and 0.05 mag in the W1 and W2 bands (Wright et al. 2010). Then, the color excess of all data in the color (a − b) are estimated using the formula: E(a − b) = R(a − b× E(BV), where E(BV) values are given in Schlafly & Finkbeiner (2011), and the R(a − b) values are reddening coefficients of the color (a − b) from Yuan et al. (2013). We calculated color excess and dust reddening of 7799 Am stars for the ($J-H$), ($H-K$), and (W1W2) colors. Finally, our conclusions are shown in Table 7. One can see a very clear downward trend in the incidence of infrared excess from the near-infrared to mid-infrared, and the incidence reduces to 0.15% in the W1W2 region.

Table 7.  Results of an Infrared Excess Analysis

Criterion Number of Am Stars Incidence
J − H > 0.1 1652 21.68%
H − K > 0.1 163 2.09%
W1 − W2 > 0.1 12 0.15%

Download table as:  ASCIITypeset image

This is in contradiction with the conclusion about Am stars from Chen et al. (2017). They found that over half of Am stars have clear infrared excess ($(W1-W2)\gt 0.1$) in the W1W2 region and have little to no infrared excess in the remaining regions, including J, H, K, and IRAS regions. We checked the data set from Chen et al. (2017) and found that they do not restrict the photometric precision to $W{1}_{\mathrm{error}}\,\lt $ 0.05 mag and $W{2}_{\mathrm{error}}\lt 0.05$ mag. When we add this constraint, there are only three sources in Chen's Am data set with infrared excess. Thus, we statistically conclude that Am stars have no infrared excess in the W1W2 region.

5. Discussion

Generally, [Fe/H] is often used to relatively represent the metallicity of a star. However, compared to normal stars, the atmosphere of an Am star is Fe-enriched and Ca-deficient. The metallicity of an Am star obtained with conventional methods may be larger than the true value. Taking into account that Am stars comprise a significant fraction of early-type stars, researchers should take care regarding the metallicity of Am stars given by pipelines for spectral surveys especially for statistical study.

In order to understand the degree of metallicity overestimation in Am stars, we analyzed the metallicity distribution of Am stars with LAMOST atmospheric parameters. The metallicity given by the LAMOST pipeline (Luo et al. 2015) causes Am stars as a whole to be biased toward metal enrichment relative to normal early-type stars. A detailed statistical metallicity distribution ([Fe/H]) is shown in Figure 10. The blue histogram shows the distribution of [Fe/H] for all A- and early F-type stars with the LAMOST atmosphere parameters. The yellow correspond to the Am stars with LAMOST atmosphere parameters. Note that the right region of the Figure 10 is dominated by Am stars. The conclusion that most metal-rich stars are Am stars is obviously unreasonable.

Figure 10.

Figure 10. Distribution of metal abundance in early-type stars and Am stars. The metallicity parameters were taken from the LAMOST catalog.

Standard image High-resolution image

6. Summary

Eight classification methods (GP, KNN, RF, SVC, DT, GNB, ESC, and MKCLASS) were compared in this study. The RF algorithm is chosen to search for Am stars among early-type stars in LAMOST DR5, and 15,269 Am stars candidate are obtained. We analyzed the top 50 classification features given by the RF classifier: the total importance of which reached 57.57%. We recognized these spectral lines in which RF classification depends on. These lines mostly are iron elements and were used to identify Am stars in the manual inspection step. In addition, we also compared the difference between Am and Ap stars and labeled Ap candidates in the final catalog. Finally, we found 9372 Am stars and 1131 Ap candidates and provided an Am star catalog. We performed a statistical analysis of the temperature distribution, spatial distribution, and infrared photometry for these Am stars. The distribution of the effective temperature shows that Am stars are mainly concentrated between F0 and A4, with a peak near A7, which is consistent with previous works. The spatial distribution suggests that the frequency of the occurrence of Am stars is inversely related to the vertical distance from the Galactic plane ($| Z| $). We also conducted an infrared photometric study for Am stars. We noticed that the incidence of infrared excess in Am stars gradually reduces from the near-infrared to mid-infrared range.

We would like to thank the referee for their valuable comments. This work is supported by the National Basic Research Program of China (973 Program, 2014CB845700), China Scholarship Council, the National Natural Science Foundation of China (grant No. 11390371), and the Joint Research Fund in Astronomy (U1531119) under cooperative agreement between the National Natural Science Foundation of China and Chinese Academy of Sciences.

Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope, LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences.

Appendix: The Catalog of Am Stars

The catalog has 10,503 entries, including 9372 Am stars and 1131 Ap candidates. Only the top 10 entries are shown in Table 8. The complete Am catalog can be downloaded from the online journal and from  http://paperdata.china-vo.org/Qinli/2018/dr5_Am.csv. The first column show the name of the FITS file of each spectrum. The next two columns are R.A., and decl. of J2000 in degrees. Teff is the effective temperature obtained through matching with the Kurucz grid in the [3900 Å, 5600 Å] wavelength range. Fe_EW and K_EW are the equivalent widths of the Fe-group lines and Ca ii K line in the observed spectrum, respectively. Fe_EW_m and K_EW_m are the equivalent widths of the Fe-group lines and Ca ii K line of the corresponding Kurucz template, respectively. The smaller the value of K_EW–K_EW_m and the larger the value of Fe_EW–Fe_EW_m are, the more obvious the Am phenomenon is. K_type, H_type, and m_type are spectral subtypes of the Ca ii K line, Balmer lines, and metallic lines, respectively. The column Z shows the vertical distance from the Galactic plane. The Ap_flag is a flag column, and Ap_flag=1 indicates that the star is an Ap candidate. The rest, rlo, and rhi come from the catalog in Bailer-Jones et al. (2018) and are the estimated distance, lower limit distance, and upper limit distance, respectively. Parallax and parallax_error are taken from the Gaia DR2 catalog. The next six columns list the magnitudes and errors of the J, H, and K bands from the 2MASS catalog. The next four columns are magnitudes and errors of the W1 and W2 bands from the WISE catalog. The EBV values are taken from Schlafly & Finkbeiner (2011). The next three columns show the color of ($J-H$), ($H-K$), and (W1W2), correcting the dust extinction. The column FeH_lamost is metal abundance provide by the LAMOST pipeline. The last two columns are EW of the spectral line of the observed spectra and templates at 4077 Å. The greater the value of 4077_EW–4077_EW_m is, the greater the probability of an Ap star is.

Table 8.  The Catalog of Am Stars in LAMOST DR5

Name R.A.(2000) Decl.(2000) Teff Fe_EW Fe_EW_m K_EW K_EW_m K_type H_type m_type Z
Ap_flag rest rlo rhi parallax parallax_error Jmag e_Jmag Hmag e_Hmag Kmag e_Kmag
  W1mag e_W1mag W2mag e_W2mag EBV (JH)o (HK)o (W1W2)o FeH_lamost 4077_EW 4077_EW_m
spec-56994-HD053243N321131V01_sp06-044 85.285 31.919 7500 11.186 4.761 2.158 2.905 A5V A7V A7V 907.046
0 1338.367 1290.222 1390.170 0.719 0.028 12.304 0.021 12.108 0.023 12.001 0.023
  11.909 0.022 11.112 0.022 0.962 −0.054 −0.047 0.772 0.717 0.830
spec-56658-GAC096N23B1_sp10-050 94.486 22.630 7250 11.333 5.398 2.419 3.452 A6IV A7V A7V −88.073
0 1514.845 1415.680 1628.588 0.633 0.045 13.052 0.026 12.887 0.033 12.775 0.027
  12.682 0.032 12.356 0.036 1.720 −0.282 −0.163 0.281 0.620 0.999
spec-56718-HD152630N280739B01_sp07-141 233.581 26.746 7500 11.409 2.875 1.326 2.003 A1V A6V F5 −229.736
0 527.862 519.542 536.447 1.866 0.030 13.489 0.026 12.932 0.024 12.827 0.037
  12.749 0.024 12.479 0.023 0.046 0.545 0.098 0.269 0.127 0.347 0.569
spec-57035-HD064910N271125V01_sp03-179 101.314 27.159 7750 11.183 4.160 1.778 2.444 A6V A3IV A7V −1700.226
0 1746.482 1655.268 1848.073 0.543 0.031 12.221 0.020 12.170 0.028 12.101 0.027
  11.676 0.020 11.418 0.023 0.094 0.026 0.054 0.256 0.672 0.691
spec-57393-M31025N38B2_sp13-112 27.934 39.496 7500 14.324 4.761 1.165 2.905 A6V A7V A7V −228.052
0 2756.721 2471.088 3109.958 0.316 0.043 14.627 0.039 14.532 0.066 14.500 0.097
  14.327 0.028 14.109 0.043 0.033 0.086 0.027 0.217 0.458 0.678 0.830
spec-57044-HD020328N513905V01_sp01-023 31.425 49.274 8250 8.412 5.101 2.009 2.777 A5V A6IV A6IV 1350.067
0 1993.598 1848.858 2162.054 0.470 0.039 12.858 0.023 12.860 0.023 12.794 0.024
  12.630 0.025 12.462 0.025 0.173 −0.047 0.038 0.164 0.067 0.402 0.786
spec-57651-HD005305N571308V01_sp09-160 13.906 57.945 8250 12.201 7.701 2.496 4.074 A6IV A6IV A7V 1121.101
0 1146.341 1098.812 1198.080 0.845 0.037 11.811 0.026 11.704 0.030 11.640 0.021
  11.572 0.023 11.406 0.021 0.329 0.021 0.011 0.157 0.614 1.175
spec-57328-HD020325N544136B01_sp16-175 29.186 55.993 7500 15.029 4.761 1.551 2.905 A5V A7V A7V 2265.005
1 4253.456 3630.378 5093.715 0.186 0.041 14.563 0.039 14.366 0.064 14.287 0.070
  14.166 0.034 14.082 0.049 0.317 0.115 0.028 0.076 0.863 0.830
spec-57396-GAC073N15B1_sp08-007 74.779 14.537 8000 10.622 3.601 0.985 1.997 A1IV A6IV A7V 2448.776
0 2614.076 2334.988 2962.641 0.340 0.046 14.316 0.027 14.123 0.038 14.025 0.053
  13.876 0.030 13.725 0.043 0.348 0.102 0.042 0.142 0.524 0.578
spec-57034-GAC103N38B1_sp03-062 101.347 39.148 8250 9.439 3.105 0.957 1.560 A3IV A6IV A7V 215.197
0 3377.005 3036.260 3796.369 0.254 0.034 13.951 0.024 13.910 0.032 13.930 0.050
  13.820 0.029 13.706 0.047 0.105 0.014 −0.037 0.111 0.260 0.337 0.482

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

Footnotes

Please wait… references are loading.
10.3847/1538-4365/ab17d8