Spectral Feature Extraction for DB White Dwarfs Through Machine Learning Applied to New Discoveries in the Sdss DR12 and DR14

Xiao Kong; A-Li Luo; Xiang-Ru Li; You-Fen Wang; Yin-Bi Li; Jing-Kun Zhao

doi:10.1088/1538-3873/aac7a8

1. Introduction

At the final stage of stellar evolution for main sequence stars, white dwarfs (WDs) simply cool off in the absence of nuclear reactions. The energy of most WDs are generated by the radiation of the residual gravitational contraction, instead of nuclear fusion. Generally, the initial masses of the progenitors of WDs are approximately between 0.07 and 8 M_⊙, and their radius are often the same order as that of the Earth, implying that they need extremely long cooling times. It is believed that over 97% of the stars in the Galaxy will eventually end up as WDs (Fontaine et al. 2001). The luminosity function of a WD, containing information of the stellar death rate in the local galactic disk, can be used to estimate the density of the matter in the Galaxy. A statistically complete sample is required to measure the luminosity function of WDs (Limoges & Bergeron 2010).

Approximately 80% of all observed WDs belong to DA type with Hydrogen dominated atmospheres, with the remaining 20% falling into the DB (He i) or DO (He ii) categories with atmospheres dominated by helium. As these stars are lined up in the WD cooling sequence, they are observed with temperatures of approximately 45,000 K; categorized as hot DO stars with He ii rich in spectra, and of effective temperatures (T_eff) mostly below 30,000 K; categorized as DB WDs (DBWDs) with only He i lines in the spectra. When the temperature drops to 10,000 K, helium becomes spectroscopically invisible, e.g., featureless smooth DC, carbon present DQ, or metal-rich DZ spectra (Voss et al. 2007).

With nearly pure helium in the neutral form in their atmospheres, the DBWDs represent the best example of hydrogen-deficient stars in the universe. Many hydrogen dominated DA WDs transform into DBWDs with helium atmospheres, and the ratio of DA to non-DA WDs varies as a function of T_eff along the cooling sequence (Fontaine et al. 2001). By expanding the DBWD sample size, a better understanding of the evolution of WDs is possible.

The high photospheric purity of DBWDs was first revealed by atmosphere models (Bues 1970). Only about 80 optical spectra and 25 ultraviolet spectrophotometries were investigated in the 1900s (Beauchamp et al. 1996). Later, with the help of the Sloan Digital Sky Survey (SDSS) spectral surveys, systematic searches were completed and a larger sample of DBWDs were obtained, holding great potential for the exploration of the chemical evolution of DB degenerates. Kleinman et al. (2013) provided 922 DBWDs from SDSS Data Release (DR) 7 (Abazajian et al. 2009), Kepler et al. (2015) added another 450 in DR10 (Eisenstein et al. 2011). DR12 (Alam et al. 2015) increased this number by 121 (Kepler et al. 2016), among which Koester & Kepler (2015) selected 1267 spectra with signal-to-noise ratios (S/N) greater than 10, and of these the atmospheric parameters of 1107 objects were analyzed. Based on the sample of 150 DBWDs and 1733 DA WDs, Kepler et al. (2007) provided the average masses of 0.711 ± 0.009 M_⊙ and 0.593 ± 0.016 M_⊙ for the DB and DA types, respectively. Koester & Kepler (2015) reported that the mass of DB types have a significant increase below T_eff = 16,000 K, possibly caused by the imperfect implementation of line broadening of neutral helium atoms, and analyzed the distributions of DBA WDs and DBWDs with the height, z, above the Galactic plane differing toward lower T_eff. Eisenstein et al. (2006b) presented 28 stars as candidate hot DB or cool DO WDs, some of which are the first helium atmosphere WDs found in the range 30,000–45,000 K, in the DB gap.

However, the majority of known DB spectra are obtained through parameter measurement, which may lead to incompleteness in the DBWD findings because of bad data quality or spectral fitting failures. For such a large data set from SDSS DR12, including 4355,200 spectra, some DBWDs will not be identified, especially for data with low S/N. The DB class in the SDSS catalog is lacking as most known DBWDs are mis-classified as "O," "B," "A," "QSO" or some other types in the DR12. It should be noted that there is a "WD" class in SDSS DR12, which mainly includes DA WDs, although a few other subtypes of WD are mixed in.

We search for DBWDs in all DR12 and DR14 without color or other limits, which depend on machine learning (ML). After a first manual check, we discard those without obvious He i lines (4471.5 and 5875.6 Å). Then we arrange the remaining spectra in descending order of S/N of the g band (S/N_g), where the majority of He i lines exist, and select the top 300 as our positive samples, which is used to extract the DB features using the LASSO method (Tibshirani 1996). To analyze the features, the mathematical tool Wavelet (Ingrid Daubechies 1992) is used to check the decomposed spectra in different scales. The wavelet transform is able to cut up signals into different frequency components, then each component with a resolution matched to its scale can be studied. We directly employ the two built-in functions of MATLAB, "wavedec" and "wrcoef," to aid the wavelet analysis of the spectrum. Li et al. (2015) described the basic properties of wavelet, explained the process of wavelet decomposition, and employed it in combination with LASSO to estimate stellar atmospheric parameters. Luo & Zhao (2001) used it to obtain spectral classification information of galaxies. Afterwards, in the derived feature space rather than the original spectra, SVM is employed to distinguish DB from other types of objects. "Feature" here has the same meaning as flux at some particular wavelength or any specific location of spectra.

This paper is organized as follows. Section 2 describes the spectral data used in this paper. The ML method applied in this paper are explained in detail in Section 3, including the preprocessing of the data set, feature model construction of DBWD through the LASSO algorithm, feature analysis using the wavelet transform, and classifier establishment by the SVM. Then, we apply the method to detect DBWDs from the SDSS data set, as introduced in Section 4. In Section 5, we compare our results with those of the literature and calculate the relative parameters, such as T_eff, surface gravities (log g), and ultra-violet color (FUV–NUV). Finally, we summarize our work in Section 6.

2. Data Sets

In the Koester & Kepler (2015) catalog, there are a total of 1107 objects of DBWD from SDSS DR10 and DR12, which are originally classified as O, QSO, B, WD, galaxy, or other types by SDSS 1D pipeline; the quantities of each type are given in Table 1. This table suggests that we can pick out more DB spectra from all types of SDSS spectra, excluding those with a high confidence of classification.

Table 1. Classfication and Quantities of 1107 DBWDs in the SDSS SR12 Catalog

Class^a	Subclass^a	Number	Class^a	Subclass^a	Number	Class^a	Subclass^a	Number
QSO	null	416	galaxy	null	12	star	K	2
Star	OB	400	star	A	11	star	T	2
Star	O	117	star	CV	9	star	carbon	2
Star	B	62	star	F	7	star	G	1
QSO	broadline	30	star	L	4	galaxy	broadline	1
Star	WD	28	star	M	3	⋯	⋯	⋯

Note.

^a"Class" and "subclass" are adopted from the data archive of SDSS.

Download table as: ASCII Typeset image

We attempt a ML method to search for more DBWDs, with a focus on the low quality data. The basic idea is to use SVM as a classifier to sort out the DBWDs from the spectral data of SDSS DR12 and DR14, based on features found by LASSO.

We construct two subsets for training and testing from SDSS DR12. The training set is used for learning, that is to fit the parameters (features and hyper-planes) of the classifier. The testing set is used for the parameter adjustment of the classifier, e.g., to choose the best features and most suitable kernel function of SVM. A validation process of 10-fold cross-validation is built into both the LASSO and SVM packages and conducted automatically using the training set. We select a candidate data set, named experimental data (ED), from the SDSS DR14 and explain the procedure of selection in Section 2.2. Table 2 lists the roles of the training, testing, and ED sets.

Table 2. Roles of the Three Data Sets

Data Set	Roles
Training Data	To be used in the training process, i.e.,
	Detecting features by LASSO (Section 3.2);
	Estimating the parameterizing model by LIBSVM (Section 3.4).
Testing Data	To be used in the training process, i.e.,
	Determining the parameters in LASSO (Section 3.2);
	Determining the hyper-planes in LIBSVM (Section 3.4).
Experimental Data	Application of Section 3, to be used in
	Searching DB spectra from Experimental Data (Section 4).

Download table as: ASCII Typeset image

In the SDSS data archive, spectra were grouped into "Class" and "Subclass," which are shown in Table 3. As the basic SVM is a binary-classification algorithm, the classification between DB and each "Subclass" are performed in parallel. For convenience, we abbreviate "Class + Subclass" (CPS) as the ID of each subclass in this experiment, such as "star+O" or "QSO+AGN."

Table 3. Types of Spectra Applied in the Experiment

Class^a	Subclass^a
Star	O, B, A, F, G, K, M, L, T, WD, carbon, CV
Galaxy	AGN, broadline, starburst, star-forming, null
QSO	AGN, broadline, starburst, star-forming, null

Note.

^a"Class" and "Subclass" are adopted from the data archive of SDSS.

Download table as: ASCII Typeset image

2.1. Data for Training Process

After a visual inspection, we select 300 DB samples of DBWDs with the highest S/N_g as the positive samples of the training set. This is the only group of positive samples, which means it is compared with all groups of negative samples. Clearly, the redshift of the positive samples from the SDSS DR12 is incorrect because they are measured using non-DB templates. Hence, we need to re-measure their z values using DB templates, and then move them to the rest frame.

Next, we apply full spectral template matching to accomplish this process. This is the most-widely used method in spectral classification and measurements and is also the core algorithm in the "one-dimensional" pipeline software of SDSS (Lee et al. 2008). It is approached as a χ² minimization problem. We first reshape the pseudo continuums of the templates to ensure they are consistent with the spectrum, then calculate the distance between a template and the spectrum at each step within a specific redshift range. Finally, z can be derived from the template that has the minimum χ², which is called the best-fit.

Meanwhile, for each CPS, the spectra selected from the total set of spectra, with the ranking of S/N_g in a descending order, are the negative samples. In the algorithm application, SVM has many limitations when it is applied to the binary-classification procedure from imbalanced data sets, in which the negative instances heavily outnumber the positive ones. Although many applications have been raised to overcome this issue (Akbani et al. 2004), we decided to keep a balance between positive and negative samples to ensure the correction of the classifications, meaning that only 300 spectral data are kept in each group of the negative samples. Furthermore, five groups of each CPS are built as negative samples in order to obtain more comprehensive results; a total of 1500 spectra in every CPS in the Table 3.

2.2. Data for Recognition

DR12 and DR14 contain huge amounts of spectral data, most of which have a high quality and correct classification. It would be inefficient if all spectra were included in the searching process, especially for "GALAXY" that corresponds to the largest number in the catalog.

We adopt the full spectral template matching program to classify all spectra from the SDSS data set. DB templates are replaced by all templates of the LAMOST 1D Pipeline (Luo et al. 2015), which differs from the steps introduced in Section 2.1. In order to obtain reliable classification results, the relationship of χ² between the best and second-best fit is also taken into consideration. After this preprocessing, spectral data that are not DBWDs with a high degree of confidence will be excluded, while others remain as ED, from which DBWD is recognized in Section 4. The amount of spectral data from each CPS within the ED and catalog are illustrated in Figure 1, using the red and blues bars, respectively. Compared with the SDSS DR14 data set, the ED is eventually built by reducing the quantity by an average of 75 percent, after the reduction process.

**Figure 1.** Data usage of the experiment. The numbers of spectra from the SDSS DR14 catalog in each CPS are shown as the blue lines, while those from the ED are in red.
Download figure:
Standard image High-resolution image

3. Method in the Training Process

The flowchart of the training process is given in Figure 2. All of the prepared data begin with the preprocessing procedure, including the normalization and redshift measurement. The spectral features are then extracted through LASSO for each group, followed by a bi-classify modular of SVM to separate the DBWDs from all of the types. If the accuracy (see Section 3.4.2 in detail) is not high enough, an optimization process needs to be performed, i.e., remove some contamination of DB from the negative sample sets and restart this loop. Once the training process is completed, the unique features of the DBWDs and hyper-planes of each group can be derived by LASSO and SVM, respectively. In addition, we analyze the features extracted by LASSO using Wavelet and provide a multi-scale explanation.

3.1. Data Preprocessing

3.1.1. Normalization for Positive and Negative Samples

To ensure the consistency of different spectral data, a simple normalization is required before feature extraction. Let a vector ${\boldsymbol{x}}={({x}_{1},{x}_{2},\ldots ,{x}_{n})}^{T}$ represent a spectrum, where n (n > 0) is the number of points. The component x_i represents the flux of the spectrum ${\boldsymbol{x}}$ , $i\in \{1,2,\,\ldots ,\,n\}$ . We simply put all of the flux between −1 and 1,

$\begin{eqnarray*}&&\hat{{\boldsymbol{x}}}=\displaystyle \frac{{\boldsymbol{x}}-\bar{{\boldsymbol{x}}}}{{\sigma }_{{\boldsymbol{x}}}}\end{eqnarray*}$

where $\bar{{\boldsymbol{x}}}$ and σ_x are the mean value and variance of ${\boldsymbol{x}}$ , respectively.

3.1.2. Redshift Measurement for Positive Sample Groups

In our investigation, we find that most DB spectra from the SDSS data set have incorrect redshifts, especially for those with large redshifts that are classified as "GALAXY" and "QSO" in the catalog. To ensure consistency of the feature extraction, all samples in the training set should be in the rest frame.

First, the DBWD spectra with high quality are selected as DB templates that only used for the redshift measurements. Then, we apply full spectral template matching, which is described in the Section 2.1, to measure all samples in the training set.

3.2. Feature Extraction

DBWDs account for about 20% of all WDs and have atmospheres dominated by neutral helium, represented as the He i line in the spectrum. Compared with all other spectra, the most significant spectral line in a DBWDs spectrum is the He i at 4471 Å (Kepler et al. 2016). The spectral lines shown in Figure 3 were checked using the atomic line table from the National Institute of Standards and Technology (NIST 2017) to aid the analysis of the feature extraction results. In this plot, we show a typical DB spectrum with all of the He i lines ranging from 3800 to 7400 Å.

**Figure 3.** Wavelength of the main He i lines in a DB spectrum. The positions of the line center, shown as the black dashed lines, are accurate to one decimal place.
Download figure:
Standard image High-resolution image

Instead of full spectral template matching, ML can be applied to imitate visual recognition to detect the features in noisy spectral data. The line wings, rather than the line center, may be more sensitive to the distinction between DB and non-DB. The LASSO algorithm has the ability to obtain such positions at some particular wavelength as features, which satisfy the demand. In other words, some dominant spectral lines do not work when spectra are similar, such as DB and B that both have a He i line at 5015.7 Å, which cannot be used as a component of the classifier.

3.2.1. LASSO

LASSO is an interpretable model that minimizes the residual sum of matrixes subject to the sum of the absolute value of the coefficients smaller than a constant (Tibshirani 1996). This model is successfully applied to extract linearly supporting features from stellar spectra, and the atmospheric parameters are automatically estimated including T_eff, log g, and [Fe/H] (Li et al. 2015).

In simple terms, consider a sample consisting of N spectra, each of which includes n points. Let y_i be the outcome and x_i be the covariate vector for the ith case. Then, the objective of LASSO is to solve

$\begin{eqnarray*}&&\hat{{\boldsymbol{w}}}=\arg \mathop{\min }\limits_{{\boldsymbol{w}}}\left\{\displaystyle \sum _{i=1}^{N}{({y}_{i}-f({{\boldsymbol{x}}}^{i};{\boldsymbol{w}}))}^{2}+\lambda \parallel {\boldsymbol{w}}{\parallel }_{1}\right\},\end{eqnarray*}$

where

$\begin{eqnarray*}\parallel {\boldsymbol{w}}{\parallel }_{1} & = & \displaystyle \sum _{i=1}^{n}| {w}_{i}| ,\\ f({{\boldsymbol{x}}}^{i};{\boldsymbol{w}}) & = & \displaystyle \sum _{i=1}^{n}{w}_{i}{x}_{i}\end{eqnarray*}$

and λ > 0 is a tuning parameter that controls the value of non-zero parameters, ${\boldsymbol{w}}$ , and the complexity of the model.

James et al. (2017) has proved that LASSO can effectively filter out most of the irrelevant or redundant variables by reducing the amount of non-zero parameters of w_i. We use the LASSO program based on MATLAB (Efron et al. 2004), in which the parameter λ can be equivalently replaced with the number m of non-zero parameters, w_i.

3.2.2. Feature Selection

Here, we adopt LASSO to extract features between the DBWD and other types of spectra. At first, we build five groups of negative samples for each CPS with the highest S/N_g (see Table 3 for details), as the features can be affected by data quality or parameters. Each group has 300 negative samples, and we combine all features within a CPS into one as the final output. Features from different wavelength ranges may vary, and most DB spectral lines are, in theory, mainly on the blue band of a spectrum. Thus, positions of the features within 3900–5900 Å and 3900–8900 Å need to be analyzed.

1.
Feature with different CPSs. Features from different CPSs are not similar, which represent the difference between spectra of this type and DB spectra. Figure 4 is one example that illustrates distinct features extracted from WD and QSO groups, which are shown by the short solid blue and red lines, respectively. From the experiment, we note that the features extracted from QSO almost cover all wavelengths, indicating that the difference between QSO and DB is large. This is also the main reason why the majority of QSO spectra with high confidence are excluded from the ED in advance. Conversely, features from the CPSs of A star groups are mainly near 4026.2, 4471.5, 4713.1, 4921.9, 5875.6, and 4341.7 Å, which is the position of the He i and Hγ lines. Face that none of the He i line exist in normal early A stars, as panel (a) of Figure 2 in Takeda et al. (2007) shows, these locations represent the major differences between A and DB stars.There are five groups in CPS in which negative samples are all classified as O or OB in the DR14 catalog. Figure 5 shows the features extracted from two groups. It can be shown that the main features of the same classification are similar, but the details are slightly different. All these spectra were observed by the telescope on Earth, which may lead to uncertainties in the data due to noise from sky light or instrument efficiencies. Besides, ML is a data-based approach, providing practical results that differ from the results obtained from the theory of spectral analysis. Therefore, a few more wavelengths, that are not characteristic spectral lines, are recognized as features between O and DB. We show these features in Table 10 in Section 6.
2.
Feature within different wavelength range. The characteristics in the blue bands (3900–5900 Å) are not exactly the same as those in full spectrum (3900–8900 Å). The possible cause is changes in the original points, while the wavelength increases. Figure 6, for instance, compares features within 3900–5900 Å(blue, above) and 3900–8900 Å (red, below).In Figure 6, all features within the blue band of the two groups are almost the same, except for some minor differences. For example, He i 5015.7 Å is an important feature within the wavelength range 3900–5900 Å, but it is not so significant within 3900–8900 Å. In addition, many data points, such as 7010 Å, are also significant characteristics, that cannot be ignored. Despite the importance of the blue band in early type stars and DBWDs, there obviously exists features for wavelengths redder than 5900 Å, such as Hα (6564.6 Å), He i (6678.2, 7065.2, and 7281.4 Å), and other positions that theoretically have no spectral lines in low resolution spectra.In conclusion, we use the full spectrum to extract features when performing the following experiment.

**Figure 4.** Different features from the A and QSO groups with a wave range of from 3900 to 5900 Å. The features that represent the distinction between A and DB are plotted in the red lines (below), and that of QSO and DB in blues lines (above).
Download figure:
Standard image High-resolution image

**Figure 5.** Features between different groups with identical type. The blue and red short lines indicate the wavelength of the feature detected by two data groups separately. Most of positions within these two groups are similar.
Download figure:
Standard image High-resolution image

**Figure 6.** Features between B and DB in different wavebands. The blue lines above the spectrum represent features extracted from 3900 to 5900 Å, while the red lines are those from 3900 to 8900 Å.
Download figure:
Standard image High-resolution image

3.3. Features at Multi-scale

Wavelet decomposition is adopted to analyze features of DBWD at multi-scales.

Li et al. (2015) evaluated the performance of various wavelet basis functions and decomposition levels for the estimation of stellar atmospheric parameters, using several evaluation methods. In some situations, the most essential difference between wavelet applications is the selection of the basis function and decomposition level. The efficiency can differ when some variables change, such as the basis function and decomposition level. For example, Meyer and Biorthogonal wavelets can sometimes lead to a large distinction when estimating T_eff of some specific stellar spectra (Panel (e) of Figure 7 in Li et al. 2015).

When it comes to analyzing the distribution of DBWD features extracted by LASSO, the wavelet basis function becomes less important since the classification algorithm is only concerned with the wavelength location of the features. We compare different basis functions and obtain almost similar locations of the features, which is relatively fixed at some wavelengths in the fourth or fifth wavelet coefficients. As a result, we simply employ the simplest basis form—Haar wavelet—to conduct the decompose procedure.

Some of the most important features stay at the same position of line wings on the same scale for DBWDs with various temperatures and gravities. Features extracted from one group are served as an example shown in Figure 5, in which line wings of He i (4026.2, 4471.5, 4921.9, and 5875.6 Å), instead of line centers, are recognized as features.

We decompose a spectrum into a low-frequency approximation signal and high-frequency details by wavelet transform, and discover that most features fall on the crest of the detail coefficients at the fourth layer of the wavelet domain. For example, 4471.5, 5875.6, and 6678.2 Å in Figure 7 are the three main typical features in DB.

**Figure 7.** Wavelet decomposition of a DB spectrum. In the upper panel, the data from top to bottom are the original flux of the spectrum and the approximation coefficients after the fourth level decomposition. The detail coefficients at the first, second, third, and fourth levels are shown in the three bottom panels.
Download figure:
Standard image High-resolution image

The features located at the 16th point from the line center or nearby, coincide with the coefficients of the fourth layer of the wavelet decomposition (2⁴). This part of the spectral line should become a major character of WDs when distinguishing from other types of spectra. We believe that spectral line broadening is a significant characteristic of WDs and that the positions are the most dramatic changes in the spectral data.

3.4. SVM

3.4.1. Hyper-plane

As a supervised ML method, SVM is used in classification and regression analysis. In a dual clustering system, given a set of training examples that are each marked by positive or negative categories, an SVM training algorithm builds a robust binary linear classifier model that sorts new data to one type or the other.

We apply the LIBSVM (Chang & Lin 2011) software to pick DB spectra from all of the data set with features obtained in Section 3.2. LIBSVM is an integrated software for support vector classification, regression, and distribution estimation. It supports multi-type classifications.

Let $({{\boldsymbol{x}}}_{i},{y}_{i}),i=1,2,\,\ldots ,\,n$ represent a training set, where ${{\boldsymbol{x}}}_{i}\in {R}^{n}$ and $y\in \{-1,1\}{}^{n}$ are the spectral data at the feature points and label, respectively. LIBSVM tries to seek a linear separating hyper-plane with the maximal margin in this higher dimensional space by solving the following optimization problem:

$\begin{eqnarray*}\mathop{\min }\limits_{{\boldsymbol{w}},b,\xi } & \displaystyle \frac{1}{2}{{\boldsymbol{w}}}^{T}{\boldsymbol{w}}+C\displaystyle \sum _{i=1}^{n}{\xi }_{i}\\ \mathrm{subject}\ \mathrm{to} & {y}_{i}({{\boldsymbol{w}}}^{T}\phi ({{\boldsymbol{x}}}_{i})+b)\geqslant 1-{\xi }_{i},\\ & {\xi }_{i}\geqslant 0.\end{eqnarray*}$

Here, training vectors ${{\boldsymbol{x}}}_{i}$ are mapped into a higher dimensional space by the function ϕ and C > 0 is the penalty parameter of the error term. Furthermore, $K{({{\boldsymbol{x}}}_{i},{{\boldsymbol{x}}}_{j})=\phi ({{\boldsymbol{x}}}_{i})}^{T}\phi ({{\boldsymbol{x}}}_{j})$ is the kernel function. LIBSVM provides four basic kernels below:

1.
Linear: $K({{\boldsymbol{x}}}_{i},{{\boldsymbol{x}}}_{j})={{\boldsymbol{x}}}_{i}^{T}{{\boldsymbol{x}}}_{j}$ .
2.
Polynomial: $K{({{\boldsymbol{x}}}_{i},{{\boldsymbol{x}}}_{j})=(\gamma {{\boldsymbol{x}}}_{i}^{T}{{\boldsymbol{x}}}_{j}+r)}^{d},\gamma \gt 0$ .
3.
Radial basis function (RBF): $K({{\boldsymbol{x}}}_{i},{{\boldsymbol{x}}}_{j})=\exp (-\gamma \parallel {{\boldsymbol{x}}}_{i}\,-{{\boldsymbol{x}}}_{j}{\parallel }^{2}),\gamma \gt 0$ .
4.
Sigmoid: $K({{\boldsymbol{x}}}_{i},{{\boldsymbol{x}}}_{j})=\tanh (\gamma {{\boldsymbol{x}}}_{i}^{T}{{\boldsymbol{x}}}_{j}+r)$ .

Here, γ, r, and d are the kernel parameters.

The kernel function and parameters are important for the SVM algorithm to be adjusted. Experiments show that the linear and RBF kernels should provide better discrimination results for spectral data. 10-fold cross-validation is utilized to automatically determine all parameters using LIBSVM software.

3.4.2. Verification

In this section, we verify the reliability of the algorithm by labeling all of the testing set. There are some measures for information retrieval and statistical classification to evaluate the quality of the algorithm: accuracy, precision, and recall. The accuracy is based on our prediction, which shows how many of the positive predictions are true positives (TPs). The recall rate shows how many positive examples in the sample were predicted correctly. We use the following terms: TP for the correct prediction of the positive category; false positive (FP) for that of the incorrect positive category; and false negative (FN) and true negative (TN) for incorrect and TNs, respectively. During the training process, almost all of the positive samples, except a few, have been recognized correctly. Therefore, the recall TP/(TP+FN) can reach an approximate percentage of 100%. Generally, the mean accuracy (TP+TN)/(TP+FN+FP+TN), and precision TP/(TP+FP), of all of the CPS can reach 99.9% and 99.7%, respectively, indicating a very high stability and reliability for this algorithm.

4. Recognition

4.1. Input of the SVM

We collect all of the features derived from Section 3.2, and demonstrate some of them obtained from the CPSs of O, B, A, F, WD, and QSO (from top to bottom) in Figure 8.

**Figure 8.** Features of DB vs. O, B, A, F, WD(DA), and QSO, from top to bottom in order. To make it more explicit, the wavelength of the features are marked with red lines below each DB spectrum (black); the other six types of spectra are plotted in blue.
Download figure:
Standard image High-resolution image

The features are marked with short red lines between the DB spectra and that of the other CPSs. In general, features on either side of the spectral lines are not perfectly symmetrical. Distinctions near He i 4026.2 Å, for instance, between DB and O, only appeared on the right side of this He i line. It can be intuitively derived that a large number of features are either just a single data points or narrow ranges of wavelengths. When they exist in a relatively long range of wavelength, this part should display the most dramatic changes in the spectral lines, as discussed below. The content of metal elements increases with changes in the stellar types O, B, A, and F, corresponding to a raise in the number of features within the red band.

Afterwards, the DB candidates will be identified from all of the spectra during the searching process. The models of the feature derived from the training process, Section 3.2, are employed as the input of SVM. A flowchart is presented in Figure 9 to demonstrate the start-to-end flow of this procedure.

**Figure 9.** Flowchart of the recognition procedure. This is the application stage of Figure 2 and Section 3. The final catalog will also be generated in this part.
Download figure:
Standard image High-resolution image

4.2. Recognition and Results

After the reduction procedure described in Section 2.2, the ED is generated as the original data set to conduct recognition. Similar to the prepocessing in Section 3.1, we normalize the ED and then move all of them to the rest frame using cross-correlation.

Then a hyper-plane in feature space is applied by SVM to distinguish the DBWDs from the ED. We inspect the output and find a sample of 2808 spectra of 2029 different objects from SDSS DR12 and DR14, in which 58 are newly identified objects. In Table 4, we illustrate those spectra classified as DBWDs to evaluate the performance of the algorithm model. In this part, we could not provide the precise ratio mentioned above because we cannot confirm how many real DBWDs reside in the predication negative category. Assuming all labeled negative samples are non-DB, then the mean percent of correctly identified samples using the algorithm can reach 99.5%.

Table 4. Results of the Experiment and Evaluation of the Algorithm Model

Class^a	Subclass^a	Numbers in ED^b	DB Candidate^c	DB^d	Ratio^e
Star	O	6497	1569	1522	99.9%
Star	B	14,759	324	115	98.6%
Star	A	85,468	82	10	99.7%
Star	F	192,387	1750	14	99.1%
Star	G	101,230	62	2	98.9%
Star	K	79,775	559	3	99.3%
Star	M	69,675	181	101	99.7%
Star	L	5678	19	7	99.9%
Star	T	1676	9	6	99.9%
Star	WD	31,776	423	64	98.9%
Star	CV	9788	248	19	97.6%
Star	Carbon	3088	26	2	99.2%
Galaxy	Broadline	6587	113	1	98.3%
Galaxy	Null	192,319	198	34	99.9%
QSO	Broadline	97,867	237	12	99.4%
QSO	Null	87,980	1264	862	99.4%

Total		883,644	6952	2774	99.5%

Notes.

^a"Class" and "subclass" are adopted from the data archive of SDSS. ^bNumber of spectra of every CPS in the ED. ^cNumber of positive samples in every CPS directly derived from the SVM. ^dNumber of positive samples in every CPS after visual inspection. ^eApproximation of the identification precision when the predication negative samples are all correct, i.e., the correct proportion of classifications.

Download table as: ASCII Typeset image

Clearly, most DBWDs are identified from O, B, WD, and QSO in the DR14 data set. In a different way, this also indicates that the qualities, such as spectral lines, of DBWDs are more likely with these types; or perhaps they are usually mixed together when matched with the full spectrum instead of some particular wavelengths (or features).

The target selections of all DBWDs are given in Table 5. Many spectra, with sources that are stars, are mis-classified into QSO, such as "WHITEDWARF_NEW" listed in Table 5. We believe this is due to a weakness in the algorithm of pipeline. Without an efficient feature wavelength, many spectra may not be correctly classified by full spectral template matching. There are also some quite broad spectral lines in both the QSO and WD (DB) spectra that may mislead the classification results.

Table 5. Target Selection of DBWDs in the SDSS DR12 and DR14

Source^a	QSO^b	Galaxy^b	Star^b	Total
AMC	1	1	⋯	2
ELG	⋯	⋯	1	1
HOT_STD	22	3	514	539
LRG	2	⋯	1	3
NONLEGACY	24	3	413	440
Null	4	⋯	3	7
QA	⋯	⋯	2	2
QSO	1	⋯	20	21
QSO_EBOSS_W3_ADM	1	⋯	2	3
QSO_VAR	⋯	⋯	1	1
QSO_VAR_SDSS	1	⋯	1	2
ROSAT_D	⋯	⋯	2	2
SEGUE1	5	⋯	1	6
SEQUELS_TARGET	1	1	3	5
SERENDIPITY_BLUE	29	5	180	214
SERENDIPITY_DISTANT	23	3	207	233
STAR	⋯	⋯	2	2
STAR_CATY_VAR	1	⋯	11	12
STAR_WHITE_DWARF	6	3	84	93
WHITEDWARF_NEW	462	3	236	701
WHITEDWARF_SDSS	295	2	188	485

Total	878	24	1872	2774

Notes.

^aTarget selection of SDSS DR12 and DR14. ^b"Class" and "subclass" are adopted from the data archive of SDSS.

Download table as: ASCII Typeset image

5. Analysis

5.1. Comparison with the Literature

Altogether, there are 1309 pure DB objects, including double stars, in the literature (see Section 5.1.1). In this paper, we present 1999 objects of DBWDs (including, but not limited to, DB, DBA, or DBZ) with 2774 spectra in SDSS DR12 and DR14, among which 58 objects are newly spectroscopically confirmed.

A total of 176 pure DB spectra from the literature are omitted in our catalog. Most of them (96 spectra) have no apparent He i lines or only one possible He i line so that our method could not be used to recognize them. We present one example in Figure 10; SDSS Plate–MJD–Fiber (p–m–f), 0804–52286–0262. Another 38 spectra are generally of very poor quality, and the hyper-planes in the feature spaces can be ineffective. Besides, an incorrect radial velocity (RV) may lead to a failure of recognition, which includes 33 of these spectra. As for measuring RV of A type star using DB templates, there are no He i lines in A-type star spectra and only He i lines in DBWD spectra, the errors of RV arise because Balmer lines in the A-type star spectrum are being identified with He i lines in the DB spectrum.

**Figure 10.** Three typical spectra. The spectrum in panel "a" is a DBWD in the literature but not in our catalog. Panel "b" is a newly spectroscopically confirmed AM CVn star. One of the DB plus M double stars is presented in panel "c," with templates of DB and M in red and green, respectively.
Download figure:
Standard image High-resolution image

Nine spectra are supposed to be DAB instead of DB. It is worth mentioning that some spectra, for example SDSS J092604.91+264225.0, which is mis-classified as DA in Kleinman et al. (2013), are typical DBWDs. More details can be found in Table 6 and the online table. Table 7 lists the columns of data provided in our online catalog, Table 6.

Table 6. Newly Spectroscopically Confirmed DBWDs from SDSS DR12 and DR14

Designation	P–M–F	Type	RV_DB	RV_M	T_eff	log g	FUV	NUV	S/N	Mass	Age	Ref
			(km s⁻¹)	(km s⁻¹)	(K)	(cgs)	(mag)	(mag)		${M}_{\odot }$	Myr
J094038.80+364645.6	1275–52996–0037	DB	449 ± 0	⋯	18200 ± 2166	8.09 ± 0.223	19.8 ± 0.2	19.1 ± 0.1	2.5	0.60	106.0	0
J115601.31+293115.4	2224–53815–0171	DB	−41 ± 31	⋯	42897 ± 1577	7.51 ± 0.088	19.5 ± 0.1	19.7 ± 0.1	3.2	−9999	−9999	0
J000801.20+272906.1	2824–54452–0037	DBAZ	149 ± 32	⋯	16673 ± 757	7.92 ± 0.225	⋯	⋯	4.7	0.59	130.7	0
J222646.14+061921.3	4410–56187–0506	DB	70 ± 37	⋯	20120 ± 1567	8.75 ± 0.159	20.0 ± 0.1	19.8 ± 0.1	10.8	0.91	204.3	0
J012752.18+140622.9	4665–56209–0726	DBA	−28 ± 16	⋯	32695 ± 441	8.87 ± 0.023	⋯	⋯	22.1	1.20	148.2	0
J222711.11+073510.7	5057–56209–0276	DB	50 ± 12	⋯	15391 ± 82	8.88 ± 0.029	19.7 ± 0.1	18.9 ± 0.1	22.7	1.19	1100.0	0
J095403.47+223919.2	5787–56254–0254	DB	118 ± 59	⋯	13661 ± 339	8.17 ± 0.111	20.5 ± 0.2	19.5 ± 0.1	13.2	0.59	245.9	0
J094852.66+233004.1	5787–56254–0500	DB	58 ± 14	⋯	17018 ± 157	8.75 ± 0.044	18.8 ± 0.1	18.4 ± 0.1	11.1	1.19	822.0	0
J094852.66+233004.1	5788–56255–0028	DB	68 ± 12	⋯	17615 ± 129	8.77 ± 0.030	18.8 ± 0.1	18.4 ± 0.1	16.7	1.19	705.9	0
J090730.35+270413.6	5780–56274–0018	DB	113 ± 22	⋯	18200 ± 275	8.88 ± 0.048	19.8 ± 0.1	19.3 ± 0.0	10.1	1.19	705.9	0
J100104.94+302543.5	5800–56279–0890	DB	27 ± 33	⋯	13939 ± 110	8.92 ± 0.053	20.2 ± 0.2	19.3 ± 0.1	12.2	1.19	1268.0	0
J135815.93+290525.5	6009–56313–0624	DB	46 ± 3	⋯	22148 ± 208	8.76 ± 0.013	17.1 ± 0.1	16.9 ± 0.0	30.1	1.19	372.9	11
J091256.90+430023.0	4687–56338–0324	DB	4 ± 24	⋯	13893 ± 184	9.03 ± 0.073	⋯	⋯	11.1	1.19	1268.0	0
J091256.90+430023.0	4687–56369–0326	DB+M1	181 ± 2893	1147 ± 52	17954 ± 409	9.32 ± 0.087	⋯	⋯	11.8	−9999	−9999	0
J091638.24+475253.6	5813–56363–0640	DB	−7 ± 7	⋯	17900 ± 99	8.67 ± 0.024	18.4 ± 0.1	18.0 ± 0.0	17.5	0.91	278.0	0
J142046.13+554201.4	6803–56402–0201	DB	217 ± 60	⋯	17200 ± 778	8.34 ± 0.179	⋯	⋯	3.0	0.91	326.3	0
J091534.70+513610.3	5729–56598–0121	DB	37 ± 17	⋯	16071 ± 115	8.77 ± 0.041	19.4 ± 0.2	19.0 ± 0.1	12.1	1.19	951.5	0
J092540.36+511229.6	5730–56607–0940	DB	45 ± 30	⋯	17247 ± 293	8.78 ± 0.049	⋯	⋯	9.6	1.19	822.0	0
J012644.96-025633.9	7877–56898–0048	DB	172 ± 2893	⋯	13162 ± 949	8.89 ± 0.460	⋯	⋯	2.6	1.19	1458.0	0
J080710.33+485259.6	7324–56935–0828	AMCVn	462 ± 56	⋯	8260 ± 23	8.18 ± 0.032	⋯	⋯	3.5	−9999	−9999	0
J231213.74+185713.8	7611–56946–0897	DBO	1260 ± 593	⋯	26200 ± 18432	6.33 ± 1.958	⋯	⋯	1.9	−9999	−9999	0
J022756.30-044504.2	8127–56957–0899	DB	1245 ± 107	⋯	7653 ± 7143	7.27 ± 38.056	⋯	⋯	0.7	−9999	−9999	0
J012920.84+191241.5	7628–56978–0465	DB	877 ± 48	⋯	9228 ± 221	7.23 ± 0.141	⋯	⋯	0.8	0.19	351.9	0
J235607.30+025254.8	7849–56980–0914	DBZ	99 ± 38	⋯	16037 ± 236	8.86 ± 0.087	20.5 ± 0.2	20.0 ± 0.1	6.8	1.19	951.5	0
J020022.48+242343.3	7692–57064–0409	DB	1245 ± 1076	⋯	18200 ± 6743	6.08 ± 3.551	⋯	⋯	−0.0	−9999	−9999	0
J074325.35+432027.7	8276–57067–0470	DB	149 ± 44	⋯	13841 ± 1083	8.88 ± 0.500	⋯	⋯	2.7	1.19	1268.0	0
J005436.04–041940.6	7912–57310–0460	DBA	268 ± 28	⋯	16939 ± 508	6.58 ± 0.162	18.9 ± 0.1	19.4 ± 0.1	5.5	−9999	−9999	0
J013634.37–001109.9	8792–57364–0358	DBA	27 ± 58	⋯	16415 ± 200	8.88 ± 0.056	20.3 ± 0.2	20.0 ± 0.1	9.4	1.19	951.5	0
J234924.30–025209.5	7851–56932–0403	DB	1152 ± 70	⋯	17200 ± 4598	6.19 ± 1.638	⋯	⋯	1.6	−9999	−9999	0
J232933.23+212015.6	7604–56947–0864	DB	1127 ± 77	⋯	18273 ± 1310	8.95 ± 0.137	⋯	⋯	2.2	1.19	705.9	0
J082623.07+555006.2	7375–56981–0144	DB	−299 ± 87	⋯	15200 ± 1122	9.46 ± 0.366	⋯	⋯	6.7	−9999	−9999	0
J081453.04+555033.1	7375–56981–0487	DB	−149 ± 74	⋯	17200 ± 738	8.95 ± 0.178	⋯	⋯	3.2	1.19	822.0	0
J234527.43+215712.3	7600–56984–0082	DB	1315 ± 696	⋯	28200 ± 3881	8.77 ± 0.184	⋯	⋯	5.2	1.19	184.3	0
J234131.56+224240.6	7600–56984–0313	DB+M9	67 ± 35	1174 ± 30	21428 ± 791	8.84 ± 0.049	⋯	⋯	9.5	1.19	436.0	0
J025818.61–004131.3	7820–56984–0106	DB	308 ± 134	⋯	20158 ± 1915	7.90 ± 0.203	⋯	⋯	3.2	0.60	68.8	0
J025720.06–003812.0	7820–56984–0182	DB	884 ± 59	⋯	15400 ± 992	7.84 ± 0.282	⋯	⋯	2.1	0.59	198.7	0
J225345.44+223258.6	7613–56988–0380	DB+M9	29 ± 37	1164 ± 41	12473 ± 486	9.35 ± 0.190	22.0 ± 0.3	21.1 ± 0.1	7.8	−9999	−9999	0
J093021.79+544359.5	7285–56991–1000	DB	−2098 ± 81	⋯	17194 ± 761	8.95 ± 0.222	⋯	⋯	3.3	1.19	822.0	0
J080128.06+554004.7	7281–57007–0548	DB	29 ± 795	⋯	17707 ± 1093	7.66 ± 0.262	21.4 ± 0.4	21.2 ± 0.3	2.0	0.36	56.2	0
J023303.84-022104.8	7829–57011–0397	DB	1245 ± 1578	⋯	16429 ± 1429	8.35 ± 0.493	⋯	⋯	1.8	0.91	385.3	0
J084704.97+511056.0	7303–57013–0675	DB	1245 ± 2586	⋯	15650 ± 1067	8.58 ± 0.502	⋯	⋯	1.5	0.91	385.3	0
J104624.26+490908.9	7387–57038–1000	DB	173 ± 41	⋯	36204 ± 1786	8.60 ± 0.099	20.7 ± 0.3	20.5 ± 0.2	3.2	0.93	16.1	0
J091854.84+515603.2	7289–57039–0049	DB	−1498 ± 76	⋯	20200 ± 3648	7.45 ± 0.617	⋯	⋯	2.8	0.37	39.1	0
J010633.14+203043.1	7624–57039–0927	DB	1191 ± 23	⋯	21763 ± 1440	8.59 ± 0.111	⋯	⋯	3.0	0.91	152.4	0
J090702.69+430612.5	8282–57041–0092	DB	1140 ± 97	⋯	15436 ± 1381	7.21 ± 0.535	⋯	⋯	3.5	0.23	77.4	0
J085309.15+584336.3	8197–57064–0537	DBAZ	29 ± 62	⋯	16715 ± 466	9.23 ± 0.112	⋯	⋯	9.2	1.19	822.0	0
J103107.17+520854.7	8167–57071–0548	DB	216 ± 24	⋯	16535 ± 531	8.76 ± 0.146	21.3 ± 0.5	20.7 ± 0.2	3.1	1.19	822.0	0
J112752.26+565539.5	8176–57131–0392	DB	449 ± 98	⋯	17200 ± 878	7.94 ± 0.206	⋯	⋯	2.2	0.59	130.7	0
J120156.39+493707.4	7423–57135–0578	DB+M2	29 ± 37	1146 ± 8	29415 ± 3758	8.80 ± 0.168	21.8 ± 0.5	20.9 ± 0.1	3.0	1.19	166.8	0
J121027.14+502735.7	7423–57135–0833	DBZ	172 ± 37	⋯	15530 ± 307	9.07 ± 0.096	20.4 ± 0.3	19.7 ± 0.1	13.5	1.19	951.5	0
J214544.31+270923.4	7641–57307–0622	DB	1245 ± 364	⋯	21503 ± 2935	7.90 ± 0.210	⋯	⋯	2.5	0.60	43.8	0
J223318.14+244812.3	7654–57330–0204	DB	618 ± 43	⋯	36971 ± 2568	8.74 ± 0.135	⋯	⋯	2.0	0.93	13.6	0
J231304.25+265057.9	7703–57333–0554	DB	659 ± 43	⋯	14268 ± 1085	6.83 ± 0.488	⋯	⋯	1.7	0.22	96.8	0
J020910.57-043943.1	7885–57336–0410	DB	1245 ± 1704	⋯	35349 ± 25761	7.20 ± 1.613	⋯	⋯	0.1	0.32	10.5	0
J005242.64+285411.0	7674–57359–0834	DB	570 ± 45	⋯	18200 ± 840	9.19 ± 0.180	21.6 ± 0.4	21.1 ± 0.3	3.1	1.19	705.9	0
J001334.89+264245.4	7694–57359–0405	DB	−599 ± 23	⋯	16025 ± 343	9.44 ± 0.161	⋯	⋯	4.3	−9999	−9999	0
J001627.53+281843.5	7694—57359–0737	DB	703 ± 66	⋯	16495 ± 606	7.90 ± 0.165	⋯	⋯	2.6	0.59	161.0	0
J075925.84+414454.4	8291–57391–0933	DB+M9	29 ± 68	1169 ± 61	16303 ± 473	8.23 ± 0.108	18.8 ± 0.1	19.5 ± 0.1	8.4	0.59	161.0	0
J014803.85+005317.7	8793–57391–0826	DB	29 ± 55	⋯	16485 ± 510	8.85 ± 0.143	22.1 ± 0.4	21.7 ± 0.2	2.3	1.19	951.5	0
J131316.23+511428.8	8210–57426–0532	DBA	−2 ± 41	⋯	14982 ± 554	8.61 ± 0.261	⋯	⋯	2.0	0.91	459.1	0

A machine-readable version of the table is available.

Download table as: DataTypeset images: 1 2 3

Table 7. Columns Provided in Table 6 and Online Table

Column No.	Heading	Description
1	Designation	SDSS object name (SDSS 2000J+)
2	P–M–F	SDSS Plate number–Modified Julian date–Fiber
3	Type	Classification of objects derived from ML method
4	RV_DB	Radial velocity and uncertainty of each spectrum (km s⁻¹)
5	RV_M	Radial velocity and uncertainty of M companions (km s⁻¹)
6	T_eff	Effective temperature (K)
7	log g	Surface gravity (cgs)
8	FUV	Magnitude of FUV from GALEX, −9999: there is no corresponding value (mag)
9	NUV	Magnitude of NUV from GALEX, −9999: there is no corresponding value (mag)
10	S/N	Median S/N from catalog of SDSS DR14
11	Mass	Obtained from Bergeron & Gilles Fontaine (2016) ( ${M}_{\odot }$ )
12	Age	Obtained from Bergeron & Gilles Fontaine (2016) (Myr)
13	Ref	ID of the literature, 0: newly identified in this paper, see Section 5.1.1 for detail

Download table as: ASCII Typeset image

Furthermore, we add 15 pure DB spectra of 14 objects in the SDSS DR14. Consider the number of DB objects presented in the literature and our catalog in SDSS DR12; the completeness of our ML method should be about 96.0%. Strong noises at the wave range of features may cause mis-classifications in this paper, which is the main disadvantage of SVM that is in need of improvement.

In general, Table 8 lists the numbers of each types of DBWD identified in this study.

Table 8. Numbers of Identified DBWD Types

Type	No. of Objects	No. of Spectra
DB	1895	1395
DB+M^a	89	79
DB:DC	23	21
DBA^b	627	465
DBO	23	18
DBQ	5	4
DBZ	112	81

Notes.

^aThe subtype and RV of the M companion can be found in the online table. ^bSome of the DBA are actually DBAZ or DBAQ; they are all counted in "DBA."

Download table as: ASCII Typeset image

5.1.1. ID of Literature

In the last column of Table 6, the numbers represent the IDs of specific literatures, which are listed as follows.

0: first reported in this paper; 1: Kleinman et al. (2013); 2: Koester & Kepler (2015); 3: Kepler et al. (2015); 4: Kepler et al. (2016); 5: Atlee & Gould (2007); 6: Eisenstein et al. (2006a); 7: Croom et al. (2004); 8: Croom et al. (2001); 9: Rebassa-Mansergas et al. (2010); 10: West et al. (2008); 11: Girven et al. (2011); 12: Stepanian (2005); 13: Bicay et al. (2000); 14: Gentile Fusillo et al. (2015); 15: Levitan et al. (2015); 16: Drake et al. (2014); 17: Vennes et al. (2011); 18: Rau et al. (2010); 19: Carter et al. (2013); 20: Jura & Xu (2012); 21: Girven et al. (2012); 22: Bergeron et al. (2011); 23: Zuckerman et al. (2010); 24: Voss et al. (2007); 25: Koester et al. (2005); 26: McCook & Sion (1999); 27: Bradley (2000); 28: Bradley (1998); 29: Lépine & Shara (2005); 30: Stepanian et al. (1999); 31: Stark & Wade (2003); 32: Calcaferro et al. (2017); 33: Kleinman et al. (2004).

5.2. Noteworthy Individual Objects

Panel "b" in Figure 10 shows the AM Canum Venaticorum (AM CVn) type spectrum (p–m–f 7324–56935–0828) that we spectroscopically identified for the ML method that only requires the intensity of the change. The AM CVn binaries are a rare ultra-compact double degenerate system and only 43 such objects are known (Campbell et al. 2015; Levitan et al. 2015).

We provide 66 DB spectra with M type stars as companions. However, there are more than 30 DB M double stars in the literature (Kleinman et al. 2013; Kepler et al. 2015, 2016). The reason why these double stars cannot be discovered by our method is that the flux of M exceeds that of DB, which could lead to much weaker features of the DB in a spectrum. After a visual inspection, we select 23 "DB+M" double stars with relative good qualities from the literature. For these 89 double stars, we provide subtype and RV of the M companion in Table 6. One DB+M spectrum (p–m–f 1057–52522–0613) together with templates of DB and M are illustrated in the panel "c" of Figure 10.

5.3. Parameter Measurement

Koester & Kepler (2015) has provided and analyzed parameters of DBWDs with a theoretical model. Besides the selection of DB samples and research on the ML algorithm, we also provide the parameters of newly discovered DB spectra based on DB parameter templates provided by Koester & Kepler (2015). With the method of full spectral template matching mentioned in Section 2.1, we measure T_eff and log g on several He i lines, and presented the results in Table 6. The average errors of T_eff and log g are 30.1% and 10.6%, respectively.

5.4. T_eff and Ultra-violet Color

WDs are a type of stars that have strong intensities in the ultraviolet waveband. The FUV–NUV color from the Galaxy Evolution Explorer (GALEX) is almost reddening-free (Bianchi et al. 2017). All DB objects with photometric data of GALEX are cross-matched and those objects with errors in both FUV and NUV less than 0.3 mag are selected. From Figure 11, we conclude that ${T}_{\mathrm{eff}}$ and FUV–NUV color are roughly linear using Equation (1). The fitting variance is σ ≈ 0.19:

$\begin{eqnarray}&&y=-0.90\times {10}^{-4}x+2.05,\end{eqnarray} \tag{ 1 }$

where x is T_eff and y is the FUV–NUV color. The majority of sources fall within the ±3σ region of Equation (1), which is illustrated by the red dashed line in Figure 11.

**Figure 11.** Magnitude of FUV–NUV as a function of T_eff. The red x-marks represent objects from the literature, while the blue plus symbols are from our catalog.
Download figure:
Standard image High-resolution image

6. Conclusion and Discussion

We have spectroscopically identified 1999 DBWDs in the SDSS DR12 and DR14, including 58 newly identified objects, using ML, i.e., LASSO and SVM. A total of 176 DB objects from the literature are not included in our catalog, T_eff mostly varies around 11,000 K, and log g is fixed at 8.0. The DB spectra in this parameter range have almost no He i lines; hence, our method failed to identify these spectra.

Features of DB versus several other types of spectra were also extracted by LASSO using this procedure. Although we cannot guarantee the completeness of our samples, we have proposed a significant scheme to extract linearly supporting features from spectra to identify DBWDs.

Furthermore, we define all of the features illustrated in Figure 12, and Tables 8 and 9. A DB spectrum with high S/N is plotted in blue, and the features in red with a light gray background. The features within the area of most of the He i lines are asymmetric about the two sides, illustrating the characteristic of He i lines in a DB spectrum. These extracted features are demonstrated in Table 9, and do not differ from one or two atomic lines; the spectral line's name and position are also given in this table. We consider the flux in the range 4693.5–4699.0 Å as a characteristic of He i 4713.1 Å although it appears to go beyond the range of this spectral line because the flux of a DB spectrum begins to decrease slightly in this part. This kind of tiny variation is usually overlooked by the human eye, but is observable when using programs to carry out the classifications. More positions of other features are also defined in a similar fashion. The width of the feature to the left of some spectral lines, such as He i 5875.6 Å, is smaller than that on the right. On the other side, many features that are not linked to any specific spectral lines are present in Table 10. These features are purely data based or due to the residual sky background.

Table 9. Features of DB Located Near Familiar Spectral Lines

He i Line (Å)	Wavelength (Å)	Feature ID^a	Level^b
3888.6	3903.0–3908.4	r144He i	5

3964.7	3963.6–3978.3	cHe i	4

4026.2	4009.5–4019.8	l64He i	4, 5
	4026.2–4029.0	cHe i
	4031.8–4044.8	r56He i

4120.8	4180.2–4206.3	r594He i	4, 5

4387.9	4369.1–4372.2	l157He i	4
	4399.4–4404.6	r115He i

4471.5	4447.3–4492.6	cHe i	4, 5
	4513.3–4574.1	r418He i

4713.1	4693.5–4699.0	l141He i	4, 5
	4711.9	l12He i
	4714.1–4718.5	r10He i
	4729.3	r162He i
	4739.1–4740.2	r260He i
	4750.1–4755.5	r370He i

4921.9	4894.4–4911.4	l105He i	4
	4922.6–4923.8	cHe i
	4935.1–4940.8	r132He i

5015.7	5001.5	l142He i	4, 5
	5017.6–5018.8	r19He i
	5026.9–5029.2	r112He i

5047.7	5038.4–5039.7	l80He i	5
	5062.9–5085.1	r152He i

5875.6	5851.9–5857.4	l182He i	4
	5873.5–5879.0	cHe i
	5885.7–5903.4	r101He i

6678.2	6652.7–6657.4	l208He i	4
	6692.6–6709.7	r144He i

7065.2	7035.5–7038.9	l263He i	4
	7071.3–7082.8	r69He i

7281.4	7279.5	l19He i	4
	7289.5	r81He i

Notes.

^aFeature ID is defined to demonstrate the relations between features and spectral lines. For example, cHe i represents the line center, and l123He i and r123He i represent the left and right 12.3 Å to the line center, respectively. ^bDetailed scale of DB spectral wavelet decomposition, in which features can be detected.

Download table as: ASCII Typeset image

Table 10. Features of DB Not Located Near the Familiar Spectral Lines

Wavelength (Å)	Length (Å)
4240.3, 4252.1, 4273.7, 4281.5, 4288.5	⋯
4585.6, 4587.8, 4593.0, 4596.2, 4599.4	⋯
4608.9, 4647.3, 4648.4, 4658.0, 4665.5	⋯
4666.6, 4763.2, 5107.4, 5141.6, 5578.3	⋯
5592.4, 6264.7, 7184.6, 7340.1, 8830.8	⋯
4232.5–4233.5	1.0
4257.9–4266.8	8.9
4277.6–4279.6	2.3
4602.5–4605.8	3.3
4624.9–4625.9	1.0
4638.7–4642.0	3.3
4651.5–4654.8	3.3
5272.3–5277.0	4.7

Download table as: ASCII Typeset image

Finally, we measure the parameters, T_eff and log g, of DBWDs using DB templates from Koester & Kepler (2015). The consistency of T_eff of DB objects between Koester & Kepler (2015) and our catalog is demonstrated through the FUV–NUV colors from GALEXY. The distribution of mag_g also indicates capability of our method of searching for DBWDs in fainter objects.

The authors would like to thank Drs. Hai-Feng Yang, Peng Wei and Zhen-Ping Yi for valuable discussion, and also appreciate Dr. Anthony E. Lynas-Gray for the helpful suggestions. This work was funded by the National Basic Research Program of China (973 program, 2014CB845700) and National Natural Science Foundation of China (grant No. 11390371/4). This work is supported by the Astronomical Big Data Joint Research Center, co-founded by the National Astronomical Observatories, Chinese Academy of Sciences and the Alibaba Cloud. The SDSS-III web site is http://www.sdss3.org/. SDSS-III is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS-III Collaboration including the University of Arizona, the Brazilian Participation Group, Brookhaven National Laboratory, Carnegie Mellon University, University of Florida, the French Participation Group, the German Participation Group, Harvard University, the Instituto de Astrofisica de Canarias, the Michigan State/Notre Dame/JINA Participation Group, Johns Hopkins University, Lawrence Berkeley National Laboratory, Max Planck Institute for Astrophysics, Max Planck Institute for Extraterrestrial Physics, New Mexico State University, New York University, Ohio State University, Pennsylvania State University, University of Portsmouth, Princeton University, the Spanish Participation Group, University of Tokyo, University of Utah, Vanderbilt University, University of Virginia, University of Washington, and Yale University.

Spectral Feature Extraction for DB White Dwarfs Through Machine Learning Applied to New Discoveries in the Sdss DR12 and DR14

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction