Applying Random Forest Classification to Ultracool Dwarf Discovery in Deep Surveys. I. Color Classification with SDSS, UKIDSS, and WISE Photometry

Zijie Gong; Adriana Nava Vega; Eduardo Gauna Gutierrez; Arantxa Mendiola Maytorena; Carlos Verdaguer; Christian Aganze; Christopher Danner; Adam J. Burgasser

doi:10.3847/2515-5172/ac6521

The following article is Open access

Applying Random Forest Classification to Ultracool Dwarf Discovery in Deep Surveys. I. Color Classification with SDSS, UKIDSS, and WISE Photometry

Zijie Gong¹, Adriana Nava Vega², Eduardo Gauna Gutierrez³, Arantxa Mendiola Maytorena⁴, Carlos Verdaguer⁵, Christian Aganze⁶, Christopher Danner^6,7, and Adam J. Burgasser⁶

Published April 2022 • © 2022. The Author(s). Published by the American Astronomical Society.
Research Notes of the AAS, Volume 6, Number 4 Citation Zijie Gong et al 2022 Res. Notes AAS 6 74 DOI 10.3847/2515-5172/ac6521

Article metrics

787 Total downloads

Author e-mails

aburgasser@ucsd.edu

Author affiliations

¹ Tecnológico Nacional de México, Campus Tijuana, Tijuana, BC, México

² Universidad Autónoma de Baja California, Calzada Universidad Tijuana, Tijuana, BC, México

³ Universidad Autónoma de Baja California, Carretera Transpeninsular Ensenada, Ensenada, BC, México

⁴ Mater Dei Catholic High School, Chula Vista, CA, USA

⁵ Instituto Tecnológico y de Estudios Superiores de Monterrey—Campus Sonora Norte, Hermosillo, Sonora, México

⁶ Center for Astrophysics and Space Sciences, UC San Diego, La Jolla, CA, USA; aburgasser@ucsd.edu

⁷ San Diego State University, Department of Astronomy, San Diego, CA, USA

ORCID iDs

Christian Aganze https://orcid.org/0000-0003-2094-9128

Adam J. Burgasser https://orcid.org/0000-0002-6523-9536

Dates

Received November 2021
Revised March 2022
Accepted April 2022
Published April 2022

Unified Astronomy Thesaurus concepts

Brown dwarfs; L dwarfs; T dwarfs; Stellar classification; Surveys; Random Forests; Classification; Low mass stars

Journal RSS

Create or edit your corridor alerts

What are corridors?

Abstract

In this first of two studies, we apply a random forest model to classify ultracool dwarfs from broadband color information. Using the Skrzypek et al. ultracool dwarf sample and a set of background sources, we trained a random forest classifier based on 28 colors derived from optical and infrared photometry from SDSS, UKIDSS, and WISE. Our model achieves 99.7% accuracy in segregating L- and T-type UCDs from background sources, and 97% accuracy in separating spectral subgroups. A separate random forest regressor model achieved a spectral classification precision of 1.3 subtypes. We applied these models to a 12.6 deg² region with overlapping SDSS, UKIDSS, and WISE coverage and identified 35 UCD candidates, five of which are previously reported, of which four are photometrically or spectroscopically classified UCDs. Our random forest model can be applied to multiple surveys to greatly expand the known census of UCDs.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Ultracool dwarfs (UCDs) are the lowest-mass stars and brown dwarfs, with effective temperatures T_eff ≲ 3000 K and spectral classifications ≳M6 (Kirkpatrick 2005). As intrinsically faint sources, these objects are relatively rare in large imaging surveys. Nevertheless, several thousands of UCDs have been found in wide-field optical and infrared imaging surveys such as SDSS (York et al. 2000), UKIDSS, (Lawrence et al. 2007), and WISE (Wright et al. 2010); and in deep imaging surveys such as the Dark Energy Survey (Carnero Rosell et al. 2019). Identifying rare sources in rich data sets is an ideal problem for machine learning methods such as random forest (RF) classification (Breiman 2001), which have previously been deployed to classify M dwarfs (Hardegree-Ullman et al. 2019) and perform star-galaxy classification (Miller et al. 2017; Clarke et al. 2020) using photometric data. Here, we explore the application of a hierarchical RF model to segregate and classify UCDs using multi-color photometry.

2. Methods

Our UCD training set was drawn from the compilation of Skrzypek et al. (2016), which includes 1341 photometrically-classified late-M, L, and T dwarfs with photometry from the SDSS, UKIDSS, and WISE surveys. From this sample, we selected a subset of 233 sources with L and T photometric classifications and complete photometry in 8 photometric bands: SDSS i, z; UKIDSS Y, J, H, K; and WISE W1, W2. We also drew a sample of 4055 "backgound" (non-UCD) sources from a 1° radius circular field centered at α = 12^h, δ = +10° with overlapping SDSS, UKIDSS, and WISE coverage. Our classifying data were comprised of 28 colors derived from the measured photometry. We constructed two RF classifier models: one to segregate UCDs from non-UCDs, and a second to classify UCDs into four spectral type groups: L0–L4.5, L5–L9.5, T0–T4.5, and T5–T9.5. We also trained an RF regressor model to derive decimal classifications for the UCD sample. We used the scikit-learn package (Pedregosa et al. 2011) to design and train these RF models. From our sample, 76% was used as the training set, 9% as the validation set, and 15% as the test set.

The validation set was used to adjust the hyperparameters of the RF models, including the number of trees (30), tree depth (no limit), accuracy criterion (Gini coefficient), and use of bootstrapping. This initial training is used to prevent both underfitting and overfitting, and to maximize our classification/regression metrics of precision, accuracy, recall, and F₁ score (Chinchor 1992). The RF models were trained in the Google Colab environment⁸ (Carneiro et al. 2018).

3. Random Forest Performance

Figure 1 displays the confusion matrices for the UCD/non-UCD and UCD spectral group classifiers. The former achieved an accuracy of 99.7% on the test sample, while the latter achieved an accuracy of 97%. For the UCD/non-UCD classifier, our feature importance analysis found that i − z color had the most predictive power for identifiying UCD candidates. For the UCD spectral group classifier (Figure 1(c)), we found that W1 − W2, K − W2, and i − J colors had the most predictive power for determining UCD spectral type. These colors also have clear monotonic trends with spectral type across the L and T dwarf sequence. We found reasonable agreement between the predicted spectral types from the RF regressor model to the types reported in Skrzypek et al. (2016), with an average classification error of 1.3 subtypes.

4. Application as Discovery Tool

Once the RF models were re-trained on the entire Skrzypek et al. (2016) sample, we applied them to a sample of 13,483 sources with overlapping SDSS, UKIDSS, and WISE photometry in a 2° radius circular field (12.6 deg²) centered at α = 10^h, δ = +5°. Our UCD/non-UCD classifier selected 35 sources as candidate UCDs; our UCD spectral group classifier and spectral type regressor identified most of these as early L dwarfs with classifications between L1 and L6. Five of these sources have SIMBAD entries. One, J095924.95+061628.2 was identified as a candidate white dwarf by Gentile Fusillo et al. (2019), and hence is unlikely to be a UCD. The other four sources are all identified as spectroscopically confirmed or candidate UCDs, suggesting a ≈80% reliability for our RF model. Additional follow-up of the other candidates will allow us to more accurately quantify this reliability. At 80% reliability, the corresponding surface density of UCDs (2.2 deg⁻²) yields ∼9000 L and T dwarfs in the roughly 4000 deg² of overlap area between UKIDSS, SDSS, and WISE (Lodieu et al. 2017).

This research was conducted as part of the ENLACE bi-national summer research program at UC San Diego. We thank Dr. Olivia Graeve for organizing this program and for her mentorship. This research has made use of the SIMBAD database, operated at CDS, Strasbourg, France.

Software: astropy (Astropy Collaboration et al. 2018), astroquery (Ginsburg et al. 2019), scikit-learn (Pedregosa et al. 2011).

Footnotes

8
https://colab.research.google.com/

Please wait… references are loading.

Applying Random Forest Classification to Ultracool Dwarf Discovery in Deep Surveys. I. Color Classification with SDSS, UKIDSS, and WISE Photometry

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Methods

3. Random Forest Performance

4. Application as Discovery Tool

Footnotes