Data-driven Derivation of Stellar Properties from Photometric Time Series Data Using Convolutional Neural Networks

Kirsten Blancato; Melissa K. Ness; Daniel Huber; Yuxi(Lucy) Lu; Ruth Angus

doi:10.3847/1538-4357/ac7563

1. Introduction

In the coming years the number of stars with photometric time-series observations is projected to increase by several orders of magnitude. The ongoing Transiting Exoplanet Survey Satellite (TESS) mission (Ricker et al. 2014) will deliver light curves for the order of 10⁵ stars, while the Legacy Survey of Time and Space (LSST; LSST Science Collaboration et al. 2009) is planned to deliver light curves for an unprecedented number of ∼10⁸ stars. The large stellar samples covered by these space- and ground-based surveys will enable further probing of known, and possibly reveal new, empirical connections between time-domain variability and stellar physics. In combination with Gaia parallaxes (Gaia Collaboration et al. 2016, 2018), these observations have the additional potential to markedly extend the characterization of stellar properties and populations throughout the Milky Way. However, while high-quality light curves have been used to infer a number of stellar properties, fast and automated methods that can be employed on shorter-baseline and sparser-cadence observations will be crucial to maximize insights from the forthcoming volume of time-domain data.

Brightness variability in the time domain encodes information about stellar properties through physical processes including oscillations, convection, and rotation. With high-cadence time-domain data from the Kepler (Borucki et al. 2008) and CoRoT (Baglin et al. 2006) missions, solar-like oscillations have been detected in a large ensemble of stars. These oscillations are a result of turbulent convection near the stellar surface, which induces acoustic standing waves in the interiors of both main-sequence and evolved stars, generating stellar fluctuations across a range of timescales (e.g., Aerts et al. 2010). Solar-like oscillations are typically parameterized through two average parameters, ${\nu }_{\max }$ and Δν, which can be precisely measured in power spectra computed from high-cadence time-series data (e.g., Hekker et al. 2009; De Ridder et al. 2009; Gilliland et al. 2010; Bedding et al. 2010; Mosser et al. 2010; Stello et al. 2013; Yu et al. 2018). The frequency of maximum power, ${\nu }_{\max }$ , is dependent on the temperature and surface gravity of a star (Brown et al. 1991; Belkacem et al. 2011), while the large frequency separation between consecutive overtones, Δν, is dependent on the stellar density (Ulrich 1986). The combination of ${\nu }_{\max }$ and Δν thus allows a direct measurement of stellar masses (M_*) and radii (R_*; e.g., Kjeldsen & Bedding 1995; Stello et al. 2009, 2009c; Kallinger et al. 2010; Huber et al. 2011).

In stars with convective envelopes, stellar granulation is also imprinted in a star's photometric variability. The circulation of convective cells produces brightness fluctuations at the stellar surface, where brighter regions correspond to hotter, rising material (granules) and darker regions correspond to cooler, sinking material (intergranule lanes). Because the size of the granules is dependent on the pressure scale height (Freytag & Steffen 1997; Huber et al. 2009; Kjeldsen & Bedding 2011), the variability timescale of granulation has been demonstrated to scale with the surface gravity (log g) of a star (Mathur et al. 2011; Kallinger et al. 2016; Pande et al. 2018). The relationship between granulation timescale and surface gravity has led to the development of the "Flicker method," in which brightness variations on timescales less than 8 hr are used to estimate log g (Bastien et al. 2013, 2016; Cranmer et al. 2014). With this estimate of log g, and a probe of effective temperature (T_eff; e.g., from spectroscopy, broadband photometry), a star's relative position on the Hertzsprung–Russell (HR) diagram, and thus evolutionary state, can be determined.

In addition to oscillations and granulation, stellar rotation also contributes to variability in the observed brightness of a star. Star spots on the surface of magnetically active stars quasiperiodically cross the observable stellar face, imprinting semi-regular patterns in the photometric time series (Strassmeier 2009; García et al. 2010). Based on these modulations, stellar rotation periods, (P_rot), have been estimated by examining light curves from ground-based surveys (e.g., Irwin et al. 2009), the Kepler and CoRoT missions (e.g., Mosser et al. 2009; do Nascimento et al. 2012; Reinhold et al. 2013; Nielsen et al. 2013; García et al. 2014; Santos et al. 2019), as well as more recently the K2 and TESS missions (e.g., Curtis et al. 2019; Reinhold & Hekker 2020). As rotation at the surface is linked to processes occurring in the stellar interior (e.g., dynamos, turbulence; e.g., Zahn 1992; Mathis et al. 2004; Browning et al. 2006; Decressin et al. 2009; Wright et al. 2011), there is a prospect of using rotation period measurements to probe fundamental stellar properties, as well as the magnetic and dynamical evolutionary history of stars.

Of particular value is the connection between the rotation period and stellar age. As main-sequence stars evolve, stellar winds transport angular momentum away from the star, slowing the rate at which it rotates (Weber & Davis 1967; Kawaler 1988; Bouvier et al. 1997). The empirical relationship between the stellar age and rotation period was first realized by Skumanich (1972), which prompted the development of gyrochronology (Barnes 2003) as a tentative tool for estimating stellar ages from rotation and color alone. Recent theoretical work has focused on deriving the gyrochronology relations from stellar physics (e.g., Matt et al. 2012; Reiners & Mohanty 2012; Gallet & Bouvier 2013), open clusters, and other stellar samples, for which precise and independent measurements of both the stellar age and rotation period can be made have been used to calibrate these relationships (e.g., Kawaler 1989; Barnes 2003, 2007; Cardini & Cassatella 2007; Meibom et al. 2009; Mamajek & Hillenbrand 2008; Agüeros et al. 2018; Douglas et al. 2016, 2019).

However, as illustrated in Angus et al. (2015), a robust empirical calibration of gyrochronology has proven to be challenging. Based on Kepler stars with asteroseismic age estimates, it is found that multiple age–period–color relationships are necessary to describe the properties of the stellar sample, which suggests that the gyrochronology relationship is underspecified. Furthermore, in Angus et al., accepted for publication in AJ, it is demonstrated that empirically calibrated gyrochronology models are not able to sufficiently reproduce the ages of rotating stars, particularly of late K- and early M-type dwarfs, which suggests that the simple gyrochronology relation as proposed by Skumanich (1972) is unable to capture the full complexity of stellar spin-down. Adding additional physics appears necessary, and in the semiempirical modeling of rotational evolution pursued in Spada & Lanzafame (2020), it is found that including a mass and age-dependent core–envelope coupling timescale is needed to reproduce the rotation periods of stars in old open clusters (e.g., Curtis et al. 2019).

The uncertainty of gyrochronology relations have also been revealed from a theoretical perspective. For instance, van Saders et al. (2016) find that weakened magnetic breaking limits the predictive capability of gyrochronology, specifically for stars in the second half of their main-sequence lifetimes. In the context of theoretical stellar rotation models, Claytor et al. (2020) determine the biases associated with the inference of the stellar age from the rotation period for lower-main-sequence stars based on current theoretical models of stellar angular momentum spin-down. Furthermore, combining theoretical models of stellar rotation with expected observational biases, van Saders et al. (2019) use forward modeling to probe rotation periods across a population of stars, finding that current models of magnetic braking fail at longer rotation periods, and that particular care is necessary to correctly interpret the stellar ages from the rotation period distributions.

As a result of the physical processes described above, high-cadence stellar photometry contains rich information at multiple timescales about fundamental stellar properties including the mass, radius, and age. Careful analysis of light curves from missions like Kepler and CoRoT have revealed the potential of these data and enabled the determination of stellar properties for thousands of stars. However, the imminent volume of time-domain data that will be delivered by surveys like TESS and LSST necessitates the development of new methods for estimating the stellar properties from the shorter-baseline and sparser-cadence data. Automated pipelines to measure the asteroseismology parameters ${\nu }_{\max }$ and Δν have been developed and applied to large samples of Kepler stars (Huber et al. 2009), and Bayesian methods for inferring these parameters have been tested on small (<100 stars) samples (Davies et al. 2016; Lund et al. 2017), and on ∼13,000 K2 Campaign 1 stars (Zinn et al. 2019).

Automated methods for extracting rotation periods from Kepler photometry have also been put forth. Producing the largest catalog of homogeneously derived rotation periods to date, McQuillan et al. (2014) derive rotation periods for ∼30,000 main-sequence stars with a peak identification procedure in the autocorrelation function (ACF) domain based on a minimum baseline of ∼2 yr of observational coverage (see McQuillan et al. 2013). Instead, taking a probabilistic approach, Angus et al. (2018) infer posterior probability distribution functions (PDFs) for the rotation periods of ∼1000 stars based on a Gaussian process model. This method has the benefit of not assuming strictly sinusoidal periodicities, and compared to traditional methods it provides more robust credible intervals on the inferred rotation periods. However, Angus et al. (2018) find that the posteriors still underestimate the true uncertainties, and the method relies on computationally expensive posterior sampling. Most recently, Lu et al. (2020) implement a random forest model to predict rotation periods from light curves and Gaia data, with a particular focus on deriving the long periods of M dwarfs from TESS data.

Data-driven techniques have shown promise in their capability to efficiently identify red-giant-branch (RGB) stars with solar-like oscillations, and to estimate fundamental stellar properties like T_eff and log g from time-domain data. Learning a generative model for RGB stars, Ness et al. (2018) use The Cannon (Ness et al. 2015) to model the ACF amplitude at each lag as a polynomial function of stellar properties (T_eff, log g, ${\nu }_{\max }$ , Δν). Trained on ∼4 yr baseline data, Ness et al. (2018) find the variance of their log g estimator to be <0.1 dex and the variance of their T_eff estimator to be <100 K, with the information required to learn these properties being contained in ACF lags up to 35 days and 370 days, respectively, for log g and T_eff. Taking a similar approach, Sayeed et al. (2021) learn a local linear regression model between the power density at each frequency of smoothed Kepler power spectra and stellar properties. For upper-main-sequence and RGB stars that do not exhibit rotation, Sayeed et al. (2021) learn a log g estimator with a variance <0.07 dex based on the 10 nearest neighbors in the frequency domain of the training set. Neural networks have also been implemented for RGB asteroseismology. Training a convolutional neural network (CNN) on an image representation of Kepler power spectra, Hon et al. (2017) classify RGB stars versus core helium-burning stars to an accuracy of 99%, and in Hon et al. (2018b) their approach predicts ${\nu }_{\max }$ to an uncertainty of about 5%. In Hon et al. (2018a) it is found that based on power spectra images derived from 4 yr, 356, 82, and 27 day data, the classification accuracy decreases from ∼98% based on 4 yr data to ∼93% based on 27 day data.

In this work, we pursue systematically and consistently estimating a set of stellar properties directly from photometric time-series data. We do this by fitting a flexible one-dimensional (1D) CNN to the data, which is able to capture the structure of the data in the time domain on multiple scales and requires minimal or no feature engineering. Using a single quarter of Kepler data and asteroseismology-quality stellar measurements as our training set, we build models to classify the stellar evolutionary state and a set of stellar properties across the RGB (including red giants and red clump stars) and main sequence from light curves of various baselines and cadences, and compare these results to models based on the ACF and frequency-domain transformations of the data. The CNN classification model distinguishes RGB stars from main-sequence and subgiant stars to an accuracy of ∼90%, and for RGB stars we demonstrate that the CNN regression model trained on 27 day Kepler light curves is able to predict log g to an rms precision of ∼0.07 dex, Δν to an rms precision of ∼1.1 μHz, ${\nu }_{\max }$ to an rms precision of ∼17 μHz, and T_eff to an rms precision of ∼300 K. For main-sequence stars, we predict rotation periods up to P_rot ∼ 35 days based on 27 day and even 14 day data, with an rms precision of ∼6 days. We also find that for observations spaced 1 day apart (over 97 days), we can recover P_rot from ≈5 to 40 days with an rms precision of ∼6.2 days. Our approach, which leverages the full information content of the data, serves as a proof of concept in the pursuit of estimating stellar properties for many millions of stars from variable-quality time-domain data.

2. Training Data

2.1. The Kepler Data

To predict stellar and asteroseismology parameters from time-domain variability, we build models trained on long-cadence (29.4 minutes sampling) Kepler data. We download all available Q9 light curves from the Kepler mission archive,⁵ which supplies ∼97 days of time-domain observations for 166,899 stars. To minimize the amount of data processing, we train our models based on this single quarter of observations.

Light curves are often transformed to different representations, in the frequency and time-lag (i.e., ACF) domains. This is commonly done in order to extract signals that concentrate in these forms: ${\nu }_{\max }$ and Δν from the frequency spectrum, and rotation period from the peak in the ACF. In this work, we do not collapse the data to a few measurable signatures. We leverage the entire set of flux observations to predict the stellar properties. It is therefore unclear as to whether a particular choice of data representation will be better than another. We examine how well we can derive the stellar properties using (i) time, (ii) frequency, and (iii) time-lag representations of the data. We report the differences between the approaches in Section 5. We note however that any preferential representation, in terms of prediction performance, may simply reflect the compatibility of the data representation, given our modeling choice.

2.2. Light-curve Processing

2.2.1. The Time Domain

The flux measurements we use are the Pre-search Data Conditioning Simple Aperture Photometry (PCDSAP) flux values, which have been corrected for systematic errors and anomalies caused by the spacecraft and instrument (Jenkins et al. 2010; Twicken et al. 2010). Before transforming the time-series data to different domains, we apply two additional processing steps to each light curve. First, we remove observations with a SAP_QUALITY flag greater than 0. We then apply a local sigma clipping algorithm to each light curve, removing observations with flux values more than three standard deviations away from the mean flux computed in a sliding window of 50 consecutive observations. Finally, we transform the light curves to be in units of relative flux, Δf/f. These three preprocessing steps are applied to the light curves before any further processing and transformation into other data domains.

For the models we build based on the data in the original time domain, we apply the following additional processing steps. First we enforce the light curves to be on a common time grid, so that the structure of the input data is standardized. As the cadence of the Kepler data is mostly regular with flux measurements at every 29.4 minutes, we set any missing flux values in the time grid to zero. In Section 7 we discuss alternative imputation choices that can be explored; however our model is successful taking the simplest zero-imputation approach. We then normalize the relative flux values of each light curve by subtracting the mean (μ) and dividing by the standard deviation (σ) of the relative flux values, so that f_scaled = (f − μ)/σ. Because the standard deviation of each individual light curve provides useful information in comparing across the collection of light curves, we supply this as an additional feature to the model as discussed in Section 3.2.

In Figure 1 we illustrate how the time-domain data varies for stars across the HR diagram. In the main panel of this figure, we show (in gray) the distribution of stars in the stellar radius against effective temperature plane (R_*–T_eff) for the set of ∼150,000 stars from Berger et al. (2018) that have Kepler Q9 light curves available. The main sequence is distributed across log(R_*) ⪅ 0.5 R_⊙ and 6500 K ⪅ T_eff ⪅ 3000 K, and the RGB is located at log(R_*) ⪆ 0.5 R_⊙ and 5500 K ⪅ T_eff ⪅ 3000 K with the red clump resolved at log(R_*) ∼1 R_⊙ and T_eff ∼ 4800 K. Stars with T_eff ⪆ 6500 K have thin convective or entirely radiative envelopes, and thus include "classical" pulsators such as delta Scuti or gamma Doradus stars. To demonstrate how the shape of the light curves vary across the regions of the HR diagram, the middle signal in the inset panels shows the time-domain data for two stars with different property values in the main sequence, as well as for two stars in the RGB/red clump with different property values. For the main-sequence stars (at about the same stellar radius) their light curves clearly indicate a stellar rotation signal with the hotter, upper-main-sequence star having a shorter rotation period of 10 days and the cooler, lower-main-sequence stars having a longer rotation period of 39 days. For the two RGB stars with T_eff ∼ 5000 K, we see that unlike the main-sequence stars the light curves do not exhibit a rotation signal within the 97 days baseline that is shown, but that the amplitudes of the short timescale variations differs between the stars at different R_*. From these example light curves in the main sequence and RGB we see that the time-domain data varies across the HR diagram, where stars with different properties exhibit distinctive light-curve characteristics. What this suggests is that from the light curves alone we can place stars (to some degree of precision) on the HR diagram and predict other fundamental stellar properties.

2.2.2. The ACF

In addition to working with data in the original light-curve space, we also test building models based on the ACF of the time series. The ACF describes the strength of periodic signals present in time-series data by measuring the similarity of the time series with itself at different lags. The ACF has been shown to be an effective domain for measuring the surface gravity of stars (Kallinger et al. 2016), the rotation periods of main-sequence stars (McQuillan et al. 2014), as well as the temperatures, surface gravity's, and asteroseismology observables of RGB stars (Ness et al. 2018).

For observations evenly space in time, t_k=(k − 1)Δt, the ACF at each lag k is,

$\begin{eqnarray}&&{\mathrm{ACF}}_{k}=\displaystyle \frac{{\sum }_{i=1}^{N-k}[({x}_{i}-\overline{x})({x}_{i+k}-\overline{x})]}{{\sum }_{i=1}^{N}{\left({x}_{i}-\overline{x}\right)}^{2}},\end{eqnarray} \tag{ 1 }$

where the numerator is the covariance between the time series and itself at lag k, and the denominator is the variance of the time series, which normalizes the ACF to be 1 at lag k = 0 and defined over the range [−1, 1] (e.g., see Ivezić et al. 2014, Chapter 10). To compute the ACF according to Equation (1), we first linearly interpolate the flux of each light curve to a common, evenly spaced time grid defined from 0 to 97.4 days with a Δt = 0.0204 days (i.e., the long-cadence sampling).

The main panel of Figure 1 shows example ACFs for stars in the main sequence and RGB. For the two stars in the main sequence, we see that the second peak of the ACF corresponds to the rotation period of the star, with the peaks at later lags being integer multiples of the period. However, for the two RGB stars, ACF shows less visible structure. For these stars that do not exhibit strong rotation over the baseline of the data, the information contained in the ACF is more subtle. For example, granulation, as a stochastic process, is much less coherent than rotation, which results in a less structured imprint of this signal in the ACF.

2.2.3. The Frequency Domain

Another representation of stellar time-series data is in the frequency domain. The power spectrum of a star's light curve quantifies the strength of the flux signal across a range of timescales (T), represented as the spectral density (P) as a function of frequency (f = 1/T). The primary asteroseismology observables, ${\nu }_{\max }$ and Δν, are defined and identified in the power-spectrum representation of stellar light curves (e.g., Bedding et al. 2010; Yu et al. 2018). For discretely sampled data the fast Fourier transform (FFT) algorithm, which represents the light curves as a summation of sinusoidal functions, is typically used to compute the power spectrum of stellar time series. However, the FFT algorithm requires that the time series be regularly sampled over the entire observation window. In the case of unevenly sampled or missing data, an alternative method for generating a frequency-domain representation of time-series data is to compute a periodogram as an estimate of the true power spectrum. A commonly used algorithm in astronomy is the Lomb–Scargle (LS) periodogram (Lomb 1976; Scargle 1982), which is a least squares method for detecting sinusoidal periodic signals in time-series data.

To compute the LS periodogram of the Kepler light-curve data, we use the implementation provided by the astropy package. Following the recommendations of VanderPlas (2018), we compute the periodogram on a frequency grid with a minimum frequency of ${f}_{\min }$ = 0 Hz, a maximum frequency of ${f}_{\max }$ = 1/(2 δ t) Hz, and a frequency spacing of Δf = 1/(n_o T) Hz, where T is the baseline of the observations (e.g., 97.39 days for Q9) and n_o is the oversampling factor, which we set to n_o = 10. The value for the Nyquist frequency, ${f}_{\max }$ , is a pseudo-windowing limit, where we take δ t to be most frequent spacing of the time-series observations (0.0204 days).

In the main panel of Figure 1 we show example periodograms for stars at different locations in the HR diagram. Considering the two RGB stars, the frequency of the maximum power is a prominent feature of the power spectra. For the star with the larger stellar radius, ${\nu }_{\max }$ is at a lower frequency of 29 μHz, while the ${\nu }_{\max }$ of the star with a smaller stellar radius is at a higher frequency of 179 μHz. For main-sequence stars, ${\nu }_{\max }$ is not visible. For these stars, the frequency of the maximum power resides at frequencies greater than the range permitted by the Nyquist frequency (⪆240 μHz). Even though ${\nu }_{\max }$ lies beyond the frequency grid of the power spectra for these stars, the overall shape and other features of the spectrum contain useful information that can potentially be indicative of the properties of the star.

2.3. Stellar Property Catalogs

There are a number catalogs in the literature providing stellar property estimates for Kepler stars. Many of these catalogs have stars in common, but there is no joint database that exists. Here we try to systematically explore the intersect of several important and relevant catalogs for data-driven inference work. Figure 2 shows the coverage and set intersection (e.g., catalog 1 ∩ catalog 2) of six stellar property catalogs with stars that have Kepler Q9 light curves available. As seen in the figure, the Berger et al. (2018) catalog includes a majority of the Kepler stars, delivering estimates of R_* and evolutionary state across the HR diagram. The catalog that provides stellar property estimates for the next greatest number of stars is the McQuillan et al. (2014) rotation period catalog for ∼30,000 main-sequence stars, and following this the Yu et al. (2018) and Pande et al. (2018) catalogs provide ${\nu }_{\max }$ , Δν, M_*, R_*, and log g for ∼13,000 stars and ${\nu }_{\max }$ , log g, and T_eff for ∼10,000 stars respectively, primarily for stars on the RGB. The remaining catalogs shown in Figure 2 provide stellar properties for fewer stars, with a minimum of ∼4000 to be included in the figure.

**Figure 2.** UpSet plot (Lex et al. 2014) showing the set intersections of the Kepler stars with Quarter 9 light curves and the various stellar property catalogs available in the literature. The histogram indicates the number of stars contained in the set defined in each column, where the shaded circles indicate which catalogs are intersected. For conciseness, only the intersections that contain a minimum of 4000 stars are displayed. This plot demonstrates the various data sets that can be used to train a model to predict stellar properties from light curves.
Download figure:
Standard image High-resolution image

As an initial proof of concept of our modeling approach, we focus on the three catalogs covering the greatest number of stars (McQuillan et al. 2014; Yu et al. 2018; Pande et al. 2018) where the stellar properties are homogeneously derived. However, models can certainly be tested on the other catalogs, as well as on a set of stellar properties combining the estimates from multiple catalogs. The stellar property catalogs compiled in Figure 2 demonstrate the various data sets that can be constructed and used to train data-driven models of stellar properties.

We now provide a brief description of how the stellar properties we predict in Section 5 are derived. For the Yu et al. (2018) sample, which includes RGB stars, we successfully recover the asteroseismology observables, Δν and ${\nu }_{\max }$ , as well as log g, each which are derived as follows:

1.
Δν: derived from the Kepler 29.4 minutes cadence data across available quarters using the SYD pipeline described in Huber et al. (2009), which consider the light curves in both the frequency and ACF domain of the data (see Huber et al. 2009 for details). The mean of the reported uncertainties on Δν is ∼0.05 μHz, and the mean fractional uncertainty is ∼1%.
2.
${\nu }_{\max }$ : derived with the same pipeline as Δν (see Huber et al. 2009 for details). The mean of the reported uncertainties on ${\nu }_{\max }$ is ∼0.9 μHz, and the mean fractional uncertainty is ∼2%.
3.
log g: derived along with the mass and radius from scaling relations from Kjeldsen & Bedding (1995). The mean of the reported uncertainties on log g is ∼0.01 dex, and the mean fractional uncertainty is ∼0.5%.

We highlight that, while these labels have been determined using 4 yr of data, we test our inference using only one quarter of data, to simplify our analyses.

For the Pande et al. (2018) sample, which includes RGB as well as subgiant stars, we successfully recover T_eff and log g, which are derived as follows:

1.
T_eff: taken from Mathur et al. (2017), who compiled temperatures from various sources including spectroscopic and photometric based measurements (see Mathur et al. 2017 for details). The mean of the reported uncertainties on T_eff is ∼140 K, and the mean fractional uncertainty is ∼2.5%.
2.
log g: determined from Kepler 29.4 minute cadence data based on an empirical relationship between log g, T_eff, and ${\nu }_{\max }$ , which has been established using the Fourier transform of the 1 minute cadence Kepler benchmark data set, consisting of ∼500 stars (Huber et al. 2011; Bastien et al. 2013). The mean of the reported uncertainties on log g is ∼0.25 dex, and the mean fractional uncertainty is ∼8%.

and finally for the McQuillan et al. (2014) sample, which covers main-sequence stars, we successfully recover the stellar rotation period, and weakly recover M_*, which are derived as follows:

1.
M_*: derived from the Baraffe et al. (1998) isochrone models taking T_eff as input, where T_eff is either from the Kepler Input Catalog (KIC) or Dressing & Charbonneau (2013), if available. As reported in McQuillan et al. (2014), given a ∼200 K precision for the T_eff estimates the typical uncertainty on M_* is ∼0.1 M_⊙. Assuming a 0.1 M_⊙ uncertainty across the entire stellar mass range, this translates to a mean fractional uncertainty of ∼12%.
2.
P_rot: derived from a minimum of 8 of the 12 Kepler 29.4 minutes cadence quarters from Q3 to Q14. The rotation period for each star is identified using an automated peak identification procedure in the ACF domain (see McQuillan et al. 2013), excluding stars from the sample that are eclipsing binaries, Keptel objects of interest, and without convective envelopes (T_eff > 6500 K). The mean of the reported uncertainties on P_rot is ∼0.6 days, and the mean fractional uncertainty is ∼3%.

3. Methods

In this section we discuss our modeling approach, as well as outline our training and evaluation procedures. The modeling code is made publicly available on GitHub at https://github.com/kblancato/theia-net.

3.1. Modeling Approach

As demonstrated in Figure 1, the properties of stars and the traits of their light curves vary jointly across the HR diagram. Given these correlations, our goal is to predict the properties of a star based on its light curve alone. To achieve this, the model we choose should capture the time structure of the data. We use the word model to describe the infrastructure the neural network builds that connects the time-domain data to the physical parameters that describe it. We capture, with this model, how each flux value is related to other values in the time series. Typically, the time structure of light curves is characterized by transforming the data to either the ACF domain or the frequency domain, described in Sections 2.2.2 and 2.2.3, respectively. After performing these data transformations, informative features in these domains are identified and used to infer stellar properties that features are known to correlate with. These data transformations require additional computational time and preconceptions of how to transform the data to produce the features of interest. Transformations of data may also result in information loss. Given these considerations, in this paper our goal is to build a model that can learn directly from the time-series data itself, requiring minimal preprocessing or handcrafted engineering of the raw data.

To do this, we implement a 1D CNN to accomplish the supervised learning task of mapping light-curve data to stellar properties. CNN-based models have been very successfully used for many supervised learning tasks, particularly for image classification (e.g., Krizhevsky et al. 2017; He et al. 2015; Simonyan & Zisserman 2014; Goodfellow et al. 2014; Ronneberger et al. 2015). They are built from a hierarchy of artificial neural networks, known as "universal function approximators" (Hornik et al. 1990; Hornik 1991), which learn increasingly abstract representations of the input data, $\vec{X}$ , by nonlinearly transforming the data through a series of hidden layers that relate $\vec{X}$ to an output prediction $\vec{Y}$ . CNNs are a special class of neural network architecture, that differ from fully connected neural networks, by their inclusion of only partially connected, or so-called convolutional layers, which detect the topological structure of the input data, capturing how neighboring image pixels are related spatially, or how adjacent time-series measurements are related temporally. The convolution operation relates elements of the input data to each other through weight sharing. This makes the modeling more efficient and less prone to overfitting than the fully connected counterpart, by effectively reducing the number of model parameters that need to be learned. CNN models have been successfully used for a variety of tasks in astronomy, including the classification of galaxy morphology and properties based on galaxy images (e.g., Dieleman et al. 2015; Huertas-Company et al. 2018; Domínguez Sánchez et al. 2018), to predict characteristics of stellar feedback in CO2 emission maps (Van Oort et al. 2019; Xu et al. 2020), and to predict the 3D distribution of galaxies from the underlying dark matter distribution in large-volume cosmological simulations (Yip et al. 2019; Zhang et al. 2019).

With the CNN as our model of choice, the modeling approach we take is a so-called end-to-end discriminative approach. A model is learned from a set of objects, for which the input data (light curves), and labels (stellar properties) that describe it, are both defined. The model takes the time-series light-curve data as input, and through the training process learns an informative set of features from the data that optimize the stellar property predictions. This procedure requires no handcrafted transformations or feature engineering of the data as a separate procedure before model training. For the task that we tackle here of predicting the stellar properties from time-series data, there are a number of alternative models that can capture the time dependence of the data. We discuss an alternative method that is also suited to this problem, recurrent neural networks (RNNs), in the discussion.

3.2. Model Architecture

The CNN model architecture we implement has two convolutional layers, followed by three fully connected layers, which together perform the stellar property prediction. Given that the size of our training sets are of the order of 10⁴ examples, we define a relatively small network architecture so as to minimize the number of network parameters that need to be learned and to prevent overfitting. For comparison, AlexNet (Krizhevsky et al. 2017), with five convolutional layers and three fully connected layers, had a total of 60 million network parameters and was trained on 1.2 million images.

Figure 3 is a visual representation of the model, showing the operations performed to transform the light-curve data to a stellar property prediction. The left-most block represents the light-curve data itself, which has been preprocessed and scaled as described in Section 2.2.1. The first operation applied to the time-series data is a 1D convolution with one input channel, i.e., the scaled flux values at each time, and a specified number of output channels, N_K, which corresponds to the number of learned kernels each having its own weight matrix and bias. This makes the number of parameters to learn for each convolutional layer [(K_W × K_H)+1] × N_K, where K_W is the kernel width and K_H is the kernel height (in the 1D case K_H = 1). The addition of one accounts for the single bias parameters learned per kernel. The convolution operation takes the input vector, $\vec{X}$ , of length n(X_in), and transforms it into a new vector of length n(X_out), which is computed as:

$\begin{eqnarray}&&n({\vec{X}}_{\mathrm{out}})=\left[\displaystyle \frac{n({\vec{X}}_{\mathrm{in}})+2\times P-D\times ({K}_{W}-1)-1}{S}+1\right],\end{eqnarray} \tag{ 2 }$

where P is number of zeros padded to either side of the time series, D is the dilation factor, and S is the stride over which the convolution is taken. In Figure 3, the block to the immediate right of the light-curve data represents the output of the first convolution layer, where each of the N_K output channels has a length described by Equation (2).

**Figure 3.** Schematic of the CNN architecture implemented to predict stellar properties from light curves. The left-most panel represents the input time-series data, the next two panels indicate the two convolutional layers with different kernel widths and output channels, the fourth and fifth layers show the two fully connected layers where each circle represents a hidden unit, and the last layer is the stellar property prediction. The symbols describing the network architecture are defined in Table 1. For the classification of evolutionary state, the last layer is replaced with a prediction of the probability of the star belonging to the RGB, subgiant branch, and the main sequence.
Download figure:
Standard image High-resolution image

After each convolution, three additional operations are performed on the data before it is passed to the next layer of the model. First, an activation function is applied to introduce nonlinearities into the model. This captures nonlinear relationship between the data and the labels that describe it. To do this, we implement the commonly used Rectified Linear Unit (ReLU) activation function, defined as ${\max }(0,{\vec{X}}_{\mathrm{out}})$ . Following the activation function, a pooling operation is applied. Pooling, or "downsampling," reduces the dimensionality of the data vector that will be passed to the following convolutional layer and aids in the prevention of overfitting. The pooling operation slides over the data vector and typically takes either the maximum or average of the data values within each window, resulting in an output vector of length $n({\vec{X}}_{\mathrm{out}})$ = $n({\vec{X}}_{\mathrm{in}})$ /K_pool when the stride is set equal to the pooling kernel width, K_pool. Lastly, batch normalization (Ioffe & Szegedy 2015) is applied to each output channel. Batch normalization solves the problem of "internal covariate shift," in which the distribution of each hidden-layer value changes during training, as the parameters of the previous layers are updated. To enforce that the distribution of the hidden-layer values is similar throughout the training process, the batch normalization operation standardizes the values of each hidden-layer by subtracting and dividing by the batch mean. This operation adds two new parameters for the model to learn, that weight and shift the normalized vector, but leads to faster and more stable training and also acts to regularize the model.

After the activation function, pooling, and batch normalization operations, convolutions are performed on each of the N_K output channels from the previous layer, which has the same properties as the first convolution, as described above. Following this second convolution operation, an activation function, pooling, and batch normalization are again applied to the data. After the two convolution layers, the output channels produced by the second convolution are flattened to a single dimension, and the data are then passed to the fully connected part of the network. The fully connected part of the network, represented in the last four panels of Figure 3, is a typical multilayer perceptron (MLP). Each element of the flattened data vector produced by the second convolution layer is mapped to N₁ hidden units in the first MLP layer, with each hidden unit having its own learnable weight parameter, w₁. With the addition of a bias parameter, b₁, the output of the first MLP layer is $\vec{h}(X)$ = $f\left({\sum }_{i=0}^{n({\vec{X}}_{\mathrm{out}})}({w}_{1,i}{X}_{i}+{b}_{1}\right)$ , where f is the specified activation function and n(w_1,i) = N₁. To this layer we also pass the scaled standard deviation of the flux values (light curve σ) for each light curve, to capture how the amplitude of the light curves varies across the sample. The second and third fully connected layers take the output of the layer immediately preceding it, and perform the same operation with each layer learning its own set of weights and biases.

The last operation of the model architecture, shown as the right-most panel of Figure 3, is the prediction of the output stellar property $\vec{Y}$ . In the case of regression, $\vec{Y}$ = $\left({\sum }_{i=0}^{{N}_{3}}{({w}_{4,i}g(X)}_{i}+{b}_{4})\right)$ , where n(w_4,i) = 1. We experiment with one hyperparameter describing the fully connected part of the model architecture, which is the dropout probability, D_FC, applied to the first and second MLP layers. Dropout is a form of regularization, where, during each training iteration, the values for a number of hidden units are randomly set to zero with a probability of p (Hinton et al. 2012). The two dropout probabilities we consider are D_FC = [0.0, 0.3].

The architecture described above is a smaller-capacity, 1D regression version of the "vanilla" end-to-end CNN architectures that are commonly used for the task of image classification. In Table 1, we summarize the parameters of the model architecture, and note which (hyper)parameters we experiment with varying, which we will discuss in Section 3.5. In the following sections we also describe how we split the data for training, validation, and testing, and we describe our training procedure and model evaluation metrics.

Table 1. Model and Training Parameters

	Parameter	Value(s)/Setting(s)	Description
	N_conv	2	number of convolutional layers
	N_K,1	64	number of output kernels
	K_W,1	3, 5, 6, 8, 12, 20	kernel width
	P₁	4, 5, 2, 3, 1, 5	padding
	S₁	3, 3, 2, 2, 2, 2	convolution stride
	T_pool,1	average	pooling type
	K_pool,1	4	width of pooling kernel
	f_conv,1	ReLU	convolution activation function
	N_K,2	16	same as above for second convolution
	K_W,2	5, 8, 10, 12, 16, 30	⋯
	P₂	1, 0, 0, 1, 1, 1	⋯
	S₂	1, 2, 2, 2, 1, 1	⋯
Architecture	T_pool,2	average	pooling type
	K_pool,2	2	⋯
	f_conv,2	ReLU	⋯
	N_FC	3	number of fully connected layers
	N₁	2048	number of hidden units in fully connected layer
	f_FC,1	ReLU	fully connected activation function
	D_FC,1	0.0, 0.3	dropout probability applied to fully connected layer
	N₂	1024	same as above for second fully connected layer
	f_FC,2	ReLU	⋯
	D_FC,2	0.0, 0.3	⋯
	N₃	256	same as above for third fully connected layer
	f_FC,3	ReLU	⋯
	D_FC,3	0.0	⋯

	optimizer	AdamW	⋯
	α	10⁻⁵, 10⁻⁴, 10⁻³	learning rate
Optimization	λ	10⁻⁵, 10⁻¹	weight decay parameter
		10⁻⁸, 10⁻²	numerical stability term
	${ \mathcal L }$	mean squared error (MSE)	loss function

	N_batch	256	training batch size
Training	N_epochs	800	maximum number of training epochs
	N_stop	50	number of epochs to stop training if no improvement
	N_tol	10⁻²	early stopping tolerance

Download table as: ASCII Typeset image

3.3. Data Sets

We split each of the data samples into three sets to form a training set (72%), a validation set (13%), and a test set (15%). This split of the data was chosen to include as many stars as possible in the training sets, while having at least a thousand stars in the validation and test sets to be representative of the entire parameter range. We test a 50%-25%-25% and 90%-5%-5% train-validate-test split for two parameters, Δν and P_rot, and find only marginal differences in model performance, with variations in the r² score of the best models being of the order of a few percent.

The full Yu et al. (2018) sample includes 10,755 stars with 7,769 in the training set, 1,372 in the validation set, and 1,614 in the test set. The full Pande et al. (2018) sample includes 13,439 stars with 9,709 in the training set, 1,714 in the validation set, and 2,016 in the test set. The full McQuillan et al. (2014) sample includes 27,001 stars with 19,507 in the training set, 3,443 in the validation set, and 4051 in the test set. Figure 9, in the Appendix, shows the distribution of the stellar properties we learn for each sample, including Δν, ${\nu }_{\max }$ , and log g for the Yu et al. (2018) sample, log g and T_eff for the Pande et al. (2018) sample, and P_rot and M_* for the McQuillan et al. (2014) sample. In forming the training, validation, and test sets, we draw stars evenly from the underlying stellar property distribution.

The training sets are used to train a given model, i.e., learn the optimal weights and biases of the network. The validation sets, which do not contribute to learning the network parameters, are used to evaluate the performance of the network throughout the training process, as well as perform the hyperparameter selection. To prevent overfitting, we implement early stopping based on monitoring the loss of the validation set, which we describe further in Section 3.4. The test sets, which are independent of learning the network parameters, and are used to evaluate the performance of the model after training has been terminated. We describe the model evaluation and selection procedure in more detail in Section 3.6

For model training we scale the distribution of each stellar property we predict to the range [0, 1] by computing ${\vec{Y}}_{\mathrm{scaled}}$ as,

$\begin{eqnarray}&&{\vec{Y}}_{\mathrm{scaled}}=\displaystyle \frac{\vec{Y}-\min (\vec{Y})}{\max (\vec{Y})-\min (\vec{Y})},\end{eqnarray} \tag{ 3 }$

where this operation is performed separately for the training, validation, and test sets to prevent information leakage. In this context, information leakage refers to when the distribution of one data set is incorrectly used to inform the scaling of another data set, making them no longer independent, which often leads to inflated model performance.

3.4. Training Procedure

The models are trained using NVIDIA Tesla GPUs. We implement our model architecture and training procedure in the machine-learning library PyTorch (Paszke et al. 2017), which includes the nn module that can be used to define a variety of network architectures, as well as compute model gradients and perform tensor computations with GPU support.

For our training task, to predict continuous stellar properties, the loss function, ${ \mathcal L }$ , we optimize is the mean square error (MSE), which is a common choice for regression problems. The mean squared difference between the true and predicted target value is computed as,

$\begin{eqnarray}&&\mathrm{MSE}=\displaystyle \frac{1}{{N}_{\mathrm{batch}}}\sum _{i-1}^{N}{\left({Y}_{i}-{\hat{Y}}_{i}\right)}^{2},\end{eqnarray} \tag{ 4 }$

where N_batch is the number of data examples in the batch, Y_i is the true stellar property of interest, and ${\hat{Y}}_{i}$ is the predicted stellar property, computed through the series of convolution and fully connected network operations as described in Section 3.2.

For each model we train, the training data are batched into sets of N_batch = 256 stars. For the Yu et al. (2018) sample this results in 31 training batches, for the Pande et al. (2018) sample this results in 38 training batches, and for the McQuillan et al. (2014) sample this results in 77 training batches. During each training iteration, the model is evaluated, and the model parameters are updated. One epoch of training has been completed once all of the training batches have been passed through the network. Batching the training data reduces the memory requirements during each training iteration, decreases the training time as the weights are updated more frequently, and acts to improve how well the model generalizes to unseen data.

The training procedure, which is typical for neural network models, can be summarized as follows: (1) forward pass of the batch through the network architecture to compute ${\vec{\hat{Y}}}_{\mathrm{batch}}$ , (2) compute ${ \mathcal L }$ according to Equation (4), (3) backpropagation of ${ \mathcal L }$ through each layer of the network architecture, (4) compute ${{\rm{\nabla }}}_{\vec{\theta }}{ \mathcal L }$ , the gradient of the loss function with respect to each model parameter $\vec{\theta }$ , (5) update the value of each model parameter to minimize ${ \mathcal L }$ . Steps 1 through 5 are repeated for every training batch iteration, and the model is trained for N_epochs = 800 epochs or until an early stopping criterion is met. During training, we also compute the loss function for a validation set described in Section 3.3. At the beginning of each training epoch, steps 1 and 2 listed above are carried out on the validation data set, and the loss is monitored as the model trains. As the validation set is not used to update the model weights, the performance of the model on this data set is diagnostic of how generalizable the model is to new data. To combat overfitting, we implement an early stopping criterion based on the validation loss as a function of epoch. If the validation loss does not improve consecutively for N_stop = 50 epochs within a tolerance of N_tol = 10⁻², then training is terminated and the model parameters before the validation loss ceased to improve are saved as the final model.

To update the values of the model parameters during training (i.e., step 5 above), we use PyTorch's implementation of the adaptive moment estimation (AdamW) optimization method (Kingma & Ba 2014), with "Decoupled Weight Decay Regularization" (Loshchilov & Hutter 2017). AdamW is an adaptive learning rate optimization method that computes individual learning rates for each model parameter based on the exponential moving average of the first and second moments of the loss function gradient, ${{\rm{\nabla }}}_{\vec{\theta }}{ \mathcal L }$ , with two parameters β₁ and β₂, that set the exponential decay parameters for each moment. For the models we train, we fix the exponential decay parameters to their defaults in the original Adam paper of β₁ = 0.9 and β₂ = 0.999. Adam differs from traditional stochastic gradient descent, which uses a single learning rate for all parameters throughout the duration of the training process. Each model parameter, θ_i, is updated at time step t:

$\begin{eqnarray}&&{\theta }_{i}^{t}={\theta }_{i}^{t-1}-\alpha \displaystyle \frac{{\hat{m}}_{i}^{t-1}}{\sqrt{{\hat{v}}_{i}^{t-1}}+\epsilon },\end{eqnarray} \tag{ 5 }$

where = 10⁻⁸ is typically added to promote numerical stability, ${\hat{m}}_{i}^{t}$ is the bias-corrected exponential average of the first moment of the gradient with respect to parameter ${\theta }_{i}^{t}$ , and ${\hat{v}}_{i}^{t}$ is the exponential average of the second moment of the gradient with respect to parameter ${\theta }_{i}^{t}$ , both computed as defined in Kingma & Ba (2014).

The initial learning rate, α, controls the step size at which the model parameters are updated. The optimal learning rate is problem specific, but is typically set in the range of [10⁻⁴, 10⁰]. Learning rates that are too low can result in training that takes many iterations to find a minimum in the loss function gradient, and without sufficient training time the parameter space may not have been explored sufficiently and a local minimum solution is returned. Learning rates that are too high can overstep the minimum in the loss function gradient and ultimately fail to converge on a desirable solution. As will be described in Section 3.5, we experiment with three different initial learning rate values, α = [10⁻⁵, 10⁻⁴, 10⁻³], and choose the one that leads to the best results for predicting a given stellar property.

Lastly, to introduce regularization into the optimization routine, we add a weight decay term to the loss function described in Equation (4). The weight decay term, λ∥θ∥², is a typical L2 regularization that penalizes model parameters that become too large by a factor of λ. As will be described in the following section, we test two different weight decay parameters, λ = [10⁻⁵, 10⁻¹].

3.5. Model Hyperparameters

As is evident in Table 1, there are numerous parameters that must be set to define the model architecture as well as the training procedure. These "hyperparameters" are parameters whose values are determined before training begins, and are not updated through the course of the training process. As the dimensionality of the hyperparameter space is large, it is not feasible to evaluate all possible hyperparameter combinations and the effect each has on model performance. However, as an improvement beyond choosing ad hoc or values selected empirically, we heuristically choose a small set of hyperparameters that are varied systematically and perform a grid search over the combinations. We train one model with each hyperparameter combination defined in the grid and select the preferred hyperparameter values based on how the model performs on the validation data set. Limiting the search to just five hyperparameters, we test varying K_W, α, λ, , and D_FC. We define a grid over the values of these parameters and train a model with each combination of hyperparameters. For each run, all other architecture and training hyperparameters are set to the values listed in Table 1.

As described in Section 3.4, the optimal learning rate is problem specific and the consequences for choosing too low or high of a rate can result in poor model performance. Therefore, we experiment with three values for the initial learning rate, α = [10⁻⁵, 10⁻⁴, 10⁻³]. We also experiment with two values for the weight decay parameter, λ = [10⁻⁵, 10⁻¹]. We prioritize varying this parameter because the amount of regularization in the optimization procedure directly impacts the values of the model parameters and controls how well the model generalizes to unseen data. We also experiment with two values of the numerical stability term , testing values of both 10⁻⁸ and 10⁻².

In addition to the two optimization-related hyperparameters, we also test varying one of the model architecture parameters. Motivated by our physical understanding of how information about different stellar properties are encoded at different timescales in the light curves, we decide to test varying the kernel widths of the convolution layers. Presumably, smaller kernel widths are more sensitive to information encoded on shorter timescales, while larger kernel widths will pick up information imprinted on longer timescales. We choose eight different kernel widths to test for the first convolution layer, K_W,1 = [3, 5, 6, 8, 12, 20], which corresponds to convolution over timescales of t_conv,1 = [.061, .102, .123, .163, .245, .408] days respectively. For the second convolution layer we choose larger kernel widths, with K_W,2 = [5, 8, 10, 12, 16, 30], where each element in K_W,2 is paired with its corresponding element in K_W,1. After the second kernel is applied, these kernel widths result in time series that are convoluted over timescales of t_conv,2 = [.306, .817, 1.23, 1.96, 3.92, 12.25] days respectively. To ensure that each convolution and pooling operation results in an integer number of output data elements, we modify the zero-padding and stride parameters for each layer as necessary. The values for P₁, P₂, S₁, and S₂ for each element of K_W,1 and K_W,2 is listed in Table 1.

3.6. Model Evaluation and Selection

As described in Section 3.3, we select the best model (over the grid of hyperparameters tested) based on the models performance on the validation set. The validation set is not used to train the model, and thus yields a more realistic report of how the model performs on unseen data. To assess the performance of a given model, we compute three evaluation metrics: the coefficient of determination (r²), the bias (Δ), and the rms. The r² score is computed as,

$\begin{eqnarray}&&{r}^{2}=1-\displaystyle \frac{1}{N{\sigma }^{2}}\sum _{i}{\left({Y}_{i}-{\hat{Y}}_{i}\right)}^{2},\end{eqnarray} \tag{ 6 }$

where Y and $\hat{Y}$ are the true and model-predicted values of the dependent variable, N is the number of observations in the validation or test set, and σ² is the variance of $\vec{Y}$ . An r² score closer to 1 indicates that the model predicts the variation in Y well, whereas an r² score of 0 indicates that the model does not capture any of the variation. The bias and rms of the estimator are computed as,

$\begin{eqnarray}&&{\rm{\Delta }}=\displaystyle \frac{1}{N}\sum _{i}\left(\hat{{Y}_{i}}-{Y}_{i}\right),\end{eqnarray} \tag{ 7 }$

and

$\begin{eqnarray}&&\mathrm{rms}={\left[\displaystyle \frac{1}{N}\sum _{i}{\left({Y}_{i}-\hat{{Y}_{i}}\right)}^{2}\right]}^{\tfrac{1}{2}},\end{eqnarray} \tag{ 8 }$

respectively. Both of these metrics are in units of the stellar property, Y, where less bias and smaller rms values both indicate better model performance.

For each stellar property that we predict for the Yu et al. (2018), Pande et al. (2018), and McQuillan et al. (2014) samples, we train 144 models with the hyperparameter choices described in Section 3.5. Each of these models is trained according to the procedure outline in Section 3.4, with the model architecture described in Section 3.2. Based on the same validation set for each stellar property, we compute the r², Δ, and rms for each of the 144 models we train to predict the property. For each stellar property, we select the best model according to a two-step procedure. If possible, we first eliminate all models with bias values greater than 10% of the mean or rms values greater than 50% of the standard deviation of the stellar property validation set distribution, described as follows:

$\begin{eqnarray}&&{\rm{\Delta }}\,\leqslant 0.1\times \displaystyle \frac{{\sum }_{i}{Y}_{i}}{N},\end{eqnarray} \tag{ 9 }$

$\begin{eqnarray}&&\mathrm{rms}\,\leqslant 0.5\times \sqrt{\displaystyle \frac{1}{N}\sum _{i}{\left({Y}_{i}-\overline{Y}\right)}^{2}},\end{eqnarray} \tag{ 10 }$

after eliminating models that do not meet both of the criteria above. We then rank the models according to their r² scores. We then visually inspect the performance of the top 10 models trained for each property and select the model with the highest r² score that does not exhibit structure in the true-versus-predicted plots for the validation set. As evident by comparing Equations (6), (7), and (8), the three evaluation metrics are closely related, so a high r² score is correlated with small Δ and rms values. Depending on the specific use case of the stellar property predictions, this model selection process can be easily modified to emphasize a particular or different evaluation metric. The final performance results we show in the following sections are based on the test set performance, which are data that were not used to train, validate, or select the best model.

4. Classification of Evolutionary State

Before attempting the regression problem described in Section 3, we start with the broader task of predicting a star's evolutionary state based on its light curve. By determining a star's general location on the HR diagram, this classification task serves as an initial probe of our modeling capabilities, before we move on to the task of predicting continuous (as opposed to categorical) stellar properties. This classification model also has the utility to be used a front end to an automated stellar property derivation pipeline.

4.1. Data

To train the classification model we build a data set based on the overlap between the stars listed in the Berger et al. (2018) catalog and the stars with Kepler Q9 light curves, which includes ∼150,000 stars as shown in Figure 2. Stars in the Berger et al. (2018) catalog are classified into three evolutionary states; main sequence, subgiant, or RGB, based on fitting solar-metallicity evolutionary tracks to the transition between the end of the main sequence and start of the RGB in the temperature–stellar radius plane as shown in Figure 5 of Berger et al. (2018). Of the total catalog, 67% of stars are classified as main-sequence stars, 21% as subgiant stars, and 12% as RGB stars. We randomly sample the same number of stars from the three classes to ensure a balanced classification problem, with the data set including 13,355 stars from the main sequence, subgiant branch, and RGB (which includes red clump stars) each, totaling 40,065 stars. We split this data set into three parts as described in Section 3.3, which results in 28,945 stars in the training set, 5,109 stars in the validation set, and 6,010 stars in the test set.

4.2. Methods

For the classification problem we make two main modifications to the model, one to the model architecture described in Section 3.2 and one to the training procedure described in Section 3.4. First, instead of the output of the final fully connected layer of the model being a single value (as shown in Figure 3), for the classification problem the output of the model is equal to the number of distinct classes C (in this case, C = 3). To convert the model output to a prediction probability over classes we apply the softmax function, σ(y)_i = ${e}^{{y}_{i}}$ / ${\sum }_{j=1}^{C}{e}^{{y}_{i}}$ , and assign each star to the class with the highest probability. The second change we make is to the loss function. Instead of computing the mean square error described by Equation (4), we instead compute the cross entropy loss, which is appropriate for training multiclass classification problems. The cross entropy over C classes is computed as,

$\begin{eqnarray}&&{CE}=\sum _{j=0}^{C}\left[-{Y}_{c}+\mathrm{log}\left(\sum _{j=0}^{C}\exp \,{\hat{Y}}_{j}\right)\right],\end{eqnarray} \tag{ 11 }$

where the first term, Y_c, is the indicator variable of the star's true class membership, and the second term is the log of the sum of the unnormalized class probabilities ${\hat{Y}}_{j}$ over the C classes output from the model. In addition to the above changes to the model architecture and loss function, we also evaluate the model performance with metrics that are relevant for classification models, which are different than the metrics used in Section 3.6. We focus on three performance metrics; the accuracy, average precision, and the area under the receiver operator curve. The multiclass accuracy, which quantifies the number of correct predictions averaged across C classes, is computed as,

$\begin{eqnarray}&&\mathrm{accuracy}=\displaystyle \frac{1}{N}\sum _{j=0}^{C}({{TP}}_{j}+{{TN}}_{j}),\end{eqnarray} \tag{ 12 }$

where N is the total number of stars in the validation or test set, TP_j is the number of stars correctly identified as belonging to class j (i.e., true positives), TN_j is the number of stars correctly identified as not belonging to class j (i.e., true negatives). An accuracy closer to unity indicates better model performance.

We also compute the average precision across classes (in a one-versus-rest manner), which summarizes the precision–recall curve. Given the class probabilities output by the model as described above, different probability thresholds can be placed to define the boundary between the classes. Precision, defined as P = TP/(TP + FP), where FP is the number of false positives, measures how many correct predictions are made for stars belonging to a certain class at a given threshold. Recall, R = TP/(TP + FN), where FN is the number of false negatives, measures how many stars belonging to a classes are recovered from the total population of that class. The precision–recall curve describes the trade-off between precision and recall at different class threshold boundaries, with the best threshold being one that produces both a high precision and high recall. The average precision, computed as,

$\begin{eqnarray}&&\mathrm{AP}=\sum _{n}{P}_{n}({R}_{n}-{R}_{n-1}),\end{eqnarray} \tag{ 13 }$

where P_n, R_n, and R_n−1 are the precision and recall values at the nth and nth-1 probability thresholds, is the weighted mean of precisions at each recall threshold, with AP closer to unity indicating better model performance. In addition to accuracy and average precision, we also measure model performance by computing the area under the receiver operator characteristic curve (AUROC). At different probability thresholds, the operator characteristic curve (ROC) shows the true positive rate, TPR = TP/(TP + FP) (i.e., the recall), as a function of the false-positive rate, FPR = FP/(FP + TN), which describes the number of stars incorrectly classified as belonging to a class relative to the total number of stars that do not belong to the class. Models with low FPRs and higher TPRs indicate good performance, which corresponds to an AUROC closer to unity.

While the various classification metrics described above are related, they each emphasize different aspects of the model performance. The accuracy is the most general, measuring the fraction of total correct predictions. While this is a good overall metric of model performance, for more specific use cases of the predictions it is often not detailed enough. The precision captures how often the model is correct when the model predicts a specific class instance, which is relevant when the consequences of a false-positive prediction are high. On the other hand the recall, which captures the fraction of a class that is correctly identified, is relevant when the consequences of a false-negative prediction are high. Depending on the specific application of the classification model, it can be important to consider these different metrics together, and not only the accuracy alone. For example, if the classifier is used to select targets for follow-up spectroscopy of one class, a classifier with high precision, but with low recall, would lead to an inefficient observing program.

For the classification problem, we perform the same hyperparameter grid search as described in Section 3.5, training a total of 144 models. In the following section we report all of the metrics described above; however, as our goal here is to demonstrate the general performance of the classification model, we select the best model based on which hyperparameter combination results in the best overall accuracy on the validation set. The model performance reported in the next section is on the independent test set.

4.3. Results

Figure 4 shows the performance of the best model we train, evaluated using the metrics described above, to classify stars as main sequence, subgiant, or RBG based on their light curves. The middle panel of the figure shows the confusion matrix, with the true class labels along the y-axis and the predicted class labels along the x-axis. Stars that fall into the diagonal bins are correctly classified by the model, while the stars that fall into the off-diagonal bins are incorrectly classified. As evident by the confusion matrix, the model performs the best at distinguishing RGB stars from the other evolutionary states, at an accuracy of 91%. For the remaining true RGB stars, 3% are misclassified as main-sequence stars and 6% are misclassified as subgiant stars. Examining the predictions for main-sequence stars, 56% are correctly classified, while 40% of main-sequence stars are misclassified as subgiants and 4% as RGB. For the subgiant stars, only 57% are classified correctly, while 34% are incorrectly classified as main-sequence stars and 9% are misclassified as RGB stars.

The precision–recall and ROC curves in Figure 4 show how varying the class discrimination threshold based on the prediction probabilities results in classifiers with different performance properties. The precision–recall curve shows that the RGB stars are clearly separable from the main-sequence and subgiant stars, with high precision values maintained at most recall thresholds resulting in a average precision of AP = 0.95. The main-sequence and subgiant stars exhibit worse performance, with average precisions of AP = 0.65 and AP = 0.57, respectively. The ROC curve shows similar behavior with regards to the classification performance. The TPR for the RGB stars is high across nearly the entire range of FPR thresholds, with an AUROC = 0.97. For the main-sequence and subgiant branch stars, high TPRs are only achieved along with higher FPRs. The TPR of main-sequence stars reaches ∼0.95 at FPRs greater than 0.5, with an AUROC = 0.81, and the TPR of subgiant stars reaches ∼0.95 at FPRs greater than 0.6, with an AUROC = 0.77. This is still better than the performance of a random classifier, characterized by an AUROC = 0.5.

To summarize, we find that the model does well at distinguishing between main-sequence and RGB stars, but mixes up the identification of a significant portion of main-sequence and subgiant branch stars. The performance of the classification model we train likely reflects that the light curves of main-sequence and RGB stars vary enough to be informative as to these evolutionary states, but that light curves vary across large regions of the HR diagram in a continuous (rather than discrete) manner. Part of the reason for the poorer results may also be related to the quality of classifications. Due to the lack of spectroscopy, Berger et al. (2018) used solar-metallicity isochrones to separate evolutionary stages, which will introduce significant noise as the exact border between main-sequence and subgiant stars is sensitive to metallicity. In contrast, the subgiants and red giants are clearly separated by luminosity with relatively small dependence on metallicity, thus yielding more accurate classifications.

5. Predicting Stellar Properties

5.1. Results of CNN Stellar Property Recovery

In the previous section, we demonstrated the potential of using a 1D CNN model in the time domain to classify a star's evolutionary state. We now turn our attention to the main goal of this paper, which is to predict stellar properties from light-curve data. As shown in Figure 2, there are many possible training sets that can be constructed to predict a variety of stellar properties given the catalogs that are available in the literature. Here, we focus on the three catalogs with the large numbers of stars available: the Yu et al. (2018) catalog, which includes 10,757 stars; the Pande et al. (2018) catalog, which includes 13,441 stars; and the McQuillan et al. (2014) catalog, which includes 32,920 stars. We split each of these stellar samples into a training set, a validation set, and a test set as described in Section 3.3, and train individual models to predict each sample and stellar property combination according to the procedure described in Section 3.4. We perform the hyperparameter search as described in Section 3.5, and for each parameter we present the predictions resulting from the best of the 144 models trained, selected as described in Section 3.6.

Figure 5 shows the stellar property predictions for each sample's test set derived from the best-trained models. For each stellar property, we show in the top panel the true stellar property value versus the model-predicted stellar property value, where the one-to-one line indicates a perfect prediction. In the bottom panels, we show the fractional difference between the model-predicted and the true stellar property values, as a function of the true values. The bottom panel therefore more clearly highlights the parameter space where the model is biased. In this panel we also indicate the regions of 3 and 5 standard deviations from a perfect prediction (expect for the P_rot panel which shows −0.5σ to 1σ), when the fractional difference is equal to zero. For each stellar property the mean (μ) and standard deviation (σ) of the true test set values are also indicated, as well as the model evaluation metrics, r², Δ, and rms, and the fractional bias and rms in parenthesis.

First, we examine the predicted stellar properties based on the Yu et al. (2018) RGB stellar sample, showing the ${\nu }_{\max }$ , Δν, and log g predictions in the top row of Figure 5. As seen in the figure, we find that we recover all three of these stellar properties well, with r² scores greater than 0.95. Demonstrating the importance of the hyperparameter search, the worst performing models for these three parameters result in r² values of ∼0.8–0.85. Examining the best ${\nu }_{\max }$ model, the overall bias of Δ = −3.5 μHz is ∼5% of the mean of the true test set values, while the rms of the predictions is 11.85 μHz, over the range of ${\nu }_{\max }$ values from 5 to 250 μHz. Considering the prediction quality as a function of ${\nu }_{\max }$ , we see that the predictions of ${\nu }_{\max }$ values less than ∼10 μHz and greater than ∼150 μHz are more biased. This is seen most clearly in the bottom panel of Figure 5, which shows the fractional difference, with some predictions falling in the 5σ range (and eight examples comprising 0.5% of the test set fall outside the plot limits). As seen in Figure 9 in the Appendix, the Yu et al. (2018) sample includes far fewer stars with these smaller and larger ${\nu }_{\max }$ values. This means that there are fewer examples for the model to learn this region of the parameter space well during training.

Similar to the ${\nu }_{\max }$ prediction, Δν for the Yu et al. (2018) sample is also recovered well. The overall bias of Δ = −0.17 μHz is ∼2.5% of the mean of the true test set values. The rms of the predictions is 0.89 μHz over the range of ${\nu }_{\max }$ values from 0.9 to 18.8 μHz. Similarly to ${\nu }_{\max }$ , Δν is biased for the smallest and largest values. For Δν values less than ∼2 μHz and greater than ∼12 μHz, the predictions are more biased. This is seen most clearly in fractional difference plot, with a few predictions falling in the 5σ range (and eight examples comprising 0.5% of the test set fall outside the plot limits). Again, as with the ${\nu }_{\max }$ , the Yu et al. (2018) sample includes far fewer stars with these smaller and larger Δν values, which means that there are fewer examples for the model to learn this region of the parameter space well at the training time.

We note that the failure of the model at the lowest Δν and ${\nu }_{\max }$ (and also the lowest log g values) in particular may not be a consequence of the sparsity and low numbers of training set in this parameter space. The model may break down for these stars due to any degeneracy between the data and labels that describe the data, or the data–label relationship changing discretely rather than smoothly—due to underlying physical processes. We note that the mass determination is very poor at the low-mass end, and in effect mass is not able to be inferred below about 0.7 solar masses, as this information is not captured in the data. Follow-up work seeking to lay physical interpretation to the label inference (where the information comes from, how and why) will elucidate this failure mode.

The final stellar property we predict for the Yu et al. (2018) stellar sample is log g. This property is also recovered well by the best-trained model, with an overall bias of Δ = 0.01 dex and an rms of the predictions of 0.06 dex, over the range of log g values from 1.6 to 3.3 dex. Considering the prediction quality across the range of log g values, we again find that the predictions are more biased in the parameter space regions with fewer representative stars in the training set, as shown in Figure 9 of the Appendix. As evident in the bottom panel of Figure 5, there is more bias in the predictions for stars with log g values less than 2 dex and also greater than 3.2 dex (and 11 examples comprising 0.7% of the test set fall outside the plot limits).

We now discuss the property recovery for the Pande et al. (2018) stellar sample, which, as shown in Figure 1, includes stars from the RBG as well as the subgiant branch and the upper main sequence. The first property we consider is log g. As seen in Figure 5, the best CNN model recovers log g with an r² score of 0.89, an overall bias of Δ = 0.07 dex, which is ∼2% of the mean of the true test set values, and an rms of 0.22 dex over the range of log g values from 2 to 4.8 dex. In the fractional difference plot (which excludes six examples comprising 0.3% of the test set given the axis limits), we see that for log g values less than ∼3.75 dex, there is a systematic positive bias. This is perhaps cause by the model trying to correctly predict the larger number of less evolved stars (log g ∼4) at the expense of biasing the log g predictions for the red giants. Compared to the recovery of log g for the Yu et al. (2018) sample of RGB stars, log g is recovered less precisely for the Pande et al. (2018) sample, as evident by both the difference in r² scores between the two models, 0.89 for the Pande et al. (2018) versus 0.97 for Yu et al. (2018), as well as the higher rms of the Pande et al. (2018) model at an rms of 0.22 dex, compared to 0.06 dex for the Yu et al. (2018) sample. One reason for the difference in the log g prediction quality between these two stellar samples is the precision of the stellar properties used to train the models. As mentioned in Section 2.3, Yu et al. (2018) use asteroseismology with an uncertainty of 0.01 dex for the derived log g values, while the reported uncertainty on the Pande et al. (2018) log g values based on granulation is much higher, at ∼0.25 dex.

The other property we successfully predict for the Pande et al. (2018) stellar sample is T_eff. With an r² score of 0.79, the bias of the best T_eff model is Δ = −66.4 K, which is ∼1% of the mean of the true test set values, and the rms is 310 K over a range of temperature values from 4520 K to 7123 K. As seen in the fractional difference plot of Figure 5 (which excludes two examples comprising 0.1% of the test set), the prediction quality varies across the range of values for both modes of the T_eff distribution. For the cluster of stars with T_eff ∼ 5000 K, the bias is larger at both cooler and hotter temperatures, and similarly for the cluster of stars with T_eff > 5500 K. Of the properties we have discussed so far, including both the Yu et al. (2018) and Pande et al. (2018), the prediction of T_eff is the least precise, achieving and r² score of ∼0.8 compared to r² scores greater than 0.9 achieved for log g, ${\nu }_{\max }$ and Δν. This is expected due to the more indirect relation of T_eff to the physical processes causing brightness variations. Granulation and oscillation amplitudes are predominantly determined by the evolutionary state (such as log g, radius, and luminosity), which are only indirectly traced by the effective temperature of a star. This is particularly the case for main-sequence and subgiant stars, which can have a wide range of temperatures for a given log g. This is also consistent with larger spread toward hotter T_eff in Figure 5.

Finally, the last stellar sample we make predictions for is the McQuillan et al. (2014) sample, which as shown in Figure 1 includes stars from across the main sequence with temperatures ranging from T_eff = 3500–7000 K. The first stellar property we consider for this sample is the rotation period, which, as discussed in Section 1, is of particular interest for its potential use as a probe of stellar age. With an r² score of 0.77, the bias of the best P_rot model is Δ = −0.34 days, which is ∼2% of the mean of the true test set values, and the rms is ∼5 days over the range of periods from 0.2 to 66 days. In the bottom panel of Figure 5, we show the fractional difference of the predictions as a function of P_rot spanning −0.5σ to 1σ from the line of perfect prediction, which excludes 35 stars comprising 0.9% of the test set. These excluded stars are fast rotators, for which we see that the prediction quality for stars with P_rot ⪅ 5 days is the most biased. The large reported fractional metrics are inflated by the short-rotation-period stars that the model greatly overpredicts. Examining the sample of stars with the highest fractional differences, we find that 49 stars have fractional differences larger than 2.5, all of which have true rotation periods <6.2 days. These 49 stars comprise ∼8% of the stars in the test set with P_rot < 6.2 days. If we remove these stars from the fractional bias and rms calculations, these metrics become 0.016 and 0.30, respectively. We suspect that most of the short-period stars that the model overpredicts could be binary systems whose rotation periods, as measured in their light curves, does not reflect the true rotation periods of the stars.

In Figure 5 we also see that the predictions of P_rot values greater than ∼35 days become increasingly more biased. As with the predicted stellar properties for the Yu et al. (2018) and Pande et al. (2018) catalogs, a potential reason for this behavior of the model is that there are simply fewer examples of stars in the McQuillan et al. (2014) catalog with these longer rotation periods, and therefore examples for the model to learn from and be able to sufficiently learn this region of the parameter space. As deriving stellar rotation periods is of special interest in light of upcoming photometric surveys, in Section 6 we investigate the ability to recover rotation periods from both shorter-baseline and longer-cadence time-series data.

The other property we predict for the McQuillan et al. (2014) sample is M_*. As seen in Figure 5, the model predicts stellar mass well only at the upper mass range, M_* > 0.8 M_⊙, resulting in an r² of 0.7. While the bias of the model is only Δ = 0.01 M_⊙, which is ∼1% of the mean of the true test set values, the rms of 0.16 M_⊙ is large compared to the range of masses covered, from 0.26 to 1.28 M_⊙. The fractional difference plot excludes 18 of the low-mass stars, comprising 0.4% of the test set, where the fractional bias of the predictions is large. We note that where the model does poorly, at M_* < 0.7 M_⊙, the density distribution of this property is underrepresented in the training objects, as seen in Figure 9 of the Appendix. Of the properties we present, our recovery of M_* is the least successful. One reason for this could be the high fractional uncertainties associated with the M_* values (∼12%), which were derived without the use of Gaia parallaxes. Another factor could be that the light-curve data alone is not sufficient to predict this property, and perhaps adding additional information to the model, like Gaia distances, may improve the recovery. To quantify the effects of the training set compared to the physical information contained in the data, additional synthetic training data may be useful. This would potentially enable disentanglement of limitations from the training set size from the information contained in the time-domain data, and how uniquely that information is expressed in the light curves across evolutionary states. Note however one disadvantage of model data is that it may be built with an incomplete physical prescription and so not capture the full physics-driven stellar variability.

In the Appendix we examine alternative choices to working directly in the time domain in building our model. We highlight that no different engineering of the data changes the information content within it and our comparative tests with the data in other domains represent only a small set of possible choices. Nevertheless, this allows us to examine if there are any substantial differences in engineering the data to different forms before using the network. Our tests validate that we are not penalized by working directly with the light-curve data and/or without feature engineering: we do not gain any advantage by transforming or engineering the data to alternate representations.

5.2. Short-baseline Predictions

We now explore the prospect of deriving stellar properties from shorter-baseline data using the 1D CNN model in the time domain. As discussed in Section 1, ongoing and upcoming photometric missions such as TESS and LSST will observe stars with different baselines of observations, most of which will be shorter than 97 days. In particular, the TESS mission is delivering thousands of stellar light curves with a baseline of 27 days. With the application in mind of being able to estimate stellar properties from these short-baseline data, in this section we train CNN models to predict stellar properties using baselines of 62, 27, and 14 days. We do this by truncating the Kepler light curves to these shorter baselines.

For the shorter-baseline models, we use the same sample of Yu et al. (2018), Pande et al. (2018), and McQuillan et al. (2014) stars, described in Section 2.3. We simply truncate the light curves to each baseline length, starting at the first observation of the Q9 Kepler data. For each of the baselines we test, we recompute the standard deviation of the light-curve fluxes based just on the observations that fall within the specified baseline. This prevents information about the light curves at later times from mistakenly inflating model performance. As the length of the input data ( $n({\vec{X}}_{\mathrm{in}})$ ) varies with the baseline, we redetermine the padding and strides of the two convolutional layers of the model, while keeping the kernel widths tested in the hyperparameter search the same as described in Table 1. Other than these changes, the models for the baseline tests have the same architecture and training process as described in Section 3. The results we report are for the performance of the best model on the Yu et al. (2018), Pande et al. (2018), and McQuillan et al. (2014) test sets selected from the 144 models trained for each baseline.

Figure 6 demonstrates the model performance as a function of baseline for all of the stellar properties examined in Section 5, except for rotation period, which is specially considered as a property of interest in Section 6. Table 2 in the Appendix reports the full list of performance metrics, r², Δ, rms, fractional Δ, and fractional rms, for each baseline model. From Figure 6 and the full list of metrics in Table 2, we find that all stellar properties are recovered remarkably well using short-baseline time-series data.

Table 2. Short-baseline Model Performance

		97 days			62 days			27 days			14 days
		r²	Δ	rms	r²	Δ	rms	r²	Δ	rms	r²	Δ	rms
Yu+18	Δν	0.95	−0.17(−0.009)	0.89(0.14)	0.94	−0.02(0.036)	0.97(0.17)	0.93	−0.3(−0.042)	1.06(0.18)	0.929	0.12(0.076)	1.09(0.23)
	${\nu }_{\max }$	0.96	−3.51(−0.002)	11.85(0.2)	0.96	−1.14(0.022)	12.02(0.22)	0.91	−3.21(0.01)	17.08(0.3)	0.94	0.22(0.064)	14.44(0.26)
	log g	0.97	0.008(0.004)	0.057(0.02)	0.97	0.006(0.004)	0.057(0.02)	0.96	0.006(0.003)	0.071(0.03)	0.94	−0.0(0.002)	0.086(0.04)

Pande+18	log g	0.89	0.07(0.026)	0.22(0.07)	0.92	0.01(0.006)	0.18(0.06)	0.89	−0.03(−0.003)	0.21(0.06)	0.89	−0.05(−0.011)	0.21(0.06)
	T_eff	0.79	−66(−0.008)	309(0.05)	0.80	−55(−0.007)	297(0.05)	0.79	−30(−0.002)	306(0.05)	0.77	−35(−0.003)	320(0.06)

McQuillan+14	M_*	0.40	0.009(0.057)	0.155(0.28)	0.38	0.012(0.062)	0.158(0.29)	0.37	0.008(0.06)	0.159(0.3)	0.36	0.006(0.058)	0.16(0.3)

Note. Model performance as a function of observation baseline (as described in Section 5.2). The fractional bias and fractional rms are indicated in parenthesis.

Download table as: ASCII Typeset image

We highlight that although we show the comparison of the models' performance using the r² metric in Figure 6, there is no optimal evaluation metric, and other choices are possible. We chose to use the r² as it captures the variance between the reference label and the inferred label as well as the correlation between them. (It is possible for example to have a small rms metric but no correlation; for example, when the label range is small). Given other choices may be preferable, we include both the r², rms, and bias metrics in Figures 7, 8, 10, and Table 2 in our model evaluation summaries.

**Figure 7.** Performance of the CNN model based on different light -curve baselines for a test set of McQuillan et al. (2014) stars. Top panel: summary of model performance showing the r² score vs. the tested baselines of 97, 62, 27, and 14 days. Bottom panels: the predicted vs. true rotation period, and the fractional difference between the predicted and true values, for models based on each light-curve baseline for the test set of stars. The r², Δ, and rms of the predictions are indicated in each panel, as well as the fractional bias and fractional rms in parenthesis. As discussed in the text, the fractional metrics for the P_rot predictions is greatly inflated by the overprediction of short-period stars. If we remove these stars with fractional differences >2.5, the fractional bias and fractional rms decrease significantly.
Download figure:
Standard image High-resolution image

**Figure 8.** Performance of the CNN model based on different light-curve cadences for a test set of McQuillan et al. (2014) stars. Top panel: summary of model performance showing the r² score vs. the tested cadences of 0.5, 2, 10, and 24 hr. Bottom panels: the predicted vs. true rotation period, and the fractional difference between the predicted and true values, for models based on each light c-rve cadence for the test set of stars. The r², Δ, and rms of the predictions are indicated in each panel, as well as the fractional bias and fractional rms in parenthesis. As discussed in the text, the fractional metrics for the P_rot predictions is greatly inflated by the overprediction of short-period stars. If we remove these stars with fractional differences >2.5, the fractional bias and fractional rms decrease significantly.
Download figure:
Standard image High-resolution image

The recovery of stellar properties using short-baseline data suggests is that these stellar properties are still sufficiently encoded in the light-curve data at these shorter timescales. This results of Figure 6 are promising for the prospects of estimating these stellar properties from light curves from surveys such as TESS and LSST. In the following section we explore a similar prospect for the recovery of stellar rotation period, demonstrating how the recovery of P_rot changes with the baseline, as well as with the cadence of the observations.

We reiterate that the ground-truth values used to evaluate the performance of our stellar property recovery (as summarized in Figure 6) are determined from the full baseline data. While shorter time series lead to less precise asteroseismic observables, the goal of our investigation is to simply establish whether there is information residing in shorter-baseline time-series data that can be used to infer physical parameters that are traced by asteroseismic observables. However, we note that the ability to extract asteroseismic observables also depends on the length of the time-series data (Hekker et al. 2012). Therefore, we suspect that some of the correlations found in Figure 6 are actually due to the sensitivity of related parameters. For example, Δν and ${\nu }_{\max }$ are well known to be correlated due to their dependency on density and surface gravity (Stello et al. 2009). Thus, for stars with time-series baselines that are too short to measure Δν, the recovered results may actually trace ${\nu }_{\max }$ , which is easier to measure at low signal-to-noise ratios.

6. Rotation Period of Main-sequence Stars

6.1. Baseline

We now focus on stellar rotation as a key stellar property, particularly for gyrochronology studies, and examine the prospects of deriving P_rot from light curves with baselines less than 97 days (Section 6.1), as well as cadences longer than 29.4 minutes (Section 6.2). For the full list of stellar properties examined in Section 5, similar cadence tests can also be performed. However, to limit the scope of this paper, we omit this examination.

First, we investigate how well rotation periods can be recovered from light curves as a function of the observation baseline. In addition to the 97 days rotation model trained in Section 5, we train three additional CNN models based on light curves with baselines of 62, 27, and 14 days. The 27 days model is of particular interest, as most stars that will be observed by the TESS mission will have observations spanning 27 days. For these shorter-baseline models, we prepare the Kepler Q9 light curves and modify the CNN padding and stride values for the hyperparameter search the same as described in Section 5.2. Further demonstrating the necessity of the hyperparameter search, a number of the short-baseline P_rot models result in r² scores of ∼0.5, which is significantly worse than the performance of the best models presented here.

Figure 7 demonstrates how the recovery of stellar rotation period degrades as a function of the light-curve baseline, with the top panel of the figure summarizing the performance of the models by showing how the r² decreases with decreasing observation lengths. We find that the r² score changes by less than Δr² = −0.1, from r² = 0.77 at a baseline of 97 days, to r² = 0.69 at a baseline of 14 days, with the 27 days "TESS"-baseline model resulting in an r² of 0.74.

Visually inspecting the bias and rms of the models as a function of the true rotation period, we find that for each baseline, the rms increases as rotation period increases, while the fractional rms decreases marginally by 0.05. In general, shorter rotation periods are recovered with less bias, except for the fraction of fast rotators whose rotation periods are overpredicted, as discussed in Section 5.1. The worst performance is seen at the edges of the data ranges, as most clearly seen in the residuals panel.

As in Section 5.1, if we remove the short-rotation-period stars with the highest fractional differences, the fractional bias and fractional rms metrics decrease significantly.

Another feature of the model performance we notice is that as the baseline decreases, the bias of the predictions at rotation periods >35 days marginally increases. While all of the models exhibit this behavior to an extent, as evident in the bottom panels of Figure 7, the 27 and 14 days models in particular do not predict rotation periods greater than ∼35 days, with slowly rotating stars in the test set having their rotation periods underpredicted. The degradation of the predictions for stars with rotation periods longer than ∼35 days could be due to a number of factors. One such factor is that for these more slowly rotating stars, fewer cycles of the rotation period are imprinted in the light curves, and in some cases only a fraction of one full rotation period is present. However, from Figure 7, we see that even for rotation periods longer than the baseline, the model can still recover rotation, although at decreasing precision. Another factor that could be impacting the model's ability to precisely recover rotation periods longer than ∼35 days is the distribution of the training data. The mean rotation period of the (McQuillan et al. 2014) sample is ∼18 days with a standard deviation of ∼11 days. As seen in Figure 9, there are few stars rotation periods longer than ∼35 days, predominantly due to the fact that instrumental systematics in the Kepler data become more prominent on monthly timescales. The model could be less effective in predicting long rotation periods also because there are few examples to learn from. Given a more complete rotation period coverage in the training data, the model may be better able to learn the rotation periods of more slowly rotating stars, making unbiased predictions even beyond the baseline of the data.

Nevertheless, with this neural network approach we can predict rotation periods beyond the baseline of the data. This is because the model can leverage gradients and correlations across the time-series data itself to learn a mapping between the data and rotation period, which can be used to predict rotation periods from new data in hand. The performance of the model breaks down given insufficient training examples, as the predictive power of the network is diminished, as seen in the poorer performance of the model in predicting long rotation periods (⪆35 days) from short-baseline data. Considering short-period predictions, periods less than ≈2 days will not be able to be recovered, as prohibited by the sampling rate of the data. We see this in Figure 7, that below ≈2 days, the data are uninformative to predict the rotation.

6.2. Cadence

In addition to the baseline, we also investigate how well we can recover the rotation periods from light curves as a function of cadence. We train a CNN to predict the rotation periods from light curves with an observation every 2, 10, and 24 hr, all with a baseline of 97 days. Being able to measure the rotation periods from less frequently sampled photometric time series is also of interest, as upcoming surveys will observe a substantial number of stars at much sparser cadences than those of the Kepler 29.4 minutes sampling. In particular, the cadence of LSST observations will be irregular, and the minimum separation between subsequent observations is tentatively ∼3 days. Building on the class of approach we present here, the methodology suited to irregularly spaced data will be demanded to analyze future data (e.g., Naul et al. 2018; Rasile et al. 1998).

To modify the light-curve data for the cadence models, we again use the same sample of McQuillan et al. (2014) stars described in Section 2.3, but instead of using the full light curve we only select every 4th, 21st, and 49th flux observation to achieve light curves with cadences of 2 hr, 10 hr, and 24 hr, respectively. As we did for the baseline models, to prevent information leakage, we compute the standard deviation of the light curves based just on the selected nth observations for each tested cadence, which is passed to the first fully connected layer of the model as shown in Figure 3. As the length of the input data also varies with cadence, we redetermine the padding and strides of the two convolutional layers. For the 2 and 10 hr models, we test the same kernel widths as listed in Table 1. However, for the 24 hr model, as there are only 98 flux observations for each light curve, instead of testing kernel widths of K_W,1 = 12 and K_W,1 = 20, we test two additional smaller kernel widths of K_W,1 = 2 and K_W,1 = 4. These have a corresponding K_W,2 = 3 and K_W,2 = 7, respectively. For the 10 and 24 hr models, we also modify the size of the fully connected part of the model. For the 10 hr cadence data with an input size of 228 observations, we change the size of the first fully connected layer to have N₁ = 512 hidden neurons, the second fully connected layer to have N₂ = 256 hidden neurons, and the last fully connected layer to have N₃ = 128 hidden neurons. For the 1 day cadence data with an input size of 98 observations, we change the size of the first fully connected layer to have N₁ = 256 hidden neurons, the second fully connected layer to have N₂ = 128 hidden neurons, and the last fully connected layer to have N₃ = 64 hidden neurons. Other than modifying the kernel widths explored and the reduction of the capacity of the fully connected part of the model, the cadence models have the same architecture and training process as described in Section 3. The results we report here are the performance of the best model on the McQuillan et al. (2014) test set selected from the 144 models trained for each cadence.

Figure 8 demonstrates how the recovery of the rotation period degrades as a function of the light-curve cadence, with the top panel of the figure summarizing the performance of the models by showing how the r² decreases with sparser cadence. As seen in the figure, in terms of the r² score, the 0.5, 2, and 10 hr cadence models have nearly identical model performance. However, for the 24 hr cadence light curves, the performance of the model decreases.

Inspecting the bias and rms of the models as a function of the rotation period, we see that the 0.5, 2, and 10 hr cadence models all perform similarly across the range of rotation periods, with the shorter and longer rotation periods being recovered less precisely than rotation periods spanning ∼5–35 days, as with the baseline models discussed in Section 6.1. In the fractional difference plots shown in Figure 8, 35, 31, and 44 short-rotation-period stars are excluded by plot limits of the 0.5, 2, and 10 hr cadence models, which comprise just ∼1% of the test set. Of the cadences we test, the 24 hr model is the only model that exhibits a markedly different performance behavior across the range of rotation periods. We take note of two significant differences in the performance of this model compared to the models trained on the higher-cadence light curves. The first is that similarly to the shorter-baseline models, the model trained on the 24 hr cadence light curves is biased at longer rotation periods, with rotation periods longer than 40 days being underpredicted. For this model we also find that the rotation periods of a larger fraction of more quickly rotating stars, with P_rot < 5 days, are predicted incorrectly. As evident in fractional difference plot for the 24 hr model in Figure 8, the rotation periods for a more significant number of fast rotating stars are overpredicted, in some cases by up to ∼40 days. Additionally, 89 short-period stars (comprising ∼2% of the test set) are excluded by the plot limits due to their large overpredictions. The more significant degradation in model performance for the stars with the fastest rotation periods makes sense given that one full rotation cycle for these stars is only sampled a few times when the cadence is as sparse as 24 hr.

7. Discussion

We have systematically explored the recovery of a set of stellar properties from photometric time-series data, examining the prediction performance across different baselines and cadences. Our approach, using a 1D CNN model, requires minimal data processing and no by-hand feature engineering. Using the Kepler 97 day, 29.4 minute Q9 data, we first construct a CNN classification model to predict stellar evolutionary state. We then use three catalogs to build training sets to predict the continuous stellar properties of: Δν, ${\nu }_{\max }$ , log g, T_eff, and P_rot. We implement a CNN regression model, optimizing over a grid of possible hyperparameters, to successfully recover these properties to a high fidelity across the parameter space of the training examples.

Our CNN modeling approach is demonstrative of the information content in the data and how this information is preserved across various baselines and cadences. We expect that our modeling choice is not the primary limitation in our prediction precision. Rather, we expect the information contained in the data, the precision on the input properties and the stellar property range of the training data are the primary drivers of our results. Nevertheless, there are several alternative types of models that can capture the structure of time-series data like light curves. Recurrent neural networks (RNNs), for example, are a well-suited class of neural networks that model temporal data structure, using a recurrence relationship between new outputs of the model and the previous states of the model. Similar to CNNs, RNNs include weight sharing from different parts of the time series throughout the model training process (Lipton et al. 2015).

Indeed one downside of CNN models, compared to RNN models, is that the shape of the input data must be similar across the entire data set. CNNs do not lend themselves to working with unevenly sampled data, and as discussed in Section 2.2.1, to overcome this issue we take the simplest approach and replace missing flux time steps in the time series with zeros values. This results in good model performance; however more well-motivated imputation approaches could be tested, including: simple interpolation of the light-curve fluxes between points as done for the ACF processing, modeling-based imputation, providing a missing value mask a second input channel to the CNN, or testing the use of alternative modeling approaches like RNNs. However RNN models are typical more difficult to train than CNN models. We leave exploring these options as a task for future work. This paper establishes a baseline of performance expectation, with the adopted model, hyperparameter optimization choices and other assumptions like the zero-imputation of missing data.

Another alternative choice is to take a generative, rather than a discriminative, approach to the modeling presented here. Generative models take a probabilistic approach to learning the joint distribution of the data (X) and label (Y), P(X, Y), which is then used to infer P(Y∣X). Generative models also include methods that learn a (typically lower dimensional) latent representation of the data itself from which the original data vector can be generated (e.g., variational autoencoders), with the latent space informing P(Y∣X). A generative approach lends itself better to understanding the data generation process, which is arguably more aligned with science goals than a discriminative approach. However, as discussed in Ismail Fawaz et al. (2018), generative models are typically less accurate than discriminative models when it comes to performance on a specific task, and generative models for time-series data are not trivial to implement in practice. Future work could certainly include taking a generative approach to the problem. This would perhaps promote understanding of the data generation process, as well as enable the derivation of well-motivated, data-point-by-data-point probability distributions for the inferred labels using fast inference methods like variational inference (Blei et al. 2016), delivering effective errors on the stellar properties inferred for each individual star.

Lastly, the discriminative end-to-end nature of many deep-learning approaches (including our own) makes these models difficult to interpret, often hindering their ability to be useful for understanding the underlying physical theory. There are numerous definitions of what it means for deep learning to be interpretable, as well as numerous proposed methods for making some of these interpretations (e.g., Simonyan et al. 2013; Yosinski et al. 2015; Binder et al. 2016; Montavon et al. 2017). Specifically, for our case of predicting stellar properties from time-series data, it would be insightful to know what features of the light curves contributed most to the prediction of a particular stellar property. For instance, if we could identify the most relevant variation timescales for making a prediction, we could make connections between the quality of the stellar property recovery and our physical understanding of stellar physics. This would allow us to confirm existing theories of how the internal physical processes of stars are imprinted in light curves, as well as have the potential to reveal new connections between stellar properties and stellar variability in the time domain. Certainly, ultimately understanding the astrophysical origin of our inference and the performance across parameter space is extremely valuable. Indeed, as the relation between the time-domain variability and parameters that describe this flux change over the range of the parameter space, a very flexible and therefore powerful model such as a neural network is very appealing, despite difficulties in interpretability. A simple but flexible local linear approach, however, may mitigate the challenge of interpretability (Sayeed et al. 2021).

8. Conclusions

We have implemented a 1D CNN architecture to estimate the stellar properties from photometric time-series data. Constructing training sets based on the 29.4 minute cadence Kepler Q9 data and high-quality stellar property catalogs, we predict evolutionary states, stellar properties (T_eff and log g), asteroseismic parameters (Δν and ${\nu }_{\max }$ ), and rotation periods (P_rot) for main-sequence and red giant stars. We compare the quality of predictions based on learning directly from the time-series data to learning from transformations of the data, including the ACF and the frequency domain. We also examine how the prediction quality varies with the baseline of observations, training models based on 97, 62, 27, and 14 days of data. For the rotation period, which is of particular interest for gyrochronology, we further examine how the prediction quality varies with the cadence, training models based on time-series data with an observation every 0.5, 2, 10, and 24 hr. The main results of this work are summarized as follows:

1.
Training a CNN model to classify the stellar evolutionary state, we are able to distinguish red giant stars from main-sequence and subgiant stars to an accuracy of ∼90%. However, the model is not as successful at distinguishing between main-sequence and subgiant stars, with each of these stellar types having a classification accuracy <60%. We suspect that this is due to the more subtle physical differences (and how these are manifested in the time domain) between main-sequence and subgiant stars and the limited quality of the training labels, as the border between main-sequence and subgiant stars is sensitive to metallicity. Oscillation and granulation amplitudes are well known to scale strongly with stellar evolutionary state (e.g., Mathur et al. 2011; Huber et al. 2011) , which physically is driven by the larger granule sizes that scale with the pressure scale height and thus surface gravity (Kjeldsen & Bedding 2011). Therefore, there is strong detection bias for more evolved stars for a given magnitude limited sample, resulting in more challenging detections of granulation in subgiants and main-sequence stars. Correspondingly, there are three clear differences between the light curves of subgiants and main-sequence stars. First, the granulation background has a greater amplitude and longer timescale in the light curves of subgiants, compared with main-sequence stars. Second, the rotation rates of subgiants are often more rapid than the rotation periods of main-sequence stars (although they can also rotate more slowly; e.g., McQuillan et al. 2014). Third, the amplitude of rotational variability is often larger for subgiants than it is for main-sequence stars. Further qualitative differences in the light curves of these two types of stars may exist. For example, it is likely that the typical sizes, lifetimes, and latitudes of magnetically active regions on subgiants differ from those of main-sequence stars. These could lead to different light-curve morphologies (e.g., see Giles et al. 2017).
2.
Based on one quarter of Kepler long-cadence data, our CNN regression model recovers ${\nu }_{\max }$ and Δν to an rms (fractional rms) precision of ∼12 μHz (0.2) and ∼0.9 μHz (0.14), respectively, and log g to an rms precision of ∼0.06 dex (0.02), for red giant stars trained with the Yu et al. (2018) catalog. Using the Pande et al. (2018) catalog as a training set, we predict T_eff with relatively little bias across evolutionary states (with T_eff = 4500–6500 K), to an rms precision of ∼300 K (0.05). We also predict log g across the range log g = 2–4.5 dex to an rms precision of ∼0.22 dex (0.07). This performance is in part limited by the precision of the training labels, with a mean reported uncertainty of ∼0.25 dex (compared to 0.01 dex for the red giant log g estimates).
3.
For main-sequence stars, based on a single quarter of the Kepler long-cadence data, our CNN regression model predicts rotation periods unbiased from ≈5 to 40 days, with an rms precision of ∼5.2 days (4.06). Our model becomes biased in parameter spaces with few training examples in the McQuillan et al. (2014) catalog (e.g., P_rot ⪆ 40 days), and overpredicts ∼8% of short-period stars in the test set with P_rot < 6.2 days. Removing these short-period stars, the fractional rms of the predictions is 0.3. We also imprecisely predict stellar mass without bias for M_* > 0.7 M_⊙, with an rms precision of ∼0.16 M_⊙ (0.28). The performance is in part limited by the large input uncertainty on mass values.
4.
For the stellar properties listed above, we compare the performance of the CNN model based on the 97 day time-domain data to fully connected neural network models based on the ACF and frequency-domain representations of the same data. We find that the CNN model trained on the original time-series data outperforms the models based on the other two data representations. This implies that more information can be gleaned from deep-learning models that work closer to the raw data, as transformations and feature engineering of data often results in lost information.
5.
To inform expectations of what can be delivered from observations made by TESS, LSST, and future missions, we train our CNN model to recover stellar properties from light curves with shorter baselines (62, 27, and 14 days). We find that we can predict the stellar properties remarkably well for TESS-like data (27 days baseline), including, for red giant stars, log g to an rms precision of ∼0.07 dex (0.03), Δν to an rms precision of ∼1.1 μHz (0.18), and ${\nu }_{\max }$ to an rms precision of ∼17 μHz (0.3). Based on the Pande et al. (2018) training set, we predict log g to an rms precision of ∼0.21 dex (0.06) and T_eff to an rms precision of ∼300 K (0.05).
6.
We predict the rotation periods for main-sequence stars up to ≈35 days based on 27 day and even 14 day data, with an rms precision of ⪅6 days (3.5). Removing the <10% of short-period stars that are overpredicted, the fractional rms is <0.38. Investigating light curves with longer cadences (2, 10, and 24 hr), we find that even for observations spaced 1 day apart for 97 days, we can predict the stellar rotation to an rms precision of ∼6.2 days (4.69), unbiased over the range from P_rot ≈ 5–35 days. Removing the <25% of overpredicted short-period stars with P_rot < 5.2 days, the fractional rms is ≈0.40.

With the results described above, we have established a baseline of performance for the stellar property information that can be extracted from these light-curve data alone. Our modeling approach is generalizable to other time-domain surveys, as well as other stellar property catalogs. The method presented here is not proposed to replace asteroseismology measurements from high-quality data. Instead, we predict these properties to demonstrate the capability of our approach, as well as the prospect of transferring the relationships establish with high-quality data to lower-quality data, where these measurements are more difficult to make.

Our ability to predict stellar properties is in part subject to the uncertainty on the input properties, and we expect that given more precisely derived properties we could improve our predictions in some cases. To better determine some stellar properties, we could also incorporate other information such as photometry from Gaia, Two Micron All Sky Survey, and Wide-field Infrared Survey Explorer, as well as stellar parallaxes from Gaia. For example, adding photometric information to the model would improve the precision with which we can recover T_eff.

We make our model code publicly available at https://github.com/kblancato/theia-net, which could be used to produce a rotation period catalog, as well as other stellar property catalogs, for the TESS mission. Some more immediate improvements to our approach could include expanding the extent, as well as the precision quality, of the stellar property training sets, and incorporating Gaia photometric and parallax information in the model. More substantial aspects that could be investigated include adapting the model to permit unevenly sampled time-series data, taking a generative modeling approach, incorporating the data errors on both the light curves and stellar properties, and interpreting the model in a physically meaningful way.

Looking forward, in the coming years ongoing and future missions will deliver time-domain data for millions of stars. Extracting stellar properties from these data will be a rich pursuit, enabling the exciting potential of Galactic archeology in the time domain.

The authors would like to thank Kathryn Johnston, David Blei, Gabriella Contardo, Maryum Sayeed, and Adam Wheeler for helpful feedback and discussions. We also thank Travis Berger for providing us his revised Gaia–Kepler stellar property catalog. We are grateful to the members of the Flatiron Institute's Scientific Computing Core for their support of the Flatiron's Rusty cluster, which was used to train all of the models for this work.

K.B. is supported by the NSF Graduate Research Fellowship under grant number DGE 16-44869. K.B. thanks the LSSTC Data Science Fellowship Program, her time as a Fellow has benefited this work. M.N. and D.H. are supported by the Alfred P. Sloan Foundation. R.A. acknowledges support from NASA award: 80NSSC20K1006. The research was supported by the Research Corporation for Science Advancement through Scialog award No. 26080.

This research was partially conducted during the Exostar19 program at the Kavli Institute for Theoretical Physics at UC Santa Barbara, which was supported in part by the National Science Foundation under Grant No. NSF PHY-1748958.

Software: PyTorch (Paszke et al. 2017), scikit-learn (Pedregosa et al. 2011), Astropy (Price-Whelan et al. 2018).

Appendix

A.1. The Distribution of Stellar Properties for the Training Samples

Figure 9 shows the distribution of the stellar properties that we infer in this work, from the different samples that we use.

A.2. Comparison to Modeling in the ACF and Frequency Domains

As discussed in Section 3.1, when deriving stellar properties from photometric time-series data, the light curves are often first transformed to an alternate representation of the original data. Two common representation are the ACF as described in Section 2.2.2 and the power spectrum as described in Section 2.2.3. Each of these representations highlights different features of the data, which are known to correlate with particular stellar properties. For example, peaks in the ACF are informative to stellar rotation periods, and the asteroseismic parameters of ${\nu }_{\max }$ and Δν are defined in the frequency domain. One aim of this paper is to investigate how well a deep-learning approach can learn various stellar properties from the time-domain data itself, because it requires no or minimal feature engineering and leverages the full information content of the data.

To test how well we learn stellar properties in the time domain compared to the ACF and frequency domains, for each of the properties we predict in Section 5.1 we also train models to predict these properties based on the ACF and the LS periodogram. We do this for the three stellar samples of Yu et al. 2018, Pande et al. 2018, and McQuillan et al. 2014, as described in Sections 2.2.2 and 2.2.3, respectively. As the ACF and frequency domain already capture the time dependence of the data, we build and train fully connected neural network models to predict stellar properties from these data representations. This means that unlike in the CNN case, which includes weight sharing to capture the time dependence of the input data, in the fully connected model a weight term is learned for each element of the input. Appropriately, we scale each nth element of the ACF and periodogram relative to the range of values exhibited by the corresponding nth element across all of the stars in the training set. The last four layers represented in Figure 3 show the fully connected architecture we implement, where each flux measurement of the light curve is passed to its own hidden node in the first model layer, each with its own weight term.

For these models, instead of implementing different kernel widths, as for the CNN, the architecture hyperparameter we search over is the number of hidden layers, as well as the number of hidden units in each layer. We test four different model architectures; two with three hidden layers consisting of [N₁ = 2048, N₂ = 1024, N₃ = 256] and [N₁ = 4096, N₂ = 1024, N₃ = 256] hidden units, as well as two with two hidden layers consisting of [N₂ = 1024, N₃ = 256] and [N₂ = 2048, N₃ = 512] hidden units. The other hyperparameters we optimize over are the same as those in Section 5.1. These are the learning rate, weight decay, numerical stability term, and the dropout fraction. With the architecture choices described above, for each stellar property we train 96 models in total and select the best model as outlined in Section 3.6.

For the stellar properties we consider in Section 5.1, Figure 10 compares the test performance of the best model, r², Δ, and rms, for the three models we train: a CNN based on the time-series data, a fully connected NN based on the ACF, and a fully connected neural network based on the LS periodogram. Compared across the three models for each property, better model performance for the evaluation metrics is indicated by darker shades of its entry in Figure 10, where the fractional values of the bias and rms (in parenthesis) are used to indicate model performance. In summary, the CNN model results in the best overall performance. As detailed in Section 5.1, taking a CNN approach, the ${\nu }_{\max }$ , Δν, and log g are recovered the most successfully with r² > 0.9, rotation period and T_eff are recovered to r² ∼ 0.8, and stellar mass is recovered the least successfully with an r² = 0.4. However, we emphasize that this is not a full exploration of every possible combination of data representation and model that could be tested.

Data-driven Derivation of Stellar Properties from Photometric Time Series Data Using Convolutional Neural Networks

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Training Data

2.1. The Kepler Data

2.2. Light-curve Processing

2.2.1. The Time Domain

2.2.2. The ACF

2.2.3. The Frequency Domain

2.3. Stellar Property Catalogs

3. Methods

3.1. Modeling Approach

3.2. Model Architecture

3.3. Data Sets

3.4. Training Procedure

3.5. Model Hyperparameters

3.6. Model Evaluation and Selection

4. Classification of Evolutionary State

4.1. Data

4.2. Methods

4.3. Results

5. Predicting Stellar Properties

5.1. Results of CNN Stellar Property Recovery

5.2. Short-baseline Predictions

6. Rotation Period of Main-sequence Stars

6.1. Baseline

6.2. Cadence

7. Discussion

8. Conclusions

Appendix

A.1. The Distribution of Stellar Properties for the Training Samples

A.2. Comparison to Modeling in the ACF and Frequency Domains

Footnotes

Data-driven Derivation of Stellar Properties from Photometric Time Series Data Using Convolutional Neural Networks

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Training Data

2.1. The Kepler Data

2.2. Light-curve Processing

2.2.1. The Time Domain

2.2.2. The ACF

2.2.3. The Frequency Domain

2.3. Stellar Property Catalogs

3. Methods

3.1. Modeling Approach

3.2. Model Architecture

3.3. Data Sets

3.4. Training Procedure

3.5. Model Hyperparameters

3.6. Model Evaluation and Selection

4. Classification of Evolutionary State

4.1. Data

4.2. Methods

4.3. Results

5. Predicting Stellar Properties

5.1. Results of CNN Stellar Property Recovery

5.2. Short-baseline Predictions

6. Rotation Period of Main-sequence Stars

6.1. Baseline

6.2. Cadence

7. Discussion

8. Conclusions

Appendix

A.1. The Distribution of Stellar Properties for the Training Samples

A.2. Comparison to Modeling in the ACF and Frequency Domains

Footnotes