This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Article

The Derivation, Properties, and Value of Kepler's Combined Differential Photometric Precision

, , , , , , , , , , , , , and

Published 2012 November 15 © 2012. The Astronomical Society of the Pacific. All rights reserved. Printed in U.S.A.
, , Citation Jessie L. Christiansen et al 2012 PASP 124 1279 DOI 10.1086/668847

1538-3873/124/922/1279

ABSTRACT

The Kepler Mission is searching for Earth-size planets orbiting solar-like stars by simultaneously observing >160,000 stars to detect sequences of transit events in the photometric light curves. The Combined Differential Photometric Precision (CDPP) is the metric that defines the ease with which these weak terrestrial transit signatures can be detected. An understanding of CDPP is invaluable for evaluating the completeness of the Kepler survey and inferring the underlying planet population. This paper describes how the Kepler CDPP is calculated, and introduces tables of rms CDPP on a per-target basis for 3-, 6-, and 12-hr transit durations, which are now available for all Kepler observations. Quarter 3 is the first typical set of observations at the nominal length and completeness for a quarter, from 2009 September 18 to 2009 December 16, and we examine the properties of the rms CDPP distribution for this data set. Finally, we describe how to employ CDPP to calculate target completeness, an important use case.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

The Kepler Mission is a NASA Discovery mission designed to detect transiting extrasolar planets, performing near-continuous photometric observations of >160,000 carefully selected target stars in Kepler's 115 deg2 field of view, as reviewed in (Borucki et al. 2010) and (Koch et al. 2010). Scores of planets have been confirmed thus far,1 and three catalogues of planet candidates have been released: 705 candidates discovered in the first month of observations (Borucki et al. 2011a), 1235 candidates discovered in the first 15 months of observations (Borucki et al. 2011b), and 2321 candidates discovered in the first 18 months of observations (Batalha et al. 2012). Although individual planetary systems continue to surprise and intrigue—see for example the Kepler-36 system (Carter et al. 2012)—we are now able to shift towards broader analysis of the underlying planetary populations (Borucki et al. 2011a; Howard et al. 2011; Youdin 2011; Catanzarite & Shao 2011) and trends, such as comparing single planet candidate systems to those with multiple planet candidates (Latham et al. 2011), and the environments of small planet candidates compared to large planet candidates (Buchhave et al. 2012).

The primary goal of the Kepler Mission is to ascertain the value of η, the frequency of Earth-size planets orbiting in the habitable zones of solar-like stars. Inferring the value of η from the planet sample discovered by Kepler requires careful quantification of the detectability of each planetary candidate across the entire set of target stars. An essential aspect of measuring η is accounting for the observation noise specific to each target star, and the subsequent impact on the detectability of the transit signature of the candidate, which is the topic of this article.

Kepler's transiting planet search (TPS) pipeline module (Jenkins et al. 2010; Tenenbaum et al. 2012), which searches through the data for evidence of transit signatures, empirically determines the level of non-stationary noise for each light curve. This noise estimate is called the combined differential photometric precision (CDPP); it is a time series of the effective white noise as seen by a specific transit duration for each target star. To facilitate analysis and interpretation of the Kepler data, tables of the rms CDPP metrics on a per-target basis for transit durations of 3, 6, and 12 hr are provided online at the Mikulski Archive for Space Telescopes (MAST) website.2 Section 2 of this article describes how CDPP is calculated by the Kepler pipeline. Section 3 provides a guide to the format and content of the tables. Section 4 discusses the general characteristics of the Q3 rms CDPP values, and examines the distribution as a function of stellar type and position in the Kepler field. Section 5 describes some of the ways in which CDPP can be used for further analysis, and § 6 contains the conclusions of the paper.

2. HOW CDPP IS CALCULATED

The details of the pipeline processing prior to the CDPP calculation for a single quarter of data have been described in detail elsewhere; for an overview see Jenkins et al. (2010). Briefly, they involve pixel-level calibrations, including bias and dark current subtraction, flatfielding, and shutterless readout smear correction by the Calibration (CAL) module (Quintana et al. 2010); cosmic ray correction, background subtraction and simple aperture photometry by the Photometric Analysis (PA) module Twicken et al. (2010); systematic error removal and crowding correction of the flux time series by the Presearch Data Conditioning (PDC) module (Smith et al. 2012, Stumpe et al. 2012 for Release 12 onwards);3 and finally, harmonic removal of strongly sinusoidal variations and flux time series extension (the latter for efficient fast Fourier transforms) by the Transiting Planet Search (TPS) module (Jenkins et al. 2010). When searching multiple quarters of data, TPS has several additional steps: median-normalize the flux level from quarter to quarter; fill the gaps between quarters (to enable the wavelet-based detection method described below); and detrend the discontinuities at the quarter boundaries (Jenkins et al. 2010). The signal detection is performed, generating threshold crossing events (TCEs) for further analysis; for example, (Tenenbaum et al. 2012) presents the set of TCEs produced by TPS in the Q1–Q3 observations. We define a threshold crossing event as a signal which, when folded at a given period, gives rise to a signal of ≥7.1σ (Jenkins et al. 2002).

CDPP is calculated as a by-product of TPS, when determining the SNR of each transit pulse for which we search. Simply stated, the CDPP produced by TPS can be thought of as the effective white noise seen by a transit pulse of a given duration. A CDPP of 20 ppm for 3-hr transit duration indicates that a 3-hr transit of depth 20 parts per million (ppm) would be expected to have a signal-to-noise ratio (S/N) of 1, and hence produce a signal of strength 1σ on average. Thus, CDPP is a characterization of the noise in the Kepler data.

Typically, the noise is non-white (that is, does not have a uniform power spectral density distribution) and non-stationary (that is, the power spectral density changes with time). The noise is typically dominated by 1/f-type noise processes (where f is frequency) due to stellar variability and instrumental effects; for a thorough breakdown and discussion of the noise sources contributing to CDPP see Gilliland et al. (2011) and Van Cleve & Caldwell (2009). Therefore, we need a way of characterising the noise in the data in a moving fashion in order to preserve the time-variability; we achieve this by decomposing the data in the time-frequency domain using the wavelet approach. The theoretical basis of the approach is described in Jenkins (2002). We measure the time-varying noise in a set of time-frequency bands, with equal spacing in log(f), and use this estimate to adjust the noise level in each band to produce a 'flat' power spectrum; that is, we filter the data to produce white noise. For consistency, the resulting whitening filter is also applied to the trial transit signal in order to reproduce and match any distortion created by the whitening. The detection of the whitened transit signal in the whitened data is therefore simplified to the well-understood problem of detecting a signal in the presence of white noise, and CDPP is a measure of the noise in the whitened data.

Full details of the calculation of CDPP in the SOC pipeline are given in Jenkins et al. (2010); we briefly describe it here. For a given time series, x(n), which is composed of a zero-mean, Gaussian noise process w(n), with a power spectrum P(ω) and a corresponding autocorrelation matrix R, we want to detect a signal, s(n) (a transit pulse in the context of Kepler). CDPP, the noise seen by the transit pulse, is quoted in ppm, and is calculated from the detection statistic, l, by:

The detection statistic is defined as:

where and are data and signal vectors distorted or "whitened" by the inverse square root of the autocorrelation matrix R. The expected value of the detection statistic, 〈l〉, under the hypothesis (H1) that the signal s is present, is given by:

For white noise, the auto-correlation matrix R is diagonal, with R = σ2 I, where σ2 is the variance of the white noise and I is the identity matrix. Therefore, the mean value of the detection statistic in the presence of signal s(n) is the S/N of the whitened signal to the whitened noise, as expected. Stated another way, the detection statistic is equivalent to measuring the significance of the signal from the chi-squared fit over the null hypothesis between the whitened light curve and the whitened transit signal. Under the null hypothesis (H0), 〈lH0 = 0. The two hypotheses have different mean values of l, but the same standard deviation of unity—this can be used to readily estimate the false alarm rate and detection rates, or their complements, using the error function.

We calculate the time series of detection statistics l for a given target over a set of trial durations; that is, we measure the detectability of given transit signals for each observation of each target. The CDPP time series is a natural by-product of this procedure, from equation (1). We obtain CDPP on 14 time scales from 1.5 to 15 hr, which covers the transit durations of interest—for a central crossing event, an Earth-Sun analogue transit would take 13 hr, although for the average value of the impact parameter the duration is closer to 10 hr. The depth of an undiluted signal produced by such a system is 84 ppm: in a typical quarter of Kepler data, 12th magnitude dwarfs (the benchmark Kepler target) have a median rms CDPP value of 34 ppm for a transit duration of 6.5 hr. Although we produce a set of CDPP time series for each target, in practice we find that the amplitude of the variation in the CDPP over the time series for a given time scale is relatively small in a given quarter of data, and that to first-order, the detectability of a given transit signal is well described by the rms of the CDPP time series for that quarter. This motivated our production of the tables described in § 3.

2.1. Some Examples

To illuminate the process described above, we show the light curves and resulting CDPP time series for several Kepler targets. The data presented are from the Q3 Kepler observations.

Figure 1 shows the detrended flux time series for target KIC 9392416, with magnitude in the Kepler bandpass (Kp) of 11.7, and a 6-hr rms CDPP of 56 ppm. The power spectrum of this target is plotted as the red dashed line in Figure 2, and we see that it is relatively uniform over all timescales, especially less than one day (i.e., the dominant noise source is white noise). An increase in noise in the flux time series can be seen at approximately days 113–114. The CDPP time series that are derived from this flux time series for transit pulse durations of 3, 6 and 12 hr are shown in Figure 3. The aforementioned increase in noise at around days 113–114 manifests here as an increase in CDPP in the 3-hr time series at the same epoch. For the longer transit durations, the increased noise is averaged out and is not so apparent in the resulting 6- and 12-hr CDPP time series. This effect is also evident when considering the CDPP time series on the whole—note the decrease in the magnitude and flattening of the CDPP time series with increasing transit duration. The rms CDPP for this target decreases from 72 ppm for the 3-hr transit duration time series to 43 ppm for the 12-hr time series, as the signal is integrated over longer time spans.

Fig. 1.—

Fig. 1.— Q3 detrended flux time series for KIC 9392416, covering 89 days of observation. This target has a relatively uniform noise power spectrum.

Fig. 2—

Fig. 2— Comparison of the noise power spectra of two Kepler targets with similar Kp magnitudes. The red dashed line is KIC 9392416 (see Fig. 1), which has a 6-hr rms CDPP of 56 ppm, Kp = 11.6, and a roughly uniform power spectrum. The blue dashed line is the variable star KIC 9328434 (see Fig. 4), which has a 6-hr rms CDPP of 41 ppm, Kp = 11.2, and a highly correlated power spectrum. Despite the significantly lower rms CDPP at approximately the same Kp magnitude, the variable target contains significantly more power at longer time scales (red noise). For comparison, we also show the energy over the same time scales for square pulse signals of 6- and 12-hr duration as blue and green solid lines respectively.

Fig. 3.—

Fig. 3.— CDPP time series for KIC 9392416, shown for transit pulse durations of 3, 6, and 12 hr. The solid line shows the rms CDPP value for the 3-hr CDPP time series, the dotted line for the 6-hr CDPP time series, and the dashed line for the 12-hr CDPP time series.

Figure 4 shows the detrended flux time series for KIC 9328434, which is a variable star with magnitude Kp = 11.2. The blue dashed line in Figure 2 shows the power spectrum of this target, which is relatively quiet on short time scales (the 6-hr rms CDPP is 41 ppm), but contains significant correlated noise on longer time scales due to the stellar variability, which has a characteristic period of ∼2.5 days. Figure 5 shows the CDPP time series for this variable target. In this case, note that the 6-hr and 12-hr CDPP values are typically higher than the corresponding 3-hr CDPP. This arises, contrary to the previous, white-noise-dominated example, due to the fact that the multi-periodic flux time series has a characteristic period much longer than 3 hr. When integrating the flux time series on 6 and 12 hr time scales, we are integrating over larger changes in the stellar signal, which increases the scatter in the observations, and therefore the measured CDPP.

Fig. 4.—

Fig. 4.— Q3 detrended flux time series for the variable star, KIC 9328434.

Fig. 5.—

Fig. 5.— CDPP time series and rms CDPP for KIC 9328434, shown for transit pulse durations of 3, 6, and 12 hr.

Finally, Figures 6 and 7 show the detrended flux time series and CDPP times series respectively for target KIC 9390653, which is also identified as KOI 249 in Borucki et al. (2011a). This is a target with magnitude Kp = 14.5, and a transiting planet candidate with an orbital period of 9.549 days. The detrended flux time series in Figure 6 clearly shows the transits, which have an average depth of 1775 ppm. The transits have a duration of 1.84 hr, and although they have a high S/N, there is no evidence of their presence in the CDPP time series in Figure 7; that is, they do not perturb the locally measured noise. Their contribution to the noise estimate is suppressed because the longest window over which the wavelet transformation considers the noise variance is large (30 times the transit duration) compared to the transit duration. Note that we also calculate CDPP time series for 1.5- and 2-hr transit pulses and the transits are not evident in these time series either; due to the nature of the whitening, transits should not generally increase the measured CDPP unless they are very closely spaced in time.

Fig. 6.—

Fig. 6.— Q3 detrended flux time series for planet candidate KIC 9390653/KOI 249.

Fig. 7.—

Fig. 7.— CDPP time series and rms CDPP for KIC 9328683, shown for transit pulse durations of 3, 6, and 12 hr.

3. DESCRIPTION OF ONLINE RMS CDPP TABLES

For users interested in the ensemble noise statistics, for instance those performing completeness calculations, tables of the rms CDPP values for the planetary targets are being made available from the MAST Kepler website,4 starting with all the publicly released data up to Quarter 10. In Table 1 we show an extract of the table for Quarter 3, described below, demonstrating the format and content. For users working with individual targets, the 3-, 6-, and 12-hr rms CDPP values are stored in the headers of the individual light curve FITS files, also available at the MAST website. As the data are reprocessed with improved versions of the SOC pipeline, the FITS files are replaced at MAST and will contain updated rms CDPP values. See the Kepler Archive Manual (KDMC-10008-004; Fraquelli & Thompson 2012)5 for details of the FITS headers.

4. QUARTER 3 RMS CDPP VALUES

The summary statistics for the CDPP values are included in the Data Release Notes (DRN) for each quarter, available from the MAST website.6 Here we discuss in more detail the characteristics of the Q3 rms CDPP values as a guide for analysis of subsequent quarters. Q3 was the first 'typical' data set obtained by Kepler; the field was observed near-continuously from 2009 September 18 to 2009 December 16, for a total of 89 days. Thus, it was the first opportunity to examine the distribution of rms CDPP values. Table 2 is an updated version of the Q3 values in Table 1 from the Kepler Data Release 14 Notes (KSCI-19054-001; Christiansen et al. 2012),7 listing rms CDPP values for 6-hr transit durations instead of median CDPP values for 6.5-hr transit durations. The trends therein are discussed below.

4.1. Distribution with Stellar Type

Figure 8 shows the distribution of the 6-hr rms CDPP values with Kp magnitude for all the planetary targets in Q3, 165,441 targets in total. All stellar parameters discussed in this section are drawn from the Kepler Input Catalog (KIC; Brown et al. 2011). There are three distinct features visible in Figure 8. The first is the discontinuity in the number of targets at Kp = 14—this is an artifact of the Kepler target selection, whereby targets with Kp > 14 and log g ≤ 4 are excluded to reduce the number of populous faint giants in the target list. The second is the increase in rms CDPP with increasing magnitude. The lower bound on this distribution is the minimum noise floor, with contributions from both shot noise, which increases with increasing magnitude, and the typical read noise. The third is that there is a faint population separated vertically from the main body of targets. The difference in the populations can be attributed directly to the stars (i.e., it is not an instrumental effect). The upper panel of Figure 9 shows the distribution of rms CDPP for dwarf stars with log g > 4, and the lower panel shows the distribution for giant stars with log g ≤ 4 (note the cut-off at Kp = 14, as stated previously). The bin sizes are 2 ppm for the rms CDPP and 0.05 mag for the Kepler magnitude. The observed noise levels for the giant stars are significantly higher on average, in particular the population between 160 and 240 ppm, which is virtually absent in the dwarf stars.8 This was first noted for the Kepler targets by Koch et al. (2010).

Fig. 8.—

Fig. 8.— Distribution of the 6-hr rms CDPP values with Kp magnitude for all Quarter 3 planetary targets.

Fig. 9.—

Fig. 9.— Distribution of the 6-hr rms CDPP values with Kp magnitude for Quarter 3 dwarf targets (upper panel) and giant targets (lower panel).

These populations are also evident if we plot the rms CDPP values as shown in Figure 10, with the surface gravity plotted against the effective temperature of the target. The giant stars, with low surface gravities, are distinct as the relatively high rms CDPP population in the top right of the figure. The coolest dwarf stars are also highly active, which increases the observed noise. These stars comprise another relatively high rms CDPP population in the bottom right. Solar-type stars, with Teff ∼ 5500 K and log g ∼ 4, are typically well-behaved on the timescales over which we are measuring the noise, and comprise the relatively low rms CDPP population in the center. Batalha et al. (2012) noted that there are systematic problems with the stellar parameters in the KIC, with combinations of surface gravities and effective temperatures that do not lie on any modelled isochrone. As a result, there are likely to be significant numbers of misclassified stars in this figure (see Mann et al. [2012] and Brown et al. [2011] for more discussion on these biases). However, the intent here is to demonstrate the behavior of CDPP as a broad function of stellar parameters and to allow users to investigate the parameter space for selecting targets for further analysis.

Fig. 10.—

Fig. 10.— Distribution of 6-hr rms CDPP values with KIC stellar parameters.

We can also examine the change in CDPP across different time scales to identify properties of stellar populations, as noted in § 2.1. The left panel of Figure 11 shows the distribution of the ratio of the 3-hr rms CDPP to the 12-hr rms CDPP across all targets in Q3. For targets dominated by white noise, the noise when integrated over 3 hr should be twice that when integrated over 12 hr, since the dependence goes as for N observations in a given integration. Indeed, in Figure 11 we see a large population of stars with a 3- to 12-hr noise ratio of ∼2. The right panel of Figure 11 shows the targets separated into the dwarf and giant populations as described earlier. It is evident that the giant population has an excess in the distribution from 1.05 to 1.2; due to the stellar variability, these stars are on average almost as noisy when integrated over 12 hr as when integrated over 3 hr. The majority of the stars however lie in a distribution peaked at ∼1.5, indicating that the typical target contains more power in the low-frequency range than would be expected from purely white noise. This is expected for a variety of types of stellar variability; Figure 2 of Jenkins (2002) shows that power in the noise spectrum of the Sun increases with the time scale of integration.

Fig. 11.—

Fig. 11.— Left: The distribution of the ratio of 3-hr rms CDPP to 12-hr rms CDPP for all Quarter 3 planetary targets. Right: The same distribution shown separately for dwarf targets (log g > 4) in green and giant targets (log g ≤ 4) in red.

Figure 12 shows the distribution of the rms CDPP values for the targets from the upper panel of Figure 9, i.e., dwarf stars with log g > 4, around which Kepler can expect to find transits of Earth-size planets; this is also the subset of targets considered in the 'dwarfs' columns of Table 2. Although bright targets are preferable for follow-up observations, the large increase in the number of Kepler targets with increasing magnitude is clear; in fact, most Kepler planets will be found around stars fainter than Kp = 14. The mode of the rms CDPP values increases with increasing magnitude largely due to the increased contribution from shot noise.

Fig. 12.—

Fig. 12.— Distribution of rms CDPP values for 6-hr transit durations in magnitude bins listed in Table 2, for the dwarf targets (log g > 4) from the upper panel of Figure 9. These distributions are used for the rms CDPP statistics in Table 2. Targets with Kp magnitude from 8.75 to 9.25 are shown in cyan, from 9.75 to 10.25 in magenta, from 10.75 to 11.25 in red, from 11.75 to 12.25 in blue, and from 12.75 to 13.25 in green.

4.2. Distribution with Position in the Field of View

Figure 13 shows the distribution of the Quarter 3 rms CDPP values across the Kepler field of view, for the same targets as Figure 10. The 21 modules can be seen projected onto the sky coordinates, each with two CCDs. The black arrow points in the direction of the Galactic plane. Although it is a fairly uniform distribution overall, two slight trends are evident. The first is a correlation between rms CDPP and the quality of the focus. In order to maximize the number of targets with good focus across the field, Kepler's best focus is found in the modules surrounding the central module. The central module itself and the outer modules are slightly out of focus compared to these modules. The modules with the best focus have lower rms CDPP values on average due to the resultant lessening of aperture effects introduced by pointing jitter and differential velocity aberration (described respectively in § 3.7 and 3.9 of Van Cleve & Caldwell 2009). The second trend is a function of Galactic latitude—modules closest to the Galactic plane have a slightly higher rms CDPP on average. This is most notable in the southern-most module in Figure 13. This is a result of the increased stellar density toward the Galactic plane, which increases the amount of noise in a given pixel contributed by background stars relative to modules at higher Galactic latitudes.

Fig. 13.—

Fig. 13.— Distribution of rms CDPP values on the Kepler field of view.

5. USING RMS CDPP TO ESTIMATE COMPLETENESS

As described in § 2, CDPP is a direct, empirical measurement of the detectability of a given transit signature in the Kepler data. Using the CDPP time series for a target, it is possible to calculate the probability that a sample transit signal of a given depth and duration, occurring at a given time, could be detected by the pipeline. The rms of that CDPP time series represents, to first-order, the average detectability of that sample transit signal for that target.

For the simplest derivation, we set tobs to the total span of time encompassed by observations of the target, and fo to the fraction of the total time the target was observed.9 For a given period, this gives us the average number of transits, Ntr = (tobs ∗ fo)/P, observed for a signal with period P. The total S/N that would be measured for a planet at period P over the whole time series, which at this point has been whitened using the wavelet filter described in § 2, is then , where 〈l〉 is the S/N of a single transit event. In an assumed white noise regime, the effective rms CDPP, CDPPeff, for a given transit duration, tdur, can be estimated by finding the closest provided duration, tCDPP (out of 3, 6 and 12 hr), and scaling the rms CDPP of that duration, CDPPN, such that CDPPeff = (tCDPP/tdur)1/2 × CDPPN. See the discussion in § 2.1 for a caution regarding applying this estimation to variable targets. For a planet of radius Rp, the transit depth is δ = (Rp/RS)2 for a target star of radius RS. For a single transit, 〈l〉 is then simply δ/CDPPeff, and the S/N of a putative planet with period P and radius RP over the set of observations is:

similar to equation (1) of Howard et al. (2011).

In the Kepler pipeline, the S/N threshold for detection is 7.1σ. Using the above calculation, it is then possible to estimate whether the putative planet signal would have been detected by the pipeline. This estimation can be performed over a grid of planet parameters for a target star, for which the detection completeness can then be calculated. See § 7.1 of Batalha et al. (2011) for a worked example of a typical 12th magnitude target star. For informed analyses of the planet population produced by Kepler thus far, the tables of rms CDPP are a necessary resource.

6. CONCLUSION

We have presented here an introduction to the rms CDPP values being made available on a per-target, per-quarter basis at the MAST website for the Kepler planetary targets. For each set of data, these values provide a measure of the observed noise, which for each individual target translates to a limit on the detectability of transiting planets. These values are extremely important for the characterisation of the underlying planet population, since to first order, they provide on a star-by-star basis the observational noise level that sets the limiting planet signal detectable by Kepler.

Funding for the Kepler Discovery Mission is provided by NASA's Science Mission Directorate. We thank the thousands of people whose efforts made Kepler's grand voyage of discovery possible. Some/all of the data presented in this article were obtained from the Mikulski Archive for Space Telescopes (MAST). STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS5-26555. Support for MAST for non-HST data is provided by the NASA Office of Space Science via grant NNX09AF08G and by other grants and contracts.

Footnotes

Please wait… references are loading.
10.1086/668847