Kepler Presearch Data Conditioning II - A Bayesian Approach to Systematic Error Correction

Jeffrey C. Smith; Martin C. Stumpe; Jeffrey E. Van Cleve; Jon M. Jenkins; Thomas S. Barclay; Michael N. Fanelli; Forrest R. Girouard; Jeffery J. Kolodziejczak; Sean D. McCauliff; Robert L. Morris; Joseph D. Twicken

doi:10.1086/667697

1. AN OVERVIEW OF THE KEPLER DATA PIPELINE

Kepler's primary science objective is to determine the frequency of Earth-size planets transiting their Sun-like host stars in the habitable zone.⁶ This daunting task demands an instrument capable of measuring the light output from each of over 100,000 stars simultaneously with an unprecedented photometric precision of 20 parts per million (ppm) at 6.5-hr intervals. The large number of stars is required because the probability of the geometrical alignment of planetary orbits that permit observation of transits is the ratio of the size of the star to the size of the planetary orbit. For Earth-like planets in 1 AU orbits about Sun-like stars, only ∼0.5% will exhibit transits. By observing such a large number of stars, Kepler is guaranteed to produce a robust result in the happy event that many Earth-size planets are detected in or near the habitable zone.

The Kepler data pipeline is divided into several components in order to allow for efficient management and parallel processing of data (Jenkins 2010a). Raw pixel data downlinked from the Kepler photometer are calibrated by the calibration module (CAL) to produce calibrated target and background pixels (Quintana 2010) and their associated uncertainties (Clarke 2010). The calibrated pixels are then processed by the photometric analysis module (PA) to fit and remove sky background and extract simple aperture photometry from the background-corrected, calibrated target pixels⁷ (Twicken 2010a). PA also measures the centroid locations of each star in each frame. The final step to produce light curves is performed in the pre-search data conditioning (PDC) module, where signatures in the light curves correlated with systematic error sources from the telescope and spacecraft, such as pointing drift, focus changes, and thermal transients, are removed. Additionally, PDC identifies and removes sudden pixel sensitivity dropouts (SPSDs) which result in abrupt drops in pixel flux with short recovery periods up to a few hours, but usually not to the same flux level as before. These step discontinuities are identified separately from those due to operational activities, such as safe modes and pointing tweaks, and are mended using a sophisticated method. PDC also identifies residual isolated outliers and fills data gaps (such as during intra-quarter downlinks) so that the data for each quarterly segment is contiguous when presented to later pipeline modules. In a final step, PDC adjusts the light curves to account for excess flux in the optimal apertures due to starfield crowding and the fraction of the target star flux in the aperture to make apparent transit depths uniform from quarter to quarter as the stars move from detector to detector with each roll maneuver. Output data products include raw and calibrated pixels, raw and systematic error-corrected flux time series, and centroids and associated uncertainties for each target star, which are archived to the data management center and made available to the public through the multimission archive at STScI⁸ (McCauliff 2010). A companion paper describes the details of overall PDC architecture (Stumpe 2012).

Data is then passed to the transiting planet search module (TPS) (Jenkins 2010b), where a wavelet-based adaptive matched filter is applied to identify transit-like features with durations in the range of 1 to 16 hr. Light curves with transit-like features whose combined signal-to-noise ratio (SNR) exceeds 7.1σ for a specified trial period and epoch are designated as threshold-crossing events (TCEs) and subjected to further scrutiny by the data validation module (DV). The 7.1σ threshold was set so that there is no more than one expected false alarm for Earth-like planets for the entire campaign, assuming Gaussian statistics (Jenkins 2002). DV performs a suite of statistical tests to evaluate the confidence in the transit detection, to reject false positives by background eclipsing binaries, and to extract physical parameters of each system (along with associated uncertainties and covariance matrices) for each planet candidate (Wu 2010; Tenenbaum 2010). After the planetary signatures are fitted, DV removes them from the light curves and searches over the residual time series for additional transiting planets. This process repeats until no further TCEs are identified. The DV results and diagnostics are then furnished to the science team to facilitate disposition by the follow-up observing program (FOP) (Gautier 2010).

2. A BAYESIAN APPROACH TO CORRECTING SYSTEMATIC ERRORS

Kepler is opening up a new vista in astronomy and astrophysics and is operating in a new regime where the instrumental signatures compete with the minuscule signatures of terrestrial planets transiting their host stars. The dynamic range of the intrinsic stellar variability observed in the Kepler light curves is breathtaking: RR Lyrae stars explosively oscillate with periods of approximately 0.5 days, doubling their brightness over a few hours. Some flare stars double their brightness on much shorter time scales at unpredictable intervals. At the same time, some stars exhibit quasi-coherent oscillations with amplitudes of 50 ppm that can be seen by eye in the raw flux time series (Jenkins 2010c). The richness of Kepler's data lies in the huge dynamic range for the variations in intensity by 4 orders of magnitude and the range of time scales probed by the data, from a few minutes for SC data to weeks, months, and ultimately, to years. Given that Kepler was designed to be capable of resolving small 100-ppm changes in brightness over several hours, it is remarkably rewarding that it is revealing so much more. The challenge is that an instrument so sensitive to the amount of light from a star striking a small collection of pixels is also very sensitive to small changes in its environment.

The systematic errors observed in Kepler light curves exhibit a range of different time scales, from a few hours, to several days, to many days and weeks. Such phenomena include, for example, temperature variations of the reaction wheel housing over the 3-day momentum managements cycles and the resultant focus changes of ∼2.2 μm °C^-1. Large thermal effects can be observed in the science data for ∼5 days after recovering from intermittent safe modes, and for ∼3 days after attitude changes required to downlink the data each month which is due to different sides of the spacecraft being heated during downlinks and subsequent thermal recoveries. Another prominent systematic is differential velocity aberration (DVA) and the orbital period which results in gradual trends in the data over each quarter. The principle objective of PDC is to remove these systematic effects by "cotrending."⁹ The fact that most systematics such as these affect all the science data simultaneously, though to differing degrees, provides significant leverage in dealing with these effects.

2.1. The Basic Problem and the Principle Behind the Solution

It is standard practice when removing systematic errors in stellar data to use robust least-squares (LS) on a set of basis vectors, as is used in methods such as SYSREM (Tamuz 2005) and TFA (Kovacs 2005). A robust LS approach, as outlined below in § 2.2, can find a chance linear combination of the systematic error model components that reduces the bulk rms at the expense of distorting the intrinsic stellar variations and introducing additional noise on short timescales. The fundamental problem with this approach is the fact that the implicit model fitted to the data for each star is incomplete. LS cotrending projects the data vector onto the selected basis vectors and removes the components that are parallel to any linear combination of the basis vectors. This process is guaranteed to reduce the bulk rms residuals, but may do so at the cost of injecting additional noise or distortion into the flux time series. Indeed, this occurs frequently for stars with high intrinsic variability, such as RR Lyrae stars, eclipsing binaries, and classical pulsators. For example, if one of the model terms is strongly related to focus variations, and the long-term trend is for the width of the stellar point spread function (PSF) to broaden over the observation interval, then the flux for all stars should decrease over time. A LS fit, however, may invert the focus-related model term for a star whose flux increases over the observation interval, thereby removing the signature of intrinsic stellar variability from this light curve because there is a coincidental correlation between the observed change in flux and the observed change in focus. Given that the star would be expected to dim slightly over time, if anything, due to the focus change, PDC should be correcting the star so that it brightens slightly more than the original flux time series would indicate.

The situation is analogous to opening a jigsaw puzzle box and finding only 30% of the pieces present. LS gamely tries to put the jigsaw pieces together in order to match the picture on the box cover by stretching, rotating, and translating the pieces that were present in the box. The result is a set of pieces that roughly overlap the picture on the box cover, but one where the details do not necessary match up well, even though individual pieces may obviously fit. In order to improve the performance of robust LS, we need to provide the fitter with constraints on the magnitudes and signs of the fit coefficients. These constraints can be obtained by using the ensemble behavior of the stars to develop an empirical model of the underlying physics. For example, the photometric change that can be induced by a pointing change of 0.1" must be bounded, and this bound can be estimated by looking at how the collection of stars behaves for a pointing change of this magnitude.

As an example of this analysis, and to demonstrate systematic trends in the Kepler data, take channel 2.1, which is the most thermally sensitive CCD channel in Kepler's focal plane. Nearly all stars on this channel exhibit obvious focus-and pointing-related instrumental signatures in their pixel time series and flux time series. Figure 1 shows several characteristic light curves for typical targets¹⁰ on channel 2.1 during the Kepler Quarter 7 data season normalized by the median flux value. Note the long-term increase for all flux curves over the 90-day interval. This is due to seasonal changes in the shape of the telescope, and therefore its focus, as the Sun rotates about the barrel of the telescope while the spacecraft orbits the Sun and maintains its attitude fixed on the field of view (FOV). All light curves exhibit these long-term trends, but to differing degrees. Also present are some short-term oscillations evident but mainly obscured in variable targets, which are due to focus changes driven by a heater cycling on and off to condition the temperature of the box containing reaction wheels 3 and 4 on the spacecraft bus. This component was receiving more and more shade throughout this time interval, and the thermostat actuated more frequently over time. A target that is varying on levels and periods similar to these systematic effects can obscure the systematics, making identification difficult. Looking at a single quiet target shown in Figure 2 (one of the targets shown in solid blue inFig. 1) allows us to more clearly see the systematic trends which are exhibited in all the targets in Figure 1, but obscured by variability. Each Earth point, one at the beginning of the quarter (cadence index 0) and after each monthly downlink (cadences 1500 and 2800) results in a heating of different sides of the telescope as the spacecraft reorients the antennae to downlink data. The Earth points themselves are gaps in the data. They result in periods of local heating and cooling, distorting the telescope. A characteristic recovery time is also evident. The other trends as described above are also clearly evident. A final short data gap is also evident at cadence 3950, but this was not due to a reorientation of the spacecraft, so no thermal recovery is present.

**Fig. 1.—** Selection of typical light curves on channel 2.1. Notice the long-term trend exhibited in all light curves. However, for highly variable targets the trend is not entirely clear. This illustrates the need to separate intrinsic stellar variability from systematic trends. Differing degrees of noise on each light curve are also evident, but the normalization obscures the fact that the variability and brightness of these light curves all have considerably different magnitudes.

**Fig. 2.—** A particularly quiet target on channel 2.1 showing almost purely systematic trends. The long-term trend is due to the seasonal changes of the shape of the telescope as the sun rotates around the barrel. Other spacecraft systematics are also visible, such as monthly Earth-point downlinks and heater cycling. Data gaps and their thermal recoveries during the monthly downlinks are evident at cadences 1500 and 2800.

As a counter example, Figure 3 shows the same highly variable target in solid blackin Figure 1. Note how this highly variable star almost completely obscures the long-term trend. The targets shown in Figures 2 and3 will hereby be referred to as the "quiet target" and the "variable target" and used as canonical example targets in § 3.

**Fig. 3.—** Highly variable target where the variability completely obscures the systematic trend.

How can we separate intrinsic stellar variability from instrumental signatures? We do not expect intrinsic stellar variability to be correlated from target to target, except for rare coincidences, and even then one would not expect a high degree of correlation for all time scales. However, we do expect instrumental signatures to be highly correlated from target to target and can exploit this observation to provide constraints on the cotrending that PDC performs.

Figure 4 shows a histogram of the absolute value of the correlation coefficient for 1864 targets on channel 2.1. The targets' light curves are highly correlated, as evidenced by the near complete pile-up near an absolute correlation coefficient of 1. Examination of individual light curves indicates that these light curves are contaminated to a large degree by instrumental signatures, as evidenced in Figures 1 and 2. However, not all the targets are dominated by systematic errors. The trick is to come up with a method that can distinguish between intrinsic stellar variability and chance correlations with linear combinations of the diagnostic time series used to cotrend out systematic errors.

**Fig. 4.—** Median absolute correlation for all targets on channel 2.1 quarter 7.

2.2. The MAP Approach. An Analytical Solution

A Bayesian approach called the maximum a posteriori (MAP) method allows us to provide PDC with constraints on the fitted coefficients to help prevent over-fitting and distortion of intrinsic stellar variability. The application of a Bayesian MAP approach to Kepler data was first proposed in (Jenkins 2012). In this exposition we follow the notation of (Kay 1993).

The PDC-MAP technique examines the behavior of the robust LS fit coefficients across an ensemble of targets on each CCD readout channel in order to develop a description for the "typical" value for each model term. This description is a probability density function (PDF) that can be used to constrain the coefficients fitted in a second pass. To develop this approach, we build on a maximum likelihood approach.

The maximum likelihood approach models each light curve, y, as a linear combination of instrumental systematic vectors, referred to as cotrending basis vectors or CBVs, arranged as the columns of a design matrix, H, plus zero-mean, Gaussian observation noise, w:

The maximum likelihood estimator (MLE) seeks to find the solution, , that maximizes the likelihood function, p(y; θ ), given by

where C_w is the covariance of w and N is the number of data points. Taking the gradient of the log of equation (2), setting it equal to zero, and solving for θ yields the familiar LS solution,

This solution assumes the model H is a complete model to the data. We will show that the Bayesian model accounts for an incomplete model, which is the common case when removing systematics from stellar signals.

Adopting the Bayesian approach allows us to incorporate side information, such as knowledge of prior constraints on the model, in a natural way. Bayesianists view the underlying model as being drawn from a distribution and the data as being one realization of this process. In this case, we wish to find the MAP estimator of the model coefficients, given the observations (data):

where we have applied Bayes' rule (D'Agostini 2003) to simplify the expression. In this equation, p(θ ) is the prior PDF of the model coefficients. The mathematical form for p(y|θ ) is the same as for the non-Bayesian likelihood function p(y; θ ) in equation (2).

For illustration purposes, if we adopt a Gaussian form for the coefficient distribution, θ , then p(θ ) takes a closed form solution,

where C_θ and μ_θ are the covariance and mean of θ , respectively, and we assume that the coefficients are uncorrelated (which will hold true for orthogonal basis functions). We can then maximize equation (4), using equation (5), by maximizing its log likelihood,

Taking the gradient of equation (6) with respect to θ , setting it to zero, and solving for θ yields

If the observation noise, w, is zero-mean, white Gaussian noise with variance σ², then equation (7) can be rewritten as

The key to this Bayesian technique is to determine when to preference, or weight, the prior PDF over the conditional PDF. If the variance in the data is large compared to the "spread" allowed by the prior PDF for the model, then the MAP estimator gives more weight to the prior so that as σ² → ∞. This case would correspond, for example, to targets with large stellar variability, such as with the target given in Figure 3. In this case, the MAP weighting constrains the fitter from distorting the light curve and introducing noise on a short time scale. Conversely, if the variance in the data is small compared to the degree to which the prior PDF confines the model, the MAP estimator "trusts" the data over the prior knowledge, and as σ² → 0. This case would correspond to targets with small stellar variability, such as with the target in Figure 2 where there is little risk of over-fitting and distortion of the light curves, and it is a "safe" bet to use the conditional, LS fit.

3. THE EMPIRICAL BAYESIAN MAP APPROACH AND IMPLEMENTATION

The above analytical solution to the Bayesian posterior PDF restricts the prior PDF to a Gaussian form. There is no a priori reason to make this assumption, and in general, since we are developing an empirical prior PDF, the least number of analytical constraints on the form, the more complete will be the empirical model.

If a Gaussian form to the prior is no longer assumed, then the prior formalism in equation (5) can no longer be used. We can, however, still take the log form of equation (4) to obtain

Using the MLE in equation (2) for p(y|θ ), removing the constant terms, inserting a weighting parameter, and using normalized light curves, , we obtain

where we assume the observation noise, w, is zero-mean, white Gaussian noise and has variance σ². Since p(θ) is no longer in closed form, the "spread" in the prior PDF (i.e., the covariance of θ , C_θ in eq. [8]) can no longer be expressed succinctly. In its stead, a "generalized weighting parameter", W_pr, is used to characterize the "spread" in the prior PDF. Equation (10) must now be evaluated numerically.

The overall flow of the algorithm is shown in Figure 5. We start by normalizing the flux light curves and calculating a relative stellar variability. We then find basis vectors using SVD, based on a reduced set of flux light curves where cuts are made on target-to-target correlation and stellar variability. A robust LS fit is then performed on each target using the basis vectors just found. This ensemble of "robust fit coefficients" is used to generate the prior PDF. The conditional PDF is also found based on the same basis vectors. Once both prior and conditional PDFs are found, they are combined to generate the posterior PDF where a weighting parameter, based on the stellar variability and the "goodness" of the prior fit, is used to weigh the prior relative to the conditional PDF. Details are elucidated in the following subsections.

**Fig. 5.—** Flowchart of the PDC-MAP cotrending algorithm.

3.1. Finding the Cotrending Basis Vectors

The cotrending basis vectors are obtained using singular value decomposition. In order to have equal representation for all light curves independent of their absolute magnitude, we first normalize the targets by their median flux values [Δflux/median(flux)]. We then select the 50% most highly correlated targets based on the median absolute Pearson correlation. This cut generates a set that exhibits the strongest trends in the data. It mostly removes targets with large variability, but not completely. A variable star exhibiting a strong trend can still remain in the reduced list. We therefore first make a cut on the estimated variability of each target.

An estimate of the intrinsic stellar variability of each target must be found. Herein lies the fundamental chicken-and-egg problem of the cotrending method. We need to know the stellar variability of each target in order to know how much to rely on the prior. But if we already knew the stellar variability, then we would have no need for the prior—the cotrending solution would simply be the intrinsic stellar variability subtracted from the light curve, minus a Gaussian noise estimate. This issue is not specific to this particular cotrending method, either. Whenever a system is characterized with an incomplete model, there exists the problem of identifying the components in the system not represented in the model. We fortunately do not need to absolutely know the variability, we only need an estimated metric in order to weigh the prior. This estimate can be obtained by comparing a third-order polynomial to the light curve. The polynomial will remove any long-term trends leaving behind a roughly detrended curve. The standard deviation of this polynomial-removed light curve results in a rough calculation of the variability of the target. Removing a low-order polynomial is essentially a high-pass filter, we are therefore assuming any long-term trends are systematic and short-term trends are stellar. There are numerous counter-examples of short-term trends that are actually systematic—reaction wheel zero crossings is a good example. However, short-term systematic trends tend to be small in magnitude, whereas long-term systematics tend to result in large diversions in the flux amplitude. Likewise, there are examples of intrinsic long-term trends, but they are generally smaller than the systematic trends. Since we are only concerned with the relative amplitude of stellar versus systematic variability, we are using the low-pass filter to distinguish two characteristic realms of influence: long-term trends dominated by systematics and short-term trends dominated by intrinsic stellar variation. An example is shown in Figure 6. Here, a highly variable target is compounded with a long-term DVA and thermal trend. For periods less than 400 cadences, the variance in the flux is dominated by stellar features. The long-term variance, and the general trend to higher flux values, is due to systematics. The variance of the residual after removing the polynomial fit, labeled as "coarsely detrended light curve" in the figure, gives a rough estimate of the stellar variability of this target. Note that there are still systematic features in the detrended light curve. They are, however, small in magnitude compared with the stellar variability.¹¹

**Fig. 6.—** By removing a third-order polynomial fit to the raw light curve, an estimate of the intrinsic variability of the target can be calculated.

The variability, V, is measured using

where is the standard deviation of the third-order polynomial detrended light curve, Δy is the uncertainty of the flux data as determined by the PA pipeline component (Clarke 2010), and is the median variability over all light curves in the sample. The normalization by the uncertainty is to ensure the noise in the data is not included in the stellar variability. The normalization by the median variability is so that a variability of 1 is considered typical, thereby simplifying the analysis parameterization. Figure 7 shows a histogram of the measured variability for all targets on channel 2.1. The median of all of these values is evidently 1, and the distribution is typical for all channels where most are close to typical variability but with a long tail to high variability (note the log scale for the x-axis). There are two cutoff thresholds plotted, as well. The upper (dashed red ) is the threshold to determine if a target is "highly variable." The lower (solid green ) is to determine if a target is "very quiet." The very quiet targets have such a low amount of variability that using the prior PDF when generating the fit has been found to be problematic. Any targets above the high variability threshold are removed from the reduced list. The remaining targets are sorted with respect to median absolute correlation, and the 50% most highly correlated are used for SVD.

**Fig. 7.—** Histogram of estimated variability for all targets on channel 2.1. This distribution is typical for all *Kepler* channels. The quiet targets below the "SVD and Prior Generation" threshold are used to generate the cotrending basis vectors.

Due to all targets being normalized by their median, most targets pass through zero amplitude at the midpoint, which can be seen as a "node" in the light curves at cadence 2200 in Figure 1. If SVD was performed on this set "as is", then all the strong cotrending basis vectors would have zero amplitude at the midpoint. The basis vectors would therefore be unable to remove systematics in the minority of targets that do not pass through zero at the midpoint. The light curves are therefore "dithered" slightly by a zero-mean Gaussian dithering magnitude in order to slightly "spread" the light curves about the zero-flux value. Since the dithering is zero-mean, this has no effect on the resultant basis vectors other than to remove the artificial zero crossing node at the midpoint. Note that the dithering is only used to generate the basis vectors. The cotrending is performed on the non-dithered, but still median-normalized, light curves.

Figure 8 shows the singular values from the singular value decomposition. This figure is characteristic of all channels, showing two or three strong singular values, then a slowly tapering region for about another dozen values, until finally asymptotically approaching zero (as is expected with SVD). The first several left-singular vectors are selected (typically the first eight) to become the cotrending basis vectors. These first singular vectors exhibit the principle trends in the data due to DVA, pointing errors, impulses due to Argabrightenings (Witteborn 2011), focus errors, and reaction wheel zero crossings, among other trends. The number of basis vectors used is generally eight, however a SNR test is performed where the S/N is determined by

with A_signal and A_noise being the rms of the light curve and noise floor, respectively. The noise floor is approximated by the first differences between adjacent flux values. Any of the eight basis vectors with a S/N below a threshold of 5 dB are removed, but only a small number of basis vectors over the entire FOV are removed by the S/N test. Most have high S/N. There are other sophisticated methods to find the dimensionality of an eigensystem, such as Bayesian model selection (Minka 2000). These are not used because they tend to pick too high a dimensionality in this particular situation. We wish to find only the singular vectors with systematics; the lesser singular vectors do contain light-curve signal information, but not necessarily systematics, and we have found including them in the MAP fit adds no value yet slows down the algorithm.

**Fig. 8.—** Singular values from SVD on the reduced set of "cleaned" targets that are highly correlated and quiet.

For a minority of basis vectors, a few target light curves can dominate the signal. The normalization process attempts to "equalize" the strength of all targets, but a small number of light curves can be over-represented in the singular vectors from SVD. To eliminate this, we calculate an entropy metric for each basis vector using the following entropy calculation

where p(x) is a PDF created from the right-singular vectors from SVD (referred to as the " V-matrix"),

The V-matrix contains the contribution of the signal in the basis vector from each target light curve. We must first normalize the entropy calculation to a Gaussian distribution, which has the highest entropy of any continuous distribution with the same 2nd moment. The entropy of a Gaussian is

σ being the second central moment of V_ki for fixed i. The resultant relative entropy is therefore

If one (or a few) targets are domineering, then they will have much larger values in the V-matrix then all the other targets. A negative value of the entropy calculation will identify this condition. Bad entropy is somewhat arbitrary, but we have found that a value below -0.7 is poor. For any basis vectors with identified poor entropy, the V-matrix column for that basis vector is examined for stand-out targets. The offending targets are removed, and SVD is re-computed on the remaining targets. The process is iterated until the entropy of all basis vectors is below -0.7. Typically, no more than a couple iterations is necessary, and fewer than 20 targets are removed (out of 2500 total targets).

Figure 9 shows the first eight cotrending basis vectors generated for channel 2.1, and Figure 10 shows just the first basis vector. Trends can be found in all the vectors, but it is useful to concentrate on the first and strongest. Here the most characteristic trends and systematics in the data can be found. The general trend to higher flux is due to the seasonal change and solar orientation. The short recovery periods at cadence indices 0, 1500 and 2800 are due to monthly downlinks. The short spikes at 700 and 1450 are due to artifacts from correcting cosmic rays near reaction wheel zero crossing periods.¹² The periodic oscillation is due to heater cycling. Notice how the basis vector in Figure 10 closely follows the flux light curve in Figure 2. This signifies that virtually all the features in this flux light curve are due to systematic effects and not intrinsic stellar variability. In theory, any features in the light curves in Figure 1 that are not represented in the basis vectors in Figure 9 are intrinsic to the target. However, a simple LS projection of the light curves on the basis vectors will not produce desirable results for all targets, as will be shown below.

**Fig. 9.—** First eight cotrending basis vectors for channel 2.1. The x-axis is the cadence index. The y-axis is arbitrary for basis vectors. The gaps in the data have been linearly filled so these curves are continuous.

**Fig. 10.—** First cotrending basis vector for channel 2.1. The amplitude of the basis vector is arbitrary.

Once the cotrending basis vectors are found, a robust LS fit is performed on each target. This creates the empirical data used to generate the prior PDF.

3.2. Numerically Generating p(θ)

The prior PDF is based on the distribution of robust fit coefficients of the basis vectors for all light curves using the method described in (Bowman 1987). This method computes a probability density estimate of the sample data based on a normal kernel function using a window that is a function of the number of points in the data sample. The form of the prior PDF will depend on the parameterization of the robust fit coefficients. We must thus decide how to parameterize the coefficients to best extract the correlations. Some systematic effects are caused by focal plane irregularities and instrumental vibrations which are stronger near the edges of the CCD frame (nearer to the spacecraft housing). There are also other issues that are dependent on the physical position of each pixel on the CCD. Therefore, the targets' locations in the sky as characterized by right ascension and declination are reasonable parameters to characterize target location with respect to the sources of systematic effects. The targets' influence by systematic effects is also directly related to the stellar magnitude, since different magnitude targets result in different saturation levels of the CCD pixels. For example, the readout electronics for the CCDs are sensitive to temperature drift, but the sensitivity is non-linear with respect to CCD flux levels. So brighter targets are affected by instrument temperature differently than dim targets. We therefore parameterize the prior PDF with three independent variables: (1) stellar magnitude (K_p), (2) right ascension (R.A.), and (3) declination (decl.).

Figures 11 13 show the robust fit coefficients for a basis vector plotted against K_p, R.A. and declination. The blue star data is for all targets, whereas, the redcircle data is just for those targets remaining for SVD after the cuts discussed in § 3.1. The solid blueand dashed redcurves in Figures 12 and13 are the travelling window means of the bluestar and redcircle data, respectively. The cuts clearly produce a bimodal distribution in K_p for this basis vector. A simple Gaussian fit would not reproduce this and demonstrates that the systematic trends are correlated with K_p, but variable targets are masking the true correlation when a simple robust fit is performed. The correlations in R.A. and decl. are also evident but to a lesser extent. Notice also that the mean (solid bluecurves) are biased compared to the dashed redcurve. This is again because the variable targets are masking the true trends in the data.

**Fig. 11.—** Robust fit coefficients for basis vector 1 for all targets (*stars*) and only those targets used for SVD (*circles*) plotted against Kepler magnitude. By taking cuts on stellar variability, target-to-target correlation and entropy results in a bimodal distribution that would not be evident without the cuts.

**Fig. 12.—** Robust fit coefficients for basis vector 1 for all targets (*stars*) and only those targets used for SVD (*circles*) plotted against right ascension. The solid and dashed curves are the traveling window means of the star and circle data respectively.

**Fig. 13.—** Robust fit coefficients for basis vector 1 for all targets (*stars*) and only those targets used for SVD (*circles*) plotted against declination. The solid and dashed curves are the traveling window means of the star and circle data, respectively.

Some basis vectors exhibit stronger trends in K_p, R.A., or decl., but not necessarily all three simultaneously, as is expected if the different systematics represented by the basis vectors have different instrumental sources. Plotting different basis vectors and/or channels reveals different trends and correlations.

We want to mainly rely on targets in the neighborhood around the target we are fitting, referred to as the target under study (TUS), in R.A.-, decl.-and K_p-space when generating the prior PDF. If we simply found an evenly weighted PDF, then a large cluster of targets with a certain coefficient value, even if nonlocal to the TUS, would always dominate the peak of the prior PDF. We therefore use a weighted probability density estimate based on the standardized Euclidean distance between targets x and y,

where Λ is a diagonal matrix whose diagonal elements give the relative weighting for each dimension. A straight normalization in each dimension by its standard deviation would result in equal weighting of all three dimensions, but we wish to overemphasize the prior PDF in dimensions that exhibit greater correlations, and in our case the robust fit coefficients exhibit a stronger correlation in K_p than in R.A. or declination. The Λ matrix diagonals are therefore

where mad(θ_i) is the median absolute deviation of the coefficient distribution along dimension i and S_i is the scaling factor for dimension i, S_i = {2, if i⇒K_p or 1,otherwise.} The above weighting results in the K_p dimension weighted twice as much as R.A. and decl. when generating the prior PDF. That is, targets further away in the K_p dimension are weighted proportionately less than in R.A. and declination. This effectively results in taking a tighter cut in K_p-space to emphasize the greater correlation in that dimension. Since the PDF is weighted by this distance metric, the PDF will emphasize the correlation in K_p and yet still be sensitive to the trends in R.A. and declination. The median absolute deviation is used instead of the standard deviation in order to ignore outliers. The effect of the scaling factor S is not great compared with the overall distance weighting, however, it does improve results slightly. As development continues, this functionality may reveal itself to be more useful when the number of independent variables increases.

The weighting by equation (17) and how it affects the prior PDF is illustrated in Figures 14 16, the latter two being the prior PDFs for the same two targets in Figures 2 and3. The bluehistogram in all three figures is exactly the same, since they are generated from the same distribution of coefficients. However, the prior PDF (solid redcurve) is dramatically different. In Figure 14 a bimodal PDF is evident due to targets nearby to the TUS containing two clusters, around -1.3 and -0.85, and suggests that the coefficient value for the TUS should be one of these two values. Which value that is actually chosen will be dependent on the form of the conditional PDF and the weighting of the prior PDF as discussed in § 3.3. In Figure 16 the targets near the TUS have coefficients clustered around -0.34, which is far from the peak in the unweighted PDF. Using the unweighted PDF would have completely missed the actual systematic trend in the data near the TUS. The log of the prior PDF is plotted in these figures for direct comparison with equation (10), which results in a compression of the PDF near the top.

**Fig. 14.—** Histogram of basis vector 2 robust fit coefficients for all 1864 targets on channel 2.1 and the weighted probability density for a particular target. The weighting by distance in K_p, right ascension, and declination clearly affects the PDF.

**Fig. 15.—** Histogram of basis vector 2 robust fit coefficients for all 1864 targets on channel 2.1 and the weighted probability density for the quiet target shown in Fig. 2.

**Fig. 16.—** Histogram of basis vector 2 robust fit coefficients for all 1864 targets on channel 2.1 and the weighted probability density for the variable target light curve shown in Fig. 3.

In summary, the prior PDF is developed by generating a three-dimensional weighted distribution of robust LS fit coefficients in R.A.-, decl.-and K_p-space. This methodology makes no assumptions on the form of the PDF, Gaussian or otherwise, and allows PDC-MAP to identify and characterize the form of the systematic trends across the full distribution of targets.

3.3. Finding the Weighting Parameter W_pr

For each light curve, the weighting parameter, W_pr in equation (10) is an empirical weighting parameter that is principally based on the variability of each target. The greater the variability, the greater we need to constrain the LS fit. However, there is another complication. A fit to the prior PDF is not always a good fit to the trend in the target. The reason for this disagreement is currently being investigated. One factor influencing the "goodness" of the prior fit is the sparseness of the targets in certain regions in R.A.-, decl.-and K_p-space. A sparse distribution will result in poor prior statistics. There are also other unknown causes resulting in poor priors for some targets, and so an additional parameter in the prior weighting is an evaluation of the "goodness" of the prior fit. The goodness is evaluated using a method similar to the variability calculation above. The prior fit is compared to a third-order polynomial fit to the light curve with a soft-wall cutoff using the following equation:

where G_raw is the "raw" goodness given by

and F pr and F poly are the prior PDF fit and the third order polynomial fit to the data, respectively. Normalization by the median absolute deviation (mad) of the polynomial fit-removed light curve allows for a comparison of the difference between the polynomial fit and the prior fit with respect to the variance of the target. The soft cutoff is to ensure that small changes in the light curve will not have dramatic changes in the weighting. The scaling parameter α_G is determined when the deviation of the prior fit to the polynomial fit becomes too poor to be useful in constraining the posterior fit and is tuned empirically by maximizing the PDC goodness metric which is described in a companion paper (Stumpe 2012). An example of a poor prior fit is given in Figure 17. Note how both the long-term trend and the Earth-point recoveries are much larger in the prior fit than in the actual data. Examples such as this are in the minority, but frequent enough to require the additional test for prior goodness. It could be proposed that this target is trending downward, cancelling out the upward trend of the prior fit. This is unfortunately not the case. Examination of Kepler season quarters 6 and 8 reveal that this target is not experiencing a general trend in quarter 7. The prior fit is indeed poor and should not be used to any large degree. The unfortunate side effect of the prior goodness test is that PDC-MAP is less sensitive to very long-term trends in the data. A true long-term trend in the data that cancels out the systematic trend can confuse the prior goodness metric which would interpret the fit as a bad prior. The only way to surely know the actual long-term trend is to examine multiquarter data. Future versions of the PDC module may indeed provide this functionality.

**Fig. 17.—** Example of a poor prior PDF fit to the trend in the target.

The resultant full form to the prior weighting is

The parameters β_V and β_G being scaling factors for the variability and prior goodness, respectively, and are tuned to maximize the PDC goodness metric, like α_G above. Future work includes fully characterizing the conditions where the prior fit is poor, and thereby removing the unfortunate need for an empirical prior "goodness" metric.

In cases where the prior goodness is near zero, the fit reverts to a reduced robust fit where the number of basis vectors is limited to just the first several (default is four). A MAP fit has a pleasant feature where a large number of basis vectors can be used. The prior PDF restricts the fit from drifting drastically in the function space, searching the large set of basis vectors for a combination that reduces the bulk rms at the expense of distorting stellar features. If the prior cannot be used, then there is no such restriction, and the posterior PDF becomes a LS fit, so a more limited number of basis vectors must be used in order to constrain the fit. The first several basis vectors have very strong trends in most of the data, and also have low noise components, so they are generally safe to use, even with an unrestricted LS fit. It is also generally true that a target with a bad prior is so because the target is quiet, and any small deviation in the prior from a true trend is very noticeable, and the prior is neither necessary or desirable to use.

If the target is below the variability threshold shown in Figure 7, then the target is very quiet, and in many cases the use of the prior fit only worsens the fit over a LS fit and so the prior weighting is zeroed. This is due to the prior fit never being an exact match to the target trend, and even small deviations can "pull" the posterior fit away from a good fit. In such cases there is little risk of a quiet target biasing a LS fit away from a proper cotrending fit. The majority of targets do not fall into either of the above two cases, and the prior is used to the degree dictated by the prior weight, and a Bayesian MAP fit is performed.

3.4. Maximization of the Posterior PDF

Once the prior weighting is determined, the posterior PDF can be assembled using equation (10). Due to the empirical (and therefore non-analytical) form of the prior PDF, the posterior must be maximized numerically. In general, a multidimensional maximization is difficult and time consuming due to the risk of only finding local maxima. Fortunately, due to the use of SVD, the basis vectors are all orthogonal, so the various coefficients can be maximized sequentially. The process is therefore straightforward. The strongest singular vector is maximized first and then all subsequent singular vectors are maximized in turn.

Following along with the same two examples of a quiet and a variable target, Figures 18 and 19 show the final posterior PDF along with the prior and conditional PDFs. The blackdots and magentastars are the maxima of the prior and conditional PDFs, and the bluecircle is the maximum of the posterior PDF. Due to the varying scales of the three curves, the prior and conditional curves have been renormalized to the same scale as the posterior for illustration purposes. The conditional fit has the smooth quadratic form characteristic of a LS fit. The prior appears Gaussian on this scale but in general is not. The title of each plot gives the prior weight. For the quiet target, the variability is low so the prior weight is only 4.65, resulting in a minor correction to the conditional curve, and so the conditional and posterior curves are virtually identical. Also, in cases where the width of the prior PDF is large, the maximum is low and it makes little contribution to the posterior. A wide prior is equivalent to saying that the prior information results in little added information to a good fit, the extreme case being a flat prior PDF which provides no additional information. This is in contrast to the variable target where the conditional maximum is near the prior PDF, meaning the fit almost completely relies on the prior data, due to the high weight on the prior PDF of 976, resulting in only a slight influence of the conditional fit. In the case of the variable target in particular, if the prior PDF was not determined using a weighted distribution (in K_p-, R.A.- and decl.-space) then the maximum of the prior would have been at about 0.6, as shown by the unweighted histogram in Figure 16. This would have resulted in a poor fit to the systematics. The actual prior fit takes into account the location of the TUS and the systematic trends in targets nearby to the TUS.

**Fig. 18.—** Posterior, prior, and conditional PDFs for the quiet target. The prior and conditional curves have been renormalized to the same scale as the posterior for legibility. This target is quiet, so the prior PDF weighting is low and does not influence the posterior by much. The maximum of the posterior is therefore very close to the conditional maximum. The width of the prior PDF can also influence its height and amount of influence on the conditional.

**Fig. 19.—** Posterior, prior, and conditional PDFs for the variable target. The prior and conditional curves have been renormalized to the same scale as the posterior for legibility. This target is highly variable, so the prior PDFs is highly weighted at 976, and it influences the posterior considerably. The maximum of the posterior is therefore close to the prior maximum.

Once the maximum of the posterior PDF is found for each basis vector, the MAP fit is a linear combination of the basis vectors. The resultant fits are in Figure 20 and 21. For the quiet target, all three fits roughly overlap the actual trend in the data. The prior fit is not an exact match, and the slight disagreement is to be expected since the prior is purely formulated using targets other than the TUS. For a quiet target such as this one, highly weighting the prior PDF would result in a degradation of the fit, and so instead the PDF relies mainly on the conditional PDF (i.e., the reddashed and greensolid curves overlap). The resultant light curve after the trend removal is in the bottom figure for both the MAP fit and the conditional fit, the later being a LS fit. For the quiet target, note how the resultant curve is near featureless above the noise floor. Some slight artifacts are not fully removed and methods to correct these are discussed in § 4. In the case of the variable target, the conditional PDF results in a fit that attempts to remove all features in the light curve, whereas the prior PDF correctly identifies just the systematic trends in the data. In this case, it is beneficial to rely principally on the prior PDF. The prior cannot be well discerned in the figure because it lies under the MAP fit. The conditional fit also introduces a considerable amount of noise into the corrected light curve, due to it being constructed more from the lesser basis vectors (shown in Fig. 9) which contain larger noise components.

**Fig. 20.—** Resultant fits to the prior, conditional, and posterior (*MAP*) PDFs for the quiet target. Target variability is low at 1.17, and W_pr is also low at 4.65. Quiet targets rely principally on the conditional PDF.

**Fig. 21.—** Resultant fits to the prior, conditional and posterior (*MAP*) PDFs for the variable target. Target variability is high at 30.25, and W_pr is also high at 976.2. Variable targets rely principally on the prior PDF. It is evident that the posterior (*MAP*) fit finds the systematic trend yet preserves the variability. Note that the errors on the raw flux curve are small compared to the line thickness, so the noisy conditional fit would introduce a considerable amount of noise in this target.

3.5. Propagation of Uncertainties

Propagation of uncertainties is not necessarily straightforward because a covariance matrix is difficult to formulate for an empirical prior PDF. As a first approximation, the propagation can be assumed to be through a LS solution—which is close to the solution for most targets. If C_raw and C_cot denote the covariance matrices for the temporal samples of the raw and cotrended flux time series for a given target, then the uncertainties may be propagated (disregarding the uncertainty in the mean level which can be considered to be negligible) by

where the transformation T_cot is defined by

H being the same design matrix as in equation (10). This is overly conservative since the posterior PDF is more constrained than a simple LS fit. A more accurate propagation of uncertainties would take into account the attenuation of the uncertainties due to the prior PDF.

4. FUTURE IMPROVEMENTS

The algorithm as presented works phenomenally well for the majority of light curves in the Kepler FOV. Overall PDC performance is discussed in a companion paper (Stumpe 2012). However, problems do arise. Remediations to these problems are discussed in this section.

One of the main issues with the current method is that different types of systematic effects are represented in each basis vector shown in Figure 9. For example, the long-term trends associated with DVA and seasonal changes should not be represented by the same basis vectors as heater cycles and Earth point thermal recoveries. Given that these different systematics behave on different time-scales, a reasonable solution is to band-split the light curves and generate separate basis vectors for each band. This method is currently in development.

The current method relies solely on singular value decomposition to generate the basis vectors after cuts on quiet and correlated targets. SVD is a reliable and often-used tool to generate basis vectors that describe highly correlated trends in data. However, it does have its detractions. For non-Gaussian systematic trends, SVD is not ideal. Methods such as independent component analysis, as described in (Waldmann 2011), can potentially better deconvolve independent systematic sources.

Occasionally, a single target can dominate one or more basis vectors. The resulting basis vectors essentially contain all stellar variability, and noise of the offending target making the basis vector impotent in removing systematics. The entropy cleaning step removes the offending targets, but the source of the problem is partially a result of the normalization by the median value of each light curve. For the small number of dim targets with appreciable noise, the noise is overly represented in the basis vectors. Normalization by the standard deviation or noise floor of the light curves would limit the problem of over-represented targets, at the expense, however, of not equally normalizing the light curves by flux intensity, which can introduce other problems.

We have also discovered that, over the channel FOV, groups of targets exhibit similar systematics that are distinct from other targets, so specific clusters of targets can be identified with similar trends. Using the same basis vectors for all targets is not ideal in this situation. We are investigating using a hierarchical clustering method, such as that described in (Jain 1999) to isolate the clusters and develop basis vectors separately for each cluster.

The prior PDF generation is based on correlations in stellar magnitude, R.A., and decl., but in some targets the prior is poor. It is to be expected that systematics are also correlated with other stellar and instrumental parameters. A full parametric study finding these correlations is to be performed to identify hidden variables that further characterize the systematics. One strongly suspected hidden variable is sub-pixel centroid motion.

There are also other stellar properties related to pixel response that could be beneficial for use in target selection and prior PDF generation. However, only sub-pixel centroid motion is available to PDC. We are investigating porting parts of the PDC-MAP algorithm to the PA Kepler pipeline component (Twicken 2010a) for application on individual pixels to improve both the basis vectors and prior PDF. Given the large number of targets analyzed, performing all of PDC-MAP at the pixel level would be prohibitively slow.

Kepler collects both long cadence data at ∼30 minute intervals and short cadence data at ∼60 s intervals. No more than 512 short cadence targets are collected at any time and are spread over the entire FOV, so the number of short cadence targets per channel is small and at most about a dozen. A dozen is too small of a sample for the prior PDF to be properly formulated. We are investigating ways to extend PDC-MAP to short cadence data. Options include using the prior PDF from long cadence data and using a single reference ensemble drawn from all 512 targets.

5. CONCLUSIONS

PDC-MAP dramatically improves Kepler's ability to understand the properties of parent stars. It preserves stellar signals and minimizes the noise while removing the systematic errors that can mask transit signals. PDC-MAP therefore ultimately improves Kepler's primary mission of detecting Earth-like planets. But PDC-MAP also improves the Kepler data's utility to the broader astrophysical community. Non—planet-finding studies such as asteroseismology are greatly benefited by PDC-MAP's ability to preserve stellar signals in the light curves.

Funding for this Discovery Mission is provided by NASA's Science Mission Directorate. We thank the thousands of people whose efforts made Kepler's grand voyage of discovery possible. The Kepler Science Office and Science Working Group work diligently in analyzing data products and have provided great insight into data reduction methods and data processing. We especially want to thank the Kepler SOC staff who helped design, build, and operate the Kepler Science Pipeline for putting their hearts into this endeavor. The authors would also like to thank the very productive conversations we have had with the Suzanne Aigrain group at Oxford University, especially Stephen Reece, Stephen Roberts, and Amy McQuillan.

Kepler Presearch Data Conditioning II - A Bayesian Approach to Systematic Error Correction

Article metrics

Permissions

Author notes

Dates

ABSTRACT

1. AN OVERVIEW OF THE KEPLER DATA PIPELINE

2. A BAYESIAN APPROACH TO CORRECTING SYSTEMATIC ERRORS

2.1. The Basic Problem and the Principle Behind the Solution

2.2. The MAP Approach. An Analytical Solution

3. THE EMPIRICAL BAYESIAN MAP APPROACH AND IMPLEMENTATION