Robust Detrending of Spatially Correlated Systematics in Kepler Light Curves Using Low-rank Methods

Light curves produced by wide-field exoplanet transit surveys such as CoRoT, Kepler, and the Transiting Exoplanet Survey Satellite are affected by sensor-wide systematic noise, which is correlated both spatiotemporally and with other instrumental parameters such as the photometric magnitude. Robust and effective systematics mitigation is necessary to achieve the level of photometric accuracy required to detect exoplanet transits and to faithfully recover other forms of intrinsic astrophysical variability. We demonstrate the feasibility of a new exploratory algorithm to remove spatially correlated systematic noise and detrend light curves obtained from wide-field transit surveys. This spatial systematics algorithm is data-driven and fits a low-rank linear model for the systematics conditioned on a total-variation spatial constraint. The total-variation constraint models spatial systematic structure across the sensor on a foundational level. The fit is performed using gradient descent applied to, a variable reduced least-squares penalty and a modified form of total-variation prior; both the systematics basis vectors and their weighting coefficients are iteratively varied. The algorithm was numerically evaluated against a reference principal component analysis, using both signal injection on a selected Kepler dataset, as well as full simulations within the same Kepler coordinate framework. We develop our algorithm to reduce the overfitting of astrophysical variability over longer signal timescales (days) while performing comparably relative to the reference method for exoplanet transit timescales. The algorithm performance and application are assessed, and future development is outlined.


INTRODUCTION
Transit surveys have contributed significantly to exoplanet science over the past two decades, uncovering exoplanet population statistics at current survey completeness limits (Batalha 2014;Bryson et al. 2020).Space-based surveys, including CoRoT (Auvergne et al. 2009), Kepler (Borucki et al. 2010), and TESS (Ricker et al. 2014) have led to the discovery of over 54001 exoplanets to date.These wide-field surveys include a large number of candidate systems and require a photometric precision after standard calibration and processing (residual photometric precision) adequate to detect exoplanet transits (Deeg & Alonso 2018).Achieving low residual photometric noise requires the optimization of both the instrument design and the associated algorithms for instrumental noise removal.In this paper, we describe a method for addressing spatially-correlated systematic noise across a wide-field imaging sensor used for exoplanet transit detection.
If an exoplanet transit induces a fractional flux density variation △F in observations, with a differential photometric precision σ p (after processing) then the signal-to-noise ratio is S/N ∼ △F σp (Deeg & Alonso 2018).For example, an Earth-Sun transit requires σ p ∼ 80 parts-per-million (ppm) over the transit duration T ∼ hours (Caldwell et al. 2010).The Kepler telescope achieved σ p ∼ 30 ppm over T = 6.5h for stars with Kepler magnitude K p ∼ 12 (Koch et al. 2010;Gilliland et al. 2011;Christiansen et al. 2012).TESS achieved σ p ∼ 230 ppm over T ∼ 1 hr for a star with TESS magnitude T p ∼ 10 which is sufficient to detect super-Earths around bright stars (Fausnaugh 2018).For reference, the Hubble Space Telescopes Space Telescope Imaging Spectrograph (HST/STIS) can attain σ p ∼ 120 ppm over T = 45 min (Demory et al. 2015) and non-wide-field ground-based telescopes can reach a precision sufficient to detect large Jovian exoplanets (Tregloan-Reed & Southworth 2013;Stefansson et al. 2017).
The instrumental response will vary over a range of timescales and across various spatial scales producing systematic noise effects, as expected from general principles (McLean 2008).The primary data products for the Kepler and TESS telescopes include light curves derived using optimized simple aperture photometry (SAP) or image data providing pixel-level light curves (Jenkins 2017;Tenenbaum & Jenkins 2018).Algorithmic innovations to suppress the residual differential photometric precision σ p , arising from unmodeled systematic noise effects, are therefore critical to detect weak exoplanet signals.These systematic effects have diverse physical origins (Jenkins 2017;Tenenbaum & Jenkins 2018).A single pixel exhibits a non-uniform sensitivity across its surface (Toyozumi & Ashley 2005;Hedges et al. 2021).In addition, there are variations in pixel-to-pixel sensitivity and instrumental pixel response (Van Cleve & Caldwell 2016) including CCD pattern, read, photon, and quantization noise (McLean 2008;Gilliland et al. 2011;Caldwell et al. 2010); instrumental CCD terms are expected to vary by CCD output channel or module.A net pixel response function (PRF) (Bryson et al. 2010) captures the spatially-variant response across the detector including the spatial variation of the optical PSF and instrumental detector sensitivity, amongst related factors (Jenkins 2017).The spatial response of the detector may, however, include unmodeled error due to pointing jitter, focus changes, uncorrected differential velocity aberration terms, and their interaction (Van Cleve & Caldwell 2016).Thermal effects as well as discrete spacecraft guidance and downlink control events, including momentum dumps, may produce temporal and spatial variation in the systematics, some abrupt (Van Cleve & Caldwell 2016;Vanderspek 2018).Further, bright astrophysical sources may produce pixel saturation or broadening of the PRF (Van Cleve & Caldwell 2016;Vanderspek 2018).
The first correction for noise effects is typically applied in early robust calibration pipelines (Jenkins 2017;Tenenbaum & Jenkins 2018).However residual spatially-and time-variant systematics unavoidably remain in the calibrated data products given the complex instrumental response at the current level of photometric accuracy.We denote these residual systematics contributions to each light curve i at discrete time sample n as l i [n] (Section 2.2) where the data may be either pixel or SAP light curves.The systematics l i [n] has the same discrete time sampling as the measured light curve.The residual systematics vary spatially over the sensor, reflected in equivalent notation l x,y [n], where light curve i maps to sensor position (x, y).Petigura & Marcy (2012) show an example of spatially-varying residual systematic noise in Kepler data by characterizing correlations across the sensor.Moreno et al. (2021) similarly characterize spatial systematics correlation in Kepler/K2 data.An a priori analytic instrumental model for l x,y [n] is intractable in practice and data-driven approaches, informed by physically-realistic instrumental assumptions, provide the most effective approach to mitigate the residual systematic noise in the target light curves.There is a rich history in the literature on this subject which we review briefly only to place our method in context.
A class of methods (hereinafter external parameter decorrelation methods (Bakos et al. 2007)) model the functional form of the residual systematics l x,y [n] as correlates of other explanatory variables such as pointing error estimates or ancillary engineering data (PDC-LS, Twicken et al. 2010).Charbonneau et al. (2005) demonstrated Spitzer2 aperture photometry correction using decorrelation with pointing errors estimated using target centroiding.There has been substantial development in this latter area, especially inspired by the pointing challenges of the Kepler K2 mission and pixel-level detrending, including the work described by Vanderburg & Johnson (2014a); Huang et al. (2015); Aigrain et al. (2015); Lund et al. (2015) and Crossfield et al. (2015).External parameter decorrelation methods implicitly include spatiotemporal variability in the residual systematics via the encoding of their functional form in the proxy variables.
A second class of methods (hereinafter cotrending methods) makes the foundational assumption that a set of measured light curves, or their computed vector basis, comprise an efficient basis of regressors over which to expand the unknown functional form of l x,y [n] as a low-rank linear model.This implicitly assumes that the residual systematics are temporally and spatially correlated across the sensor; this is similarly a physically reasonable assumption.The Trend-Filtering-Algorithm (TFA, Kovács et al. 2005a) directly uses a uniform selection of external light curves as regressors in a least-squares fit to detrend an individual target light curve.The cotrending approach is analogous to the joint coupled least-squares solution for the product of stellar extinction coefficients and airmasses as posed in a global solution for wide-field multi-object photometry (Sysrem, Tamuz et al. 2005); this solution is equivalent to a Principal Component Analysis (PCA) of target observations.A refinement including additive external parameters is provided by SARS (Ofir et al. 2010).
Cotrending methods have seen significant algorithmic refinement.The PDC-MAP algorithm (Stumpe et al. 2012;Smith et al. 2012) constructs a set of cotrending basis vectors (CBV) using singular value decomposition (SVD) on a template set of light curves selected for their high degree of correlation (therefore capturing foundational instrumental trends) and quiescence.Each CBV is fit against the target light curve using Bayesian maximum a posteriori (MAP) methods to obtain relative coefficient weightings for each basis term, the sum of which is then subtracted from the target time series to remove systematic effects.An empirical Bayesian prior is used to constrain overfitting, constructed on the variation of coefficient value over stellar magnitude and spatial position on the sensor.The CBV are orthogonal by mathematical construction and, therefore, do not map directly to constituent instrumental effects; this dilutes the variable dependence in the empirical coefficient prior.Instrumental effects are often separated by characteristic timescale and this mapping can be improved (MS-MAP, Stumpe et al. 2014) by deriving separate CBV for different timescale wavelet sub-bands; this provides better temporal separation in the spatiotemporal dependence of l x,y [n].A net composite systematic correction is applied as the multiscale combination across each sub-band.The Astrophysically Robust Correction algorithm (ARC; Roberts et al. 2013, Aigrain et al. 2017) is related conceptually to PDC-MAP but uses automatic relevance coefficient priors to maximize model evidence for instrumental trends.
Additional cotrending methods include the Causal Pixel Model (CPM, Wang et al. 2016) that uses a template (or training) set of regressor pixel-data light curves that are separated in time from the target light curve sample being corrected for systematics.The training set is spatially separated from the target but the target light curve itself is included to include autoregressive information.Both measures suppress residual variability except on short transit timescales.A successor method is described by Hattori et al. (2022).The method described by Foreman-Mackey et al. (2015) uses a large training set to derive an expanded CBV but uses joint estimation of systematics and the transit signal in order to better constrain the problem.A related Bayesian formulation is described by Taaki et al. (2020).The Pixel Level Decorrelation algorithm (PLD, Deming et al. 2015) was developed to implicitly correct photometric decorrelation with pointing error in Spitzer data.PLD uses pixel light curve regressors within a summed aperture to model SAP light curves.The regressors encode pointing errors without the need for explicit centroiding as in external parameter decorrelation approaches.The PLD method was extended by Luger et al. (2016) for application to Kepler K2 data by including higher-order terms in the SAP flux dependence on pointing error, by modeling astrophysical variability using a Gaussian process, and by constructing a PCA basis from the regressors to better constrain the model.The method was further improved (Luger et al. 2018a) by including L2 coefficient regularization, which was also used in CPM, and by incorporating PLD vectors from nearby stars (nPLD) to improve sensitivity to pointing error for faint stars.
The cotrending methods described above address challenges in estimating intrinsic astrophysical variability.Foundationally it is difficult to perfectly separate systematics and astrophysical variability a priori in a light curve regressor or derived basis set.Therefore the linear systematics model may include and absorb true astrophysical variability (overfitting) or inject spurious variability into detrended light curves.There may also be incidental correlation between a relatively clean systematics basis and astrophysical variability (Smith et al. 2018).Overfitting can be reduced by judicious use of prior constraints that represent a physically realistic instrumental response; these constraints may also significantly improve the condition of the problem.
In the cotrending methods described above, spatial structure is incorporated in the systematic noise model using several different approaches.In the simplest form, the spatial variation constraint is imposed implicitly by cotrending targets across a discrete sensor region as a whole (e.g. a CCD output channel) (Smith et al. 2012;Roberts et al. 2013;Stumpe et al. 2014;Luger et al. 2018a;Lund et al. 2021).Spatial variation has also been incorporated by restricting regressors by proximity to the target light curve (Luger et al. 2018a;Lund et al. 2021), or by directly introducing a parametric dependence of fitted systematics on sensor position in an empirical prior (Smith et al. 2012;Stumpe et al. 2014).
In this paper, we present an exploratory algorithm to solve for a cotrending low-rank linear systematics model while incorporating a spatial constraint across the sensor at a foundational level.We refer to this as the spatial systematics method in what follows.The spatial constraint is of generalized total variation (TV) form (Rudin et al. 1992).This TV constraint on basis vector coefficients promotes the correlation of adjacent neighbors on the sensor and weighted spatially across the sensor, while permitting discontinuities as found in wide-field imaging sensors at module or channel edges.As such, the spatial constraint has a realistic physical motivation and therefore a strong potential to reduce overfitting.The fit over all SAP light curves is performed by minimizing the sum of the least-squares residual between the light curves and the linear systematics model, and, the total variation spatial constraint.We use variable elimination (Golub & Pereyra 1973) to reformulate the optimization problem as a function of the weighted coefficients only, introducing stability to the minimization (Golub & Pereyra 2003;Shearer & Gilbert 2013).An approximate closed-form gradient of the objective function is derived which we minimize via gradient descent.In this iterative solution both the weighted coefficients, and the basis vectors, which functionally are defined by the coefficients, are varied.The algorithm as implemented is available as a public Python package on the Github repository3 or via PyPi under package name spatial-detrend4 .
We numerically evalaute the performance of our method in the context of the Kepler sensor and against PCA as a reference method.We use both injected signals in real long-cadence (LC) Kepler data and fully-simulated data in this evaluation.As noted above (Smith et al. 2012, PDC-MAP) Kepler systematics depend on stellar magnitude.This is also found for CoRoT light curves (Mazeh et al. 2009;Ofir et al. 2010).This effect likely arises due to pixel saturation effects.Kepler operated in the magnitude band Kp ∈ 9 to 15 (Koch et al. 2010).In this work for simplicity we do not include a prior on stellar magnitude dependence but instead select light curves from a narrow fixed magnitude band Kp ∈ 12 to 13 within which magnitude-dependent systematics such as saturation are not expected to substantially vary (Smith et al. 2012); this therefore allows the spatial variability systematics to be isolated.Our numerical evaluation demonstrates that the spatial systematics algorithm has reduced overfitting of astrophysical variability, particularly on day-long timescales, compared to the reference PCA method.
The paper is organized as follows.Section 2 introduces the light curve signal model, our method for spatial systematics inference, an analysis of spatial correlation across the Kepler sensor, and describes our experimental tests to numerically evaluate the algorithm.In Section 3 we report the results of our numerical evaluation.These results are discussed in Section 4 and conclusions are presented in Section 5. A table of commonly-used symbols is provided in Appendix A for reference.

METHODS
As described above, the data products for wide-field exoplanet transit surveys may comprise: i) pixel-level image data either within target apertures or for full frames; and, ii) SAP target light curves (Jenkins 2017;Tenenbaum & Jenkins 2018).Here we use Kepler SAP target light curves with a sampling cadence of 30 minutes produced by the Kepler science data processing pipeline but not including post-processing by the Kepler Pre-Search Data Conditioning (PDC) module (Twicken et al. 2010;Smith et al. 2012;Stumpe et al. 2012); the latter light curves with full Kepler post-processing are distinguished as PDC-SAP light curves.We present the details of the spatial systematics method that is the subject of this paper.

Matrix Notation
We define common notation here that is used throughout the paper.The i-th row of matrix M is denoted as [M] i and the j-th column as [M] •,j .The matrix element corresponding to the i-th row and the j-th column is denoted as [M] i,j or M i,j .The transpose of a matrix M is denoted as M T .An identity matrix of size N × N is denoted as 1 N .
The L p norm of a vector x of length I is defined as: to denote the L p norm applied to each column [M] •,j , followed by the L q norm applied to this vector, such that . The Frobenius norm is defined as

Light Curve Decompositon
A collection of target light curves {y i : i ∈ I} are obtained on a sensor, each light curve y i (over target index set I) is a length-N time-series.Each light curve is represented as the sum of a systematics term l i and a statistical noise term n i : In matrix form the data model takes the form Y = L+N, where y i , l i , and n i form the columns of matrices Y ∈ R N ×I , L ∈ R N ×I , and N ∈ R N ×I representing the light curves, systematic noise, and statistical noise respectively.The goal of this work is to form an accurate estimate of the systematic noise l i .Sources of systematic errors are described in Section 1; see also Auvergne et al. (2009), Van Cleve &Caldwell (2016), andVanderspek (2018).Statistical noise originates due to a mixture of instrumental error and astrophysical variability.We approximate statistical noise n i as white Gaussian noise n i ∼ N (0, σ 2 i ) since additive statistical noise sources may reasonably approach N (0, σ 2 i ) under the central limit theorem (Grinstead & Snell 2012).This is a common assumption in this domain and implicit in least-squares minimization approaches (Smith et al. 2012;Stumpe et al. 2012Stumpe et al. , 2014;;Aigrain et al. 2017).The noise level σ i is unknown a priori, but may reasonably be estimated from filtered and coarsely-detrended light curves.In what follows, light curves are normalized by an estimate of σ i so that n i ∼ N (0, 1) for mathematical convenience.
Light curves may include intermittent sources of systematic noise that produce outliers and are difficult to model.Their influence may be minimized by prior constraints (a further motivation for our approach here), but data-driven outlier filtering is generally necessary in this domain (Section 2.5).

Systematics Model
Low-rank model: -We adopt a cotrending basis model: where (K ≪ N ) describes the rank of the systematic noise model, {v k : k ∈ K} are a set of basis vectors shared by light curves, and each c k i is a coefficient weighting of v k for light curve i.The coefficient vector for light curve i is In matrix form: where L ∈ R N ×I is defined above, the columns of V ∈ R N ×K are the basis vectors {v k : k ∈ K}, and the columns of We define a column-normalized coefficient matrix C of the form ci = ci ∥ci∥2 : The rank of L may be less than the number of independent systematic noise sources.Since rank(VC) ≤ min{rank(V), rank(C)}, it is possible to have a more expansive basis set V representing many individual noise effects, but for the relative weightings C between light curves to have a degenerate and therefore low-rank structure and consequently for L to be low rank.
where p(Y|L) is the likelihood of Y given L (Srebro 2004).
For the data model in Equation 1, without any further constraints, a rank-K optimal least-squares solution of L can be found by rank-thresholding the SVD or PCA decomposition of Y per the Eckart-Young-Mirsky theorem (Appendix B) and in this case takes the form: In what follows we use the SVD/PCA method primarily as a comparative method (Section 2.5).
Spatial systematics model: -As described in Section 1, the residual systematics are spatially correlated across the sensor.In addition, the white noise model for N is an idealization and the systematics cannot then be perfectly separated by rank thresholding alone.In our spatial systematics algorithm, we retain a low-rank systematics data model L while recognizing the incomplete model for N. We introduce a spatial side constraint of total variation measure (Appendix C) on the normalized coefficient matrix C to facilitate more robust separation of the unknown astrophysical signals.The total variation constraint admits a small number of bounded discontinuities within generally smooth functions (Rudin et al. 1992;Vogel 2002) and promotes correlation between neighboring light curve systematics.This choice of spatial constraint is experimentally motivated in Section 2.5.In this formulation, the estimate for L = VC is obtained by minimizing an objective function comprising the least-square residual f (V, C) = ||Y − VC|| 2 F and the total variation penalty constraint g Here D W = WD where D is a difference operator, as defined in Appendix C, and W is a diagonal matrix of weights w x,y between [0, 1] to model non-uniform spatial correlation across the sensor.We assume each light curve i has a pixel position i → (x, y) ∈ (X, Y ), where X, Y ∈ Z + .Further, we assume uniform spatial cells of size (∆x, ∆y) across the sensor and that there is a one-to-one mapping between light curve i and spatial cell.
The weighted difference operator D W applied to C computes for each k ∈ K and cell (x, y), the weighted difference of neighbouring coefficients ,y) .This is depicted graphically in Figure 1.In expanded form . When p = 2 minimizing the total variation spatial constraint is equivalent to maximizing the correlation between neighboring coefficient vectors; this is shown in Appendix C.1.
Placing the spatial prior on the normalized coefficients C instead of the full systematics model L = VC has mathematical advantages as described in Section 2.4 and is shown in Appendix E to promote correlation between neighboring systematics estimates by proxy.The prior is calculated using normalized coefficients C to avoid the objective being minimized for a trivial minima.If instead the prior was of the form ∥D W C∥ p 2,p any solution C can be replaced with a solution C ′ = αC for α < 1 that produces a lower value of the spatial prior ∥D W C ′ ∥ p 2,p = α∥D W C∥ p 2,p .Taking V ′ = V α , there is no change to the least-squares penalty ∥Y − VC∥ = ∥Y − V ′ C ′ ∥, therefore a trivial minima can be found as α → 0.
Basis vectors obtained with SVD and PCA are constrained to be orthogonal, however, orthogonality is not a necessary constraint as a property of a systematics model, nor to achieve a minimal optimization cost.Our model does not restrict V or C to be orthogonal.

Systematics Inference
The systematics L = VC are inferred by minimizing the objective function defined in Equation 7.This function is defined over variables V ∈ R N ×K , C ∈ R K×I , and comprises a least squares residual f (V, C) = ∥Y − VC∥ 2 F , which depends on both variables V, C, and the total variation penalty constraint g(C) = ∥D W C∥ p 2,p , which only depends on C. The least-squares residual f (V, C) is not separable over V, C.
Coupled least-squares problems of this type can be solved by iterative alternating solutions for V and C (Wold 1973;Tamuz et al. 2005) or by using variable elimination to remove dependence on one of the variables (Golub & Pereyra 2003).There is no evidence that either is preferable and we adopt the latter variable projection method here.
For fixed V or fixed C the least-squares residual f (V, C) has an analytic solution for the complementary free variable.Therefore we can eliminate variable dependence on V in the least-squares residual so that the objective can be minimized over C alone.For a value of C the minimizing value V of the least-squares residual is denoted as a function h(C): This function h(C) can be inserted as V into the least-squares residual so that the original objective depends only on C: Using the theory described in Golub & Pereyra (2003), as elaborated in Appendix F.1, the variable-reduced objective takes the form: arg min The minimum of the derived objective does not have an analytic solution but can be obtained numerically using gradient descent optimization methods.However, the L p norm in the total variation constraint g(C) is not directly differentiable since it is discontinuous for zero-valued entries of D W C when p = 1.Proximal methods are standard for obtaining minima of objective functions with non-differentiable penalties such as L 1 norms, as described in Parikh & Boyd (2013).In particular, the Split-Bregman method (Goldstein & Osher 2009) is well suited to total-variation regularized problems.A preferred approach is to replace the non-differentiable L p norm with a continuously-differentiable Huber loss function as an approximation (Vogel 2002).Our general gradient-descent algorithm to minimize the variable-reduced objective is described in Algorithm 1.We refer to the variable reduced objective function in Equation 10 as w(•), the gradient of w(•) with respect to C is denoted ∇w(•).The gradient of the variable-reduced least-squares term F in the objective is denoted ∇f ′ (•) and the gradient of the total variation constraint term g(C) = ||D W C|| p 2,p as ∇g(•).These gradients are used used to iteratively update C t at each timestep t.We note that this variable-projection algorithm also updates V implicitly, denoted V |C .The details of the Huber loss function and the derived gradients are described in Appendix F.2.
Algorithm 1: Variable Projection Gradient Descent for the Spatial Systematics Algorithm Initialize C for rank K and weight matrix W. The step-size α t may be fixed or vary at each step t.

Algorithm Implementation and Evaluation
In this section, we describe further details concerning the implementation and evaluation of the spatial systematics algorithm presented in this paper.As noted above, this method has been developed to explore new algorithmic approaches to exoplanet transit detection.Our prime focus is to demonstrate the feasibility of the algorithm and to provide an initial assessment of its performance.Accordingly, this work is not designed to support or claim optimality of this method over existing specialized detrending approaches.
The spatial systematics algorithm was evaluated numerically on a selected subset of Kepler test data using both injection tests with time-variable signals (Section 2.5.4) and fully-simulated light curves in the same Kepler coordinates (Section 2.5.5).Further, a high-level comparison between the spatial systematics algorithm and several standard detrending methods was performed over the same Kepler test data (Section 2.5.6).The Kepler test data are described in Section 2.5.1.
The data analysis framework used for the numerical evaluation tests is depicted in Figure 2. The free parameters to be chosen, shown in our core Algorithm 1 include the initial value of C 0 , the model rank K, the total variation norm parameter p, the gradient step size α, and the spatial weighting matrix W. The analysis framework and initial values are discussed in further detail below.
The data analysis was performed in Python and using a single CPU core.The computational time complexity of the spatial algorithm acting on a collection of light curves I = X × Y , is quadratic in |I| and linear in the number of gradient step iterations, and the model rank K.The leading order time-complexity of a gradient step in Algorithm 1 The difference matrices D x and D y are sparse in form and for either matrix approximately 2 |I| fraction of elements are non-zero.In our implementation, matrices D x and D y are generated and stored in compressed-sparse-row (CSR) format with Scipy sparse.

Kepler Test Data
The test data were selected from the Kepler long-cadence SAP light curves; these have an integration time of 29.4 min (Jenkins et al. 2010).The mapping of a stellar target to approximate sensor position repeats every four Kepler quarters (Aigrain et al. 2017) and accordingly to investigate the consistency of spatial trends over several quarters, we selected quarters (Q hereinafter) Q6, Q10, and Q14 separated at this cadence.Quarter Q6 was chosen to match Petigura & Marcy (2012) and allow inter-comparison with their prior work on spatial correlation; the decision to include successor quarters (Q10, Q14) rather than preceding quarters (Q2) was also made due to calibration issues during early Kepler operations (Van Cleve 2010).Stellar targets were selected within a range of Kepler magnitude Kp ∈ [12, 13] and with a combined differential photometric precision over 12 hours CDPP 12h ≤ 40 (Christiansen et al. 2012) 5 , resulting in 6179, 6286 and 6049 light curves for Q6, Q10 and Q14 respectively.These target selection criteria were informed by Petigura & Marcy (2012), however we do not believe however that our numerical algorithm evaluation is highly sensitive to these exact parameter choices.The narrow range in photometric magnitude was used to isolate the known dependence of systematics on magnitude (Stumpe et al. 2014) and to allow a focus on spatial systematics in our algorithm.
Pre-processing: -As noted in Section 2.2 the assumption of Gaussian statistical noise N is an idealization and preprocessing is necessary to address several data conditioning issues, as described here.First, any missing data values, and four visually-identified outlier cadences, were substituted using linear interpolation between the preceding and following data samples spanning each gap.Median normalization was then applied to each light curve as ŷ = y/med(y) − 1, where med(y) is the median of y, and the subscript on light curve index i (Equation 1) is dropped for clarity.Here, each pre-processing step is sequential with generic input y and output ŷ.We then removed a first-order linear trend from each light curve as systematics are known to depend on timescale (Stumpe et al. 2014).This is evident in Figure 3, where there is no clear correlation between short-and long-timescale effects.Removing the first-order trend allows a focus on spatial systematics in our algorithm.While it is possible to apply our algorithm to each timescale bandwidth separately in a decoupled manner that generalization is beyond the scope of the current paper.After linear detrending the first-order difference variance σ 2 z = var({z : y n − y n−1 ∀ n}), variance σ 2 = var(y) were computed for each light curve, where n ∈ N here denotes time sample index in an individual light curve.The light curves were then filtered to retain only those that are in the lowest 90% in both σ 2 z and σ 2 , ranked separately.In a second filtering step, light curves are removed if they do not satisfy a minimum correlation requirement.Each light curve y i must have a correlation coefficient ρ greater than 0.6 with at least 10 other light curves y j : j ̸ = i, where the correlation coefficient is calculated as ρ(y i , y j ) = y T i yj ∥yi∥∥yj ∥ .To identify outlier samples the light curves were then coarsely detrended using exploratory PCA6 (Appendix B) and all samples exceeding a 3 − σ point-to-point scatter were then flagged.This exploratory PCA detrending was not retained however; it was only used to flag and linearly interpolate outlier samples in the input data at this point in the pre-preprocessing.These initial pre-processing and filtering steps resulted in selecting 4749/6179 (76 %), 4850/6286 (77 %), and 4741/6049 (78 %) of light curves for Q6, Q10, and Q14 respectively.
The final step in pre-processing concerned mapping the data to a regular spatial grid on the sensor.The Kepler sensor comprises 25 modules, of which 21 contain two CCDs across four output channels (Van Cleve & Caldwell 2016).Each CCD is 2200 x 1024 pixels in size and each module contains a total of 2200 x 2048 CCD pixels (Van Cleve & Caldwell 2016).The pixel position of each target light curve on a CCD was mapped to a rescaled square global pixel coordinate grid across the sensor of size 11000×11000 for ease of analysis.In this process each CCD row coordinate was rescaled by a factor 1100/1024 and gaps between CCDs were removed; this resulted in convenient module coordinates of size 2200 × 2200.The total variation constraint depends only on light curve adjacency, so this rescaling has no impact on our algorithm.
The positions of the pre-processed test data light curves on the Kepler sensor are shown in blue in Figure 4.This Figure uses global pixel coordinates and is further labeled by Kepler module number.The difference operators D x and D y are easier to implement on a rectangular data grid and light curves on modules (2,3,4,22,23,24) were discarded, with light curves on a total of 15 modules retained.A rectangular grid is not an algorithmic requirement, however.As described in Section 2.3 our implementation of the total variation constraint requires a one-to-one mapping between light curve i and individual cells in a regular spatial grid.We adopted a spatial cell size of 220 × 220 pixels for this gridding, comprising (30 × 50) cells across the remaining modules with (10 × 10) cells per module; this grid is shown in Figure 4.The light curve mapped to each cell was selected randomly from the pre-processed light curves falling within each cell.Where a cell contained no light curves, the light curve closest to the center of the cell was selected; this occurred for 12% of 30 × 50 targets.The 1500 gridded light curves are shown in red in Figure 4.
Spatial structure on Kepler sensor: -The pairwise neighbor correlation of the gridded light curve sample across the sensor was measured to inform our choice of algorithm parameter W (Equation 7).The correlation between vectors p and q is denoted ρ(p, q) = p T q ∥p∥∥q∥ .We denote the set of gridded cell positions within each module as x ∈ X M and y ∈ Y M respectively.The set of gridded light curves within a module is then the Cartesian set product M = X M × Y M .The mean pairwise correlation ρM between all gridded light curves (y m , y n ) in a module was computed as: The mean pairwise neighboring correlation within a module ρM XY was calculated as: The spatial correlation structure across the gridded (30 × 50) region of the Kepler sensor is shown in Figure 5.The mean pairwise correlation ρM for each module is tabulated in black in this Figure , and the difference ρM XY − ρM is shown in red.The mean pairwise neighbor correlation ρM XY is generally higher than the mean pairwise correlation ρM across the module as a whole.The mean pairwise correlation for all gridded light curves over all modules was computed to be ρ = 0.26 and is notably lower than the per-module correlations.Moreno et al. (2021) describe how time-dependent systematics, lagged with radius, may plausibly give rise to spatial correlations as observed in Figure 5.We further explored the spatial structure on the Kepler sensor by performing an exploratory PCA decomposition of the gridded light curves for Q6, Q10, and Q14.The resulting leading PCA basis vector v 1 for each quarter is shown in Figure 6 along with a color map of the leading coefficient value c 1 i in each cell.We note the implicit mapping of light curve index i to cell position i → (x, y) ∈ (X, Y ) defined in Section 2.3.The leading systematics basis vector is informative of general systematics as it is the strongest term.As shown in Figure 6 the leading coefficient values are highly spatially correlated, with a blocked structure per module, with anti-correlations present between modules.This effect is explained by Petigura & Marcy (2012) as due to PSF variation from the momentum cycle.The overall spatial structure is also persistent across these quarters.
As discussed in Section 1 and in Moreno et al. (2021), Petigura &Marcy (2012), andLund et al. (2021), the systematics exhibit spatial dependence that are key to our choice of a total variation constraint.

Algorithm Initial Parameter Values
As described above, there are several free parameters to be chosen in our method, including the choice of initialization C 0 , the model rank K, the choice of prior norm p, the gradient step size α and the prior weighting matrix W.
The weighting matrix W was formed as neighboring pairwise correlations of gridded light curves y i scaled by a factor 0.2 as w x,y = 0.2 2 [ρ(y x,y , y x+1,y )+ρ(y x,y , y x,y+1 )].A comprehensive parameter optimization was not performed, however heuristic evaluation for these data (Q6, Q10 and Q14) showed slightly improved convergence for this choice of W compared to uniform weighting.
The exact model rank is unknown and for these Kepler test data, a model rank of K = 20 was chosen based on the singular value distribution of the light curve sample.This singular value distribution was consistent with that shown in Smith et al. (2012) where a Kepler CBV rank of K = 16 was adopted.
The initial C 0 was obtained from PCA applied to Y.The choice of p in the total variation constraint determines the degree of smoothness in the spatial correlation and the tolerance of discontinuities (Appendix C).In Appendix D the choice of p is interpreted in a Bayesian framework, where for p = 1 the spatial constraint is equivalent to a Laplacian prior on the difference of coefficients, and for p = 2 the spatial constraint is equivalent to a Gaussian on the difference of coefficients.Referring to Figure 6, PCA fitted coefficients representative of the underlying systematic structure show a mostly spatially uniform structure with discontinuities at module edges.We adopted a value p = 1.1, chosen empirically as the observed coefficients are closer to Laplacian in form.
Gradient step size α t at iteration t was set using a backtracing line search (Armijo 1966) starting from α t = 0.1.This was found to allow the minimization to progress adequately during initial iterations but with fine-tuning in later iterations, thereby saving computation time.Our stopping condition was based on the difference in our cost function in Equation 10 between successive iterates falling to a negligible level ∥w(C t+1 ) − w(C t )∥ ≤ 10 −5 .

Simulated Astrophysical Signals
We describe here the functional form of the simulated transient and variable signals used in the injection tests and full simulations described below.We denote an individual simulated astrophysical signal in light curve form as vector a defined on n ∈ {1, .., N } and relative to normalized light curve y : ∥y∥ = 1.The functional form of the simulated signals was chosen to be either a sinusoid (a s ), a periodic exoplanet transit (a t ) or a transient flare (a f ). Figure 7 shows a simulated example signal of each type.These cover a representative range of timescales.The simulated signals were randomly generated relative to light curve y as follows: • Sine wave: a s [n] = A sin(2πβn).
The amplitude A was drawn from the uniform distribution U (0.005∥y∥, 0.015∥y∥).The angular frequency β was drawn from U 4 N , 8 N .• Simulated exoplanet transit signal: a t [n].A transit signal is simulated as a periodic repeating transit profile b[n], with randomized parameters: orbital period P , epoch n 0 , duration d, and transit depth δ.The simulated signals vary in strength but are generally not weaker than the sample variance of the first-order difference across each light curve (Section 2.5.1.0),and not greater than the magnitude of systematics.Each injected signal has on average 30% of the energy of the systematics term ∥as∥ ∥y∥ = 0.3.

Experiment A: Injected Transient and Variable Signals
In this experiment (A) we numerically evaluated the performance of the spatial systematics algorithm in the recovery of simulated signals (Section 2.5.3)injected into the gridded Kepler test data (Figure 2), here selected for quarter Q10.We denote the pre-processed gridded Kepler test data here as matrix Y (Section 2.3).We simulated nine astrophysical signals, with random parameters drawn as described in Section 2.5.3, divided equally into three each of the types: sinusoid, flare, and exoplanet transit.Each of the nine individual signals was randomly assigned to a separate randomly-selected light curve in the test data in a one-to-one mapping.Apriori the Kepler test data contain unknown astrophysical variability.Therefore choosing a small number of injected signals matches this realistic incidence and avoids significantly biasing the sample.We represent the injected signals as matrix A, with the same shape as Y, and where A is null except for nine randomly-selected columns containing the simulated injection signals.
The post-injection data are denoted: Y → Y + A.
A least-squares PCA solution was obtained using the data model Y = V P CA C P CA + N (Section 2.3), yielding detrended light curves using PCA: Y ′ P CA = Y − V P CA C P CA .The post-injection Kepler test data Y were then detrended using the spatial systematics algorithm (Algorithm 1) using C P CA to initialize C and yielding detrended light curves Y ′ .For both methods, we adopt a model rank K = 20.This model rank is empirically sufficient to represent the systematics and is smaller than both the number of light curves 1500 and the length N ∼ 4000 of each light curve.
This experiment was performed over an ensemble of ten instances of the random parameters defining the injected signals (Section 2.5.3) and random light curve assignment from Y.The performance of both the spatial systematics and PCA algorithm was evaluated in terms of the estimated level of residual systematics after detrending and in terms of their recovery of injected signals, as described in further detail below.

Experiment B: Fully-Simulated Light Curves
In this experiment (B) we fully simulated the light curves using only the coordinate framework of the Kepler test data.No simple generative model exists to fully simulate light curves due to the complex origin of residual systematics across the sensor, as discussed in Section 1. Accordingly, we adopted a simple low-rank linear model for the systematics L = VC with rank K = 4.The basis vectors V were selected at approximately equal spacing as index set (1,5,11,16) from the 16 Kepler cotrending basis vectors (CBV) published for Q10 and channel 30 (Stumpe et al. 2012).The coefficients C were simulated with a random mix of discontinuities and smooth features, broadly comparable to the spatial structure found for the Kepler sensor test data but not constrained to be an exact match, including module gridding (Figure 6).We chose this spatial structure to be representative but distinct from the Kepler test data.The basis vectors V and coefficients C used in the full simulations are shown in Figure 8. Median normalization was applied to each simulated systematic term as li = l i /med(l i ) − 1, where med(l i ) is the median of l i for i ∈ I.We note that a PCA decomposition of the simulated systematics will not be identical.The simulated basis vectors and coefficients are rank K = 4, however, the simulated systematics are rank K s = 5.An extra basis term is introduced in a small number of light curves due to the median normalization of the simulated systematics.The corresponding basis vector is a constant offset.We simulated 1500 light curves in the 30 × 50 cell region, chosen to mirror the Kepler sensor region and discretization used in Section 2.5.1.0.A total of 300 astrophysical signals were simulated and are represented as matrix A as introduced in Section 2.5.4.Equal proportions of sinusoid, flare, and exoplanet transit signal were simulated with random signal parameters drawn as described in Section 2.5.3.Each simulated signal was injected into a single randomly-selected light curve in a one-to-one mapping.The light curves Y (Section 2.3) were simulated as Y = A + VC + N, where N is iid Gaussian noise [N] n,i ∼ N (0, σ 2 ) where σ 2 = 1 4|I| i∈I var(l i ).The simulations were performed over one run as no ensemble was necessary given the sufficient sample size of simulated and injected astrophysical signals.
As in Experiment A (Section 2.5.4), both the PCA method and the spatial systematics algorithm were used to detrend the data Y using model rank K = 8, yielding Y ′ P CA and Y ′ respectively; as before the PCA coefficients were used to initialize the spatial systematics algorithm (Algorithm 1).This experiment deliberately simulates a high level of overfitting as the fitted model rank exceeds the rank of the simulated systematics.The performance of the algorithms was evaluated in terms of the accuracy with which the known systematics and astrophysical signals were recovered and the level of residual noise in the detrended light curves, as described in further detail below.

Experiment C: Comparison with Standard Detrending Methods
We performed a high-level comparison against several standard detrending methods including the Kepler cotrending basis vector method (CBV, Stumpe et al. 2012;Smith et al. 2012), self-flat-fielding (SFF, Vanderburg & Johnson 2014b), and pixel-level-decorrelation (PLD, Deming et al. 2015;Luger et al. 2016;Luger et al. 2018b), all as implemented in the Lightkurve package (Lightkurve Collaboration et al. 2018).We caution that this is a high-level comparison only and not intended to assess the optimality of any method.For this comparison, the gridded Kepler targets (Figure 2) selected for Q10 were used.The spatial systematics method was applied to these data with rank K = 20 after pre-processing (Section 2.5.1.0;Figure 2) and with no injection of astrophysical signals.However for the Lightkurve methods, the data for this target list were downloaded and processed inside that package as is customary.In general, for all Lightkurve methods we used the recommended parameters in the Lightkurve tutorial7 .Specifically for the CBV and SFF detrending methods SAP pre-processing was performed using the remove nans and remove outliers functions.The PLD detrending method uses pixel-level data and no additional pre-processing was applied.The Lightkurve module masks bad quality-flagged cadences and removed ∼ 100 cadences per light curve for these test data.We did not remove these cadences in our processing using the spatial systematics method (Figure 2).However, in our calculation of performance metrics we use the intersection of non-masked cadences among Lightkurve light curves.This excludes a further ∼ 100 cadences per light curve.For the Lightkurve CBV detrending we used all single-scale CBVs available K = 16.For Lightkurve SFF detrending we specified a window of width 401 cadences for pre-processing; this removes long-term variability and flattens each light curve.The Lightkurve SFF detrending was performed using 20 such windows.No other Lightkurve input parameters were specified.The broad relative performance of the spatial systematics algorithm against the comparison Lightkurve methods was evaluated in terms of the estimated level of residual systematics after detrending (Section 3.1).

RESULTS
In this section, we describe the results of experiments A, B and C.

Performance Metrics
The metrics used to numerically evaluate algorithm performance fall into the following two broad categories.
Residual Noise in Detrended Light Curves: -This section concerns metrics used to assess the level of residual systematic noise in detrended light curves.The combined differential photometric precision (CDPP) over 6 hr is used in the Kepler pipeline and serves as an estimate of the level of white noise remaining after detrending on a transit timescale (Christiansen et al. 2012).
Following Aigrain et al. (2016Aigrain et al. ( , 2017) we used the procedure described in Gilliland et al. (2011) to compute an approximate measure of the CDPP in our detrended light curves.A 2D quadratic Savitsky-Golay filter was applied to the detrended light curves to remove low-frequency variability and the data were then averaged over 6-h bins.The approximate CDPP was then computed as the standard deviation of the binned time series and scaled by a value 1.168.The median CDPP across the set of detrended light curves is reported.
The second metric for residual systematic noise was adopted from the goodness metric defined by Stumpe et al. (2012) and is denoted here as G y ′ .This metric is formed as the average absolute cubed pairwise correlation between detrended light curves as Lower values of G y ′ suggest a lower residual systematic noise in detrended light curves.The cube down weights correlations which may be spuriously low if all residual astrophysical variability is removed and higher correlations make a more significant contribution to this metric.
Cross-Correlation against Known Signals and Systematics: -In this section, we consider metrics used to measure the degree to which known astrophysical signals or systematics are recovered in the injection tests or full simulations.
The injection matrix A defined above is a null matrix with a subset of columns containing simulated vector light curves that are either sinusoids (a s ), flares (a f ), or exoplanet transit signals (a t ), as defined in Section 2.5.3.Each simulated light curve is mapped to a random light curve in a one-to-one mapping.We denote the mean correlation of all simulated astrophysical signals of type a [m] , m ∈ {s, t, f } against their associated detrended light curves y ′ [m] and averaged over all simulation runs as ρ(a [m] , y ′ [m] ).This is a measure of the degree to which the simulated or injected astrophysical signal has been recovered from the detrended light curve.When averaged across all signal types this is denoted ρ(a, y ′ a ).We denote the mean absolute correlation of all simulated astrophysical signals of type a [m] against the set of all detrended light curves where no astrophysical signal was injected y ′ a c and averaged over all simulation runs, as |ρ(a [m] , y ′ a c )|, m ∈ {s, t, f }.The absolute value is used as the simulated signal may be anti-correlated with the estimated systematics.This correlation measures whether a detrending algorithm is corrupting light curves that do not contain an injected astrophysical signal (y ′ a c ).The unprocessed light curves before detrending (y a c ) may however be incidentally correlated with an injected astrophysical signal and therefore we normalize by the mean absolute correlation of the non-detrended light curves Values below unity indicate that the detrending algorithm is not introducing spurious correlation with the injected signals beyond that present incidentally in the non-detrended light curves.
In Experiment B, the constituent systematic light curve vectors l i , (i ∈ I) comprising matrix L are also known, in addition to the injected astrophysical signals a [m] , m ∈ {s, t, f }.We denote the estimated systematics vector for light curve i as li .The mean correlation of the simulated l i and estimated systematics vectors li averaged over all light curves is denoted ρ(l, l).If this metric is restricted to light curves where an astrophysical signal a [m] , m ∈ {s, f, t} was injected, it is denoted ρ(l a , la ).

Experiment A
Experiment A, as described in Section 2.5.4,was performed to numerically evaluate the spatial systematics algorithm in terms of its ability to recover transient and variable signals injected into Kepler test data.A standard PCA decomposition method was used as a comparison.The convergence of the spatial systematics algorithm for five representative runs of this experiment is shown in Figure 9.This figure plots the mean correlation ρ(a [m] , y ′ [m] ) (over light curve) of the injected astrophysical signal types m ∈ {s, t, f } as a function of spatial systematics iteration number.This mean correlation is a proxy for the degree of recovery of injected astrophysical signals, as described in Section 3.1.
Figure 9 shows general convergence of the spatial systematics algorithm within ∼ 30 iterations.The figure shows signal recovery performance comparable to or exceeding the reference PCA method in the majority of cases but not without exception for individual flare and transit runs.[m] ) for Experiment A for sine, flare, and transit signal types m ∈ {s, t, f } (across columns) for each of five separate simulation runs (across rows).This mean correlation, defined in the main text, is plotted for the spatial systematics algorithm (blue) and the PCA decomposition method (red).The mean correlation is scaled by the maximum value achieved by either method; this maximum is shown in the legend of each subplot.The PCA algorithm is non-iterative and is depicted as a straight line accordingly.The x-axis label is the iteration number of the spatial systematics algorithm.The spatial systematics algorithm is generally convergent and typically has comparable or improved performance relative to PCA but this is not universal for all runs.
The performance of the spatial systematics algorithm is shown in more detail for a single representative run in Figures 10-12.Figure 10 shows the estimated basis vectors v k obtained with the spatial systematics and reference PCA algorithms.The estimated basis vectors v k are generally similar but differences increase at higher-order k.
Figure 11 shows the fitted coefficients c k i , k ∈ {1, .., 5} (K = 20) obtained by the spatial systematics and PCA algorithms across the gridded light curves i → (x, y) ∈ (X, Y ).This figure shows that the total variation constraint in the spatial systematics algorithm enforces smoothness in the coefficients while preserving discontinuities.In Figure 12 example detrended light curves are shown over injected astrophysical signal type.In this example, the spatial systematics algorithm has preserved the injected astrophysical variability to a greater degree than the reference PCA method.However, the spatial systematics algorithm has a higher residual scatter in the detrended light curves than PCA, as is visible near cadence 3000 for the light curve injected with a sine signal.
The summary noise metrics (Section 3.1) for Experiment A for the final iteration of the spatial systematics algorithm and averaged over all runs in the ensemble are listed in Table 1.This table shows that for these simulations the spatial systematics algorithm generally outperforms the reference PCA method with caveats discussed in further detail in Section 4.

Experiment B
Experiment B (Section 2.5.5) was designed to numerically evaluate the spatial systematics algorithm using fullysimulated data generated using only the Kepler test data coordinates.The algorithm was evaluated in terms of its ability to recover the known astrophysical signals and systematics used in simulating the light curves.
The simulated systematics are rank K s = 5 but were estimated with rank K = 8 as described above.The estimated and simulated basis vectors v k , k ∈ {1, .., 8} for Experiment B are shown in Figure 13.The first four estimated basis vectors are identical between the spatial systematics and PCA algorithms.Higher-order basis vectors deviate between the two methods and are also poorly constrained.However, fitted coefficients for these basis terms (K > 4) are small.For the spatial systematics algorithm these fitted coefficients contribute ∼ 4% in total magnitude ( K k=5 ∥c k ∥/ K k=1 ∥c k ∥), while for PCA the contribution is ∼ 11%. Figure 14 shows the true simulated coefficients and those estimated using the PCA and spatial systematics algorithms for the first 5/8 basis terms.The coefficients estimated using PCA have a more speckled spatial sub-structure than the smoother distribution obtained by the spatial systematics algorithm.
Example individual detrended light curves for sine, transit, and flare simulated signal types are shown in Figure 15.Summary results for Experiment B, in terms of the metrics described in Section 3.1, are provided in Table 2.

Experiment C
Experiment C was designed to provide a high-level comparison of the spatial systematics algorithm against the standard CBV, SFF, and PLD methods (Section 2.5.6).The performance metrics G y ′ and CDPP 6h for the residual     noise in the detrended light curves for this experiment are shown in Table 3.A composite plot of CDPP 6h versus G y ′ across all methods in Experiment C is shown in Figure 16.These results show that the CDPP 6h metric is broadly comparable across all methods while the G y ′ metric has a greater dependence on detrending method.As noted in Section 2.5.6, the data used in this comparison have undergone different pre-processing, and furthermore detrending methods have not been optimized in each specified case.The purpose of this comparison is to provide a high-level reference comparison and not to show the general optimality of any single method.The CDPP 6h metric is broadly comparable across method while the G y ′ metric is more strongly dependent on detrending method.

DISCUSSION
The numerical evaluation of the spatial systematics algorithm presented here, using injected signals in Kepler data (Experiment A) and full simulations (Experiment B), shows that in these cases the algorithm matches or outperforms the reference PCA method for astrophysical signal recovery and achieves comparable performance for systematics removal.This is indicated by higher values of ρ(a, y ′ a ) and ρ(l a , la ) respectively in Tables 1 and 2. The improved recovery of injected astrophysical signals by the spatial systematics algorithm is direct evidence that the algorithm is less prone to erroneously absorbing true astrophysical variability in the systematics (overfitting) than the reference PCA method.Astrophysical variability and systematics are not separable a priori and overfitting is thus a foundational concern in the detrending of exoplanet transit light curves (Stumpe et al. 2012;Smith et al. 2018).Overfitting may remove true astrophysical variability from detrended light curves and also introduce spurious variability into other detrended light curves in the sample as these corrupted systematics are applied.A number of detrending approaches infer a basis representative of systematic effects from a correlated set of light curves Petigura & Marcy (2012); Foreman-Mackey et al. (2015) and Kepler PDC-MAP Smith et al. (2012); Stumpe et al. (2012).The set of light curves from which a set of basis vectors is constructed can be robustly filtered to exclude outliers and those with clear astrophysical variability as a means to mitigate overfitting, as in PDC-MAP (Stumpe et al. 2012).A basis consisting only of systematics (which is difficult to realize) can however still be overfitted to astrophysical features (Smith et al. 2018), as it is unlikely that a basis is completely orthogonal to all possible astrophysical signals.Spatial dependence among light curves has been identified by Petigura & Marcy (2012) and Moreno et al. (2021), whereby the latter work explains spatial correlations as time-delayed systematic effects traversing the sensor.The modified total variation prior applies a spatial correlation constraint across the sensor which is well-suited for inference of local effects of this form.Functionally, the spatial systematics model depends on systematic coefficients alone, this model is differentiable, and derived gradients are computationally tractable.The algorithm iteratively refines both the fitted coefficients and the overall systematics model.These algorithm properties help to separate the systematics and astrophysical variability and thereby mitigate overfitting.A position-based prior is used in PDC-MAP to reduce overfitting of cotrending basis vectors to a light curve.The spatial systematics algorithm is novel in that it uses a spatial constraint computed directly from systematic estimates, of total variation form and further, iteratively refines the overall systematics solution for a collection of light curves.We emphasize the comparative performance to standard PCA and while we perform a high-level comparison to specialized detrending methods (Experiment C), we do not claim optimality for this exploratory algorithm development.The reduced overfitting is most evident in Tables 1 and 2 for sine ρ(a s , y ′ s ) and flare ρ(a f , y ′ f ) signals and comparable for transit signals ρ(a t , y ′ t ).This is also visible in the sample detrended light curves shown in Figures 12 and 15.In summary, the longer-duration astrophysical signals (sine, flare) have reduced overfitting relative to the transit signals, which are of shorter duration.The shorter the duration of an astrophysical signal relative to the timescale of systematics present in Kepler light curves (Figure 10), the less linearly dependent on an inferred systematics basis and therefore the smaller the expected improvement over the reference PCA method.Notably, the spatial systematics algorithm performs well over all astrophysical signal characteristic timescales considered here.The spatial systematics algorithm contains no explicit model for transient or variable astrophysical signals, however, it performs well for the signal types considered here (Section 2.5.3).The consistent relative performance between signal types in Experiments A and B further supports the preceding proposed causal link between the signal and characteristic residual systematic timescales, rather than the exact functional form of the astrophysical signal.
Further evidence for the improved mitigation of overfitting by the spatial systematics algorithm over the reference PCA method is provided by Experiment B. As a full simulation, this experiment includes no unknown astrophysical variability beyond the simulated astrophysical signals.In Figure 14 the true simulated coefficients shown in the top row are spatially smooth but with realistic module discontinuities.In contrast, the coefficients in the second row estimated using the PCA method show speckling in smooth regions.These discrepant coefficient values correspond to cells containing light curves with simulated astrophysical signals that are overfitted by the PCA method.The coefficients estimated by the spatial systematics method shown in the bottom row of this figure generally have reduced speckling and reflect the true coefficients with greater fidelity.Figure 17 shows the cells where an astrophysical signal was injected into a simulated light curve, alongside fitted coefficients from PCA and the spatial systematics method.This figure supports the argument that PCA has overfit light curves containing astrophysical variability.However, we note that not all individual cells containing an injected astrophysical signals are overfit by PCA.A light curve and the associated detrending results corresponding to an overfit cell in Figure 17 are shown in Figure 18.The spatial systematics algorithm varies both the coefficients and basis vectors (Figure 13) in the iterative fit.Empirically, in Experiment B the spatial systematics algorithm estimates systematics l with a correlation ρ(l, l), against true simulated systematics l, that is comparable to the reference PCA method.The correlation ρ(l a , la ) against systematics for light curves containing a simulated astrophysical signal, and therefore a higher risk of overfitting, is slightly higher for the spatial systematics algorithm relative to the reference PCA method.This implies the spatial systematics algorithm shows reduced overfitting relative to the reference PCA method.We believe this is due to the spatial total variation constraint appropriately estimating spatially correlated systematics.i for light curve i → (x, y) ∈ (X, Y ) at each spatial cell (from Figure 14) for the reference PCA method and the spatial systematics method respectively.For these columns the magnitude of the coefficient is shown as color intensity (color wedge at right).A black outline around a cell indicates that the corresponding lightcurve contained an injected astrophysical signal.The rank 5 coefficient was chosen for clarity of interpretation as this coefficient should be closer to zero.Coefficients fitted with PCA exhibit some overfitting of injected astrophysical signals.
As a definitional consequence of the reduction in overfitting, the detrended light curves obtained using the spatial systematics algorithm will retain a higher degree of true astrophysical variability relative to those obtained using the reference PCA method in addition to a component for uncorrected systematics, in keeping with all detrending algorithms.We note that PCA is guaranteed to maximally flatten a collection of light curves as the PCA solution is equivalent to a minimal least-squares residual between fitted systematics and light curves (Equation 6).PCA implicitly assumes a white noise model, and is susceptible to overfitting of astrophysical signals (Candès et al. 2011) because astrophysical signals are not appropriately modeled as white noise Pont et al. (2006).In both experiments, the two algorithms obtained low goodness metric values G y ′ < 2% indicating largely successful systematics removal in the detrended light curves.The residual systematics metrics G y ′ and CDPP 6h are plotted for run #0 of Experiment A per light curve and over detrending algorithm in Figure 19 and in Figure 20. Figure 19 shows that the distribution of the goodness metric G y ′ is broadly larger than that for PCA.This metric (Section 3.1) is nonlinear and pair-wise, and therefore sensitive to those light curves with poorly-modeled systematics (Figure 20).For example, individual light curves may be poorly modeled by the spatial systematics algorithm if their systematics exhibit strong temporal variation insufficiently captured by the fixed spatial model.It is also possible that the spatial systematics algorithm retains a greater degree of astrophysical variability that biases G y ′ through incidental correlation.We lack sufficient evidence to state either claim strongly here.Future generalizations of the constraint terms are described below as potential future work.
In Experiment B, the goodness metric value G y ′ was lower for the spatial method than the reference PCA method.For both experiments (Table 1 and 2) the two algorithms obtained very similar CDPP 6h metrics and a reduction from the non-detrended preprocessed light curves, indicating that both methods were equivalently successful at suppressing transit time-scale systematics.
In Experiment A the measures of residual systematics G y ′ and CDPP 6h are higher for the spatial systematics method compared to the PCA method.As noted above, these values for the detrended light curves may be elevated primarily due to improved preservation of true astrophysical variability or alternatively due to unmodeled residual systematics.In Figure 12 the example detrended light curve from Experiment A obtained by the spatial systematics algorithm in the first column shows an example of large residual systematic scatter not present in the PCA detrended light curve; however the overall variability is dominated in this case by the injected astrophysical signal.We argue, that although the spatial method may retain a greater level of residual systematics, comparably the major component of retained light curve variability is astrophysical.As discussed above, Experiment B demonstrated a more accurate estimation both of systematics l and of astrophysical signals a in light curves containing astrophysical signals.Experiment B, although simulated with a simplified data model, therefore aids the interpretation of metric performance.We note also that the improved preservation of true astrophysical variability in the light curves detrended by the spatial systematics algorithm may reduce the correlations ρ(a [m] , y ′ [m] ), m ∈ {s, f, t} in Tables 1 and 2 discussed above and that these may be underestimated as a result.
In Experiment A, both the spatial systematics algorithm and the reference PCA method produce values |ρ(a Therefore, as discussed in Section 3.1, this suggests neither algorithm has increased the average incidental correlation between the simulated astrophysical signals and the light curves in which no astrophysical signals were injected (the non-injected subsample).Equivalently, neither method has introduced spurious astrophysical content into this non-injected subsample of light curves.However, the ratio , y a c )| is higher for the spatial systematics algorithm compared to the PCA method (Tables 1  and 2) particularly for sine and flare m ∈ {s, f } astrophysical signals, which have longer duration.In Figure 21 we plot the light curves in the non-injected subsample for which the spatial systematics algorithm produces the highest incidental correlation max y a c |ρ(a [m] , y a c )|, m ∈ {s, t, f } between the non-injected spatial detrended light curve and any injected astrophysical signal used in the simulation run.This figure shows the worst cases of incidental correlation; in general the incidental correlation average is low.As described above, the spatial systematics algorithm is more successful at preserving astrophysical variability in detrended light curves and will therefore retain rare incidentally correlated signals, particularly if their timescale matches the simulated astrophysical signal.We believe this explains the higher values of this ratio the case of sine and flare signal types.
Section 3.4 describes the results of Experiment C, a high-level comparison between the spatial systematics algorithm and the standard detrending methods CBV, SFF, and PLD (Lightkurve Collaboration et al. 2018).As noted above, the default recommended parameters were used for each method as opposed to a customized optimization.In addition, individual methods apply different default pre-processing steps.Accordingly, this high-level comparison cannot address the optimality of different detrending methods.
In this experiment the CDPP 6h residual systematics metric, which is computed per light curve, shows broadly comparable (≤ 8 ppm between mean values) detrending performance between the spatial systematics algorithm and the reference detrending methods CBV, SFF, and PLD (Table 3; Figure 16).The spatial systematics method has a slightly increased mean CDPP 6h compared to other methods possibly due to instances of residual scatter where the spatial model imperfectly corrects strong temporal systematic variation.The G y ′ residual systematics metric differs in distribution across detrending method (Figure 16).By definition (Section 3.1) and as discussed earlier, this metric is nonlinear and pair-wise; accordingly, light curves with unmodeled systematics are more heavily weighted.The PLD method has the highest mean value of G y ′ (Figure 16) but this method retains long-term variability, which is fit by a spline term in the data model, and therefore is expected to have an inherently higher G y ′ distribution.In contrast, in pre-processing, SFF light curves are flattened using a Savitsky-Golay filter (Lightkurve Collaboration et al. 2018), and SFF is therefore broadly comparable to the CBV method in this metric.The spatial systematics algorithm has a higher mean value of G y ′ and broader distribution than the CBV and SFF methods (Figure 16).As discussed earlier for Experiments A and B, this may be due to the spatial systematics method retaining a higher level of astrophysical variability or due to the nonlinear contribution to G y ′ of light curves that do not fit the spatial systematics data model assumptions.Generalizations to make the constraint terms more complete are discussed below as potential future work.
Our current work has several limitations that would benefit from exploration in future work.As described in Section 2.5.2, we chose tunable algorithm parameters, including model rank, convergence criteria, optimization step size, weighting matrix, and initial coefficient values, amongst other parameters, based on feasibility and empirical evaluation Both plots are per target light curve for a single run #0 of Experiment A. Empirical distribution functions for each light curve metric are plotted over the spatial systematics method (horizontal axis) and the PCA method (vertical axis).Although correlated, there is significant scatter in the left plot due to the intrinsic nonlinear and pair-wise nature of the metric G y ′ .It is accordingly more sensitive to light curves with poorly-modeled systematics or greater retained astrophysical variability.
alone.It would be beneficial to sample this algorithm parameter space over a broader range, for completeness.In conjunction, our numerical evaluations relied on an associated set of injected astrophysical signal types (Section 2.5.3), which could be expanded.Generalizing the sensor spatial discretization to non-uniform layouts is left for future work.It has been noted that light curves at small spatial separations may suffer from blending of astrophysical signals (Kovács et al. 2005b;Hattori et al. 2022) requiring future consideration when using the spatial systematics algorithm on densely-populated fields.Although some light curves in our data sample have small spatial separation, this is not a major concern in the current work, as the cell separation on the spatial grid used significantly exceeds the PSF width.The sparse population of the field is likely due to the narrow (and bright) magnitude range selected.There is scope to generalize the constraint term to incorporate non-neighbouring pixel spatial correlations or weightings of the spatial prior to exclude targets within a pixel separation where astrophysical blending could occur.In addition, the constraint terms may be generalized to include the known magnitude correlation (Stumpe et al. 2012;Smith et al. 2012) and other parametric dependencies (Moreno et al. 2021).As noted earlier, the Kepler test data were selected in a narrow magnitude range to isolate the effect of magnitude dependence in the current work.

CONCLUSIONS
In this paper we present an exploratory algorithm for detrending light curves obtained in wide-field exoplanet transit surveys.This spatial systematics algorithm fits a low-rank linear systematics model to a collection of light curves while also including a total variation spatial constraint across the sensor at a foundational level.The resulting objective function is reduced using variable elimination (Golub & Pereyra 1973) which also stabilizes the solution (Golub & Pereyra 2003;Shearer & Gilbert 2013).An approximate closed-form gradient was developed for minimization by gradient descent; this particular formulation is a modification of total variation optimization approaches (Vogel 2002).The spatial systematics algorithm was numerically evaluated relative to a reference PCA method using both injection tests with Kepler data (Experiment A) and full simulations including simulated systematics, astrophysical signals, and statistical noise within the same Kepler coordinate framework (Experiment B).
The principal conclusions of the paper are: In these worst cases it can be seen that the high level of correlation of the spatial detrended lightcurve y ′ with an astrophysical signal a is incidental, as the original lightcurve y itself is incidentally correlated with a.This suggests that high values for the metric of corruption |ρ(a, y ′ a c )|/|ρ(a, yac )| may result incidentally for the spatial systematics algorithm because it retains a higher level of astrophysical variability.
1.The spatial systematics algorithm showed reduced overfitting of instrinsic astrophysical variability relative to the PCA method in the two numerical evaluation studies.The reduced overfitting was demonstrated by comparable or significantly higher correlation between the known astrophysical signals and the detrended light curves in both experiments.In Experiment B the known simulated systematics were comparably or more accurately estimated relative to the PCA method, particularly for light curves containing injected signals.We argue that the reduced overfitting likely arises from the physically-realistic total variation constraint.In addition, the algorithm simulataneously varies both the basis vectors and their coefficient weights.Both factors likely contribute to the improved separation of systematics and astrophysical signals.
2. A marked reduction in overfitting relative to PCA was found for slowly-varying astrophysical signals while comparable performance was achieved for shorter exoplanet transit signals.We argue that the longer-duration astrophysical signals overlap more significantly with the timescale of residual systematics in the Kepler data and therefore allow the algorithm to achieve a clearer separation of systematics for these signals.
3. In terms of estimates of residual systematics in the detrended light curves, both methods achieved comparable CDPP 6h values in Experiment A suggesting largely equivalent success in suppressing transit timescale systematic noise.However, the goodness metric G y ′ was comparatively larger in this experiment for the spatial systematics algorithm relative to PCA indicative of residual systematics.
4. Neither method increased the incidental correlation between the simulated astrophysical signals and non-injected subsample of light curves in the numerical evaluations.Equivalently, neither method added spurious astrophysical variability into the detrended light curves.
The PCA solution can be obtained from the compact SVD solution, using V K and setting An overview of total variation is provided in the monograph by Vogel (2002) and in Karl (2005, Chapter 3.6).In the continuous limit, a Lebesque-integrable (Edwards 1994) function f (x, y) ∈ L 1 with vector gradient ∇f = ∂f ∂x , ∂f ∂y has total variation: where ∥.∥ 2 denotes an L 2 vector norm.Functions of bounded total variation have T V (f ) ≤ ∞.The total variation measure T V (f ) can be used as a regularization or penalty constraint on f to enforce spatial uniformity while preserving limited function discontinuities (Vogel 2002).
A discretized approximation of the total variation measure T V d (f ) for f sampled at points X × Y is given by: where (D x f ) i,j = f i,j − f i−1,j and (D y f ) i,j = f i,j − f i,j−1 are unweighted function differences.If matrix Df is constructed with vec(D x f ) T and vec(D y f ) T as the first and second rows respectively (where vec(.)denotes vectorization (Golub & Van Loan 2013)), then: This may be generalized to: ||Df || p 2,p : p ∈ [1, 2], where p controls the degree of spatial uniformity.C.1.Total Variation Measure for p = 2: We show here that minimizing the generalized total variation measure T V d (f ) = ||Df || p 2,p for the case p = 2 is equivalent to maximizing the correlation ρ ij between neighboring normalized light curve coefficients ci and cj (i ∈ I, j ∈ I, i, j ∈ n(I)), where n(I) is the set of neighboring light curve tuples.The maximum neighboring coefficient correlation ρ n takes the form: , an equivalent formulation is given by: The summation can be replaced by a linear difference operator D (here unweighted for simplicity) acting on columns of C.This difference operator was introduced above in the definition of the generalized total variation measure.Therefore: We note that even if the minimum is not achieved either objective should produce an equivalent result.

D. PROBABILISTIC VIEW OF THE OBJECTIVE
Although the spatial systematics algorithm was not derived within a Bayesian framework, a probabilistic interpretation of the objective function can be shown.An overview of Bayesian estimation and low rank models is provided in the monograph by Murphy (2020).We adopt the nomenclature of Section 2.3 here.In Bayesian MAP estimation the posterior probability is maximized: The maximum deviation of lT i lj from cT i cj can be bounded and understood in terms of V ∈ R N ×K .In the simplest case, if V is orthonormal then ∥Vc∥ = ∥c∥ (Gentle 2010), and: However V is not restricted to be orthonormal, as discussed in Section 2.3.In the general case, the worst case deviation between lT i lj and cT i cj can be understood in terms of the singular values of V; we follow the approach of Huber (2016).Denoting the rank K singular value decomposition as V = AΣB T (Appendix B), Σ is comprised of singular values of descending magnitude σ 1 ≥ σ 2 ... ≥ σ K ≥ 0 and where, without loss of generality, we adopt σ 1 = 1.Matrix A contains the left singular vectors {a k : k ∈ K} and matrix B the right singular vectors {b k : k ∈ K}.The systematics correlation lT i lj can be expanded as:  σmαi .Therefore the maximum differential rotation |ϕ − ψ| occurs when the ratio between singular values σm σn is greatest, i.e. when m = 1 and n = K.In this case, the transformed vector component along b 1 is unchanged (as σ 1 = 1) while the transformed component along orthogonal vector b K is scaled by the smallest relative value σ K ; this maximizes rotation.The rotation of a single coefficient vector c i under the linear transformation ΣB T is illustrated in Figure 22.
The maximum differential rotation |ϕ − ψ| depends on the relative position of the coefficient vectors c i and c j before linear transformation.In this 2D case we denote their bisector angle as θ, as depicted in Figure 23.By geometric inspection and by considering the algebra above for the 2D case, the maximal differential rotation |ϕ − ψ| occurs when θ = 0 and the coefficient vectors are bisected by b 1 .The minimal differential rotation occurs when θ = π 2 and the coefficient vectors are bisected by b K .For these values of θ and following the trigonometric approach by Huber (2016), the upper and lower limits for the correlation lT i lj of the normalized systematics vectors can be expressed the form: .This illustration shows an example coefficient vector ci = 3b1 + 2bK (blue), linearly transformed to vector ΣB T ci = 3b1 + σK 2bK (red), plotted on the 2D space spanned by b1 and bK (brown).The normalized transformed vector is also shown (gray).A value σK = 0.6 was adopted for this example (recall σ1 = 1).As the correlation cT i cj → 1 implies ϕ → 0, the lower and upper bounds smoothly converge to unity and lT i lj → 1 in this limit.This implies that the deviation between lT i lj and cT i cj is lower when the absolute value of the coefficient correlation is high.Furthermore the worst case deviation depends on the ratio of the smallest and largest singular values of V; when σ K = σ 1 , V is orthonormal and there is no deviation, also shown above.In Figure 24 the deviation bounds are shown as a function of the coefficient correlation for a hypothetical value σ K = 0.6 (σ 1 = 1).

F. SPATIAL SYSTEMATICS OPTIMIZATION METHOD
In this section we derive gradient descent steps for minimizing the objective in Equation 7, repeated here for convenience: As above, the least-squares penalty is denoted as f (V, C) = ||Y − VC|| 2 F and the total variation spatial constraint as g(C) = ||D C|| p 2,p .For clarity of presentation we use an unweighted difference operator D instead of D W (Equation 7).

F.1. Variable Projection
In this section we use variable projection to eliminate the dependence of the least-squares penalty on V.An overview of variable projection applied to separable non-linear least-squares problems is provided by Golub & Pereyra (2003).For completeness we note that alternating iterative minimization over V and C could also be used, however this would not necessarily be superior to variable projection used here.For simplicity, we work with a transposed version of the least-squares penalty: We denote as h(C) = arg min V ∥Y T − C T V T ∥ 2 F the value V that minimizes this penalty for a particular C. Here, and afterwards h(•) T denotes the transpose of h(•).The minimum V T is the minimum-norm least-squares solution, with respect to C T ∈ R I×K .The gradient ∇w(C T ) ∈ R I×K is the matrix of partial derivatives.We denote the gradient of the variable projection least-squares term as ∇f ′ (C T ) and the gradient of the total-variation constraint as ∇g(C T ).The total gradient is: where ∇w(C) T = ∇w(C T ).

F.2.1. Variable Projection Least-Squares Gradient
For clarity in computing ∇f ′ (C T ), we expand the variable-projection least-squares penalty as F where [Y] n is the n th row of Y ∈ R N ×I .Noting the Frobenius matrix norm may be expressed as a Frobenius matrix inner product, as ∥A∥ 2 F = ⟨A, A⟩, where the Frobenius matrix inner product is defined as ⟨A, B⟩ = tr(A T B) and tr denotes a trace (Petersen & Pedersen 2012).By application of the chain rule: The derivative of the projection matrix is shown due to Harville (2008) (Theorem 15.11.1): Substituting this result for dP ⊥ R(C T ) into F23: Noting that an orthogonal projection matrix P is symmetric P T = P and furthermore, idempotent PP = P (Golub & Van Loan 2013), the first term in Equation F26 may be further simplified.Consider It can be seen that the range space of (C † ) T is the range space of C T , therefore P ⊥ R(C T ) (C † ) T = 0 and C † P ⊥ R(C T ) = 0, so that the second term in Equation F26 is dropped: Here each index n maps to some (j, k), but the ordering does not affect the computation.
The L p norm is non-differentiable at zero for p ∈ [1, 2) and a differentiable approximation of the L p norm is used instead in the form of the Huber functional (Vogel 2002).The Huber functional avoids zero values by adding a very small positive value δ to every element.
We apply the Huber functional to arrive at the modified spatial constraint: where The derivative is decomposable over k and for simplicity we find the derivative with respect to each column of [ CT ] •,k . Using The final term in Equation F30 is the derivative due to the row-wise normalization of CT .For the case p = 2 an analytic solution can be computed without the use of the Huber functional approximation. [∇g

Figure 1 .
Figure 1.A visual representation of the coefficient matrix C. Each image layer ck represents the coefficient weights for a basis term k, ([ C] k ) organized by 2D sensor cell position.The mapping of light curve i to cell position i → (x, y) is implicit in this notation, as described in the main text.

Figure 2 .
Figure 2. Data analysis framework for numerical evaluation of the spatial systematics algorithm.The data flow includes input data (purple box), pre-processing (grey box; Section 2.5.1), and application of the spatial systematics algorithm (blue box; Sections 2.4 and 2.5).

Figure 3 .
Figure 3.A representative sample of 15 light curves from Q10 shown before (a) and after (b) linear detrending and outlier removal.The x-axis index is in the unit of long-cadence time samples (∆t = 29.4min).The dominant systematics visible are due to the quarterly roll and earth point recoveries (Stumpe et al. 2012).

Figure 4 .
Figure 4. Positions of all (4850/6286) pre-processed Kepler test data light curves for Q10 plotted on the Kepler sensor in global pixel coordinates (11000 × 11000).Kepler modules are demarcated by bold dashed lines and labeled by module number.The (30 × 50) spatial grid is drawn in dashed lines with individual cells of size (220 × 220) pixels.All light curves with coordinates included in the rectangular gridded region are shown in blue.The subset of light curves that were gridded, and therefore processed by the spatial systematics algorithm, are shown in red at their original positions on the CCD.These light curves were either gridded to the (30 × 50) spatial cell in which they fall or to a nearby empty spatial cell, as described in the main text.After gridding, no spatial cell contained multiple light curves and there were no empty spatial cells.Light curves on modules excluded from the data grid are shown in grey.There are no targets in module 3 due to module failure.

Figure 5 .
Figure 5. Spatial correlation structure within modules for Q6, Q10 and Q14 across the (30 × 50) gridded region of the Kepler sensor.The module number is shown in the top left corner of each module.The mean pairwise correlation ρM within each module is tabulated in black.The difference ρM XY − ρM is tabulated in red, where ρM XY is the mean pairwise correlation of neighboring light curves within each module.The shaded color intensity is proportional to ρM XY .In most modules the correlation between neighbouring light curves is greater than the mean pairwise correlation between all pairs of light curves on the module.

Figure 6 .
Figure 6.The leading PCA basis vector v1 for quarters Q6, Q10 and Q14) (top) and an associated color map of the leading PCA coefficient c 1 i←(x,y) at each spatial cell position (bottom).The spatial structure of the leading coefficient term is persistent between quarters.
The transit profile b[n] is an empirical tapered symmetric profile, defined on the first half of the transit n ∈ [1, d 2 ] as: b[n] = 0.4 exp(−0.3n)− 1 and on the second half n ∈ [ d 2 + 1, d] as b[n] = 0.4 exp(−0.3(d− n)) − 1.The depth of this profile generally ranges from −0.6 at transit ingress and egress to −1 at the midpoint.Transits were simulated to allow at minimum three transits to occur in the light curve.The period P was drawn from the uniform distribution U (96, 360), which corresponds to a range of 4 to 15 days.The transit epoch n 0 is drawn from the uniform distribution U (0, P ).The duration d of the transit profile was drawn from the uniform distribution U (8, 24), which corresponds to a range of 4 to 12 hours.The depth δ is selected between U (0.002∥y∥, 0.016∥y∥).• Simulated flare: a f [n].The time sample index n p of the flare peak was drawn from the uniform distribution U (1, N ).The flare profile is simulated as a rising exponential up to n p followed by a slower decaying exponential.The overall signal is multiplied by a random walk stochastic process x[n], generated as x[n + 1] = x[n] + w[n] : w[n] = N (0, 1), x[0] = 0.The simulated flare is defined on the interval n ∈ [1, n p ] as a f [n] = x[n] exp − 20•(np−n) N and on the interval n ∈ [n p , N ] as a f [n] = x[n] exp − 10•(n−np) N.

Figure 7 .
Figure 7. Example simulated sine as, transit at, and flare a f signals as defined in Section 2.5.3.The inset panel (blue outline) shows the tapered transit profile at higher temporal resolution.

Figure 8 .
Figure8.Basis vectors V (upper) and coefficients C (lower) used for the full simulation of light curves in experiment B using a rank K = 4 model.The basis vectors were selected at approximately equal spacing as index set(1, 5, 11, 16) from the 16 Kepler CBV published for Q10 and channel 30(Stumpe et al. 2012).The coefficient spatial dependence was simulated as a mixture of random smooth features and sharp discontinuities.Both components are arbitrarily chosen and the discontinuities are not constrained to match the Kepler module gridding.

Figure 9 .
Figure 9.The scaled mean correlation of injected signals and associated detrended light curves ρ(a [m] , y ′[m] ) for Experiment A for sine, flare, and transit signal types m ∈ {s, t, f } (across columns) for each of five separate simulation runs (across rows).This mean correlation, defined in the main text, is plotted for the spatial systematics algorithm (blue) and the PCA decomposition method (red).The mean correlation is scaled by the maximum value achieved by either method; this maximum is shown in the legend of each subplot.The PCA algorithm is non-iterative and is depicted as a straight line accordingly.The x-axis label is the iteration number of the spatial systematics algorithm.The spatial systematics algorithm is generally convergent and typically has comparable or improved performance relative to PCA but this is not universal for all runs.

Figure 10 .
Figure 10.The estimated basis vectors v k , k ∈ {1, , 20}(K = 20) obtained for run #0, a single representative run of Experiment A, shown here for the spatial systematics algorithm (blue) and the reference PCA method (red).The spatial basis vectors and PCA basis vectors closely overlap for the first two terms.

Figure 11 .
Figure 11.Basis vector coefficients c k i , k ∈ {1, .., 5}(K = 20) for light curve i → (x, y) ∈ (X, Y ) obtained using the spatial systematics (top row) and PCA algorithms (bottom row) for run #0, a representative run of Experiment A. The coefficient index k is shown in the top left of each sub-figure.The x-and y-axes are in units of gridded spatial cells.

Figure 12 .
Figure 12.Three sample light curves across injected signal type a [m] m ∈ {s, t, f } (sine, transit, and flare) from run #0, a representative run of Experiment A. The figure shows the pre-processed light curve before detrending y (upper row), the detrended light curve using PCA y ′ P CA (middle row), and the detrended light curve using the spatial systematics algorithm y ′ (lower row).Simulated injected signals are overlaid in black.The x-axis is Kepler long-cadence time sample index.This sample shows an instance where PCA detrending removed the injected astrophysical sine and flare signals, while spatial detrending preserved those signals.

Figure 13 .
Figure 13.The estimated (left) and known simulated (right) basis vectors v k , k ∈ {1, .., 8} for Experiment B. The estimated basis vectors (left) are shown for the spatial systematics algorithm (blue) and the reference PCA method (red).The x-axis is in units of Kepler long-cadence time sample.The estimated spatial and PCA basis vectors (left) for the first four terms overlap one another in these plots.

Figure 14 .
Figure 14.Coefficients c k i , k ∈ {1, .., 5}(K = 8) for light curve i → (x, y) ∈ (X, Y ) for Experiment B. This figure shows the true simulated coefficients (top row), the coefficients obtained using the PCA method (middle row), and the coefficients obtained using the spatial systematics algorithm (bottom row).The coefficient index k is shown in the top left of each sub-figure.The x-and y-axes are in units of gridded spatial cells.Coefficients fitted with PCA show speckling, indicative of the overfitting of injected astrophysical signals.

Figure 15 .
Figure 15.Three sample light curves across injected signal type a [m] m ∈ {s, t, f } (sine, transit, and flare) for Experiment B. The figure shows the simulated light curve before detrending y (upper row), the detrended light curve using PCA y ′ P CA (middle row), and the detrended light curve using the spatial systematics algorithm y ′ (lower row).Simulated injected signals are overlaid in black.The x-axis is Kepler long-cadence time sample index.This sample shows an instance where PCA detrending moderately removed the injected astrophysical sine and flare signals, while spatial detrending more clearly preserved those signals.

Figure 16 .
Figure16.A plot of CDPP 6h (ppm) versus goodness metric G y ′ per target light curve shown for the spatial systematics method and comparison standard detrending methods CBV, SFF, and PLD.Empirical distribution functions for each light curve type are plotted over G y ′ (top) and CDPP 6h (right).The CDPP 6h metric is broadly comparable across method while the G y ′ metric is more strongly dependent on detrending method.

Figure 17 .
Figure 17.A plot of gridded light curve cells i → (x, y) ∈ (X, Y ) where a simulated astrophysical signal a [m] was injected for Experiment B. The x-and y-axes are in units of gridded spatial cells.The column at left depicts the magnitude of the injected astrophysical signal ∥a [m] ∥ as the color intensity of each cell on a log scale (blue color wedge at left).The middle and right columns depict coefficients c 5i for light curve i → (x, y) ∈ (X, Y ) at each spatial cell (from Figure14) for the reference PCA method and the spatial systematics method respectively.For these columns the magnitude of the coefficient is shown as color intensity (color wedge at right).A black outline around a cell indicates that the corresponding lightcurve contained an injected astrophysical signal.The rank 5 coefficient was chosen for clarity of interpretation as this coefficient should be closer to zero.Coefficients fitted with PCA exhibit some overfitting of injected astrophysical signals.

Figure 18 .
Figure 18.The simulated light curve y before detrending (left), the detrended light curve using PCA y ′ P CA (middle), and the detrended light curve using the spatial systematics algorithm y ′ (right), all at the position of the the maximum value of [CP CA]5 in Figure 17.Simulated injected signals are overlaid in black.The x-axis is Kepler long-cadence time sample index.This shows an instance where PCA detrending overfit the injected astrophysical sine signal relative to the spatial systematics algorithm.

Figure 19 .
Figure 19.A plot of CDPP 6h versus goodness metric G y ′ per target light curve for a single run #0 of Experiment A. Three light curve types are shown, including non-detrended preprocessed light curves (black), light curves detrended using PCA (red), and light curves detrended using the spatial systematics algorithm (blue).Empirical distribution functions for each light curve type are plotted over G y ′ (top) and CDPP 6h (right).

Figure 20 .
Figure20.The metrics G y ′ (left) and CDPP 6h (right) for the reference PCA method versus the spatial systematics method.Both plots are per target light curve for a single run #0 of Experiment A. Empirical distribution functions for each light curve metric are plotted over the spatial systematics method (horizontal axis) and the PCA method (vertical axis).Although correlated, there is significant scatter in the left plot due to the intrinsic nonlinear and pair-wise nature of the metric G y ′ .It is accordingly more sensitive to light curves with poorly-modeled systematics or greater retained astrophysical variability.

Figure 21 .
Figure 21.The non-injected light curves yac for which the spatial systematics algorithm produces the highest incidental correlation maxy a c ρ(a [m] , yac ), m ∈ {s, t, f } between the non-detrended light curve yac and any injected astrophysical signal a [m] used in the simulation run.The top row shows the non-detrended light curve and the astrophysical signal (injected into another light curve) with which the incidental correlation is highest.The detrended light curves y ′ a c are shown as obtained by the reference PCA method (middle row) and the spatial systematics algorithm (bottom row).Simulated injected signals are overlaid in black.In these worst cases it can be seen that the high level of correlation of the spatial detrended lightcurve y ′ with an astrophysical signal a is incidental, as the original lightcurve y itself is incidentally correlated with a.This suggests that high values for the metric of corruption |ρ(a, y ′ a c )|/|ρ(a, yac )| may result incidentally for the spatial systematics algorithm because it retains a higher level of astrophysical variability.

i
: m ∈ {s, t, f } Simulated astrophysical signal of type: s sine, t transit, f flare A Matrix of simulated astrophysical signalsK Rank of systematic noise model v k Basis vector k ∈ K for systematic noise model L = VC Matrix of low-rank systematic noise (V basis vectors, C coefficient matrix) C Column-normalized coefficient matrix c k Coefficient weighting of v k for light curve i ∈ I ci = [c 1 i , . . ., c K i ] T Coefficientvector for light curve i W Weight matrix D Difference operator (D W difference operator with weights W) α t Step size at iteration t f (V, C) Least-square residual between Y and VC h(C) For fixed C, minimizing value V of f (V, C) g(C) Total variation penalty constraint w(.) Variable-reduced objective function ∇f ′ (.) Gradient of f (h(C), C) ∇g(.) Gradient of f (C) P R(C) T Projection onto the range space of C T ρ(yi, yj) Correlation between vectors yi and yj ρ(a, y ′ a ) Averaged correlation of astrophysical signals a and detrended lightcurves with an injected signal y ′ a {yac } Set of lightcurves excluding those with a simulated astrophysical signal G y ′ Residual detrended systematics: goodness metric CDPP 6h Residual detrended systematics: 6 hr combined differential photometric precisionwith Σ = diag(σ 1 , ..., σ K ), V K and U K denote the associated leading K left and right singular vectors respectively.PCA is also the solution to the objective function minimization in Equation B1 with form L

2 :
arg max C p(C|Y) ∝ arg max C p(Y|C)p(C) (D8) (D9) The signal model in Equation 1 is Y = VC + N, where N is white Gaussian noise, and therefore p(Y|C, V) ∼ N (VC, 1).The conditional basis vectors are V |C = arg min V p(Y, V|C) and p(Y|C) ∼ N (V |C C, 1).Therefore ln(p(Y|C)) = ∥Y − V |C C∥ 2 F .The total variation prior implies that ln(p(C)) ∝ ∥D W C∥ p 2,p , whereby p(C) ∝ exp ∥D W C∥ p 2,p .For p = 2 this is equivalent to a Gaussian prior on the product D W C. E. RELATION BETWEEN COEFFICIENT AND SYSTEMATICS CORRELATIONS We consider here the relation between the correlation cT i cj of the normalized coefficients ci ∈ R K and the correlation lT i lj of the normalized systematics li = Vci ||Vci|| 2 li ∈ R N .
E13) since A is orthonormal A T A = 1 K and therefore ∥AΣB T c∥ = ∥ΣB T c∥.By the cosine similarity (Phillips 2021), cT i cj = cos(ϕ) where ϕ is the angle between the normalized coefficient vectors in K-dimensional space.The correlation lT i lj is similarly determined by the angle ψ between unit length vectors ΣB T ci ∥ΣB T ci∥ and ΣB T cj ∥ΣB T cj ∥ .The deviation of lT i lj from cT i cj is defined as |ϕ − ψ|.The transform ΣB T c i projects c i along each b k and scales by σ k : ΣB in the span of orthonormal vectors {b k : k ∈ K} as the b k are a basis for R K .For simplicity, assume that c i and c j lie in the 2D span of two of the basis vectors b m and b n in the form c i = α i b m + β i b n and c j = α j b m + β j b n .Then ΣB T c i = σ m α i b m + σ n β i b n and ΣB T c j = σ m α j b m + σ n β j b n .Then ϕ = arctan β j /α j − arctan β i /α i and ψ = arctan σnβj σmαj − arctan σnβi Figure22.This illustration shows an example coefficient vector ci = 3b1 + 2bK (blue), linearly transformed to vector ΣB T ci = 3b1 + σK 2bK (red), plotted on the 2D space spanned by b1 and bK (brown).The normalized transformed vector is also shown (gray).A value σK = 0.6 was adopted for this example (recall σ1 = 1).

Figure 23 .
Figure23.A bisector at angle θ between coefficient vectors ci and cj in the 2D space spanned by basis vectors b1 and bK .

Figure 24 .F=F
Figure 24.Deviation of the systematics correlation lT i lj, upper bound (green) and lower bound (blue), from the coefficient correlation cT i cj (gray) for σ1 : σK = 0.6.

R
F27)Re-arranging and summing together the derivatives asN d dC T ||P ⊥ R(C T ) [Y] n || 2 F , the gradient ∇f ′ (C T ) ∈ R I×K is: ∇f ′ (C T ) = −2P ⊥ compactly expressed as: ∇f ′ (C T ) = −2P ⊥ R(C T ) Y T Y(C † ) T (F29) F.2.2.Total-Variation GradientThe derivative of ||D CT || p 2,p can be computed by application of the chain rule:∇g(C T ) = d dC T ||D CT || p 2matrix D CT ∈ R 2×K•X•Y has columns for each coefficient k ∈ K and each pixel j → (x, y) ∈ X × Y , formed as [[D x CT ] j,k , [D y CT ] j,k ] = [c k x,y − ck x+1,y , ck x,y − ck x,y+1 ], where D x ∈ R (X−1)Y ×XY and D y ∈ R X(Y −1)×XY are difference matrices in x and y respectively.Note, for the edge pixels in top row and right-most column, difference values are only included along X or Y respectively.The total variation penalty computes the L p p norm over the L 2 norm of each column of D CT , given by∥D CT ∥ p 2,p = n ||[D CT ] •,n || p 2 = k j ∥[[D x CT ] j,k , [D y CT ] j,k ]∥ p 2 .
x CT ] j,k [D x ] T j , and by applying the chain rule we derive:d||D CT || p x CT ] j,k [D x ] T j + [D y CT ] j,k [D y ] T j (F33)We denoteL k = diag p([D x CT ] 2 j,k + [D y CT ] 2 j,k + δ) p 2 −1 ∀ j .The sum over j allows the expression to be written compactly as:d||D CT || p 2,p d[ CT ] •,k = (D T x L k D x + D T y L k D y )[ CT ] ,k F35)Each row i of the final derivative is the product ofd||D CT ||p d[ CT ] i and d[ CT ]i d[C T ]i such that [∇g(C T )] i = [B 1 [ CT ] •,1 , . . .B K [ CT ] •,K ] i (1 K − [ CT ] i [ CT ] T i ) 1 ||[C T ] i || (F36)whereB k = D T x L k D x + D T y L k D y .

Table 1 .
Experiment A ResultsNote-Performance metrics for detrended light curves, as defined in Section 3.1.As described there, recovery indicates the degree to which injected signals are preserved correctly in the detrended light curves.Corruption indicates the degree to which injected signals incorrectly appear in light curves that contain no injected signals.The uncertainties listed are the standard deviation of the value over the ensemble of ten runs.For reference, the correlation of non-detrended, pre-processed and non-injected light curves yac with simulated astrophysical signals a[m]have values |ρ(as, yac )| = 0.07, |ρ(at, yac )| = 0.04, |ρ(a f , yac )| = 0.18 for sine, transit, and flare signals respectively.For non-detrended pre-processed light curves, the goodness metric is Gy is 0.26 ± 0.01 and the median CDPP 6h is 51.21 ±0.14 ppm.

Table 2 .
Experiment B resultsNote-Performance metrics for detrended light curves, as defined in Section 3.1 and used in Table1.As described there, recovery indicates the degree to which injected signals are preserved correctly in the detrended light curves.The uncertainties listed are the standard deviation of the value over the ensemble of simulated light curves.No uncertainty is listed for G y ′ as it is computed over the ensemble of light curves.The goodness metric value computed over the non-detrended simulated light curves Gy is 0.23.The goodness metric value computed between all simulated astrophysical signals Ga is 2.2 × 10 −2 .

Table 3 .
Experiment C results Note-Residual systematics metrics for detrended light curves produced in Experiment C. The CDPP 6h mean over the ensemble of simulated light curves is denoted µ with sample standard deviation σ.No uncertainty is listed for G y ′ as it is computed over the ensemble of light curves.

Table 4 .
List of symbols[M]i,j or Mi,j Element at i-th row and j-th column of matrix M 1N Identity matrix of size N × N M † Pseudo-inverse matrix operator M T Matrix transpose operator ∥x∥p Lp norm of vector x ∥M∥p,q Combined Lp,q norm for matrix M ∥M∥F Frobenius norm of matrixM i ∈ I Target light curve index set i → (x, y) ∈ X, Y ; X, Y ∈ Z + Griddedtarget position on sensor yi Vector light curve for target index i ni Vector statistical noise term for light curve target index i li Vector systematics term for light curve target index i Y Matrix of light curves; for vectors y Y ′ Matrix of detrended lightcurves; for vectors y ′ N Matrix of statistical noise a [m]