Direct Exoplanet Detection using Convolutional Image Reconstruction (ConStruct): A New Algorithm for Post-processing High-contrast Images

We present a novel machine-learning approach for detecting faint point sources in high-contrast adaptive optics (AO) imaging data sets. The most widely used algorithms for primary subtraction aim to decouple bright stellar speckle noise from planetary signatures by subtracting an approximation of the temporally evolving stellar noise from each frame in an imaging sequence. Our approach aims to improve the stellar noise approximation and increase the planet detection sensitivity by leveraging deep learning in a novel direct imaging post-processing algorithm. We show that a convolutional autoencoder neural network, trained on an extensive reference library of real imaging sequences, accurately reconstructs the stellar speckle noise at the location of a potential planet signal. This tool is used in a post-processing algorithm we call Direct Exoplanet Detection with Convolutional Image Reconstruction, or ConStruct. The reliability and sensitivity of ConStruct are assessed using real Keck/NIRC2 angular differential imaging data sets. Of the 30 unique point sources we examine, ConStruct yields a higher signal-to-noise ratio than traditional principal component analysis-based processing for 67% of the cases and improves the relative contrast by up to a factor of 2.6. This work demonstrates the value and potential of deep learning to take advantage of a diverse reference library of point-spread function realizations to improve direct imaging post-processing. ConStruct and its future improvements may be particularly useful as tools for post-processing high-contrast images from JWST and extreme AO instruments, both for the current generation and those being designed for the upcoming 30 m class telescopes.


Introduction
A complete statistical census of exoplanet demographics is needed to test and guide planet formation and evolutionary models (e.g., Burrows et al. 2001;Alibert et al. 2005;Gaudi et al. 2021).Planets detected with indirect methods, particularly using radial velocities or transits (Seager 2008;Lovis et al. 2010), comprise the bulk of known discoveries.These methods are effective for finding companions at close separations to their host stars, but are less sensitive at wider orbital distances.Over the past two decades, high-contrast imaging (HCI) has emerged as an effective tool to study long-period planets by probing the architectures of planetary systems from the outside in, while also enabling spectroscopic characterization of their atmospheres (Bowler 2016;Baron et al. 2019;Nielsen et al. 2019;Vigan et al. 2021).
Dedicated hardware, including high-order adaptive optics (AO; Guyon 2005) and coronagraphy (Guyon et al. 2006;Oppenheimer & Hinkley 2009), are necessary to reach the planet/star contrasts required to detect faint substellar companions with direct imaging.Advanced post-processing algorithms play a crucial role in pushing the sensitivity of imaging surveys to smaller inner-working angles (IWAs) to maximize the scientific yield of these instruments.Central to this is accurately modeling and removing correlated quasi-static speckle noise in the imaging data.For ground-based instruments, residual atmospheric wave-front errors uncorrected with AO and instrumental aberrations produce speckle noise with correlation lengths that range between seconds and hours (Hinkley et al. 2007;Martinez et al. 2012).Efficient post-processing is especially necessary at small IWAs, where speckles are often brighter and exhibit similar spatial characteristics to planetary signatures (e.g., Fitzgerald & Graham 2005).
Post-processing strategies are tied to the observation approach.With angular differential imaging (ADI), instruments are configured to observe in pupil-tracking mode so that on-sky sources rotate deterministically around the optical axis of the instrument, while slowly evolving speckle noise realizations remain fixed in the image plane (Liu 2004;Marois et al. 2005).Post-processing algorithms such as locally optimized combination of images (LOCI; Lafreniere et al. 2007) and Karhunen-Loéve image projection (KLIP; Soummer et al. 2012) leverage the spatial decoupling between speckle noise and planetary signals with ADI to estimate a best-fit reconstruction of the speckle noise in each frame, conditioned on other frames in the sequence.The reconstructed images are subsequently subtracted from each respective frame, and the residual images are derotated and averaged to recover real sources.
Several variations and extensions of LOCI and KLIP have been developed to improve these algorithms in their original forms.For instance, at very close separations, the sky rotation is sometimes insufficient to decouple planetary signals from speckle noise, causing the reconstructed frame to partially fit any real source that might be present.To prevent excessive oversubtraction, Pueyo et al. (2012) introduced a damped Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
version of LOCI which regularizes the least-squares reconstruction.Reference differential imaging (RDI) is another strategy to address oversubtraction by making use of a library of reference images (Lafrenière et al. 2009;Sanghi et al. 2022;Xie et al. 2022), or images of a reference star sampled concurrently with the target (Wahhaj et al. 2021), to create the linear reconstruction of each science frame.However, the RDI image basis is not necessarily linearly independent to the expected signal profile of an on-sky object, so some selfsubtraction can still occur.
To improve the linear speckle reconstruction methods like those used in both ADI and RDI, we present a machine-learning-based method for speckle noise reconstruction we call Direct Exoplanet Detection with Convolutional Image Reconstruction, or Con-Struct.Our approach explicitly partitions image patches4 sampled in ADI frames into hypothesized planet-absent/ present sectors for post-processing.It uses the spatial speckle correlations across neighboring pixels in the planet-absent sector to predict a speckle noise reconstruction, independent of potential signals, in the planet-present sector.For this, an autoencoder neural network is trained in a self-supervised learning architecture with thousands of real examples, which can be applied to science targets without further training.
By leveraging a library of archival HCI data, ConStruct is similar to RDI.However, it can encode information from thousands of training examples without manually selecting a reference library of images.Our approach is motivated by algorithms developed in the machine-learning community for filling corrupted or missing regions in images with deep learning, commonly known as image inpainting.Elharrouss et al. (2019) provides a review of the literature related to image inpainting.
Machine-learning approaches for post-processing direct imaging data have received growing interest in recent years.A supervised learning framework for detecting faint point sources is introduced in Gonzalez et al. (2018), which uses both random forests and neural networks to classify likely candidate point sources.Similarly, in Flasseur et al. (2024) the authors use convolutional neural networks for both detection and characterization of post-processed images, generated with the PACO algorithm (Flasseur et al. 2018(Flasseur et al. , 2020)).Yip et al. (2019) apply generative adversarial networks to create synthetic coronagraphic image realizations.The synthetic images then train a convolutional neural network to classify regions that contain potential bright planets in single science frames.Gebhard et al. (2022) built a regularized linear model of speckle noise based on causal predictors to fill in speckle noise in a region of interest.Our approach shares some similarities; however, we utilize a highly nonlinear model to complete the speckle prediction with solely local speckle correlation.
This paper is organized as follows.In Section 2, we explain ConStructʼs principle and how it is used for detecting planetary companions.We also compare the operating mechanisms of ConStruct to those used in linear speckle reconstruction approaches like KLIP.Section 3 details the autoencoder neural network used in ConStruct as well as an additional linear correction to leverage the temporal correlations between frames in ADI sequences.In Section 4, we discuss how ConStruct is tuned to maximize the signal-to-noise ratio (S/N) of substellar sources in these data.We apply ConStruct to 30 unique point sources in data sets from the W. M. Keck Observatory's NIRC2 imaging camera in Section 5, and compare the results with a principal component analysis (PCA) reduction approach.

Speckle Noise Reconstruction
In this section, we first introduce the framework behind ConStruct and place it in the context of existing approaches for processing high-contrast ADI sequences.A general region in an ADI frame can be partitioned into subregions  and , as illustrated in Figure 1.The subregion  is assumed to include only pixels from speckle noise, and  contains speckle noise spatially correlated with  and a potential signal from an onsky object.The pixel-wise intensities for the tth frame contained in regions , , and their union, , are as follows: where, Here, s t  is the intensity of the speckle noise contained in region  for frame t, and s t  is the intensity of the speckle noise contained in region  for frame t.The pixel intensities contained in  are modeled as an additive combination of the speckle noise, s t  , and a potential point source, h  .The term η includes any irreducible uncertainty in the data.Here, we denote γ to be the sample statistic, which is a function of the excess intensity in the region .In this formulation, if the intensity from speckle noise is known perfectly, then detecting an on-sky point source is limited only by η.To that end, postprocessing algorithms seek an unbiased minimum error estimate, s t ˆ.PCA-based reconstruction approaches like KLIP estimate the weighting coefficients { ˆ} of an optimal lowrank image reconstruction basis matrix V q ˆ, which together generate this estimate.Without explicitly partitioning the regions  and , minimizing  produces a minimum mean squared error reconstruction of the region  for each of the tth frames.Hastie et al. (2009) show In this example, the autoencoder is constructed such that it is compatible with a 32 × 32 pixel image patch.The encoder and decoder portions of the network act sequentially to first transform the masked image into a low-dimensional feature space, then back into the original input space.By training the network with many examples, the network learns to accurately predict the masked central region.Here, we include the sizes for the output from each encoder and decoder block.that Equation (2) is equivalent to The solution to Equation (4) is found via singular value decomposition of the data matrix X, where The matrix U is an orthonormal matrix and the columns contain the left-singular vectors of the decomposition.The matrix D is a diagonal matrix containing the singular values of the matrix X.The columns of V ˆare the right-singular vectors, or principal component vectors, which are ordered in the directions of maximum variance.V q ˆis obtained by truncating the columns of V ˆup to a desired integer q.In PCA-based reduction approaches, the regions  are sometimes partitioned into concentric annuli, sectors, or full ADI frames.Generating the speckle estimate for each region  consists of two symmetric operations: 1.The intensity vector is mapped onto a low-rank PCA feature space with = w I V t q t ˆˆ  .2. The PCA features are mapped back into the original pixel space to produce an estimate of the speckle noise in region  with = s w V t q t ˆˆ .
In this work, we generalize V q and V q T to be two nonlinear functions, f( • ) and g( • ), parameterized by a neural network.To alleviate partial fitting to the potential source, we explicitly exclude the region  in building the reconstruction.Minimizing the cost function results in the optimal nonlinear reconstruction functions f (•)  and g (•)  .Here, p is a positive scalar, and N is the number of training samples in a reference library, each indexed by an integer i.We detail the architecture and implementation of this algorithm in Section 3. Note that in this formulation the speckle noise prediction relies solely on spatial correlations exhibited between the predictor  and the region of interest .

Recovering Off-axis Point Sources
During ADI observations, a telescope is placed in pupiltracking mode, causing on-sky point sources to rotate deterministically around the optical axis of the instrument (Marois et al. 2005).Accordingly, pixel intensities in a region, *  , label a set of locations, { }  , across the ADI sequence which will all contain a point source if it is contained in *  .Note that we use " * " to denote the first frame of the ADI sequence.Rearranging Equation (1), the maximum-likelihood estimate of a potential point source across the set of timedependent locations associated to * Here, s t t ˜ is the predicted speckle noise intensity for t  in the tth frame, which is generated through our prediction function The operator t (•)  denotes the statistical expectation taken with respect to time.
Constraining a hypothesized point source requires systematically comparing the estimated signal to its surrounding resolution elements.This is formalized in a hypothesis test, where g g , where the operator ∥ • ∥ ∞ takes the maximum value of the expected source intensity vector.This is calculated in practice by placing a circular aperture around the potential source and retrieving the maximum value within the aperture's extent.We assume the test statistic g ˆis a Gaussian-distributed random variable so that g g s ~, ˆ( ¯)  , where s ¯is the empirical root mean square (RMS) of the residual intensity in the surrounding elements.Note that we assume that residual pixel-wise intensities are independent and identically distributed (i.i.d).The optimal decision in the sense of minimizing the false negatives, subject to a fixed probability of false positives, is given by the log-likelihood ratio test: Because we are assuming Gaussian statistics, Equation (10) reduces to The scaling variable l ˜is used to gate the hypothesis test as a multiplicative factor of the residual speckle RMS.Due to spatial correlations in the residual speckle intensity, the samples are not i.i.d.A more statistically sound treatment of the residual speckle noise statistics is considered in Mawet et al. (2014).

Reconstruction Function Architecture
In this section, we detail how an autoencoder neural network serves as a surrogate for the functions f( • ) and g( • ) given in  Equation ( 7).An autoencoder is a feed-forward neural network that reconstructs a corrupted input as its output.An encoder and decoder network portion act symmetrically to first transform a high-dimensional input into a low-dimensional feature space, and subsequently back into the original input space.Con-Struct uses a fully convolutional version of an autoencoder, similar to the U-net network (Ronneberger et al. 2015).Convolutional neural networks excel in image-based tasks, so this architecture choice is attractive for our application.We provide a graphical illustration of our network architecture in Figure 2 and show the output sizes from each block in the network.
The encoder portion of the network consists of a sequence of four convolutional blocks.Each block contains two convolutional layers, followed by a max-pooling layer.The convolutional layers create a set of feature maps by translating a 3 × 3 convolutional filter over the output of the previous layer and applying a rectified linear activation function (or ReLU).Unique filters correspond to each feature map.Through training, the network learns a set of convolutional filters that enable it to accurately predict a corrupted image portion, or in our case, a masked region of interest.The max-pooling layer applies the maximum operator over a 2 × 2 filter translated across the feature map.This operation downsizes each dimension of the feature map by one-half.
The decoder portion uses a set of low-dimensional convolutional features produced by the encoder to reconstruct a prediction of the speckle noise in the masked region of interest.The decoder contains four up-convolutional blocks.Similar to the encoder, each block contains two sequential convolutional layers, but these are followed by a deconvolutional layer that transforms many lowerdimensional features into one of higher dimension.The encoder features congruent to each decoder block are concatenated to the deconvolved layer, and the new set is passed into the next block.This technique, sometimes referred to as residual connections or skip connections, helps to alleviate the issue of vanishing gradients common in deep neural network architectures.The final layer in our architecture consists of a single convolutional layer with a sigmoid activation function which reconstructs a scaled speckle prediction in the masked region, along with a reconstruction of the original image region fed into the network. 5In Appendix A, we include the layer-wise parameters for the autoencoder used by ConStruct.
Figure 3 shows an example of using the trained autoencoder to predict speckle noise in an ADI image patch.In this example, an image patch is extracted from a frame contained in an ADI sequence of the HR 8799 system, corresponding to Sequence 9 in Table 1.A synthetic 2D Gaussian source is injected into the center of the image patch.The central region is then masked before being fed into the network.ConStruct accurately predicts the speckle in the missing masked region, and this prediction is subtracted from the original image patch to reveal the synthetic source.In Appendix B, we include two additional prediction examples extracted from spatially disparate realizations of the speckle field.

Data Selection
A collection of representative data is needed to train our autoencoder.We select a library of archival ADI sequences observed with the Keck NIRC2 near-infrared camera for this task.The data are downloaded from the Keck Observatory Archive (KOA), along with accompanying calibration frames.In total, 92 unique ADI sequences are used to train our algorithm.
KOA contains a large number of ADI sequences that we can potentially include for training.We downselect our sample to only include K s -, K p -, CH 4 S-, Y-, and H-band filters.This choice has repercussions as to which data sets we can deploy our trained algorithm upon: Because the spatial frequency of speckle noise is wavelength dependent (Sparks & Ford 2002), we expect our algorithm to perform best when used on ADI sequences taken with the same filters.Additionally, only data using the 400, 600, and 800 mas diameter Lyot coronagraphs are used.The sequences were searched on the archive primarily using a list of stars with directly imaged planets and low-mass brown dwarf companions contained in Bowler (2016).Most training sequences have known point sources, but we assume that the number density of the point sources is sufficiently small across these data sets so as to not significantly bias the speckle realizations.Each sequence is visually inspected to determine its quality for training.Sequences containing saturated pixels or degraded Strehl ratios were excluded.The data used for training are detailed in Appendix C. In generating training data, we draw an ADI sequence from our library by sampling a distribution whose probability of selection is proportional to the number of frames in the respective sequence.Patch coordinates in the sequence are then uniformly sampled in azimuth, radial separation, and frame number.These define the central coordinate of a sector which is projected into a square array for training.Figure 4 illustrates the projection operation.An angular and radial width defines the dimension of each sector.Depending on the radial coordinate, we choose the angular width such that the square projection approximately maintains the number of pixels in the sector.The projection uses cubic spline interpolation.We find the information degradation resulting from this transformation is sufficiently small to not affect the performance of ConStruct.

Processing the Training Data Set
The pixel values in each projected square array are linearly scaled between 0 and 1.A circular mask of zeros is applied to each of these scaled samples, corresponding to the region .The original array, y, and the masked sample, x, together form a training sample (x, y).We use 30,000 samples to train the network.The complete set is partitioned into a 90/10% train/

Autoencoder Implementation and Training
We implement our autoencoder network in Keras (Chollet 2015), which is an application program interface for the TensorFlow machine-learning library (Abadi et al. 2015).In training, the component x of each sample is first propagated forward through the network, and the network's prediction is subtracted from the ground truth, y, to form a residual array.Repeatedly backpropagating the residual errors adjusts the autoencoder parameters such that the prediction accuracy in the masked region increases.We use a 32-sample mini-batch gradient descent with the Adam optimizer (Kingma & Ba 2015), and a Huber loss function given as


The Huber loss uses the minimum squared error when deviations are small (i.e., within some defined deviation ò), and  (Marois et al. 2008(Marois et al. , 2010) ) for Sequence 11 in Table 1.The top two panels correspond to (left) the flux map and (right) the S/N map produced by ConStruct.The bottom panels are that produced with PCA.In the side panels, we show a cut-out view of each planetary point source, and the source's S/N produced by each algorithm.minimum absolute error when deviations are large.This reduces the importance placed on outlier samples in the training process.Thirty training epochs are used, and we observe that the errors in the validation set consistently decrease with all epochs.For our application, more training epochs can likely be used, but with only marginal improvement in the prediction accuracy.

Correcting the Autoencoder Prediction Residuals
We leverage the temporal correlation of speckle noise across ADI frames to further correct the residual errors of the autoencoder predictions with an L2 regularized linear regression model, otherwise known as ridge regression.In our formulation, we use the output from the autoencoder as the predictor variables for the model.For a fixed spatial region  in the tth frame partitioned into subregions  and , the speckle prediction from the autoencoder is The prediction residual in the region  is an additive combination of the residual speckle noise, and a potential point source: For each pixel in the region , we solve a regularized linear regression to correct the residuals.Let be the response variable in the regression model, where z j t  is the residual intensity for the pixel indexed by the presuperscript j in the tth frame.Each regression model finds the coefficients β j that minimize the loss function which has the analytic solution Here, I T×T is the T-dimensional identity matrix.α is a positive scalar which acts as a tuning variable in ConStruct.Higher values of α increase the bias of the regression model, but help reduce overfitting to potential companion point sources.X ˜is the design matrix, which is comprised of the horizontally stacked predictions produced by the autoencoder: This is repeated for all pixels contained in region .

˜ˆ( )
Figure 5 illustrates how the autoencoder prediction and leastsquares regression step sequentially reduces the residual speckle noise.We also include two examples showing the speckle residuals in the region  from the autoencoder without and with the least-squares regression in Figure 6.The estimated intensity of the potential companion point source in the region  in each frame t is, then, where col t ( • ) denotes the tth column of the matrix.

Using ConStruct with ADI Sequences
ConStruct uses a flexible architecture that provides subpixel processing capabilities with ADI sequences.Based on a user-defined resolution and lower and upper radial processing bounds, radial and azimuthal coordinates, (r * , θ * ), are defined in the image plane that satisfy the resolution requirement.Here, we use the subscript " * " to denote that these coordinates are located in the first frame of the ADI sequence and define the plane of the final flux map,  , and S/N map,  .Each coordinate (r * , θ * ) labels a set of spatial patches , where f t is the field rotation observed up to and including frame t.Post-processing is performed serially over radial resolution bins, and for each bin the following steps are implemented: 1.Each azimuthal and time coordinate defines the center of a sector area in the radial bin, and the sector is extracted and mapped to a square array, as illustrated in Figure 4.This step is parallelized for all elements (θ * , t) in the radial bin spanning the ADI sequence.2. The extracted square arrays are each independently scaled (i.e., a unique scaling function is fit to each spatial patch) between 0 and 1, and the central pixel values corresponding  are masked with a circular template of zeros.
3. The masked arrays are fed as a batch into the autoencoder network, which allows for GPU parallelization.The autoencoder reconstructs the missing regions in each patch using solely the spatial correlations with the surrounding speckle noise.4. The predictions produced by the autoencoder are rescaled by applying the inverse of the scaling operator fit to each patch in step 2. 5.The rescaled predictions are subtracted from the true image patches, creating a set of residual images.The residual images corresponding to a fixed azimuthal coordinate but spanning all frames, z , are further corrected with ridge regression described above.This is repeated for all azimuthal elements θ * in the radial bin.6.The corrected residuals are formed into sets that may each contain a point source, based on a label azimuthal coordinate, and the parallactic rotation of the ADI sequence.Each of these sets are denoted as . This is repeated for all azimuthal coordinates θ * in the radial bin.

The images in the set
are averaged, and the center pixel value of the averaged array is used for the intensity of the coordinate (r * , θ * ) in the final flux map,  .This is repeated for all azimuthal coordinates θ * in the radial bin.
The intensity values of  contained within a radial window of ±3 pixels of r * are used to calculate the empirical RMS of each radial bin.The S/N values of  are then where s ¯denotes the empirical RMS.

Algorithm Performance and Tuning
In this section, we optimize ConStruct to detect substellar companions in ADI sequences from NIRC2.We tune ConStruct over three selected algorithm parameters.NIRC2 ADI sequences of the HR 8799 system are used to tune our algorithm on the four known planetary companions in these data sets.This procedure allows us to select a particular parameter set for applying ConStruct on other ADI sequences.The goal is not to determine a universally optimal set of parameters, but rather to establish a reasonable set of parameter values that can then be applied to other data sets.

Selecting Design Parameters for ConStruct
We tune three design parameters in ConStruct: (i) the radius of the masked region, , (ii) the size of the sector used for the autoencoder prediction, , and (iii) the regularization parameter, α, in the linear correction model.Three HR 8799 ADI sequences from the Keck NIRC2 instrument serve as a test bed for adjusting these design parameters.The objective is to find parameters that recover all four known planets in this sequence and maximize their S/N.These sequences were originally published in Konopacky et al. (2016), and the details of the observations are summarized in Table 1 in Sequences 9, 10, and 11.
We run ConStruct over a combinatorial grid of the design parameters at the locations of the four planets for each data set.The regularization is iterated uniformly in log-space between 10 −3 and 10 3 , with 15 increments.The mask radius is tested at values of 4, 5, 6, and 7 pixels.The radial and azimuthal widths of the image sector patches are adjusted from 16, 24, 32, and 40 pixels.In total, this produces 240 unique parameter combinations.We determine the S/N of each companion for design parameters, which creates a 3D grid with each axis corresponding to a design variable.Figure 7 shows slices of the 3D S/N grid of each planet in Sequence 10 in Table 1.
We perform the same analysis for all three HR 8799 data sets.These data have similar observation parameters (i.e., number of frames, field rotation, filter type), and we include the results for the other two sequences in Appendix D. From this study, we determine parameters that work well but are not necessarily optimal for all planets across the three data sets in our sample.In deploying ConStruct on other data sets, we choose a sector size of 24 pixels, a mask radius of 5 pixels, and an L2 ridge regression regularization of 0.1.
A standard annular PCA reduction serves to benchmark ConStructʼs performance on the HR 8799 ADI sequences in Table 1.For this, a similar parameter search is performed where the annular width and number of PCA components are iterated to find those that recover each of the planets at a maximal S/N.We iterate the annular width between 8 and 30 pixels in increments of 3 pixels, for a total of seven different annular widths.The number of PCA components is iterated between five and the maximum allowable number, which is equivalent to the number of frames in the ADI sequence, in increments of four components.In Appendix E we show the performance of our PCA-based processing approach over the grid of parameters for three HR 8799 data sets contained in Table 1. Figure 8 compares these two approaches for all 12 sources in our tuning sample.For this, ConStruct produces a higher maximum S/N for planets b, c, and d than PCA over all data sets and produces comparable performance with the PCAbased reduction approach for planet e.

Sample S/N Comparison with PCA and ConStruct
We applied ConStruct to additional data sets obtained from KOA and compared the performance with a standard annular PCA reduction approach.The 18 ADI sequences chosen contain known faint companions and are included in Table 1.In total, the sample includes 30 unique point sources.For all data sets, ConStruct uses the design parameters identified in Section 4. To provide a fair comparison between ConStruct and annular PCA, we also fix a set of design variables for our PCA reduction approach over this testing data set.We identified the suitable design variables to be 17 pixels for the annulus width while using the maximum allowable number of PCA components (i.e., the number of frames in the ADI sequence).
For all the sources in our sample, we determine the recovered S/N produced from ConStruct and PCA with these fixed design parameters.Figure 8 shows a one-to-one comparison of the two algorithms for all 30 sources in our sample.ConStruct recovers 20 out of the 30 point sources at a higher S/N than PCA.Additionally, we find that the number of frames in the ADI sequence is correlated with Con-Structʼs performance, where more frames give a higher S/N for ConStruct than PCA.The L2 regularization coefficient, α, chosen in tuning ConStruct is likely the primary driver for this trend.6

Full-frame Reductions
Figures 9 and 10 highlight two representative full-frame reductions for HR 8799 (Marois et al. 2008(Marois et al. , 2010) ) and HD 114174 (Crepp et al. 2013a) using ConStruct and PCA.The top and bottom panels correspond to the reductions produced by ConStruct and PCA, respectively.For each, the flux map (left) and S/N map (right) of the reduced data sets are shown.Each S/N map is produced by dividing the flux values in each pixel-width annulus by the empirical rms of the residuals taken over a buffered annular region that extends ±3 pixels radially.In Figure 9, both ConStruct and PCA recover all four known planets in the HR 8799 system; however, ConStruct produces a higher S/N.In Figure 10, both ConStruct and PCA recover the white dwarf companion HD 114174 B with an S/N of 37.7 and 24.6, respectively.Interestingly, individual speckle noise realizations produced by PCA appear more elongated radially, whereas those produced by ConStruct have much less spatial covariance in the image plane.This effect may be attributed to the inclusion of data set diversity in training ConStruct over many ADI sequences.

5σ Contrast Curves
We generate contrast curves to rigorously compare the performance of ConStruct with PCA for two HR 8799 ADI sequences contained in Table 1.Synthetic sources are injected into each frame across varying radial separations.A source template is created by fitting a 2D Gaussian profile to the central stellar PSF partially visible in the occulting disk of the coronagraph in NIRC2 images.Sources are injected over a grid of azimuthal coordinates and contrasts commensurate with the RMS of the residual speckle noise in a reduced image corresponding to the particular separation.For each frame, the grid is rotated in the negative parallactic rotation direction so that on-sky sources in the ADI sequence do not add coherently during the derotation and coaddition step.We perform a least-squares fit between the intensity of the injected source and the recovered S/N for each radial separation.We use the resulting fit to find the intensity of the source that is recovered at a 5σ confidence level, and divide it by the flux of the central stellar PSF to obtain the 5σ contrast.The stellar flux is found using frames of the unsaturated PSF taken immediately before or after the ADI sequence.The flux values in the unsaturated PSF are scaled linearly so that the exposure is equivalent to the ADI frames.We also determine an associated 1σ confidence interval for the contrast value by determining the sample standard deviation of the contrast value over the grid of azimuthally spaced source locations.
In Figure 11, we show 5σ contrast curves for two HR 8799 data sets, corresponding to Sequences 11 and 12 in Table 1.For Sequence 11, ConStruct shows improvement over PCA for nearly all separations.At its best, ConStruct improves the contrast by a factor of 2.6 at 0.8''.Sequence 12, on the other hand, is nearly identical to PCA.The contrast curves share similar behavior to the results of the S/N analysis when comparing ConStruct and PCA.The properties of each data set are the likely driver of these results, however it is difficult to directly attribute any one factor.

Conclusions
In this work, we introduced ConStruct, a deep-learningbased algorithm for identifying faint point sources in ADI sequences.This algorithm uses an autoencoder neural network in a self-supervised training architecture to embed information from thousands of ADI image examples.The trained network is employed in a source present/absent processing framework to predict speckle noise and recover faint signals in ADI sequences.We further augment the autoencoder neural network in ConStruct with a regularized least-squares regression step to leverage temporal correlations in speckle noise across ADI frames.ConStruct is tuned using three data sets of the HR 8799 system taken with Keck/NIRC2 to identify suitable design parameters for a detailed assessment of its performance on other NIRC2 data sets.We demonstrated ConStruct with a sample of 30 point sources in 18 NIRC2 ADI sequences taken from KOA and compared the performance with a standard PCA-based reduction approach.For the examples analyzed, ConStruct improves the S/N of detections by up to ∼75% and contrast up to a factor of 2.6.
There are several potential directions to improve Con-Struct and make it more generally applicable for other HCI data sets.This includes testing alternative data types for training and deploying ConStruct, including different filters and additional instruments besides NIRC2.It is also possible to assess a broader variety of learning architectures that may improve the accuracy of ConStruct.In ConStruct, we employ a regularized least-squares regression to use the temporal correlations in speckle noise across ADI frames.This step can potentially be implemented directly in the neural network with the addition of a long short-term memory block -a standard method in machine learning for predicting features with temporal correlations.
Additionally, there is growing interest in the HCI community for processing procedures that operate under an inverse problem framework (Cantalloube et al. 2015;Flasseur et al. 2018Flasseur et al. , 2020)).ConStruct may be adapted for these methods by representing speckle noise predictions as a continuous probability distribution, which can be achieved with a variational autoencoder.This work only considers singlespectral-band ADI sequences, but ConStruct can in principle be readily extended to multi-wavelength integral field units that take advantage of spectral differential imaging such as the Gemini Planet Imager (Macintosh et al. 2006), the Spectro-Polarimetric High-contrast Exoplanet REsearch (SPHERE) instrument (Beuzit et al. 2019), the Subaru Coronagraphic Extreme Adaptive Optics (SCExAO) instrument (Jovanovic et al. 2015), and MagAO-X (Males et al. 2018(Males et al. , 2020) by using higher-dimensional convolutional layers.Finally, ConStruct and its future improvements present a promising approach for reducing data collected from the JWST and future extreme AO ground-based facilities.21 The  (Marois et al. 2008(Marois et al. , 2010) ) for Sequence 10 in Table 1.

Figure 1 .
Figure1.The partitioning of regions , , and  in an ADI frame.Each region can in principle be any shape and size.In forming the partitions, the region  is hypothesized to contain a point source, and  contains only speckle noise.

Figure 2 .
Figure2.Graphical illustration of the autoencoder architecture used in ConStruct.From left to right, an image region of interest is fed into the network, with the central region  masked.In this example, the autoencoder is constructed such that it is compatible with a 32 × 32 pixel image patch.The encoder and decoder portions of the network act sequentially to first transform the masked image into a low-dimensional feature space, then back into the original input space.By training the network with many examples, the network learns to accurately predict the masked central region.Here, we include the sizes for the output from each encoder and decoder block.

Figure 3 .
Figure3.Autoencoder prediction of the speckle noise intensity in a masked region of an ADI sequence.The image patch is extracted from a frame in Sequence 9 in Table1.It is linearly scaled to values between 0 and 1 before being fed into the autoencoder.Panels (a) and (b) are the masked autoencoder input and the ground-truth region of interest, respectively.Panel (c) is the ground truth, with an injected Gaussian signal with a peak scaled intensity of 0.2.Panel (d) is the prediction formed by passing the masked input into our network.Panels (e) and (f) are the residual images produced by subtracting the predicted image region from the ground truth without and with the injected source, respectively.ConStruct can accurately reconstruct the ground truth over the mask, and recover the injected source.

Figure 4 .
Figure 4. Graphical illustration of how sector regions are extracted and processed by ConStruct on ADI sequences.Each plane represents a frame in the ADI sequence.Sectors are extracted from each frame, and geometrically transformed into square arrays for processing.The center of each array is masked before being fed into the autoencoder.

Figure 5 .
Figure 5. Pixel-wise speckle prediction residuals in a fixed spatial patch  accumulated over all frames in an ADI sequence.Each of the samples is first linearly scaled between 0 and 1.The orange distributions are residuals after subtracting the time-averaged pixel-wise intensity from each frame; the purple distributions are the autoencoder prediction residuals; and the green distributions are the prediction residuals augmented with an additional linear regression.The vertical dashed lines indicate the ±1σ bounds on the residuals.The autoencoder prediction and regularized least-squares regression in ConStruct progressively improves the prediction accuracy, yielding the most accurate speckle prediction, shown in the green distribution.

Figure 6 .
Figure 6.Two examples of the residual speckle patches in the region of interest  produced (left) with the autoencoder only and (right) with the autoencoder augmented with the regularized least-squares regression step.Both of the samples are extracted from the same spatial location, but at different frames in the ADI sequence.Each of the samples is first linearly scaled between 0 and 1.The autoencoder prediction removes most of the speckle noise in the prediction region.The bias in the autoencoder prediction is removed with the linear regression step.

Figure 7 .
Figure 7. Marginalized representations for our parametric parameter search for the 2011 July 21 epoch of HR 8799.Each contour plot shows the S/N values projected onto two of the variables.The projection operation takes the maximum S/N along the axis of the marginalized variable.The violin plots show the spread in the S/N in each variable bin.Panels (a), (b), (c), and (d) correspond to HR 8799 b, c, d, and e, respectively.
Each of the ADI sequences used for training our network are pre-processed using standard procedures for HCI.Using dome flat calibration frames, we generate a bad-pixel mask for each sequence with the ccdproc Astropy module (Craig et al. 2017).The mask is applied to each calibration and science frame, and bad pixels are removed with nearest-neighbor interpolation.Dark subtraction and flat-fielding are carried out for each frame in the science sequence.Registration is performed by centering each frame at the pixel coordinate closest central stellar point-spread function (PSF), visible inside the occulting coronagraph.

Figure 8 .
Figure 8. Comparative analysis between ConStruct and annular PCA.Each point source's S/N produced by ConStruct is plotted against that obtained with PCA.The left panel shows the maximum obtainable S/N by tuning both ConStruct and PCA on each point source.In the right panel, we fix the design parameters for both ConStruct and PCA to reduce 18 ADI sequences, which together contain 30 unique point sources.For this set of tuning parameters, we find the number of frames in the image sequence is an indication of the relative performance of ConStruct, with more frames (typically above 100) yielding improved results compared to the standard PCA-based approach.

Figure 9 .
Figure9.Full-frame reductions with ConStruct and PCA of HR 8799(Marois et al. 2008(Marois et al. , 2010) ) for Sequence 11 in Table1.The top two panels correspond to (left) the flux map and (right) the S/N map produced by ConStruct.The bottom panels are that produced with PCA.In the side panels, we show a cut-out view of each planetary point source, and the source's S/N produced by each algorithm.

Figure 10 .
Figure10.Full-frame reductions with ConStruct and PCA of HD 114174(Crepp et al. 2013a) for Sequence 5 in Table1.The top two panels correspond to (left) the flux map and (right) the S/N map produced by ConStruct.The bottom panels are that produced with PCA.
The output of the regression step in ConStruct are the corrected residual intensities

Figure 12 .
Figure 12.Two autoencoder prediction examples for the speckle noise contained in a masked patch extracted from an ADI frame.Each of the patches are linearly scaled between 0 and 1.In each of the examples provided, panels (a) and (b) are the masked autoencoder input and the ground-truth region of interest, respectively.Panel (c) is the ground truth, with an injected Gaussian signal with a peak scaled intensity of 0.2.Panel (d) is the prediction formed by passing the masked input into our network.Panels (e) and (f) are the residual images produced by subtracting the predicted image regions from the ground truth without and with the injected source, respectively.

Figure 14 .
Figure 14.Marginalized representations for our parametric parameter search for the 2012 October 26 epoch of HR 8799.Each contour plot shows the S/N values projected onto two of the variables.The projection operation takes the maximum S/N along the axis of the marginalized variable.The violin plots show the spread in the S/N in each variable bin.Panels (a), (b), (c), and (d) correspond to HR 8799 b, c, d, and e, respectively.

Figure 16 .
Figure 16.Parametric PCA post-processing search for the 2012 October 26 epoch data set of HR 8799.Each contour plot shows the recovered S/N over the grid of parameters tested.The violin plots show the spread in the S/N in each variable bin.Panels (a), (b), (c), and (d) correspond to HR 8799 b, c, d, and e, respectively.

Table 1
Keck/NIRC2 ADI Sequences Used for the Performance Analysis of ConStruct Sequence References.(1, 16) Crepp et al.