The Large Array Survey Telescope—Pipeline. I. Basic Image Reduction and Visit Coaddition

The Large Array Survey Telescope (LAST) is a wide-field telescope designed to explore the variable and transient sky with a high cadence and to be a test-bed for cost-effective telescope design. A LAST node is composed of 48 (32 already deployed), 28 cm f/2.2 telescopes. A single telescope has a 7.4 deg2 field of view and reaches a 5σ limiting magnitude of 19.6 (21.0) in 20 (20 × 20) s (filter-less), while the entire system provides a 355 deg2 field of view. The basic strategy of LAST is to obtain multiple 20 s consecutive exposures of each field (a visit). Each telescope carries a 61 Mpix camera, and the system produces, on average, about 2.2 Gbit s−1. This high data rate is analyzed in near real-time at the observatory site, using limited computing resources (about 700 cores). Given this high data rate, we have developed a new, efficient data reduction and analysis pipeline. The LAST data pipeline includes two major parts: (i) Processing and calibration of single images, followed by a coaddition of the visit’s exposures. (ii) Building the reference images and performing image subtraction and transient detection. Here we describe in detail the first part of the pipeline. Among the products of this pipeline are photometrically and astrometrically calibrated single and coadded images, 32 bit mask images marking a wide variety of problems and states of each pixel, source catalogs built from individual and coadded images, Point-Spread Function photometry, merged source catalogs, proper motion and variability indicators, minor planets detection, calibrated light curves, and matching with external catalogs. The entire pipeline code is made public. Finally, we demonstrate the pipeline performance on real data taken by LAST.

1. INTRODUCTION Sky surveys produce a large amount of data.Therefore, efficient and successful use of these datasets requires a diversity of high-quality data products that enable achieving the science goals in a timely manner.
Here we describe the first-step pipeline (out of two) developed for the Large Array Survey Telescope (LAST) and the ULTRASAT space mission (Sagiv et al. 2014;Shvartzvald et al. 2023).LAST is a new, cost-effective sky survey (Ofek & Ben-Ami 2020).A LAST node contains 48, 28-cm f/2.2 telescopes.Each telescope carries a 61 Mpix (about 6400 by 9600 pixels) backsideilluminated CMOS detector with 3.76 µm pixels.This yields a pixel scale of 1.25 ′′ pix −1 and a 7.4 deg 2 field of view (FOV) per telescope.The LAST system design and overview are discussed in Ofek et al. (2023), while the LAST science goals are reviewed in Ben-Ami et al. (2023).Among the LAST main science goals are the search for optical counterparts of gravitational-wave events (e.g., Abbott et al. 2017), exoplanets, supernovae, fast transients (e.g., Drout et al. 2014;Ho et al. 2021;Ofek et al. 2021), and the study Solar System objects.Some LAST science results are presented in Ofek et al. (in prep.), and Ho et al. (submitted).The LAST default observing strategy includes observing each field for 5 to 20 successive 20 s exposure images (a visit).This strategy has several unique advantages: (i) It allows us to screen for satellite glints (Corbett et al. 2020, Nir et al. 2020, Nir et al. 2021) which appear as point sources in a single image (due to their sub-second duration); (ii) It allows the detection of main-belt asteroids due to their motion within a single visit (400 s); (iii) It enables us to explore stellar variability on time scales as short as 20 s, which is relevant for exoplanets transiting white dwarfs, and for flare stars (see Ben-Ami et al. 2023).
With a nominal exposure time of 20 s and dead time between telescope visits, LAST produces data at a rate of 2.2 Gbit s −1 .This data rate is about 70% higher than the expected LSST data rate (assuming a 30 s exposures; Ivezić et al. 2019).Such a high data rate requires either considerable computational resources or a highly efficient data reduction pipeline.The LAST data reduction pipeline consists of two parts.The first part is responsible for processing raw images and producing cal-ibrated images, including coadded images of successive snapshots of a single visit.The second part is responsible for performing image subtraction and transient detection.In this paper, we describe the first part of the pipeline, while the second part will be described in a following publication.
In §2 we provide an overview of the main pipeline steps.The software package is discussed in §3, while in §4 we provide a detailed description of the steps outlined in §2.In §6 we discuss the main data products, and in §7 we describe the pipeline's on-sky performance.In §8 we analyze the pipeline speed performance , and §9 summarizes our conclusions.
2. PIPELINE OVERVIEW Here we briefly describe the high-level data processing steps implemented in the LAST pipeline.The following steps are carried out for each science image that is taken during the night and are fully addressed in §4 Apart from the science image data, the processing requires bias and flat-field images along with their corresponding bit mask images.The flat-field images are produced once per night.
1.A 32-bit mask image is created for the science image and the saturated pixels are flagged therein.The mask image is further populated in the following steps (see Table 1).
2. A master dark image is subtracted from the raw science image (including the darkened/overscan region).The bit mask of the dark image is propagated to the science image bitmask using the or operator.
3. The residual dark value of the global overscan/darkened pixels is subtracted from the image.
4. A non-linearity correction is applied.
5. The science image is divided by the most recent flat field image, and the bit mask of the flat field image is propagated to the science image bit mask.
7. The image is interpolated over saturated-pixels, NaN pixels, and negative value pixels.Each of the interpolated pixels is specifically flagged in the bit mask.
8. The pixel values are multiplied by the gain.
9. The image is partitioned into 24 sub-images of 1726 × 1726-pixel size, including a 64 pixels overlap between the sub-images (to avoid losing sources that fall near the edges).
After this step, all processing steps are done on the sub-images.10. Background and variance images are estimated for each image.
11. Matched-filtering4 of the background-subtracted images with a template bank of Gaussian filters with width (sigma) of 0.1, 1.0, 1.5, 2.6, and 5.0 pixels is performed.
12. The match-filtered images are searched for sources, and for each filter and each source the S/N ratio is calculated and a convolution-based Point-Spread Function (PSF) photometry flux is measured.Hence, a source catalog for each sub-image is generated.
13.The first and second moments are calculated for each of the cataloged sources.In addition, aperture photometry is performed (in three apertures) with the background and standard deviation measured in an annuli around the sources.
14.The PSF in each sub-image is estimated, and PSF photometry is performed.
15.The sources are classified into delta function (e.g., cosmic rays), and stellar PSF sources.Cosmic rays are removed from the catalog, and their positions are marked in the bit mask image.
16.The bit mask information is propagated to the source catalog.17.The astrometric solution is found for each subimage.
18.The source catalog is updated with the astrometric information.
19.The sources in the (5 to) 20 successive images of a visit are matched, and a matched source catalog containing all detected sources is produced (for each sub-image).
20. Proper motion and variability of each of the snapshot-matched sources are measured.

21.
A search for asteroids in the 20 successive images of a visit is performed.
22. The visit-matched sources are matched with external catalogs.
23.The 20 successive sub-images (with overlap) of the same position in the detector are transformed into the same reference frame and coadded.
24.A source catalog for each coadded sub-image is produced (similar to the catalogs of the individual images).
25.The astrometry of the coadded catalog is refined using the PSF-fitted source positions.
26.All the data products are saved.
The second part of the pipeline (Ofek et al., in prep) is responsible for the reference image coaddition, image subtraction, and detection of transient sources.

THE SOFTWARE PACKAGE
The LAST pipeline is a part of the AstroPack/MAATv2 package (Ofek 2014) that is being developed for the image processing system of the ULTRASAT (Sagiv et al. 2014;Shvartzvald et al. 2023) space mission.AstroPack/MAATv2 is mainly written in MATLAB, with some tools in C, C++ and Python.The code consists of over 105 lines, and we put emphasis on documentation and on the built-in testing suite (unittest).Containing a wide variety of tools, the code is designed to be efficient, modular, and flexible, with several goals in mind, including fast adaption for a wide range of pipelines and complex analysis of telescope data in a few steps.
By design, data (e.g., images, catalogs) is stored in several classes, which contain the basic functionality, while the high-level functionality is available in other specialized functions or classes.The object data containers are arrays, and therefore, most of the operations can be performed simultaneously on multiple instances of the objects.
Code efficiency is a very important requirement, as the LAST data have to be processed with limited computational resources.Therefore, employment of an off-theshelf code was not an option (see §8).
AstroPack will be described in detail in a future publication (see also Ofek 2014).However, a certain amount of documentation has already been included in a freely distributed version available at GitHub 5 .In addition, an online wiki help pages are available6 .

DETAILED DESCRIPTION OF THE PIPELINE
4.1.Dark and Flat calibration A sequence of 20 dark images, each of a 20 s exposure, are obtained with each camera, every few weeks.The dark images are combined7 using a sigma-clipped mean with a 5-σ rejection.In the same step, we also calculate the variance image of the dark measurement, and create a bit mask for the dark image.The dark image bit mask includes the following bits, described in Table 1: LowRN, HighRN, DarkHighVal, DarkLowVal, BiasFlaring.
The flat images are generated on a nightly basis.Currently, the flats are generated using twilight images (see overview paper).Each flat image is normalized by its own image median.Next, the normalized flat images are combined8 using a sigma clipped median with a 5-σ rejection.The combined flat image is divided by its own median.We also generate a variance image for the flat and a flat bit mask image.The bit mask image includes the bits FlatHighStd, FlatLowVal, NaN (see Table 1).

Bit masks
For each calibrated science image, a 32-bit mask image is generated.The bit mask is designed to indicate a wide range of states and conditions in the image, and it is also propagated to the source catalog and to the coadded images.In Table 1 we list the LAST bit dictionary, with the bit definitions.Some of the bits are measured from the raw image itself (e.g., Saturated), while some are propagated from other calibration images (e.g., LowRN).Population of some of the bits (i.e., Ghost) is not yet implemented in the pipeline, and some are required for other missions (e.g., ULTRASAT;Sagiv et al. 2014, Shvartzvald et al., in prep.).

Basic calibration
The basic calibration step9 includes populating the Saturated, and NonLinear bits, subtracting a dark image, subtracting a global median of the darkened pixels, dividing by the flat image, cropping the darkened pixel region, and multiplying the image by its gain, such that the effective gain is 1.The original detector gain is stored in the ORIGGAIN header keyword.The non-linearity correction is applied before the division by the flat image.The non-linearity of each camera is measured in the lab, and stored in a configuration file.Measurements of the non-linearity are presented in Ofek et al. (2023).
Next, the image is partitioned into 24 sub-images, sized 1726 × 1726 pixels.The sub-images contain a > ∼ 64-pixel overlap region.The overlap region is marked in the bit mask image using the Overlap bit.In addition, pixels within 10 pixels from the image edge are marked in the bit-mask using the NearEdge bit.
There are several reasons behind image partitioning: (i) Any image calibration process (e.g., astrometry and photometry), and steps related to image subtraction (e.g., registration and PSF estimation), become more accurate when the field of view decreases; (ii) Data transfer of small files is more practical for a wide range of applications; (iii) The full image is too large to fit in the cache memory of most current computers.Indeed, tests show that using small sub-images improve the speed performance, compared to full-size images; (iv) With our astrometric solution approach ( §4.7), the astrometry speed performance is improved by about two orders of magnitudes.

Background mean and variance estimation
Reliable background-mean and background-variance image estimations are crucial for source detection, photometry, and image subtraction.However, typically, it is not trivial to estimate them accurately and even their definition depends on their purpose.
Several methods of image background and variance estimation are implemented in the AstroPack software.
The default strategy for LAST images is to estimate the background and variance in 128 × 128 pixel blocks10 .In each block we estimate the mode and the variance.The mode is estimated by calculating the histogram of all the pixels, with logarithmic bins that on average contain about 100 pixels, and taking the highest value in the histogram.The variance is estimated by calculating the width of the histogram that contains 50% of the data, and converting it to variance, assuming Gaussian statistics.Next, the sparse (i.e., down-sampled) background and variance maps are interpolated and extrapolated linearly to the full image.We are currently working on improvements to this approach.Note.-This bit-mask dictionary is used in all bit mask images and FLAGS columns in source Tables.The Used column indicates if the bit is used in the current version of the pipeline.Some bits are not used because they are not relevant for LAST (e.g., GainHigh), or we have not yet developed the required algorithm (e.g., Ghost).The SrcNoiseDominated bit is populated only in the Coadd image.An order of magnitude estimate, based on several examples, of the fraction of pixels with the LowRN, HighRN, DarkHighVal, DarkLowVal, BiasFlaring, and CR DeltaHT bits on are about: 1.6 × 10 −5 , 0.0009, 0.0008, 0.0002, 0.0009, and 2 × 10 −5 , respectively.

Source detection and measurements
The source detection algorithm is based on matched filtering with a template bank, after subtracting the local background.The default template bank includes five Gaussian filters with σ-width of 0.1, 1, 1.5, 2.6, and 5 pixels.The 0.1 pixel filter is supposed to mimic a delta function, and it is used for cosmic ray detection via hypothesis testing (e.g., Zackay et al. 2016).The 1.0 and 1.5 pixel filters are for (stellar) point-like sources, while the wider filters are appropriate to cover seeing variations, detection of extended sources, and optical ghosts.Figure 1 presents the theoretical information overlap of all the filters we are using as a function of a source σwidth.The information loss is less than 10% for sources with FWHM in the range of 1.7 to 16 pixels (2.1 ′′ to 20.6 ′′ ).
The matched-filter map is normalized by the square root of the filtered-image variance11 .Next, the pixelby-pixel maximum over all matched-filtered images is obtained, and we search for local maxima in the comcorrelated) and multiply this StD by q P 2 q , where Pq is the value of the unit-normalized PSF at position q.-The theoretical information-overlap of all the Gaussian filters we are using for source detection, as a function of the source actual σ-width.The dashed lines are for individual filters (colorcoded) (i.e., 0.1, 1, 1.5, 2.6, and 5 pixels), while the solid line is for the maximum over the range.The delta-function filter (of the 0.1 pix width) is shown by the peak on the left.The information content using the formula in Zackay & Ofek (2017a).The meaning of the information content loss is how well the filtering process can detect a source whose PSF is perturbed (i.e., different FWHM) compared with the filter PSF.The information loss is proportional to the variance and hence to the (S/N ) 2 .bined matched-filter image12 .Any peak, in one of the five matched filter images, is declared as a source candidate, if its highest value is above a threshold level of 5 σ.We note that we normalize the matched filter image by the StD of the filtered image13 such that the resulting image is in units of standard deviation (e.g., Zackay & Ofek 2017a).
Given a list of integer-pixel positions of possible sources and their S/N using each of the filters, we calculate the Gaussian-weighted first and second moments for each source.Next, a convolution-based PSF photometry of the source candidates is performed.This is done by normalizing the filtered image to flux units (see the formulae in Zackay & Ofek 2017a), and reading the flux value at the whole pixel position of the source.Aperture photometry is done in three apertures of 2, 4, and 6-pixel radius.The aperture photometry is calculated by FFT-shifting the sources to the pixel origin14 , and summing all the pixels within some radius.Although the apertures have a pixelated shape, thanks to the FFT-shift it is consistent (i.e., has the same shape, relative to the star position) for all the sources.In addition, for each sub-image, we also estimate the PSF and use it to perform PSF-fit photometry (see §4.6).One disadvantage of this approach is that in crowded regions it overestimates the background and variance, and therefore misses real sources.To correct this, in the future, we will replace this routine with an iterative PSF measurement and subtraction similar to the approach implemented in DAOphot (Stetson 1987).
The first and second source moments are calculated using an iterative algorithm.In the first iteration, we assume a flat weight function, and then we narrow down the weight function (a Gaussian of some finite width) from iteration to iteration, until the σ-width of the Gaussian weight function reaches 1.5-pixels at the final iteration.Our experimentation suggests that this algorithm converges faster, compared to a non-variable weight function, and results in more accurate estimations15 .A full list of the columns listed in the images catalog is presented in Table 2.

PSF-fit photometry
For each sub-image we also estimate the PSF and fit it to all the sources.The PSF is estimated by selecting isolated sources with 500 > S/N > 30.We cut small stamps around each source, and all the source stamps are aligned using a sub-pixel fft shift16 , and coadded using a sigma-clip mean, with 5-σ rejection17 .The wings of the pixelated PSF are smoothed by forcing the PSF wings above a radius of 5 pixels to decay exponentially18 .
Next, we perform a PSF-fit photometry19 using the estimated PSF.This is done using the following algorithm: The first-moment position is used as the initial guess for the PSF position.In each iteration, we calculate the two-dimensional gradient and second derivative of the χ 2 around the current position, and use this gradient and second derivative to extrapolate the position of the local χ 2 minimum.However, to avoid large jumps induced by noise, we limit the maximum step size to about 0.1 pixels.When we calculate the χ 2 , for each source position, the PSF flux is fitted as an independent parameter.In the future, we plan to replace this routine with a multi-iteration PSF-fitting function, that has better performance in crowded regions.

Astrometry
The astrometry code is based on some general ideas from Ofek (2019).However, the code was rewritten for modularity and speed considerations (see also Corbett et al. 2022).Specifically, we required improvements in speed performances by over an order of magnitude, compared to other tools.This is achieved by using two kinds of astrometry routines: one for solving the astrometry from scratch, including pattern matching 20 , and the second for refining an existing astrometric solution 21 Since pattern matching is relatively expensive, we run it for one or two sub-images of the first epoch in the visit, while for all other sub-images and epochs in the visit we refine the astrometry based on the solution extrapolated from the nearby sub-images.
In each sub-image we first fit an affine transformation, and the residuals are fitted with a 3rd-order polynomial in X and Y.Although second-order polynomials can fit the smooth atmospheric refraction to better than about 1 mas over our sub-image field for an altitude > 30 deg above the horizon, we use higher (3rd) order polynomials in order to capture some of the distortions induced by the atmospheric scintillation (Lindegren 1980;Shao & Colavita 1992;Ofek 2019).
An additional feature of our code is the ability to query for the reference catalog once, and use the same reference catalog for all other 19 sub-images of the same visit.Currently, the astrometry is done against the GAIA-DR3 catalog (Gaia Collaboration et al. 2016, Gaia Collaboration et al. 2021, Lindegren et al. 2021), and includes the sources' proper motions, and rejects sources with large astrometric residuals.

Photometric calibration
Each sub-image is photometrically calibrated against the GAIA-DR3 catalog (Gaia Collaboration et al. 2021).Currently, the calibration procedure is as follows.We match the sources in each sub-image to the GAIA-DR3 catalog.Sources with measured photometric error better than 0.01 mag, and S/N < 1000 are selected.Next, we fit the following equation where m s is the instrumental magnitude for source s, G s B p,s R p,s are the GAIA G, B p , and R p AB magnitudes 22 , respectively, for source s, C is the median color 20 Using imProc.astrometry.astrometryCore. 21Using imProc.astrometry.astrometryRefine.
22 GAIA-DR3 Bp Vega−AB magnitude is −0.0155 mag.for sources in the field, and α 1 is the zero point and α 2 is the color coefficient.The fitted value of these free parameters, as well as C, are stored in the header (see Table 3).The photometric calibration is calculated using the MAG PSF magnitude 23 , and then it is applied to all the magnitudes (all the apertures, convolutions, and PSF photometry).The magnitude type used in the calibration process is indicated in the header keyword PH MAGT (see Table 3).This means that, excluding the MAG PSF, our photometry may suffer from aperture correction errors.Currently, this is not corrected and is the responsibility of the user.In future versions, we may correct for this effect.
When reporting the calibrated photometry in the catalog we apply the image zero point (α 1 ), but we set α 2 to zero.The reason for this is that the color is not necessarily known for all sources.Therefore, the reported magnitudes are in the LAST native AB system.
Based on the photometric calibration, we estimate the image limiting magnitude for source detection.This is done by fitting the log 10 (S/N ) vs. the star magnitude, and color, and reading the best-fit magnitude value at S/N = 5 and B p − R p = 1 mag.The limiting magnitude, as well as other parameters, are added to the image headers (see Table 3).We note that the fitted color terms are correlated with the radial distance from the field center.
A new method to perform the photometric calibration is being developed and will be included in future versions of the code.4.9.Interpolation over masked pixels 23 If not available then using the MAG APER 3 magnitude insted In order to make sure that the coadd images are clean and to make the photometry more reliable, we interpolate over some pixels which have problematic bitmask values.All the interpolated pixels are marked as such in the bit mask (using the Interpolated bit).The main rationale behind this step is that, first, any convolution operation will be less affected by sharp features (e.g., cosmic rays) that can cause long-range ringing artifacts, and, second, photometric measurements will become more reliable.
The interpolation is done over pixels with one of the following bits on: Saturated, DarkHighVal, NaN, and Negative.In the caption of Table 1 we quote averages fractions of pixels with one of the bad pixel indicators.The interpolation algorithm is described in §5.2.

Merged catalogs: 20 images light and position curves
The sources in each group of 20 catalogs of each subimage in a single visit are matched using the following scheme: The sources in the i (∈ 2..20)-image are matched against the first image sources with a 3-arcsec search radius.Sources in the i-th image that have no counterparts are added to the first image source list.The result is a master list of sources that appear in at least one image.
Next, each source in this list is matched against all the sources in all the epochs of the visit, again with a 3arcsec search radius.For each property of interest (e.g., Right Ascension, Magnitude; see Table 5), we populate a matrix of all the measurements in all the epochs and for all the sources.Properties that are not available for some source/epoch are set to NaN.These matrices are described in Table 5 and are saved in HDF5 files.
In addition, a matched sources catalog is saved (in a Note. -Special image header keywords added during the image processing.The four blocks of the table refer to photometric, image background, astrometry, and image quality keywords (top to bottom).AST ERRM is calculated from the asymptotic rms divided by the square root of the number of sources (AST NSRC).It is not the positional error, but rather the uncertainty in the knowledge of the coordinate system reference point compared to the GAIA/ICRS coordinate system.The Global motion (GM ) keywords are added only for the coadd images.FWHM, MED A, MED B, MED TH are populated only in the coadd images.

FITS binary table
) with a row per matched source.Also calculated and added to this catalog are proper motion fits ( §4.11 and §4.12), and a variability estimator for all the sources (see §4.14).The matched sources table columns are described in Table 4.

Asteroids search
Before we describe specific channels for detecting asteroids, we briefly review our overall strategy.We search for asteroids in the LAST images using several methods.each method is designed for asteroids with different angular speeds: (i) Asteroids moving faster than about 1 ′′ s −1 , will be streaked and their S/N will be diluted by a factor of > ∼ 5.These asteroids are best identified using streak detection methods.Even with the fast and optimal algorithm presented in Nir et al. (2018), streak detection is resource-expensive, and we are currently developing a more efficient approach.
(ii) Asteroids having angular speeds below 1 ′′ s −1 and above ≈ 0.06 ′′ s −1 will be identified as matched sources ( §4.10) with less than four appearances.Such asteroids will lack proper motion measurements in the matched source tables (Table 4).We detect these asteroids by fitting a linear motion to objects in the matched sources catalog that have no proper motion fits.Such objects are named orphans (i.e., detected and matched in 1 to 3 exposures, within 3 ′′ ).The search algorithm fits 2-D linear motion between all possible pairs of orphans.To do this efficiently, we populate a k-d tree (e.g., Press et al. 2002) with the four parameters of the fitted linear motion.Finally, we search for orphan sources that have common linear motion parameters.This algorithm 24 (similar to Denneau et al. 2013) is efficient and its run time scales like N 2 , where N is the number of orphans.On a single processor, it takes about 1 s to match about 1000 orphan sources.This functionality is being tested and will be incorporated into the pipeline in the future.
(iii) With our typical astrometric precision (see §7.1), asteroids with angular speed between about 0.06 ′′ s −1 and 3 × 10 −4 ′′ s −1 , will be detected by the proper motion fit to the visit exposures.This method covers a large fraction of the main-belt asteroids and is described in §4.12.
(iv) Asteroids with angular speeds slower than about 0.005 ′′ /s will be a point source in the coadd images.A search for asteroids in the coaddition of 20 images can detect asteroids fainter by about 1.4 mag, compared to asteroids detected in individual images.These asteroids are searched for after image subtraction, using the same approach as method (ii).
Figure 2 shows the apparent magnitude vs. angular speed of all the currently known numbered and unnumbered minor planets 25 .Also shown, in horizontal lines, are the angular speed regions discussed previously.The two vertical lines show the single exposure and visit lim-24 Implemented in AstroPack imUtil.asteroids.pairsMotionMatchKDTree.

Asteroid index
Note. -There is one merged catalog table per sub-image per visit.This table contains a row per source with information gathered from all the epochs in a visit.In column names starting with Epoch, the epoch index j is saved with 3 digits (zero padded).Asteroid index -is a number indicating if a source was identified as an asteroid using the proper motion fit.NaN -indicates, not an asteroid.A positive index indicates an asteroid that was linked to another source in the merged catalog and all the linked sources have the same index.Negative numbers indicate a possible asteroid that is not linked (i.e., appears as a single proper motion fit).The magnitudes in the merged catalog are the calibrated magnitude (i.e., no relative photometry is applied).
iting magnitude of LAST.4.12.Moving sources search using proper motion in the visit exposures The positions of each source that appears more than three times are fitted with two models: a stationary position hypothesis (H 0,pm ), and a linear proper motion hypothesis (H 1,pm ).The positional errors are normalized (by a multiplication factor) such that the χ 2 per degree of freedom of H 0,pm is one.Next, the Student's-t statistic (assuming unknown StD) is calculated by the proper motion value multiplied by the standard deviation of the number of epochs and divided by the standard deviation of the residuals of the proper motion fit.The Student'st statistic is converted to the probability to reject the null hypothesis (no motion) using the Student's-t distribution function.Although, the probability to reject the null hypothesis should not depend on the number of observations, so we use an adaptive threshold for finding asteroids.The reason for this is that the number of trials (look elsewhere effect) decreases when the number of epochs increases.In addition, an adaptive threshold is important in order to deal with outliers.Specifically, if the number of observations is above three, we require a false alarm probability of 10 −4 , while if the number of observations is larger than five, we require a false alarm probability of 0.005.
Other criteria include selecting sources, where the standard deviation of all the S/N measurements is larger than 0.4.The reason for this is that the expectation value of the StD of the S/N is one, and sources with abnormally low StD of S/N usually indicate the presence of bad pixels.We also remove sources having one of the following bitmask values: Saturated, Spike, Note.
-List of datasets in the HDF5 files containing the merged catalog matrices.Each dataset contains one merged matrix of epoch vs. sources, for one measured property (e.g., Right Ascension, Magnitude).There is one merged matrices HDF5 file per sub-image per visit.Unlike in all the other data products, all the magnitudes in this data products are relative photometry calibrated ( §4.13), after the photometric zero point calibration (see §4.13).The relative photometry is done using the MAG APER 3 magnitudes, and the relative zero points are applied to all other magnitudes (ignoring aperture corrections).CR DeltaHT, CR Laplacian, CR Streak, Streak, Ghost, Persistent, NearEdge, and for sources with S/N < 8, we also require that the DarkHighVal, BiasFlaring, FlatHighStd, HighRN bits are off.
Next, moving sources are linked by matching their origin position (i.e., with the fitted position at the visit's central epoch) to be consistent within 3 arcsec.

Relative photometry visit light curves
An important feature of LAST is its visit-observing strategy, where for each field 20 × 20 s exposures are obtained, within a single 400 s time window.This, for example, allows for searching for exoplanets around about 10 5 white dwarfs and for flare star studies (Ben-Ami et al. 2023).To achieve these goals, the 20 epochs light-curves are searched for variability.
We apply a relative photometry zero point.In contrast to the analysis of the light curves generated over a larger number of epochs that requires more sophisticated algorithms (e.g., Ofek et al. 2011), here we use a simple and fast algorithm.Specifically, we select all the sources that appear in all the epochs and have median photometric errors smaller than 0.02 mag.We calculate the magnitude difference between the relative zero points of successive images.We calculate the median of magnitude differences for all stars with mean magnitude error smaller than 0.02 mag.The differences between successive median-magnitude differences are added to the images (skipping the 1st image).The result is a magnitude system that is anchored to the photometric calibration of the first image.Next, we apply the zero points to all the calibrated magnitudes, to get the relative photometry magnitudes, for all the sources.Note that the relative photometry zero points are calculated using only the 6-pixel radius aperture, and are applied to all the photometry.Therefore, aperture correction biases may exist, and the relative photometry may be further improved.These relative photometry light curves are reported in the merged-catalogs matrices (Table 5), while the photometry listed in the merged-catalogs table (Table 4) is before the relative photometry step (i.e., only photometric calibration is applied).

Variability search
Based on the relative photometry visit light curves, we search for variable sources using several variability indicators: (i) For each source we calculate the photometry χ 2 , relative to a constant flux model, and normalize the errors such that the χ 2 per degree of freedom is unity.Next, we fit a 5-th degree polynomial and calculate the χ 2 for this fit.We report in the merged catalog (Table 4) the ∆χ 2 between the 0-order polynomial fit χ 2 and the 5-th order polynomial fit χ 2 (i.e., PolyDeltaChi2 column).(ii) For each source we calculate the StD and a robust StD.In the future, we may consider adding a matched-filtering search based on a predefined template bank.

Cross matching with external catalogs
The matched catalogs and coadd image catalogs are cross-matched with external catalogs.To that end, we have generated a catsHTM catalog (Soumagnac & Ofek 2018) of the catalogs listed in Table 6.Also listed in this table, are the matched radius distance used for each catalog, and a column containing a bit mask integer indicating to which external catalog each line belongs.For each source in the merged catalog (Table 4), and in the coadd catalog, we provide a bit mask which is the or operation between all the bit mask integers of all the matched external sources that were matched to the source (i.e., MergedCatMask column in Table 4).

20 images coadd and analysis
The successive exposures in a visit are registered using the images' World Coordinates System (WCS) and coadded.The coaddition is done using a 5-sigma clipping, with no weights and no background subtraction.The rationale here is that, in the majority of cases, these 20 images are affected by similar sky variance, and have similar PSFs.Therefore, the information loss is negligible (see Zackay & Ofek 2017a, Zackay & Ofek 2017b).Next, the coadd image background and variance are estimated, sources are identified, and the astrometry is refined.All these steps are similar to those described in §4.4- §4.9, and §4.15.

ADDITIONAL DETAILS
In this section, we several topics that, as far as we know, are not standard procedures, and thus may be of special interest to the community.

The flux-annulus-background bias
The common procedure for estimating the local background and variance around sources is to measure these properties in an annulus of a constant radius, around each source.In this case, the amount of light that leaks from a source into the annulus depends on the source flux.In Figure 3 we present the measured annulus background vs. aperture photometry (in a 6 pix radius) in a random LAST image.A clear bias in the background level is detected.This bias will tend to overestimate the background level around bright sources and therefore decrease their measured (background subtracted) flux.The observed amplitude of this effect is larger than expected from a pure Gaussian PSF and may depend on the quality of the optical system (aberrations and scattered light).Figure 4 shows the estimated relative bias in photometry, as a function of flux, for two apertures.
Currently, the photometry is not corrected for this effect, but offline tools are available in order to estimate the correction as a function of flux and to apply it to data.The estimator is generated by binning the flux vs. annulus background distribution, in 0.3 dex bins, and interpolating.

Interpolation Algorithm
As described in §4.9, some pixels are interpolated over.The interpolation algorithm is efficient and uses the following scheme.We select a stamp around the interpolated pixel.We prepare a Gaussian kernel, in which the central pixel is set to 0. The modified Gaussian kernel is normalized to unity, multiplied by the image stamp, and integrated over the stamp (i.e., local convolution).The result of the local convolution replaces the value of the interpolated pixel 26 .
6. DATA PRODUCTS In this section, the primary data products produced by the pipeline are described.Additional data products that are produced by the second (image subtraction) pipeline, and by additional post-processing (e.g., light curves) are not discussed here.In §6.1 we describe the naming conventions we use for all the data products.Next, we describe the data products according to the following categories: single exposures ( §6.2); matched cat- Note.-List of external catalogs matched to sources in the merged catalogs and coadd image catalogs.For each source, a 32 bit integer mask indicates (one bit per external catalog) in which external catalog there is a possible counterpart of the source.For each external catalog, a different search radius is adopted.The GLADE+ version we use was last updated on 2022-Jul-10.The NED redshift catalog was last updated on 2018-May-02.alogs ( §6.3); asteroids ( §6.4); and coadded images ( §6.5).The pipeline I data products are summarized in Table 7.

The naming convention
All data products follow the same naming conventions, with the same file name format.The file names are unique and they are concatenated from several strings separated by underline (" ").The rationale for this convention is that all file names have the same structure and could be analyzed using a single routine that uses simple regular expression commands.If some sub-string is not relevant to the file, then it appears as an empty string.In this case, two (or more) successive underlines (e.g., " ") will appear in the file name.The strings, by their order of appearance in the file names, are: • Project Name -Project/telescope name.For LAST we use LAST.Node .Mount .Telescope .
where Node is the node index (1 for the first node in Neot-Smadar, Israel).Mount for the telescopemount index (e.g., 1 to 12).Camera, for the camera on mount index (e.g., 1 to 4).
• Time -UTC date and time in format YYYYMMDD.HHMMSS.FFF.
• Field -The field ID.For LAST the field ID is a string of the format ddd±dd, indicating the RA and Dec in decimal degrees.
• Counter -Image counter.For LAST the image counter is usually 1 to 20, indicating the index of the image in the sequence of 20 exposures in a visit.
• CCDID -Detector ID.For LAST this is always 1.
• CropID -Index of the sub-image (1 to 24).0 or empty string is reserved for the full image.
• Level -One of the following strings describing the level of processing: raw -A raw image.
proc -A single processed image.
coadd -The coadd image of the visit or any other sequence of images.
merged -Catalogs based on data from multiple epochs.
ref -A reference image.
calib -A calibration image.
• Product -One of the following keywords describing the file content: -Image -Image data.
-Mask -A bit mask image.
-PSF -A PSF image, or cube.
-Cat -A Catalog of sources detected in the image.Note.
-Summary of all data products.#/visit/tel is the number of files produced per visit (assuming 20 epochs in a visit) per one telescope.Availability indicates the currently estimated lifetime of the product before deletion.Individual images can be saved from deletion on demand.Note that the sub-images and their corresponding catalogs include overlapping regions.
-TransientsCat -A Catalog of transient candidates.
-MergedMat -A merged matrices of sources detected in multiple epochs of the same field.
-Asteroid -A file containing information on asteroids detected in the image.
-Evt -A photon event file.

Individual images
The individual image data products include the image itself, the corresponding bit mask image, the PSF, and a catalog (Product: 'Image', 'Mask', 'PSF', and 'Cat', respectively).The images are saved in FITS format, while the catalogs are in FITS binary table format.The image headers contain additional information, like the limiting magnitude, the sky brightness, the fitted photometric zero points, the astrometric World Coordinate System (WCS), and parameters regarding the weather, enclosure, and mount (see Table 3).A single visit of a single telescope yields 1920 files (20 instances of 24 sub-images, each with 4 data products: ['Image', 'Mask', 'PSF', 'Cat']).

Visit matched sources
The sources in all the catalogs of a single sub-image in a visit are matched and two data products are generated: (i) A FITS binary table containing the matched sources and some properties including the proper motion fits and variability indicators (see Table 4 for columns description); (ii) A matched source matrices file, including an epoch vs. source matrix for selected properties like magnitudes and positions (see Table 5 for matrices description).This data product is saved in an HDF5 file with multiple datasets.Each dataset corresponds to a different property (e.g., 'RA', 'MAG APER 3'), and each such dataset contains a matrix (epoch vs. source).The magnitudes in these matrices are the calibrated magnitudes but after a relative photometry step.A single visit of a single telescope yields 48 files (24 sub-images, with 2 data products).

Asteroids
Data on asteroid candidates are saved with their metadata: linear motion fits, source matching, and image cutouts are stored in MATLAB (mat) files.For each field (all sub-images), a single mat file is saved, where the sub-strings in the file name are set to: Type='science', Level='proc', Product='Asteroids'.

Coadd images
The images and the bit mask of each visit are coadded.The main products of this step are the coadded image, the coadded bit mask, the catalog of the coadded image, and a PSF of the coadded image.These data products are similar to those generated for individual images.A minor difference is that the source-dominated noise bit is populated only for the coadded images.This step ends up with 96 files per visit per telescope (24 sub-images, with four data products).

ON SKY PERFORMANCES
Here we present some initial results obtained using the LAST pipeline on the data collected by the LAST system and LAST prototype telescopes.These results mainly demonstrate the pipeline performances, while the camera properties and site-related parameters are discussed in Ofek et al. (2023).

Astrometric precision
In Figure 5 we show the two-axes astrometric residuals, compared to GAIA-DR3, as a function of star magnitude for a single LAST image, while in Figure 6 we show the same for a coadd image.Following Ofek (2019) we define the asymptotic rms by fitting a 3rd order polynomial to the rms vs. magnitude distribution, and choose the minimum of the fitted function that is found between the saturation limit and the faintest sources.
The improvement in astrometric accuracy between a 20 s image and a coaddition of 20, 20 s, images is only a factor of about two, instead of √ 20.This likely indicates that our image registration process adds astrometric noise, and we are investigating this for future improvement.However, more precise astrometry is possible by averaging astrometric measurements obtained in different images.Figure 7 shows the relative astrometry rms in declination as a function of B p magnitude, calculated also by averaging the astrometry of individual images.Dots with different colors represent various binning of successive images.Dark blue (Bin1) is for no binningi.e., precision obtained on a 20 s time scale; Blue (Bin4) is for binning of four points (80 s time scale); and green (Bin16) is for binning of 16 points (320 s time scale).At least up to a binning of 16 epochs, the relative astrometry seems to improve roughly like the square root of the number of epochs.

Photometric calibration
The LAST data is calibrated against the GAIA catalog ( §4.8), and all the magnitudes are in the native LAST AB system (Oke & Gunn 1983;see §4.8).The current LAST pipeline uses a simple calibration method ( §4.8), while a more sophisticated algorithm is under development.In Figure 8 we present the absolute value of the residuals between the fitted and measured magnitudes, for a single LAST exposure, as a function of magnitude.The typical calibration is good to about 1.5% at the bright end.

Relative photometry and variability
The visit successive exposures of the same field are calibrated using a relative photometry technique ( §4.10).In Figure 9 we present an example for the rms vs. GAIA B p magnitude derived from relative photometry calibration and measured over 200 successive images.This is done using aperture photometry with a 6-pixel radius, and hence its performance on faint objects is not optimal.Dots of different colors represent various binning of successive images.Blue (Bin1) is for no binning -i.e., the precision obtained on a 20 s time scale is shown; Orange (Bin4) is for binning of four points (80 s time scale); and yellow (Bin16) is for binning of 16 points (320 s time scale).On time-scales of 60-min, we consistently achieve a precision of 3-4 milimag per 20 s exposure.Averaging over larger time scales (i.e., 80 s, 320 s) improved the precision.At least up to binning of 16 epochs, the relative photometry seems to improve roughly like the square root of the number of epochs.The lines agree well with the theoretical Poisson noise plus about ∼ 1 millimag of noise (in 20 s exposures).The extra noise is likely dominated by scintillation noise (e.g., an average over 16 epochs (yellow; Bin16).The photometry is based on the 6-pixel radius aperture photometry.Therefore, its performances are sub-optimal at the faint end.Young 1967;Osborn et al. 2015), and flat field errors.

Coaddition image depth
The limiting magnitude of the coadd images is stored in the coadd image header.From several nights of data we verified that, as expected (Ofek et al. 2023), the 20 exposure coadd image depth is about 1.4 magnitude deeper than of the individual exposures.The difference from the expected improvement of 2.5 log 10 √ 20 ∼ = 1.6 mag, is due to readout noise.
8. TIMING An important requirement for the LAST pipeline is its efficiency.The main reason for this is that in order to simplify the observatory design we would like to use limited computational resources.Specifically, a LAST node uses 24 high-spec desktop computers, each with 30 cores and 256 GB RAM.Therefore, we put efforts into the optimization of the code and minimization of its run time.Performance improvements, relative to, e.g., the original code presented in Ofek (2014), have been achieved in many aspects of the code, including astrometry, PSF fitting, source matching, and background estimation.For example, the source finding and measuring routine is about 30 times faster than SExtractor (Bertin & Arnouts 1996), when running on the same images with the same computational resources.Another example is that the FITS writer function's run time on test images is about three times shorter compared to an implementation using CFITSIO (Pence 2010).
The resources required for running the second-step pipeline are considerably less than those required for the first step pipeline, described in this paper.Therefore, from a design point of view, they do not pose a big challenge in terms of computation time.
Figure 10 shows a profiling flame graph for processing a single visit from a single telescope.The computational load is dominated by saving of the data products (not shown in the Figure ), and by arithmetic calculations on large matrices, while the overhead from the objectoriented design and data management is small.
A few examples: processing of the 480 individual subimages in a visit, of a high-Galactic-latitude field, takes 50% of the run time, while working on the merged catalogs and coadd images take about 30% of the run time, and 8% of the run time is spent on writing the data products.For a high-Galactic-latitude and a low-density field, the run time of the pipeline on a single visit is less than 5 min.However, for very high-density fields the run time can be significantly longer.Therefore, we are putting efforts into making the code even more efficient.9. CONCLUSION We presented the Large Array Survey Telescope pipeline-I, which is responsible for processing a single LAST visit -e.g., N exposures of a single visit of a single field.
The LAST observational strategy provides an opportunity to study the short-term variability of celestial objects, and to screen a large fraction of asteroids and satellite glints in a single visit.Our pipeline takes advantage of this strategy to yield data products that will allow us to explore short-time scale variability and motion of the objects of interest.
A challenge related to any pipeline that handles a high data rate, is that the number and size of data products are overwhelming.For example, a single LAST telescope produces about 2000 data product files for each visit (400 s).In order to avoid excessive read times and storage costs for data files, we avoid using ASCII tables, and our data is kept exclusively in binary files.Furthermore, given that opening, reading, and closing data files is an expensive operation, it is desirable to store information in a diversified manner, and store smaller data files with specific products.
We plan to release the LAST data products periodically.These data releases will include processed images and catalogs described in this paper and in the forthcoming paper II.Since the LAST pipeline is constantly updated, we are keeping a live version of this document27 , as well as online wiki pages describing the tools used by this pipeline28 .E.O.O. is grateful for the support of grants from the Willner Family Leadership Institute, André Deloro Institute, Paul and Tina Gardner, The Norman E Alexander Family M Foundation ULTRASAT Data Center Fund, Israel Science Foundation, Israeli Ministry of Science, Minerva, BSF, BSF-transformative, NSF-BSF, Israel Council for Higher Education (VATAT), Sagol Weizmann-MIT, Yeda-Sela, and the Rosa and Emilio Segre Research Award.V.F.R. acknowledges support from the German Science Foundation DFG, via the Collaborative Research Center SFB1491: Cosmic Interacting Matters -from Source to Signal.
Fig.1.-Thetheoretical information-overlap of all the Gaussian filters we are using for source detection, as a function of the source actual σ-width.The dashed lines are for individual filters (colorcoded) (i.e., 0.1, 1, 1.5, 2.6, and 5 pixels), while the solid line is for the maximum over the range.The delta-function filter (of the 0.1 pix width) is shown by the peak on the left.The information content using the formula inZackay & Ofek (2017a).The meaning of the information content loss is how well the filtering process can detect a source whose PSF is perturbed (i.e., different FWHM) compared with the filter PSF.The information loss is proportional to the variance and hence to the (S/N ) 2 .

Fig. 2 .
Fig. 2.-The apparent magnitude vs. angular speed of all known numbered and unnumbered minor planets in some arbitrary epoch.Also shown in horizontal lines are the angular speed regions discussed in §4.11.The two vertical lines show the single exposure (19.4), and visit (21.0) limiting magnitude of LAST.
Fig. 3.-The measured annulus background vs. aperture photometry (in a 6 pix radius) in a random LAST image.Saturated sources are marked as blue points.

Fig. 6 .
Fig. 5.-Astrometric two axes rms vs. Bp magnitude for a random single 20 s exposure LAST image.

Fig
Fig.7.-The one-axis Declination rms of the relative astrometry solution, measured over 100 epochs of 20 s exposures, vs. the GAIA Bp magnitude.The color represents single epochs (dark blue; Bin1), an average over 4 epochs (blue; Bin4), and an average over 16 epochs (green; Bin16).
Fig.9.-The rms of the relative photometry solution, measured over 200 epochs, vs. the GAIA Bp magnitude.The color represents single epochs (blue; Bin1), an average over 4 epochs (orange; Bin4), an average over 16 epochs (yellow; Bin16).The photometry is based on the 6-pixel radius aperture photometry.Therefore, its performances are sub-optimal at the faint end.