TRANSLIENT: Detecting Transients Resulting from Point-source Motion or Astrometric Errors

Detection of moving sources over a complicated background is important for several reasons. First is measuring the astrophysical motion of the source. Second is that such motion resulting from atmospheric scintillation, color refraction, or astrophysical reasons is a major source of false alarms for image-subtraction methods. We extend the Zackay, Ofek, and Gal-Yam image-subtraction formalism to deal with moving sources. The new method, named the translient (translational transient) detector, applies hypothesis testing between the hypothesis that the source is stationary and that the source is moving. It can be used to detect source motion or to distinguish between stellar variability and motion. For moving source detection, we show the superiority of translient over the proper image subtraction, using the improvement in the receiver-operating characteristic curve. We show that in the small translation limit, translient is an optimal detector of point-source motion in any direction. Furthermore, it is numerically stable, fast to calculate, and presented in a closed form. Efficient transient detection requires both the proper image-subtraction statistics and the translient statistics: When the translient statistic is higher, then the subtraction residual is likely due to motion. We test our algorithm both on simulated data and on real images obtained by the Large Array Survey Telescope. We demonstrate the ability of translient to distinguish between motion and variability, which has the potential to reduce the number of false alarms in transients detection. We provide the translient implementation in Python and MATLAB.

1. INTRODUCTION Detection and measurement of sub-pixel motions of sources in astronomical images is required for two major science cases: (i) Detecting motion due to any kind of astrometric noise, or astrophysical motion, that leads to subtraction artifacts seen in image differencing methods, and in turn, leads to false alarms in transient detection; (ii) Measuring astrophysical motion, including proper motion and parallax measurement, detecting astrometric binaries (e.g., Malkov et al. 2012), astrometric microlensing (e.g., Paczyński 1998;Gould 2000;Lu et al. 2016;Sahu et al. 2017;Ofek 2018), lensed quasars (Springer andOfek 2021a, Springer andOfek 2021b), binary quasars (e.g., Liu 2015, Shen et al. 2019), and studying the black hole in the center of our Galaxy (e.g., Ghez et al. 2005;Gillessen et al. 2009).
Although point-spread function (PSF) fitting (e.g., Stetson 1987;Schechter et al. 1993) is an excellent approach for measuring point source positions, and hence, their motion, it is biased in the presence of a complicated background (e.g., in galaxies).Due to their ability to work in complicated background regions, image subtraction algorithms are the method of choice for transient detection (e.g., Phillips and Davis 1995;Alard and Lupton 1998;Bramich 2008;Zackay et al. 2016).Historically, image differencing resulted in subtraction artifacts, non-Gaussian noise, and excess noise compared to the expectation.Some of these problems were solved by the Zackay et al. (2016) (ZOGY) algorithm, which provides an optimal, anti-symmetric, fast, with correct noise propagation, and numerically stable solution to the transient detection problem.Indeed, the application of ZOGY results in a reduced amount of false alarms, sometimes by two orders of magnitude, compared to older methods.
However, some of the assumptions in the derivation of all the existing methods are still not accurate (see discussions in Zackay et al. 2016).One such assumption, which is the subject of this paper, is that the reference image and the new image are perfectly registered.The breakdown of this assumption is yet another leading reason for the presence of false alarms in transient detec-tion.In practice, the breakdown of this assumption is unavoidable for several reasons: (i) For ground-based observations atmospheric scintillation shifts the position of stars by an amount typically larger than the measurement error due to the Poisson noise.Furthermore, these shifts are correlated within the < 1 arcmin-size isoplanatic patch, meaning that in practice the stars' positions are shifted, (almost) randomly, with respect to their mean position.(e.g., Lindegren 1980;Shao and Colavita 1992;Ofek 2019).(ii) Color refraction in the atmosphere or in a telescope, along with the variance of stars' colors, introduces variations in the relative positions of stars, as a function of the airmass or the focal plane position, respectively (see e.g., Zackay et al. 2016).(iii) Poisson errors are limiting the accuracy of image registration; and (iv) Proper motion of astrophysical objects.Zackay et al. (2016) suggested two methods to deal with these astrometric errors.The first method is to propagate the measured (global or regional) astrometric noise of the image into the image subtraction formula.This approach does not attempt to measure motion, but only to ignore flux residuals which may be due to shifts in source positions between the new and the reference images.Zackay et al. (2016) demonstrated the ability of this approach to deal with astrometric errors.A disadvantage of this method is that it is sub-optimal, and requires knowledge about the local astrometric errors8 .If for example, the displacement of a source between the new and reference images is larger than the assumed astrometric noise, the displaced source will likely generate a detection in the subtraction image.The second suggestion made in Zackay et al. (2016), was to model the flux residual due to a specific astrometric shift in the position of a source by modifying the PSF of one of the images to account for the astrometric misalignment.This suggestion has not been derived and implemented as of yet and is the subject of this paper.
In the current work, we derive a formalism for the detection and measurement of a moving point source on a general static background.Additionally, we prove that in the small translation limit, our detector is an optimal detector of point source motion in any (non-specific) direction.This property is what essentially enables the employment of the discussed method as a blind detection scheme and not as a post-fitting scheme.We also discuss the extent to which small translations can be distinguished from local flux changes and find a set of nondegenerate observational parameters for this task.We refer to the translational transient detection method presented below by the name translient and to transient events arising from point source motion as translients (translational transients).This paper deals with deriving the formalism and testing the new method on simulated and real data.
The structure of the paper is as follows: In §2 we develop a statistical image formation model, derive an optimal statistic to perform hypothesis testing in the motion-only scenario, and derive the likelihood of observing a particular difference image when both motion and a change of flux are taking place.The statistical sufficiency of this difference image is discussed in Appendix A. In §3 we use simulated images to measure the performance of a translient detector as compared to the proper subtraction detector at various levels of background noise and translation size.We additionally use these simulations to assess the ability to differentiate between motion and flux variation when translations are smaller than the PSF size.In §4 we present some tests of translient on real astronomical images, in §5 we describe our software implementations, and we conclude in §6.
2. METHODS DERIVATION Extending Zackay et al. (2016), we derive the formalism needed to detect and measure point sources moving on a general static background.In §2.1 we outline the employed statistical model of formation of astronomical images.We start by discussing the case of a particular motion in §2.2, and continue with the case of a general motion (direction and amplitude) in §2.3.Next, in §2.4 we discuss how one can distinguish between flux variation and motion, and in §2.5 we present a method to fit the motion and flux variation simultaneously.

Image formation model
Our model for the reference image (R) and the new image (N ) is given by: R = (T + α r δ q ) ⊗ P r + ǫ r , (1) (2) Here ⊗ denotes convolution, T is a general unknown true background image, r and n subscripts indicate the reference image and the new image, respectively, α r and α n are the observed point source fluxes, P r and P n are the point-spread-functions (PSF), and δ q and δ p denote Dirac delta functions centered at image positions q and p, in the reference and new image, respectively.The additive background noise of the images are ǫ r and ǫ n and these are assumed to contain independent and identically distributed (i.i.d.) per-pixel Gaussian noise having zero mean and known variances, σ 2 r and σ 2 n , respectively.This assumption is typically valid since in most cases astronomical images are in the background-noise-dominated regime, therefore, σ r and σ n can be approximated as scalars.Furthermore, the background is high enough such that the Gaussian noise approximation holds (see Ofek and Zackay 2018 for matched filtering in the Poisson noise case).We further assume that the images are Nyquist sampled9 , and are flux matched10 .

Detecting pure motion for particular translation
We now specialize our model to the case where we assume that the point source has a constant flux (α r = α n ≡ α).At each reference image position q we would like our detector to test between the following two hypotheses: where the null hypothesis means that the point source is static in both images R and N .Note that a static point source can equivalently be absorbed into the background image T .The alternative hypothesis H 1 states that a point source, existing in image R at position q, has translated to position p in the image N .
Motivated by the Neyman-Pearson lemma11 (Neyman and Pearson 1933), and following ZOGY, we express the likelihood-ratio of the alternative and the null hypotheses12 LR(q; p, α) = P (N, R|H 1 ) where we have used the law of conditional probability in the second equality; and in the third equality we used the fact that the probability distribution of R is insensitive to which hypothesis is valid.We use the following definition of the discrete Fourier transform (DFT) of an m×m pixel image f [x, y] (6) Here the hat sign marks the Fourier transform operator, the dot sign indicates the dot product, m is the image size, and k ≡ (k x , k y ) are the frequency coordinates.Taking the Fourier transform of R and N (applying the convolution theorem) Eliminating T between Equations 7 and 8 we obtain We now use the fact that the DFT of an m × m zero mean Gaussian noise image with i.i.d.pixels each with variance σ 2 , is a complex valued m × m random variable with i.i.d.real and imaginary frequency coefficients each with variance m 2 σ 2 /2.We can therefore express the loglikelihood (up to a constant term which does not depend on the data) of the new frequency-image conditioned on the reference image under H 0 as log where the second line can be obtained from the first line by multiplying the numerator and denominator by | P r | 2 and using the definition of the variances of the noise terms ǫ r and ǫ n .In a similar fashion, for the alternative H 1 and setting α ≡ α r = α n , we obtain Writing a ≡ P r N − P n R and b ≡ α P n P r ( δ q − δ p ) we can express the log-likelihood-ratio (up-to a constant) as follows where the |b| 2 term can be dropped because it does not depend on the observations.Here, ℜ represents the realpart function, and the bar sign above a variable indicates the complex-conjugate operator.If the bar comes above the hat sign it means that the complex conjugate operator follows the Fourier transform.Defining the translation vector ∆ ≡ p− q, and the center position x c between point source positions p and q to be x c ≡ (x c , y c ) ≡ p+ q 2 , we obtain where the approximation between the second and third lines holds for sufficiently small translations where we may keep only terms first order in k • ∆.
Using the fact that where the ℑ function is the imaginary-part function that replaces the ℜ function due to multiplication by the imaginary unit i. Substituting the above into the ex-pression for the LLR (Equation 12) we find that Defining the following two component frequency image (one component for each spatial frequency direction) and identifying the inverse DFT, with respect to the image coordinate x c , in the last expression for the LLR, we define and find (to first order in k • ∆) that where we use the symbol ∆ u ≡ ∆ || ∆|| to denote the unit vector in the direction of the translation of ∆.
Then, according to the Neyman-Pearson lemma, rejecting H 0 when ∆ u • ℑ [ z] > η, for some threshold η, is the most powerful test at significance level P ( ∆ u • ℑ [ z] > η|H 0 ).This assumes that the direction of translation, ∆ u is known.We generalize this in the following section.

Detecting pure motion for any translation
To find a statistic that is most powerful at a certain significance level for any direction of translation, we modify the alternative hypothesis of the previous section to the following H1 : This leads to the following likelihood ratio: Defining θ to be the angle between ∆ u and the positive x-axis in the image plane, then which, for any positive value of α∆, is an increasing function of ||ℑ [ z] || (this can be shown using the fact that for every r > 0, d dr [exp(r) + exp(−r)] = exp(r) − exp(−r) > 0).
We thus conclude that a threshold test of the statistic , is a most powerful test, at a specific significance level, for detecting a point source, having a center pixel coordinates (x, y) ≡ x c ≡ ( p + q)/2 undergoing a translation in any direction.We call this last expression the translient statistic (Z 2 ).An important property of Equation 22is that it is anti-symmetric13 and numerically stable14 (i.e., no division by zero).Given the definition of Z 2 , its values are distributed like the χ 2 distribution with two degrees of freedom.In the case that we are interested only in knowing which model (flux variation or motion) better explains the residuals, we can compare the translient statistic Z 2 (Equation 22) against the proper subtraction statistic S, where S is defined in Equations ( A26) and (A27) of Zackay et al. (2016).As the two statistics are strictly not produced by nested models, the comparison has to occur indirectly.We use two equivalent methods: (i) to convert the Z 2 to Gaussian significance (assuming χ 2 distribution with two degrees of freedom) -we call this significance Z σ , and to compare it with |S|; (ii) Since S follows a normal distribution, S 2 is distributed like a χ 2 distribution with one degree of freedom.Therefore, we can compare S 2 + 1 to Z 2 .
We note that before using S 2 and Z 2 (or S and Z σ ), it is recommended to normalize them empirically, such that S will have a mean (or median) of zero and a standard deviation (or robust standard deviation) of 1.Similarly, S 2 and Z 2 can be normalized such that their mean, median, or variance, will be k, k(1 − 2/(9k)) 3 , or √ 2k, respectively, where k is the number of degrees of freedom.Finally, if Z 2 > S 2 + 1 (or Z σ > |S|), then the motion model is preferred, while if Z 2 < S 2 + 1, then the variability model is more likely.
An important implementation comment is that, as seen in Figure 2, the local maxima in |S| and Z σ are not at the same location.Therefore, our recommendation for implementation is as follows: (i) Look for a local maximum in |S| that is larger than the detection threshold (e.g., 5σ); (ii) For each local maximum in |S|, look for the maximum value of Z σ within a radius of X pixels from the local maximum position in |S|; (iii) If |S| > Z σ , then declare that the local maximum is a transient candidate.In our own implementation which is described below, we used a radius of X = 5 pix (about 2-2.5 times the FWHM).When we compare |S| and Z σ , we always refer to the maximum of Z σ found near the local maximum of |S|.

Fitting motion and flux variation
Another approach is to fit a model that mixes flux variations and motion, and therefore allows one to consider more complicated cases, in which the transient is both moving and variable.To do so, we use the new and reference image models defined in equations 1 and 2, where we now allow for observed flux variation, so that α r will generally differ from α n .Using the definitions of the Fourier transforms of the reference image R and the new image N (equations 7 and 8), we may eliminate T by writing This motivates the introduction of the following frequency domain difference image: Here, we get the second line by using equations 2 and 1.In Appendix A we show that this specific choice of a difference image D T captures all the available information (from the observed R and N ) for the purpose of finding the model parameters α r , α n , q, and p.In other words, that D T is a sufficient statistic15 with respect to the model parameters.
We now express the log-likelihood of observing D T for a given α r , α n , q and p = q + ∆ as where we have included the Gaussian normalization term We can specialize Equation 25 to the small translation limit by keeping only terms first order in k • ∆.This can be achieved by substituting: where we again use the image coordinate x c ≡ q+ p 2 .In this limit, a set of independent parameters for the likelihood are the flux difference (α r − α n ), the flux sum multiplied by the magnitude of translation (α r +α n )||∆|| and the angle θ between the direction of translation ∆ u and the positive direction of the x-axis.
We also write Equation 25 using the following two alternative parametrizations (which we will find useful in §3.2): and we denote the linearized version of Equation 25where we substitute the above approximation for α r δ q − α n δ p (the left hand side of Equation 28) as The importance of D T is that it can be used to fit α r , α n , ∆, x c , simultaneously (see §3.2).
A disadvantage of this approach over the test suggested in §2.4 is that if the source flux is constant, fitting the full model will result in lower sensitivity, compared to employment of Z 2 .Furthermore, fitting the likelihood in Equation 30 is computationally expensive.Therefore, we suggest using this approach only in the rare cases when both flux variations and motion are suspected.

SIMULATIONS
In this section, we demonstrate the operation and measure the performance of the translient statistic Z 2 (x, y) (Equation 22), as well as of the difference image likelihood function log P ( D T |α r , α n , q, p) (Equation 25).Specifically, we measure the performance of Z 2 (x, y) as a detector of pure motion in §3.1 and the ability of log P ( D T |α r , α n , q, p) to distinguish between motion and flux variation in §3.2.
The performance of the new algorithm is shown on simulated 64 × 64 pixel image pairs, R(x, y) and N (x, y), the synthetic reference, and new images, respectively.These images simulate the generative process of equations 1 and 2 assuming a static true background image (setting T = 0).Image pairs are generated using 2-D Gaussian profile point spread functions P r (x, y) and P n (x, y) having various aspect ratios and orientations (defined below).To generate R and N with sub-pixel point source positions q and p we first evaluate the (continuous) 2-D Gaussian profiles used to generate P r and P n at sub-pixel offsets and then add zero mean per pixel Gaussian noise with variances σ 2 r and σ 2 n respectively.In Figure 1 we show an example of one such image pair.

Performance evaluation of the translient detector
We now test the performance of the translient statistic and compare it to the proper image subtraction statistic.Proper image subtraction was designed to be an optimal detector of a general local change in flux and does not assume that the transient is due to a translating point source.We thus expect that the optimality of the translient statistic in this specific translational data generation process would lead to enhanced detector performance.For this purpose, we generate a set of six simulations having varying background noise levels (simulations 1-3) and translation sizes (simulations 4-6).The full set of parameters used in these simulations is summarized in Tables 1 and 2. In each simulation, two sets of 10 4 image pairs were generated -one set was of positive examples that did contain a translating point source (such as the pair shown in Figure 1) and the other set was of negative examples which contained only background noise.
For each image pair, we computed the translient statistic image Z 2 (x, y) as well as the (squared) proper statistic image S 2 (x, y).We show in Figure 2 an example of these translient and proper statistic images evaluated on the positive example of Figure 1.We typically see that the center position of the sources coincides with the position of the peak in Z 2 (x, y) while in S 2 (x, y) a double peak (on both sides of the source center position) is visible.In Figure 3 we show an example of the two image statistics when they are evaluated on a negative example pair.These patterns, resulting from the background noise, are eventually responsible for the false positive detection events of each of these detectors (note that the contrast in Figure 3 was enhanced by a factor of ∼5 compared to that of Figure 2).
To quantitatively compare the performance of each image statistic as a detector of pure translation we compute for each simulation and for each image statistic  2).The red and black crosses mark the positions of the original point sources in the reference and new images respectively.It can be seen that the translation is a onepixel translation (diagonally), which is considerably smaller than the width of the PSFs and that the Gaussian PSF profile of the new image differs from that of the reference image by a 90 • rotation.(translient and proper) a curve showing the relative rates of the true positive detection as a function of the false positive detection rates (known as a receiver operating characteristic [ROC] curve).We define the true positive detection rate of each detector (above some threshold) to be the number of positive examples that were detected as such, divided by the total number of positive examples in the simulation.Similarly, the false positive rate is the number of negative examples that the detector incorrectly classified as positive examples divided by the total number of negative examples.
For a given threshold level, we define each detector's response to be positive if at least one pixel in the statistic image was above this threshold.We show the resulting curves for simulations 1-3 in Figure 4 (where the noise level was varied), and for simulations 4-6 in Figure 5 (where the translation length was varied).One can see, that for all simulations and at all values of the false-positive rate, that the translient statistic detector achieves a higher true-positive detection rate compared to that of the proper statistic detector.One can also see that lower levels of noise and larger translations allow  for simulations 1-3 (varying levels of flux noise) showing the rates of true-positive events as a function of the false-positive event rate.These curves were generated by varying detector threshold levels η over an appropriately wide range.Each color represents a simulation index (noise level) while solid (dashed) curves represent the resulting performance when using the translient statistic (proper statistic).One can see that for all simulations and at all values of the false-positive rate, the translient statistic achieves a higher true-positive detection rate compared to the proper statistic.We also see that lower levels of noise allow the detectors to reach higher true-positive rates at set false-positive rates.
both detectors to reach higher true-positive rates for a given set of false-positive rates.

Performance evaluation of fitting for motion and flux variation simultaneously
In some rare cases, one may want to check the possibility that the source is both moving and variable.For this purpose in §2.5 we derived expressions for the likelihood of observing the difference image D T due to a point source centered at x c that has translated ∆ pixels and changed flux from α r in the reference image to α n in the new image (Equations 28,29 and 30).In this section, we use simulations 7 and 8, where both the flux and the position of the point source change, and evaluate the likelihood of these observations.The full set of parameters of simulations 7 and 8 is provided in Table 3. Simulation 7 differs from simulation 8 such that in simulation 7 the translation is larger than the size of the PSF, whereas in simulation 8 the translation is marginally smaller.To evaluate the likelihood functions in this section we perform Markov chain Monte Carlo (MCMC) sampling using the emcee package (Foreman-Mackey et al. 2013) by randomly initializing 100 chains near the parameter origin and running each chain through 100 warm-up iterations and an additional 650 sampling iterations.
We show in Figure 6 the resulting MCMC sample distribution of the likelihood function P ( D T |α r , α n , ∆ x , ∆ y ) (Equation 28, after setting the point source center position x c to its true simulated value), when evaluated on simulation 7.In the case of simulation 7, where the point source is well separated between the reference and new images (compared to the PSF widths), one can see that the model parameters ∆ x , ∆ y , α r , α n are non-degenerate, and that the resulting sample distribution may be used to distinguish between a change in flux and a translation of the point source.
In Figure 7 we show the resulting MCMC sample distribution for simulation 8, where the convolved point sources of the reference and new images are not well separated.One can see that in this case when using model parameters ∆ x , ∆ y , α r , α n , there is a strong dependence (or approximate degeneracy) between these parameters.Such a dependence hampers the ability to distinguish between a change in flux and a translation of the point source.
To partially deal with this degeneracy, one can switch to a new set of (more orthogonal) parameters: ||∆||, θ, α r − α n and α r + α n .In Figure 8 we present the result of sampling the same distribution (for simulation 8), but with this new set of parameters.One can see that this change of variables largely decouples most (but not all) of the parameters, leaving a clear dependence between ||∆|| and α r + α n .
Finally, we show in Figure 9 the result of sampling the linearized likelihood (Equation 30), again for the "small translation" simulation 8.The linearized likelihood depends only on the three parameters α r − α n , (α r + α n )||∆|| and θ (after setting x c to its true simulated value).One can see that when using this reduced set of model parameters, the likelihood does not show a strong dependence between the parameters.To summarize, in some circumstances, inference using this set of model parameters can allow us to distinguish motion   1 and 2 while here H and W denote the equal horizontal and vertical Gaussian profile diameter of the reference and the PSFs of the new image.In both simulations all the parameters other than the translation length are equal.In these two simulations, both the flux and the point source position are changed between the reference and new images.In simulation 7 the motion is larger than the PSF resolving diameter, while in simulation 8 it is marginally smaller than the PSF. and flux variation when the translations are smaller than the PSF.

TESTS ON REAL ASTRONOMICAL IMAGES
To test the performance of the translient statistic Z 2 (Equation 22) on real astronomical data we have used images obtained with the Large Array Survey Telescope (LAST; Ofek et al. 2023a;Ben-Ami et al. 2023).These images have been taken over the course of one night on 2023 Apr 25 from the LAST site at Neot Smadar, Israel.The data was reduced using the LAST pipeline (Ofek 2014;Soumagnac and Ofek 2018;Ofek 2019;Ofek et al. 2023b).One of the most important applications of Z 2 is to distinguish between flux-variation and motion of a source.To test this capability, we derived Z σ and |S| on consecutive images of a stationary variable source, and of a moving non-variable source.As explained in §2.4,we first looked for local maxima in |S|, and then chose the maximum of Z σ within 5 pixels from the local maximum of |S|.
For the variable source, we analyzed images captur-  25in k • ∆.Here, evaluation is again for the small translation simulation 8 (as in Figures 7 and 8).One can see that this reduced set of model parameters (in the linearized likelihood) does not show a strong dependence and may therefore be preferable in the small translation limit.
ing LINEAR 3506385, an eclipsing binary with a period of ≈ 0.22 d and an average V-band magnitude of 15.96 (Drake et al. 2014).In the selected data, the observations cover a brightening phase of the binary.We thus use one of the first images as the reference image and perform subtraction of subsequent images, registered to the reference image.The standard deviation in the astrometric solution of the binary's Right Ascension and Declination coordinates within the data is about 80 mas.As the binary brightens over time, both statistics increase, whereas |S| leads to overall higher significance than Z σ (Figure 10).This result is consistent with the interpretation of a variable stationary source.As a moving source example, we analyzed images capturing the asteroid 3832 (Shapiro).The asteroid's motion between two successive images is about ≈ 0.3 arcsec, whereas the LAST pixel scale is 1.25 arcsec pix −1 .We used the first image as a reference image so that the time series of subsequent registered images represents a source moving further away from its initial position.Figure 11 shows |S| and Z σ as functions of the asteroid's motion relative to the reference image.The results clearly show that for small translations, when the shift is smaller than the PSF size, Z σ yields a larger significance than |S|.
Once the asteroid has moved far enough to clearly separate it in the new image from itself in the reference image, it effectively becomes a transient, and |S| overtakes Z σ .function returns z data product.This function, along with the likelihood functions and code to generate the plots in this paper, are available from GitHub16 .
The MATLAB code is available as part of the As-troPack/MAATv2 package available from GitHub17 .The MATLAB code includes a core function18 , a ZOGY plus Translient class19 , and many high-level functions that allow the construction of the PSF, registering the images, measure the zero points, apply proper subtraction, and more.The image subtraction utilities are also described, with examples, in the AstroPack live wiki page20 .

DISCUSSION
We have presented a novel extension to the work of Zackay et al. (2016) to detect and measure transients that result from a point source undergoing translation given a pair of temporally separated images of the event.
For the case of a pure translation (no flux change), and when the translation is smaller than the PSF size, we derive an optimal statistic that can detect such translients for any, a priori unknown, direction of motion.We show, using simulated examples, that our derived statistic, when applied to moving sources, reaches a higher true-positive detection rate at all false-positive detection rates, compared to the proper subtraction statistic of Zackay et al. (2016).This is the case at various levels of background noise and for various small, compared to the PSF, non-zero translations.
When dealing with transient detection, this method should be used together with the proper image subtraction (ZOGY) method.While ZOGY is optimal for flux variation, translient is optimal for motion.Therefore, the combination of the two methods allows one to discern between variability (or transient detection) and motion (e.g., due to astrometric scintillation noise).Specifically, calculating both statistics and comparing their significance allows us to tell which model better explains the flux residuals in the subtraction image -is it variability or motion?We have demonstrated this capability using real telescope images of an asteroid and a variable source.We argue that this is a powerful statistic to distinguish between transients/variables and source motion due to registration errors, scintillation noise, color-refraction, or astrophysical motion.
For the case where the transient detection results from both a change in position and a change in flux, we additionally derive the likelihood function for observing a particular difference image (shown to sufficiently capture all the necessary information) which is then used to disambiguate changes in flux and position.Using simulated examples, we have studied the behavior of this likelihood function, when the translation is either larger or smaller than the PSF diameter, and using three different param-eterizations of the likelihood.We show that for large translations, this likelihood function can distinguish between motion and flux variation in all cases.For translations smaller than the PSF, a degeneracy (or strong dependence) of the parameters occurs in the likelihood in some of the parameterizations.Using a linearized version of the likelihood, we have found a reduced set of non-degenerate parameters which are preferable for discrimination between translation and flux variation in the case of small translations.
Our method is unique in that it enables the detection of translating point sources in crowded fields and on top of a complex background for which no good models exist.It is derived from a well-defined generative process, is essentially non-parametric, and is optimal in the case of small pure translations.Particular cases where the newly developed detector method may find use are (but not limited to): (i) reducing the amount of false alarms, due to motion, in a transient detection algorithm implementation; and (ii) measuring the motion of astrophysical sources on top of a complicated background.

2. 4 .
Distinguishing motion from flux variation Given a transient detection, we would like to know what the likely cause of the transient detection is: a motion of a point source or a change in its flux.
Note. -Here αr (αn) is the flux level of the point source in the reference (new) image, Wr and Hr (Wn and Hn) are the Gaussian profile width and height parameters of the reference (new) image PSF in pixels and βr (βn) is the orientation angle of the PSF in the reference (new) image.Note.-Here for each simulation, σr (σn) is the standard deviation value of the per-pixel additive white Gaussian flux noise in the reference (new) image R (N ), measured in the αr (αn) arbitrary flux units.The components of the translation vector (∆x, ∆y) ≡ ∆ ≡ p− q between the position of the point source in the reference and new images, are in the units of pixels.Simulations 1-3 vary the flux noise at constant translation and simulations 4-6 vary the length of the translation vector at constant noise levels.

Fig. 1 .
Fig. 1.-Example of a pair of reference and new images generated in simulation 1 (see Table2).The red and black crosses mark the positions of the original point sources in the reference and new images respectively.It can be seen that the translation is a onepixel translation (diagonally), which is considerably smaller than the width of the PSFs and that the Gaussian PSF profile of the new image differs from that of the reference image by a 90 • rotation.
Fig. 2.-Resulting translient statistic (Z 2 ; left panel) and proper statistic (S 2 ; right panel) images computed for the image pair shown in Figure 1.The red crosses mark the location of xc ≡ ( p + q)/2, the center between the positions of the point sources in the reference and new images.In the squared translient statistic image one can see a single peak approximately centered at xc whereas in the squared proper statistic image two peaks on opposite sides of xc (diagonally) can be observed.This is the typical qualitative appearance of the two detector images when the data is generated by a point source undergoing pure translation (without any flux change).
Fig.3.-Resultingtranslient statistic (Z 2 ; left panel) and proper statistic (S 2 ; right panel) images computed for zero-mean noiseonly images (no moving point source).These patterns (arising from the action of the translient statistic and proper statistic on the observational noise) are responsible for false-positive detection events (at a certain detection threshold), i.e., cases of an event being classified as a detection of a translating point source when there was none.The highest peak, on the left panel, (Z 2 ∼ = 11) corresponds to a false alarm probability of 0.004 per pixel (i.e., χ 2 distribution with two degrees of freedom), and close to unity for 64 2 trials.

Fig. 4 .
Fig. 4.-Receiver operating characteristic curve (ROC curve)for simulations 1-3 (varying levels of flux noise) showing the rates of true-positive events as a function of the false-positive event rate.These curves were generated by varying detector threshold levels η over an appropriately wide range.Each color represents a simulation index (noise level) while solid (dashed) curves represent the resulting performance when using the translient statistic (proper statistic).One can see that for all simulations and at all values of the false-positive rate, the translient statistic achieves a higher true-positive detection rate compared to the proper statistic.We also see that lower levels of noise allow the detectors to reach higher true-positive rates at set false-positive rates.
Note. -Parameters of the additional simulations discussed in §3.2 to study the ability to distinguish between point source motion and point source flux variation.The columns are similar to those of Tables

Fig. 6 .
Fig. 6.-The likelihood function P ( D T |∆x, ∆y, αr, αn) evaluated for simulation 7 using the (true) point source center location xc = ( p + q)/2 as a function of model parameters ∆x, ∆y, αr, αn (see §2.4).The evaluation was performed using Markov chain Monte Carlo sampling.The figure shows the marginal distributions over single parameters (upper diagonal) as well as the marginal distributions for all parameter pairs (off-diagonal).The blue point indicates the true model parameters while the curves indicate the 68%, 95% and 99.7% confidence regions.The dashed vertical lines indicate the 1-sigma confidence intervals in the 1-D marginals.

Fig. 7 .
Fig. 7.-The likelihood function P ( D T |∆x, ∆y, αr, αn), similar to Figure 6 but evaluated for simulation 8 where the translation length is smaller than the resolving width of the PSFs.

Fig. 8 .
Fig. 8.-The likelihood function P ( D T | ||∆||, θ, αr − αn, αr + αn), similar to Figure 7 and similarly evaluated for simulation 8 (where the translation is small compared to the PSF) but using a different set of variables to display the same distribution.One can see that this change of variables largely decouples most (though not all) of the parameters, leaving a clear dependence between ||∆|| and αr + αn.

Fig. 9 .
Fig. 9.-The likelihood function P ( D T |αr − αn, (αr + αn)||∆||, θ), obtained by linearizing Equation25in k • ∆.Here, evaluation is again for the small translation simulation 8 (as in Figures7 and 8).One can see that this reduced set of model parameters (in the linearized likelihood) does not show a strong dependence and may therefore be preferable in the small translation limit.
Fig. 10.-A comparison of Zσ (blue empty X's) and |S| (orange empty circles) for a series of images capturing a variable star in the FoV.The measured LAST magnitude of the star (green solid circles) is also shown, with its value read on the right y-axis.For a stationary variable star, |S| is dominant.
Fig. 11.-Upper panel: A comparison of Zσ (blue empty X's) and |S| (orange empty circles) for a series of images capturing a moving source (asteroid 3832 Shapiro).The images are taken sequentially within one hour during the same night.The first image serves as the reference image.Each sequential new image thus represents a source moving away from its initial position.The distance, that the asteroid has crossed since the time of the reference image, is shown as a shift in arcseconds and in the FWHM of Pr.Three selected measurements, where the shift is closest to a multiple of FWHM PSF , are marked by gray dashed vertical lines and designated by the letters A, B, and C. Lower panel: For each of the three marked measurements (A, B, C), a set of cutouts of the new, reference, D, |S|, and Zσ images around the initial asteroid position are shown.For the |S| and Zσ cutouts, the minimum and maximum values of the colormap are fixed to −5σ and 13σ, respectively.For each cutout, the dashed circle shows the FWHM of the PSF of the respective image.For |S| and Zσ, the shown PSF is that of the D image.Within a shift of 2 FWHM PSF , Zσ > |S|.Once the asteroid positions in the reference and new images are clearly separated, |S| overtakes Zσ.
E.O.O. is grateful for the support of grants from the Willner Family Leadership Institute, André Deloro Institute, Paul and Tina Gardner, The Norman E Alexander Family M Foundation ULTRASAT Data Center Fund, Israel Science Foundation, Israeli Ministry of Science, Minerva, BSF, BSF-transformative, NSF-BSF, Israel Council for Higher Education (VATAT), Sagol Weizmann-MIT, Yeda-Sela, and the Rosa and Emilio Segre Research Award.This research is supported by the Israeli Council for Higher Education (CHE) via the Weizmann Data Science Research Center, and by a research grant from the Estate of Harry Schutzman.

TABLE 3
Parameters used in the motion vs. flux-variation simulations