This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.

AN ACCURATE AND EFFICIENT ALGORITHM FOR DETECTION OF RADIO BURSTS WITH AN UNKNOWN DISPERSION MEASURE, FOR SINGLE-DISH TELESCOPES AND INTERFEROMETERS

and

Published 2017 January 16 © 2017. The American Astronomical Society. All rights reserved.
, , Citation Barak Zackay and Eran O. Ofek 2017 ApJ 835 11 DOI 10.3847/1538-4357/835/1/11

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0004-637X/835/1/11

ABSTRACT

Astronomical radio signals are subjected to phase dispersion while traveling through the interstellar medium. To optimally detect a short-duration signal within a frequency band, we have to precisely compensate for the unknown pulse dispersion, which is a computationally demanding task. We present the "fast dispersion measure transform" algorithm for optimal detection of such signals. Our algorithm has a low theoretical complexity of $2{N}_{f}{N}_{t}+{N}_{t}{N}_{{\rm{\Delta }}}{\mathrm{log}}_{2}({N}_{f})$, where Nf, Nt, and NΔ are the numbers of frequency bins, time bins, and dispersion measure bins, respectively. Unlike previously suggested fast algorithms, our algorithm conserves the sensitivity of brute-force dedispersion. Our tests indicate that this algorithm, running on a standard desktop computer and implemented in a high-level programming language, is already faster than the state-of-the-art dedispersion codes running on graphical processing units (GPUs). We also present a variant of the algorithm that can be efficiently implemented on GPUs. The latter algorithm's computation and data-transport requirements are similar to those of a two-dimensional fast Fourier transform, indicating that incoherent dedispersion can now be considered a nonissue while planning future surveys. We further present a fast algorithm for sensitive detection of pulses shorter than the dispersive smearing limits of incoherent dedispersion. In typical cases, this algorithm is orders of magnitude faster than enumerating dispersion measures and coherently dedispersing by convolution. We analyze the computational complexity of pulsed signal searches by radio interferometers. We conclude that, using our suggested algorithms, maximally sensitive blind searches for dispersed pulses are feasible using existing facilities. We provide an implementation of these algorithms in Python and MATLAB.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

When a radio pulse propagates through the interstellar and intergalactic plasma, different frequencies travel at different speeds. This phenomenon, known as dispersion, hinders the detection of radio pulses. This is because integrating over a wide bandwidth during a given time frame dilutes the signal with noise, as only a narrow bandwidth contains the signal's energy at any given interval within the integration frame. The solution to this problem is to dedisperse the signal (i.e., to apply frequency-dependent time delays to the signal prior to integration). Since in most cases the dispersion is a priori unknown, we need to test a large number of dispersion values. The best dispersion is the one that maximizes the signal-to-noise ratio (S/N) of the pulse. A different way to look at this problem is that we need to integrate the flux along many dispersion curves in the frequency–time domain.

The difference in pulse arrival time between two frequencies is given by

Equation (1)

where $\mathrm{DM}$ is called the dispersion measure of the signal, given by the column density of free electrons along the line of sight to the source (measured in units of $\mathrm{pc}\,{\mathrm{cm}}^{-3}$), fi are frequencies measured in GHz, and ti is the arrival time of the signal at frequency fi. For brevity, throughout the paper, we will use d to denote the dispersion measure in which all the dimensional constants are absorbed, and the relation is given by

Equation (2)

The raw input from a radio receiver is a time series of voltage measurements sampled typically at high frequency (e.g., ∼100 MHz). We denote the sampling interval by τ. In order to generate a spectrum ($I[t,f]$) as a function of time (t) and frequency (f), the time series is divided into blocks of size Nf samples, and each block is then Fourier transformed (a process known as short-time Fourier transform, or STFT), and the absolute value squared at each frequency is saved after summing the power from all polarizations ${N}_{\mathrm{pol}}$.1

There are two distinct processes that we can apply to dedisperse a signal: incoherent dedispersion and coherent dedispersion. The term incoherent dedispersion refers to applying frequency-dependent time delays to the $I(t,f)$ matrix, while coherent dedispersion involves applying a frequency-dependent phase shift to the Fourier transform of the raw voltage signal. This subtle difference is important—incoherent dedispersion is only an approximation that is valid under certain conditions that we are going to review shortly (Section 2). A typical input matrix ($I[t,f]$) to incoherent dedispersion is presented in the top panel of Figure 1, while on the bottom we show a zoom-in on the output of the transform.

Figure 1.

Figure 1. A dispersed signal and its dispersion transform (FDMT), based on simulated data. The top panel shows a $0.1\,\mathrm{ms}$ wide dispersed pulse with $D=40\,\mathrm{pc}\,{\mathrm{cm}}^{-3}$. The bottom panel shows a zoom-in on the significant part of the dedispersion transform. Notice that the Y axis has units of time because the dispersion measure is parameterized by the total time delay between the entrance and exit of the pulse into and out of the observed band. Note that because the pulse is so thin, any slight error on the dispersion path will immediately result in a substantial loss of the pulse power. Note also that the characteristic shape visible near the correct solution is due to the fact that curves with close $t0,{\rm{\Delta }}t$ can substantially intersect the pulse's curve. This may cause significant detections in these $t0,{\rm{\Delta }}t$ combinations. This is not to suggest that match-filtering this shape on the FDMT output is required, as the most informative detection statistic is the value returned in the FDMT output itself.

Standard image High-resolution image

The exact mathematical description of signal dispersion is the multiplication of the Fourier transform of the raw signal with the phase-only filter:

Equation (3)

(Lorimer & Kramer 2012), where f0 is the baseband frequency. In order to coherently dedisperse the signal, we can apply the inverse of this shift.

The computational requirements of incoherent dedispersion are more tractable than those of coherent dedispersion, and therefore whenever a blind search for astrophysical pulses is done, incoherent dedispersion is usually the method of choice (e.g., van Leeuwen & Stappers 2010).

The main practical motivation for improving dedispersion algorithms is to allow efficient analysis of data from modern radio interferometers. When a blind search for new sources is conducted using a multicomponent radio interferometer, coherent dedispersion of a large number of synthesized beams is the most sensitive detection method. But, in reality, coherent dedispersion, even on one beam, was considered (until this work) unfeasible. In addition, applying incoherent dedispersion to all independent sky positions is also generally unfeasible. The standard solutions for this problem were one of the following:

  • 1.  
    Dedisperse incoherently only a small fraction of the field of view.
  • 2.  
    Combine the power from all antennas incoherently, and incoherently dedisperse.
  • 3.  
    Use only a small compact core of the interferometer for blind searches, and incoherently dedisperse.

The above solutions result, each with a different trade-off, in orders-of-magnitude losses in time resolution, sensitivity, field of view, and angular resolution. These losses can degrade the scientific yield of transient surveys by orders of magnitude.

It is therefore important to improve upon these methods, especially when searching for nonrepeating radio transients such as fast radio bursts (FRBs; Lorimer et al. 2007; Thornton et al. 2013). Efficient detection of such objects requires both high sensitivity and a good spatial localization that is calculated in real time. This is crucial for the multiwavelength follow-up of these elusive transients (e.g., Petroff et al. 2014).

In this paper, we present the fast discrete dispersion measure transform algorithm (henceforth FDMT) to solve the problem of incoherent dedispersion. FDMT is a transform algorithm, having (generally) equal sizes for both input and output and a complexity that is only logarithmically larger than the input itself.

In addition, we present a hybrid algorithm that achieves both the sensitivity of coherent dedispersion and the computational efficiency2 of incoherent dedispersion. Finally, we show that using this algorithm, it is feasible to perform blind searches with modern radio interferometers, and consequently to open new frontiers in the search for pulsars and radio transients.

The structure of the paper is as follows. In Section 2 we analyze the sensitivity of incoherent dedispersion. In Section 3 we review the existing approaches for incoherent dedispersion. In Section 4 we describe the proposed algorithm, along with its complexity analysis. In Section 5 we present a variant of the algorithm that utilizes the fast Fourier transform (FFT) to make the algorithm more efficient on multiprocessor technologies. In Section 6 we compare the runtime of the implementation we provide with existing implementations of brute-force dedispersion. In Section 7 we propose a new hybrid algorithm for detection of pulses shorter than the dispersive smearing limits of incoherent dedispersion. In Section 8 we discuss the application of the proposed algorithms for interferometers, and we show that sensitive detection of short pulses, with maximal resolution, using all the elements of the interferometer, is feasible with current facilities. We conclude in Section 9.

2. SENSITIVITY ANALYSIS OF INCOHERENT DEDISPERSION

In this section, we develop the conditions on the pulse length, the sampling interval, and the dispersion delay that allow sensitive detection with incoherent dedispersion. This will be of importance in Section 7.

We denote by x the raw voltage signal and by Ns the total number of samples. We further denote the pulse duty time (length) by ${t}_{p}={N}_{p}\tau $, where Np is the number of samples within the pulse. The optimal score for pulse detection is the sum of the squared voltage measurements within the pulse start time (t0) and end time:

Equation (4)

We define dmax to be the largest dispersion measure up to which we want to dedisperse. We further define the maximal dispersion time td to be the total delay of the pulse within the band, and we define Nd to be the dispersion time in units of samples:

Equation (5)

An important property of the dispersion kernel H(f) is that it is power-preserving ($| \hat{H}(f){| }^{2}=1$). Another important property is that the majority of pulse power will lie within the dispersion curve in $I(t,f)$. However, by summing over the dispersion curve in I, we also sum the power of the noise outside the pulse, the total variance of which is proportional to the number of added I bins, that is, to the area of the dispersion curve. Assuming the dispersion curve is close to linear, the total area of the dispersion curve can be approximated by

Equation (6)

Therefore, the ratio of the noise power summed by incoherent dedispersion and the noise power within the pulse is

Equation (7)

Optimizing the sensitivity with respect to Nf we get the following:

  • 1.  
    If ${N}_{p}\lt {N}_{f}$, the choice of ${N}_{f}$ that maximizes sensitivity is
    Equation (8)
    regardless of pulse length ${N}_{p}$.
  • 2.  
    If ${N}_{f}\leqslant {N}_{p}$, the choice of Nf that maximizes sensitivity is
    Equation (9)

In order for the sensitivity loss to be less than a factor of $\sqrt{2}$, we equate Equations (7)–(2) and get the following:

  • 1.  
    If ${N}_{f}\gt {N}_{p}$, then ${N}_{d}\lt {N}_{f}(2{N}_{p}-{N}_{f})$. Substituting Equation (8) we get the relation
    Equation (10)
  • 2.  
    If ${N}_{f}\lt {N}_{p}$, then ${N}_{d}\lt {N}_{f}{N}_{p}$, which together with the assumption yields again the relation
    Equation (11)

Equation (10) transforms into an important relation between the minimal pulse duration tp, the maximal dispersion time td, and the sampling time τ:

Equation (12)

or, simplified,

Equation (13)

This means that incoherent dedispersion cannot avoid a minimal dispersion smearing of ${t}_{\mathrm{inc}}$. The sensitivity loss factor from incoherent dedispersion is therefore (substituting Equations (8) and (13) into Equation (7) and simplifying)

Equation (14)

Therefore, incoherent dedispersion is insensitive3 when ${t}_{p}\ll {t}_{\mathrm{inc}}$ and approaches maximal sensitivity when ${t}_{p}\gg {t}_{\mathrm{inc}}$. This pulse duration depends (weakly) on frequency and dispersion measure and is given by

Equation (15)

See Cordes & McLaughlin (2003) for a similar derivation of ${t}_{\mathrm{inc}}$.

Among the astrophysical sources with short pulse durations reside pulses from millisecond pulsars, with durations of $\sim {10}^{-3}\,{\rm{s}}\mbox{--}{10}^{-4.5}\,{\rm{s}}$ (Kramer et al. 1998), FRBs with durations of $\sim {10}^{-3}\,{\rm{s}}$ (Thornton et al. 2013; Champion et al. 2015), and pulsar giant pulses, for which the duration can be shorter than $2\,\mathrm{ns}$ (Hankins et al. 2003).

As a result of the insensitivity of incoherent dedispersion to very short pulses, the phase space in which ${t}_{p}\ll {t}_{\mathrm{inc}}$ is effectively unexplored. For example, past search campaigns for giant pulse-emitting pulsars could not probe this phase space because of the limitations of incoherent dedispersion (see McLaughlin & Cordes 2003; Bhat et al. 2011).

One phenomenon that prevents astrophysical pulses from being infinitesimally narrow is the phenomenon of scattering. Pulse scattering is a widely documented phenomenon in which astrophysical radio pulses broaden as a result of multipath propagation of the pulse through the interstellar medium (ISM). In general, we expect scattering to increase with the dispersion measure, as both depend on the amount of interstellar medium between the source and the observer. An empirical correlation between the observed dispersion measure of pulsars and the observed scattering can be found in Bhat et al. (2004). However, this correlation has a large scatter, which spans orders of magnitude of scattering times per constant dispersion measure. Scattering measures depend on the $\tfrac{{D}_{l}{D}_{s}}{{D}_{{ls}}}$ ratio, where Dl is the distance to the screen, Ds is the distance to the source, and Dls is the distance between the screen and the source. This ratio may differ substantially for different sources. Possible examples for sources in which this ratio is substantially different than that of pulsars is the speculated population of radio transients from cosmological sources (Macquart & Koay 2013). Another example is the recently found pulsar in the vicinity of the galactic center (Spitler et al. 2014). In both examples, the observed pulse broadening is orders of magnitude shorter than the predicted scattering based on the observed relation between dispersion measure and pulse broadening. In addition, the scaling of scattering with frequency ($\sim {\nu }^{-4}$) is much steeper than the scaling of ${t}_{\mathrm{inc}}$ with frequency (${\nu }^{-3/2}$). Therefore, the need for coherent treatment of dedispersion also depends on the observing band.

As a result of all these complications, we find it more robust to keep the discussion in the rest of the paper to using observed pulse durations, rather than scattering measures, distances, or spectral indices, which are often found in the literature.

3. EXISTING ALGORITHMS FOR INCOHERENT DEDISPERSION

Algorithmically, there are two leading approaches for incoherent dedispersion of single-dish data streams. These are the tree dedispersion (Taylor 1974) and brute-force dedispersion (e.g., Magro et al. 2011; Barsdell et al. 2012; Clarke et al. 2013).

The tree dedispersion algorithm efficiently calculates the integrals of all the straight-line paths with slopes between 45° and 90° through the input time versus frequency matrix (this is similar to the discrete Radon transform, Gotz & Druckmuller 1996; Brady 1998). The computational complexity of this algorithm is ${N}_{t}{N}_{f}{\mathrm{log}}_{2}\,{N}_{f}$, where we use the notation ${N}_{t}={N}_{s}/{N}_{f}$ (note that Nt and Nf are the dimensions of $I(t,f)$). However, since the dispersion curve is not linear, the use of this method results in a substantial loss of sensitivity. This can be somewhat mitigated by applying the algorithm to many small subbands4 of the data, and then combining the results with a dedicated algorithm. This approach is not exact, and it increases the computational complexity of the algorithm. More details on the sensitivity analysis and computational complexity of this algorithm are given in Barsdell et al. (2012).

The brute-force dedispersion algorithm simply scans all of the trial dispersion measures one at a time, integrating its path on the input map and finding curves with excess power. This method is exact but has the high complexity of ${N}_{{\rm{\Delta }}}{N}_{f}{N}_{t}$, where ${N}_{{\rm{\Delta }}}$ is the number of trial dispersion measures scanned. In order to expedite the search speed, the algorithm was implemented on graphical processing units (GPUs), and this method is now capable of analyzing single-beam data in real time (Barsdell et al. 2012). Furthermore, as shown by Cordes & McLaughlin (2003), one can reduce the number of dispersion trials by a small factor by optimizing the distances between adjacent dispersion trials. The maximal sensitivity, along with the possibility of real-time analysis using GPUs, makes this method likely to be the most popular algorithm for dedispersion.

Here we present an algorithm that combines both maximal sensitivity and low computational complexity. A comparison of all these mentioned algorithms is summarized in Table 1.

Table 1.  Algorithm Comparison

  FDMT Brute Force Tree
Computational complexity $\max \{{N}_{t}{N}_{{\rm{\Delta }}}{\mathrm{log}}_{2}({N}_{f}),2{N}_{f}{N}_{t}\}$ ${N}_{f}{N}_{t}{N}_{{\rm{\Delta }}}$ ${N}_{f}{N}_{t}{\mathrm{log}}_{2}({N}_{f})$
Information efficient Yes Yes No
Memory-access friendly Yes Yes Yes
Parallelization friendly Yes Yes No

Note. Comparison of the FDMT algorithm with two other approaches to incoherent dedispersion, brute force (e.g., Barsdell et al. 2012) and tree dedispersion (Taylor 1974). The computational complexity of FDMT is at least two orders of magnitude smaller than that of brute-force dedispersion for typical ${N}_{f}\sim {2}^{10}$, while retaining maximal sensitivity.

Download table as:  ASCIITypeset image

4. THE FDMT ALGORITHM

The input to the FDMT algorithm is a two-dimensional array of intensities, denoted by $I(t,f)$. Denote the width of a frequency bin by ${\delta }_{f}$ and the width of a time bin by ${\delta }_{t}$, satisfying

Equation (16)

The FDMT algorithm calculates the integral over all curves defined by Equation (2). A dispersion curve can be uniquely defined by the arrival time of the signal at the lowest frequency (t0) and the time delay between the arrival times of the lowest and highest frequencies (${\rm{\Delta }}t$).

Therefore, the FDMT result can be expressed as a two-dimensional array that contains the integrals along dispersion curves as a function of t0 and ${\rm{\Delta }}t$:

Equation (17)

where ${f}_{\min }$ and ${f}_{\max }$ are the minimum and maximum frequencies in the observed band.

To compute the FDMT transform of the input, the algorithm works in ${\mathrm{log}}_{2}({N}_{f})$ iterations. The inputs of the ith iteration are the FDMT transforms of a partition of the original band into ${N}_{f}/{2}^{(i-1)}$ subbands of size ${2}^{(i-1)}$ frequencies. The outputs of the ith iteration are the FDMT transforms of a partition of the original input into ${N}_{f}/{2}^{i}$ subbands of size 2i frequencies. Every two successive subbands are combined using the addition rule described below. After ${\mathrm{log}}_{2}({N}_{f})$ iterations, we have the FDMT over the whole band.

The FDMT process of combining two successive subbands into ${A}_{{f}_{0}}^{{f}_{2}}({t}_{0},{\rm{\Delta }}t)$ is given by the following addition rule:

Equation (18)

Here, ${A}_{{f}_{0}}^{{f}_{1}}$ and ${A}_{{f}_{1}}^{{f}_{2}}$ are part of the output of the previous iteration, and t1 is the intersection time of the dispersion curve at the central frequency ${f}_{1}=\tfrac{{f}_{2}+{f}_{0}}{2}$. And t1 is uniquely determined by the formula

Equation (19)

where

Equation (20)

By definition, ${A}_{{f}_{0}}^{{f}_{1}}({t}_{0},{t}_{0}-{t}_{1})$ is the calculated sum over the unique dispersion curve between the coordinates $({t}_{0},{f}_{0})$ and $({t}_{1},{f}_{1})$, and ${A}_{{f}_{1}}^{{f}_{2}}({t}_{1},{t}_{1}-{t}_{2})$ is the same for $({t}_{1},{f}_{1})$ and $({t}_{2},{f}_{2})$. After an FDMT iteration, the only dispersion curve passing through $({t}_{0},{f}_{0}),({t}_{2},{f}_{2})$ will be given by ${A}_{{f}_{0}}^{{f}_{2}}({t}_{0},{t}_{0}-{t}_{2})$.

For sufficiently early t0, the time t1 will be smaller than zero. In that case we just copy, that is, use the alternative addition rule:

Equation (21)

The operation of one iteration of the algorithm is graphically illustrated in Figure 2.

Figure 2.

Figure 2. Illustration of a single iteration of the FDMT algorithm. On the left side, the input frequency vs. time input table is drawn. An example dispersion curve is highlighted in purple. The input table is methodically divided into two subbands. The right-hand table shows the final dedispersion transform of the input (left), where the integral over the purple dispersion curve (left) is marked by the purple cell on the right. In the middle, the dispersion measure transform of the two subbands (which are assumed to be calculated in the previous iteration) are drawn, where each of the two subbands contains in each cell the sum of the unique dispersion trail with exit point t and total delay ${\rm{\Delta }}t$ through the corresponding subbands. The dispersion measure dimension is parameterized using ${\rm{\Delta }}{t}_{0}={t}_{0}-{t}_{1}$ and ${\rm{\Delta }}{t}_{1}={t}_{1}-{t}_{2}$. The cells that contain the partial sums of the two halves of the purple dispersion curve on the left are highlighted in purple. Highlighted in red is a row in the dispersion tables that contributes to the calculation of the red cells on the right. Notice that we can add the lines highlighted in red as vectors, in order to implement the algorithm in a vectorized form. Highlighted in orange are the cells that use the alternative addition rule, in the case when the dispersion trail exits the boundaries of the input table.

Standard image High-resolution image

The only thing left to deal with is the data initialization.

This is done prior to the first iteration, generating ${A}_{{f}_{0}}^{{f}_{1}}$ for every two consecutive frequencies. If the maximum dispersion delay between two consecutive frequencies is smaller than the width of a time bin, then we can use the simple initialization:

Equation (22)

Otherwise, the energy of the signal at some frequencies is not located at a single bin. Note that this implies ${N}_{{\rm{\Delta }}}\gg {N}_{f}$, which we show in Section 2 to be suboptimal in terms of sensitivity. This can be compensated for in two ways:

  • 1.  
    By computing partial averages over the time axis:
    Equation (23)
    Note that if the maximal dispersion measure is large, multiple ${\rm{\Delta }}t$'s are computed, and this step, unlike the FDMT iterations, inflates the data.
  • 2.  
    If for a certain d the time delay within each single frequency bin is larger than one time bin, we can simply reduce the time resolution (i.e., bin). This process is analogous to rebinning before scanning dispersion measures above the "diagonal DM" using tree dedispersion.

Note that option 1 is more sensitive than option 2 in general as there might be a mismatch between the pulse phase and the boxcar phase. For further discussion, see Keane & Petroff (2015). In the MATLAB and Python codes we provide, we use option 1. The maximal time delay within each frequency bin is uniquely determined by dmax, the maximal d we want to scan, and is given by

Equation (24)

Therefore, this decision can be made prior to the computation.

A pseudocode of FDMT is given in Algorithm 1. In addition, we provide implementation for the algorithm in Python and MATLAB.5 Note that so far we have not treated rounding and binning issues; these are discussed in Section 4.2 and are implemented in the codes we provide.

Algorithm 1. The FDMT Algorithm

Input: $I(f,t)$ input matrix (possibly packed); t axis is continuous in memory.
Output: Time series of integrated flux density as a function of trial dispersion measures. Output is arranged in a (possibly packed) two-dimensional table ${A}_{{f}_{\min }}^{{f}_{\max }}(t,{\rm{\Delta }}t)$ where ${\rm{\Delta }}t$ represents the dispersion measure axis.
1: Initiate the table by ${A}_{f}^{f+{\delta }_{f}}(t,{\rm{\Delta }}t)=\tfrac{{\delta }_{t}}{{\rm{\Delta }}t}{\sum }_{j=0}^{{\rm{\Delta }}t/{\delta }_{t}}I(f,t+j{\delta }_{t})$
2: for iteration j = 1 to $j={\mathrm{log}}_{2}\,{N}_{f}$ do
3: for f0 in the range $[{f}_{\min },\,{f}_{\max }]$ with steps ${2}^{j}{\delta }_{f}$ do
4: ${f}_{2}={f}_{0}+{2}^{j}{\delta }_{f}$
5: ${f}_{1}=\tfrac{{f}_{2}+{f}_{0}}{2}$
6: ${C}_{{f}_{2},{f}_{0}}=\tfrac{{f}_{1}^{-2}-{f}_{0}^{-2}}{{f}_{2}^{-2}-{f}_{0}^{-2}}$
7: ${\rm{\Delta }}{t}_{\max }(j,{f}_{0})={d}_{\max }({f}_{0}^{-2}-{f}_{2}^{-2})$
8: for ${\rm{\Delta }}t$ in the range $[0,{\rm{\Delta }}{t}_{\max }(j,{f}_{0})]$ with steps ${\delta }_{t}$ do
9: for t0 in range $[{C}_{{f}_{2},{f}_{0}}{\rm{\Delta }}t,{N}_{t}]$ with steps ${\delta }_{t}$ for
10: ${t}_{1}={t}_{0}-{C}_{{f}_{2},{f}_{0}}{\rm{\Delta }}t$
11: ${A}_{{f}_{0}}^{{f}_{2}}({t}_{0},{\rm{\Delta }}t)={A}_{{f}_{0}}^{{f}_{1}}({t}_{0},{t}_{0}-{t}_{1})+{A}_{{f}_{1}}^{{f}_{2}}({t}_{1},{t}_{1}-{t}_{2})$
12: end for
13: for t0 in the range $[0,{C}_{{f}_{2},{f}_{0}}{\rm{\Delta }}t]$ with steps ${\delta }_{t}$ do
14: ${A}_{{f}_{0}}^{{f}_{2}}({t}_{0},{\rm{\Delta }}t)={A}_{{f}_{0}}^{{f}_{1}}({t}_{0},{t}_{0}-{t}_{1})$
15: end for
16: end for
17: end for
18: end for

Download table as:  ASCIITypeset image

4.1. Computational Complexity

To calculate the computational complexity, we need to trace the number of operations done throughout the algorithm. The amount of additions in iteration j is bounded from above by ${N}_{b}{N}_{t}{N}_{{\rm{\Delta }}}(j)$, where Nb is the number of subbands processed at the current iteration, and ${N}_{{\rm{\Delta }}}(j)$ is the number of distinct dispersion measures needed to calculate for a subband at iteration j:

Equation (25)

Equation (26)

As a first-order approximation, one can assume that the dispersion curve is almost linear, meaning that the number of unique dispersion trials needed in iteration $j+1$ is roughly twice the number needed in iteration j. In the last iteration, ${N}_{{\rm{\Delta }}}({j}_{\max })={N}_{{\rm{\Delta }}}$, and therefore, in iteration j,

Equation (27)

The number of bands (Nb) in each iteration is ${N}_{b}=\tfrac{{N}_{f}}{{2}^{j}}$. Therefore, under the approximation of almost linear dispersion (or narrow band), the following approximation is correct:

Equation (28)

Summing this for all iterations, and assuming NΔ is dominant in all iterations, and taking into account the number of entries in each added row (Nt), we get the complexity

Equation (29)

If we assume Nb is dominant, we get

Equation (30)

Therefore, the total complexity of the algorithm is bounded from above by

Equation (31)

Casting the complexity analysis of the FDMT algorithm with the more naturally defined ${N}_{s},{N}_{d}$, using ${N}_{s}={N}_{f}{N}_{t}$ and ${N}_{{\rm{\Delta }}}=\tfrac{{t}_{d}}{{\delta }_{t}}\leqslant \tfrac{{N}_{d}}{{N}_{f}}$ we get

Equation (32)

Adding to the above the complexity of data preparation by STFT, ${N}_{s}{N}_{\mathrm{pol}}{\mathrm{log}}_{2}({N}_{f})$, and the fact that if we chose ${N}_{f}\lt {N}_{p}$ we can effectively bin the input matrix $I(f,t)$ in both time and frequency domains by a factor $\tfrac{{N}_{p}}{{N}_{f}}$ (or low pass, see Section 5.1) to a final size of ${N}_{p}$, we get

Equation (33)

Here we can see that the data preparation complexity dominates the operation count of the algorithm whenever incoherent dedispersion is maximally sensitive (i.e., ${N}_{d}\lt {{N}_{p}}^{2}$).

4.2. Implementation Details

Here, we consider implementation issues, such as rounding and binning, pulse profile convolution, and dividing the band into an arbitrary number of frequency channels. In addition, it is important to implement the tricks of the trade, in order to transform the theoretical complexity reduction into a real speedup.

Rounding and Binning: The exact formulas written above need to take into account the discreteness of both frequency and time axes. To keep the formulas readable, we did not include these considerations in the algorithm description and pseudocode. However, we include them in the implementation we provide, and we advise readers who want to implement the FDMT to pay attention to the discretization process. That is because an incorrect choice might lead to a significant reduction in accuracy.

An example of the most important discretization issue is that, when combining two subbands, the point t1 where the dispersion curve travels from one subband to the next might not be well defined. This can happen because the dispersion curve might travel one bin between the end frequency of the first band ${f}_{0}+({2}^{i}-1){\delta }_{f}$ and the start frequency of the second band ${f}_{0}+{2}^{i}{\delta }_{f}$. The implemented solution for this problem is to calculate two versions of ${C}_{{f}_{0},{f}_{2}}$, one with the end frequency of the lower subband, and the other with the start frequency of the upper subband. Using the different versions of ${C}_{{f}_{0},{f}_{2}}$ in the two different uses of t1, we can account for a time shift between the added bands, approximating the dispersion curve better.

Machine Word Utilization: One can utilize the machine word width (or the Advanced Vector Extensions (AVX)) to pack a few instances of the dedispersion procedure into one computation (since modern computers operate on machine words of 64 bits, this will result in a speedup factor of 4–8 depending on the number of bits per frequency and the pulse maximum allowed strength).

Memory Access: An important issue in runtime reduction is the continuity of memory access. The FDMT algorithm never performs any reordering action along the time axis. Therefore, it is recommended to store the time axis continuously in order to speed up the memory-access operations. We note that this choice is not standard, and that most telescope data streaming systems store the data in time-major ordering. Therefore, a data transpose operation may be required in order to apply FDMT efficiently.

Applying Window Functions and Filter Banks: The process of transforming from the time domain to the time–frequency matrix was described in this paper as a simple short-time Fourier transform, though in practice it is generally recommended to use a polyphase filter bank instead. While neither the computation complexity nor the sensitivity of the transient search is affected by this, the resilience to radio frequency interference (RFI) and spectral leakage is greatly improved. As a result, this is not a main concern for this paper.

Mitigating the Sensitivity Loss or Squeezing Additional Speedup: It is possible to increase the sensitivity of a pulse search (or alternatively, viewing it as an additional speedup) by choosing a slightly lower time-dm resolution for the FDMT, marking all of the pulse candidates above some (low) threshold. Then, for each candidate we can scan the finer resolution $({t}_{0},d)$ in order to make the final decision using a score with maximal sensitivity.

An example scheme for saving computation time is as follows. In the first step, aim at half the maximal sensitivity of a full-blown search by using four times the optimal bin size for both the frequency and time dimensions. If a significant pulse is to be detected in the final stage, it should have significance larger than $\sim 8\sigma $. This means that at half the sensitivity, this pulse will have a mean significance of $4\sigma $, and thus, assuming the dedispersed scores have a Gaussian distribution, with high probability ($\gt 85 \% $) the correct $({t}_{0},d)$ combination will pass the $3\sigma $ threshold. Each random $({t}_{0},d)$ combination has a probability of roughly 1/750 to pass this threshold. Assuming the number of dispersion measures (in the coarse resolution) enumerated was ${N}_{{\rm{\Delta }}}\lt {10}^{3.5}$, then the computational work for dedispersing all of the candidates with maximal sensitivity is smaller than or equal to the complexity of the FDMT algorithm itself. Using this scheme, it is possible to save roughly an additional order of magnitude in the computation time of the dedispersion step without sensitivity losses or a significant false-negative probability. Using a similar scheme (with $4\sigma $ as threshold), it is possible to eliminate most of the built-in $\sqrt{2}$ sensitivity loss of the finite $({t}_{0},d)$ grid without significant additional work.

Different Range of Dispersion Measures: Sometimes we have a prior knowledge of the range of dispersion measures we need to scan. In that case, one can still employ the FDMT algorithm after an additional preparation of applying either a frequency-dependent shift to the input (according to some dmin) or a coherent dedispersion of the signal (using dmin).

Pulse Profile: Sometimes we have prior knowledge on the pulse width or profile (there might be a different profile for each frequency, like in pulse scattering). By applying a matched filter approach, one can convolve each frequency time series with the predicted profile for that frequency and employ the FDMT at the end. For wide-enough pulse shapes, one might consider binning the time resolution.

We note that convolution of the time axis with a uniform pulse profile (for all frequencies) commutes with the entire FDMT operation. Therefore, we can test a few pulse profiles per FDMT without repeating the dedispersion process. An important note is that such an operation changes the effective variance of the detection score in a nontrivial way (because the variance change in each DM trial will be different due to different path lengths in the time–frequency matrix). We therefore recommend either keeping careful accounting of the variance through the algorithm, or simply measuring it empirically for each combination of DM trial and pulse profile separately.

Dealing with the Case of ${N}_{f}\ne {2}^{k}$: The algorithm presented above assumes that the number of frequency channels is strictly a power of two. This assumption can be abandoned by slightly adjusting the addition rule to allow a merger of nonequal-size subbands. The only change needed is to switch f1 in Equation (20) from being the middle frequency between f0 and f2 to being the border frequency between the subbands.

Applying FDMT for Other Functional Forms: The dispersion equation (Equation (2)) is used only in the preparation of ${C}_{{f}_{2},{f}_{0}}$. One can easily extend the FDMT algorithm to search for other functional forms, for example,

Equation (34)

The only required modification is to change the power of the frequency in Equation (20) from −2 to γ. Furthermore, it could be extended to any family of curves that satisfies the condition that there is only one curve passing between any two points in the input data. Using this, one can calculate the required ${C}_{{f}_{2},{f}_{0}}$ by finding the only curve passing through both $({t}_{0},{f}_{0})$ and $({t}_{2},{f}_{2})$, and defining t1 to be the intersection time of the curve with the frequency f1. While the complexity of the algorithm may change with the functional form, for a sufficiently regular functional form, the complexity will be close to ${N}_{d}{N}_{t}\mathrm{log}\,{N}_{f}.$

5. ELIMINATING SHIFTS BY FAST FOURIER TRANSFORMING THE TIME AXIS

In modern computers and GPUs, memory access is frequently the bottleneck of many algorithms, especially when programming transforms, where the computational complexity is only slightly larger than the data size. Efficient implementation of transform algorithms is nontrivial and requires architecture-dependent changes in order to avoid cache misses6 (in a general CPU setup) or to avoid communication when using distributed computing. While it is probably possible to control the behavior of the algorithm as presented above, it is nontrivial to distribute the data between different processing units while avoiding duplication and communication issues.

Here, we present a variant of the algorithm that is easily parallelized on all architectures and where the memory-access pattern is as parallelization-friendly as possible.

The algorithm, as it is described in Section 4, has only one core operation: adding a complete shifted "time" row. It is the shift operation that makes the data transfer and memory management of the algorithm challenging, and therefore we wish to eliminate shifts from the algorithm. In order to do that, we can Fourier transform the time axis. This makes the shift operation become a multiplication with a "shift vector" that is the Fourier transform of a shifted delta function. In this version, all additions are of numbers from the same (Fourier transformed) time coordinate.7 Therefore, we can assign different parts of the (Fourier transformed) time axis to different processing units and consequently reduce the need for shared memory or data transport. At the end, we need to Fourier transform back the time axis. We call this algorithm FFT–FDMT, and it is summarized in Algorithm 2. Tracking the data in this algorithm, we can see that there are only two "global" steps and that they are both transpose operations of the data. To perform all other steps of the algorithm, we need only to access memory that is not larger than one row or one column of the input. Since present L1 cache architectures can contain more than a typical row or column of data, the algorithm can be computing-power limited. The runtime of this algorithm on any machine is comparable to the runtime of two-dimensional convolution, because of the similar number of operations and data-access patterns. We note that in the basic preparation of radio data, one often applies Fourier transforms (for example, when applying filters or screening for radio frequency interference). Therefore, if we have the computational ability to prepare the input table from the raw data, FFT–FDMT is also feasible.

Algorithm 2. The FFT–FDMT Algorithm

Input: $I(f,t)$ input matrix (possibly packed); t axis is continuous in memory.
Output: Time series of integrated flux density as a function of trial dispersion measures. Output is arranged in a (possibly packed) two-dimensional table ${A}_{{f}_{\min }}^{{f}_{\max }}(t,{\rm{\Delta }}t)$ where ${\rm{\Delta }}t$ represents the dispersion measure axis.
1: Initiate the table by
2: Initiate the "shift vector" $V(\tilde{{t}_{0}},{\rm{\Delta }}t)={ \mathcal F }(\delta ({\rm{\Delta }}t))(\tilde{{t}_{0}})$ where $\delta (x)$ is a vector containing 1 at position x and zeros everywhere else, ${ \mathcal F }$ is the FFT operator, and $\tilde{{t}_{0}}$ is the index of the Fourier transformed time axis.
3: Fourier transform the time axis
4: Transpose the data. After this action, the frequency and ${\rm{\Delta }}t$ axes should be continuous in memory, and the time axis should be distributed across all computing units.
5: for $\tilde{{t}_{0}}$ in the range $[0,{N}_{t}]$ do
6: for j in the range $[1,{\mathrm{log}}_{2}\,{N}_{f}]$ do
7: for f0 in the range $[{f}_{\min },\,{f}_{\max }]$ with steps ${2}^{j}{\delta }_{f}$ do
8: ${f}_{2}={f}_{0}+{2}^{j}{\delta }_{f}$
9: ${f}_{1}=\tfrac{{f}_{2}+{f}_{0}}{2}$
10 ${C}_{{f}_{2},{f}_{0}}=\tfrac{{f}_{1}^{-2}-{f}_{0}^{-2}}{{f}_{2}^{-2}-{f}_{0}^{-2}}$
11: ${\rm{\Delta }}{t}_{\max }(j,{f}_{0})={N}_{{\rm{\Delta }}}\tfrac{{f}_{0}^{-2}-{({f}_{0}+{2}^{j}{\delta }_{f})}^{-2}}{{f}_{\min }^{-2}-{f}_{\max }^{-2}}$
12: for ${\rm{\Delta }}t$ in the range $[0,{\rm{\Delta }}{t}_{\max }(j,{f}_{0})]$ do
13: ${\rm{\Delta }}{t}_{1}={C}_{{f}_{2},{f}_{0}}{\rm{\Delta }}t$
14:
15: end for
16: end for
17: end for
18: end for
19: Transpose the data back. Now, time is again continuous in memory.
20: Perform inverse Fourier transform on the time axis

Download table as:  ASCIITypeset image

5.1. Comments on the Implementation of the FFT–FDMT Algorithm

The FFT–FDMT algorithm is designed to increase the amount of computation per cache replacement. To completely optimize the algorithm for this property, we have to consider special implementation details like cache size and processing unit communication geometry. Though important to an efficient implementation of the algorithm, these details are outside the scope of this paper as they are dependent on architecture. We note that all the details discussed in Section 4.2 are valid also for the FFT–FDMT version, except for the changes listed below.

Machine Word Utilization: The long integer data type is the optimal choice for the regular FDMT algorithm in order to fully utilize the machine word capability. In the Fourier-transformed version of the algorithm, we have to use the complex floating-point data type. Using the floating-point data type, we have to leave unused the bits of the exponent field and leave some more bits unused to retain the floating-point precision needed to perform the Fourier transform operations. Furthermore, some architectures such as GPUs have a clear optimization preference for the 32 bit floating-point data type. However, it is possible to pack another algorithm instance into the complex field of the input vector. Since the result of the FDMT algorithm is real (as a sum of real numbers), packing another input into the imaginary part of the input is possible. The imaginary part of the result will be the second algorithm instance.

Pulse Profile: In addition to the ability to test several pulse profiles per FDMT operation, as explained in Section 4.2, we can further exploit the use of the Fourier-transformed time domain. If the pulse width is slightly larger than one bin, reducing the computational load by binning loses information. Instead, we can effectively apply a low-pass filter on the time axis by either keeping fewer (time-domain) frequencies or multiplying with a filter. This can be both more sensitive than binning the time axis and more efficient than having a high sampling rate.

Handling Large Dispersion Measures: If the maximum dispersion broadens the pulse to more than one time bin per frequency bin, the initialization phase of the algorithm inflates the data from size ${N}_{t}{N}_{f}$ to size ${N}_{{\rm{\Delta }}}{N}_{f}$ (note that the use of ${N}_{{\rm{\Delta }}}\gg {N}_{f}$ is losing sensitivity, and therefore this part is not considered a crucial part of the algorithm). The partial sum operation of the initialization phase is equivalent to an application of a low-pass filter on the time axis. This allows a natural reduction of computation and memory by saving a differential amount of Fouriered time bins for different ${\rm{\Delta }}t$'s. This can be used in the case of large dispersion measures to reduce the algorithm's complexity from ${N}_{{\rm{\Delta }}}{N}_{t}{\mathrm{log}}_{2}({N}_{f})$ to $2{N}_{t}{N}_{f}{\mathrm{log}}_{2}\left(\tfrac{{N}_{{\rm{\Delta }}}}{{N}_{f}}\right)$.

Zero Padding: Since convolution is a cyclical operation, all the shifts done in this algorithm are cyclical shifts. Therefore, we have to pad the time axis with NΔ zeros prior to the Fourier transform. This operation can increase by a factor of two the complexity of the algorithm if ${N}_{{\rm{\Delta }}}={N}_{t}$. To avoid this we can choose ${N}_{t}\gg {N}_{{\rm{\Delta }}}$. This is usually possible if the size of the input table is not too close to the maximum memory (or cache) capacity of the machine used.

6. RUNTIME AND BENCHMARKING

Accurate benchmarking of algorithms should use a mature code and contain architecture-dependent adaptations. However, it is important to demonstrate that the code we present, running on a single standard CPU, is competitive with the brute-force dedispersion implementations on GPUs. Therefore, we provide a simple benchmark for the provided code.

The benchmark we use is the runtime of performing FDMT on data with the following properties: ${N}_{f}={2}^{10}$, ${N}_{t}=5\times {2}^{16}$, and ${N}_{{\rm{\Delta }}}={2}^{10}$. This volume of input is similar to the one used in the "toy observation" defined in Barsdell et al. (2012) and Magro et al. (2011), ${N}_{f}={2}^{8}$, ${N}_{t}={2}^{19}$, and ${N}_{{\rm{\Delta }}}=500$. However, we modified the partition between ${N}_{f},{N}_{t}$ and increased Nd by a factor of two.8 The runtime we achieve on this data is 3.5 s on a standard Intel Core i5 4690 processor. For example, these numbers can represent a real-time dedispersion of 8 s of input data with 40 MHz bandwidth and 1024 dispersion trials. To get this benchmark, we pack five instances of the algorithm into the 64 bit machine word, allocating 12 bits to each instance. The resulting packed data set has dimensions ${N}_{t}={2}^{16},{N}_{f}={2}^{10}$ and serves as input to the FDMT implementation. Using this scheme, we find that our runtime is already shorter than that of the state-of-the-art brute-force implementations on GPUs reported in Barsdell et al. (2012) and Magro et al. (2011). A comparison between the run times is shown in Table 2.

Table 2.  Runtime Comparison

  This Work Magro et al. (2011) Barsdell et al. (2012)
Machine used Intel Core i5 4690 Tesla C1060 GPU Tesla C1060 GPU
Programming language Python (anaconda + accelerate) C C
Number of instances packed 5 1 1
Runtime 3.5 s 4.8 s 2.1 s
${N}_{f},{N}_{t}(\mathrm{total}),{N}_{{\rm{\Delta }}}$ ${2}^{10},5\times {2}^{16},{2}^{10}$ ${2}^{8},{2}^{19},500$ ${2}^{8},{2}^{19},500$
${N}_{f}{N}_{t}{N}_{{\rm{\Delta }}}$ $5\times {2}^{36}$ 236 236
Algorithm used FDMT (non-FFT version) Brute force Brute force
Algorithm theoretical complexity ${N}_{t}{N}_{f}+{N}_{t}{N}_{{\rm{\Delta }}}{\mathrm{log}}_{2}({N}_{f})$ ${N}_{f}{N}_{t}{N}_{{\rm{\Delta }}}$ ${N}_{f}{N}_{t}{N}_{{\rm{\Delta }}}$

Note. The FDMT algorithm has a different computational complexity scaling than the brute-force dedispersion it is compared to. The implementation we provide was intended to serve as a high-level reference implementation for future highly efficient, platform-specialized implementations. But, even with a standard desktop computer, our high-level FDMT implementation is faster than the existing GPU implementations of brute-force dedispersion. Our MATLAB code has similar performance.

Download table as:  ASCIITypeset image

7. BRIDGING THE GAP BETWEEN COHERENT AND INCOHERENT DEDISPERSION

Since some interesting transient sources such as pulsar giant pulses are in the regime $1\ll {N}_{p}\lt \sqrt{{N}_{d}}$, it is important to find a feasible and sensitive algorithm for their exact dedispersion. Coherent dedispersion was, until now, the only sensitive alternative. The noise power summed when searching for a pulse that is dispersed with a dispersion measure d0 is

Equation (35)

The noise power summed when searching for a nondispersed pulse is Np, and therefore the largest dispersion tolerable (denoted by ${\delta }_{d}$) for sensitive pulse detection satisfies

Equation (36)

Therefore, for sensitive detection, the number of dispersion measure trials we need to process is

Equation (37)

The convolution operation performed for coherent dedispersion can be efficiently calculated with Fourier transforms of size ${N}_{p}$, and therefore the complexity of coherent dedispersion is

Equation (38)

Noting that the computational complexity of coherent dedispersion scales with ${N}_{d}/{N}_{p}$ and that of incoherent dedispersion scales with ${N}_{d}/{N}_{p}^{2}$, we see that using coherent dedispersion is not computationally efficient for resolved pulses (i.e., ${N}_{p}\gt 1$).

7.1. Hybrid Algorithm for Dedispersion

In order to have both the detection sensitivity of coherent dedispersion and the computational complexity of FDMT, we propose the following solution: coherently dedisperse the raw signal with coarse-trial dispersion values (with steps $\delta d$), and then apply STFT and absolute value squared, followed by FDMT with the maximal dispersion being the next coarse-trial coherent dedispersion. This process ensures that the FDMT will not lose sensitivity, relative to coherent dedispersion.

We denote by ${N}_{\delta d}$ the number of bins of length τ that a delta function pulse will spread over when dispersed by $\delta d$:

Equation (39)

As shown in Section 2, in order to retain sensitivity, the maximal dispersion residual to be processed by the following FDMT must be bounded from above by

Equation (40)

(The factor of 2 in the above equation is due to the fact that FDMT can be applied to also find negative dispersions). Therefore, the number of dispersion trials we need to coherently dedisperse is

Equation (41)

This process is approaching maximum sensitivity, and its complexity is

Equation (42)

where the right-hand part was computed simply by observing that the complexity of FDMT is exactly twice the input dimensions plus $\mathrm{log}({N}_{f})$ times the output dimensions. Simplifying, we get the computational complexity for detection of a pulse of length ${N}_{p}$:

Equation (43)

7.2. Using Convolving Filter Banks

The coarse enumeration using coherent dedispersion may contain very large FFTs, which may be prohibitive because of the need to access large volumes of data simultaneously. This could be mitigated in two ways. The first is by using the minimal dedispersion kernel ${H}_{{d}_{0}}$ to dedisperse the signal from the last iteration, thereby linearly shifting the dispersion measure phase space scanned in each iteration by the correct amount with FFTs of size Np2 instead of Nd2. We can save even further by performing small convolutions on the already channelized data, ${I}_{c}(t,f)=\mathrm{STFT}(x(t))$, and dedisperse each channel separately using coherent dedispersion. This process is known as "convolving filter bank" and is detailed in van Straten & Bailes (2011). By combining both methods, we can reduce the complexity of the coarse coherent dedispersion stage to linear time. Essentially, using this process we can use time-domain complex convolutions on the already STFTed data with convolution kernels of very small sizes (which can be as little as convolving with a ∼4 tap filter (in addition to a large shift)). This reduces further the computational complexity of hybrid dedispersion to be

Equation (44)

where we have aggregated the complexity of all the short $O\left(\tfrac{{N}_{d}{N}_{s}{N}_{\mathrm{pol}}}{{{N}_{p}}^{2}}\right)$ operations (convolving filter bank, absolute value squared, and FDMT initialization into $c\sim 7$).

This complexity is near optimal because the number of uncorrelated scores is $\tfrac{{N}_{d}{N}_{s}}{{{N}_{p}}^{2}}$, which is only a logarithmic factor smaller than the computational complexity. Therefore, there is not much room for further reduction of computational complexity. The algorithm is summarized in Algorithm 3.

Algorithm 3. Coherent Hybrid FDMT Dedispersion Algorithm

Input: Antenna voltage series x(t).
Output: Time series of integrated flux density as a function of trial dispersion measures. $d\lt {d}_{\max }$ with steps $\tfrac{{N}_{p}}{{N}_{d}}{d}_{\max }$ and all exit times ${t}_{0}\lt {N}_{s}\tau $ with steps ${N}_{p}\tau $.
1: Apply STFT with block size ${N}_{p}$ on x(t) to obtain ${I}_{c}(t,f)$.
2: for dispersion d0 in the range $[0,{d}_{\max }]$ in steps of $2\tfrac{{{N}_{p}}^{2}}{{N}_{d}}{d}_{\max }$ do
3: Apply the filter ${\hat{H}}_{{d}_{0}}(f)=\exp \left(\tfrac{2\pi {{id}}_{0}}{f+{f}_{0}}\right)$ to ${I}_{c}(t,f)$ via convolving filter bank (i.e., apply in separate channels), perform absolute value squared, and sum over all polarizations to get $I(t,f)$.
4: Apply FDMT to $I(t,f)$, with $-{N}_{p}^{2}\tau \leqslant {\rm{\Delta }}t\leqslant {N}_{p}^{2}\tau $, and output the partial result ${A}_{{f}_{\min }}^{{f}_{\max }}({d}_{0}+d,{t}_{0})$.
5: end for

Download table as:  ASCIITypeset image

7.3. Implications

Using this algorithm, it is possible to perform blind searches for pulses with duration in the 1 μs to 0.1 ms regime (which implies ${N}_{p}={10}^{2}\mbox{--}{10}^{4}$ for standard searches). Searching for pulses with this short duration opens a parameter space that had never been scanned before in a blind survey, and allows us to feasibly search for pulsars via their giant pulse phenomenon.

8. BLIND TRANSIENT SEARCHES WITH RADIO INTERFEROMETERS

There are two existing approaches to blind search interferometry. The first is to add antennas incoherently and then dedisperse. This is considered to be computationally feasible, and it is sensitive to the entire field of view of the interferometer. However, this process loses the angular resolution and reduces the sensitivity by a factor of $\sqrt{{N}_{{\rm{a}}}}$, where Na is the number of antennas.

The second approach is to beam-form and dedisperse: for every searched location $({p}_{x},{p}_{y})$, shift all the signals from all antennas with the correct shift for position $({p}_{x},{p}_{y})$, add them up, and perform dedispersion. To mitigate the computational load of this process, it is customary to use only a small subset of all sky locations at a time, considerably reducing the overall survey speed of the instrument.

Another possibility is to use a combination of both approaches by dividing the interferometers into closely packed stations, beam-forming all stations to a subset of all possible directions, and then incoherently adding the stations. All methods trade the computational unfeasibility with a significant sensitivity reduction. These approaches are further discussed in van Leeuwen & Stappers (2010), and a review on the existing algorithms for blind transient searches using interferometers can also be found in Bannister & Cornwell (2011).

8.1. Incorporating the FDMT Algorithm into Transient Searches of Radio Interferometers

In this section, we analyze the complexity of blind searches of short astrophysical signals with radio interferometers using FDMT. We first calculate the computational and communication complexity of applying the FDMT algorithm after the imaging operation. Moreover, we offer a way to reduce the communication complexity by applying the FDMT algorithm after the correlator operation and before the imaging operation. We show that in principle, using our scheme, it is possible to use modern radio interferometers to detect and locate short astrophysical pulses in real time without knowledge of the dispersion measure.

We start by introducing some additional notation. In the scenario of a blind dispersed pulse search with a radio interferometer, we have signals from several telescopes. We denote the raw voltage signal from the jth telescope by xj. We further denote by Na the number of antennas, and by Nl the number of distinct locations on the sky, or pixels, in the optimal image resolution of the interferometer. The desired statistic that we need to calculate for efficient detection is given by

Equation (45)

where ${u}_{i}({p}_{x},{p}_{y})$ represents the time delay of the signal at antenna i, H is the dedispersion filter needed to be convolved with to correct for dispersion, and ∗ represents convolution. We wish to calculate this score for all combinations of sky positions, dispersion trials $\tfrac{{N}_{d}}{{N}_{p}}$, and start times $\tfrac{{N}_{s}}{{N}_{p}}$. Therefore, the number of calculated scores is $\tfrac{{N}_{l}{N}_{s}{N}_{d}}{{{N}_{p}}^{2}}$.

To illustrate that the search for transients using interferometers is feasible, we estimate the number of computations required for searching FRBs with the core of MeerKAT: ${f}_{\min }=\,1\,\mathrm{GHz}$, ${f}_{\max }=\,1.75\,\mathrm{GHz}$, $\nu =750\,\mathrm{MHz}$ is the baseband sampling frequency, ${t}_{d}=3.6\,{\rm{s}}$ (corresponds to a target DM of 1000), ${t}_{p}=1\,\mathrm{ms}$. Using Na = 48 antennas of diameter ${D}_{a}=13.5\,{\rm{m}}$, spread out to a maximal baseline of ${D}_{b}=1\,\mathrm{km}$${N}_{l}=\tfrac{\pi }{4}{\left(2\tfrac{{D}_{b}}{{D}_{a}}\right)}^{2}\approx 17,300$ (covering twice the primary beam size in imaging), ${N}_{s}=0.75\times {10}^{9}$, ${N}_{p}=0.75\times {10}^{6}$, and ${N}_{d}=2.7\times {10}^{9}$, we get $6\times {10}^{10}$ scores per second, which means that the computing power required for performing FDMT (estimating the logarithmic factor in Equation (32) to be 16) is $\sim 1\,\mathrm{TFlop}/{\rm{s}}$ to process. For comparison, this computational complexity is an order of magnitude less than the required complexity for correlating the signals (∼9 TFlop/s). To show the importance of performing dedispersion via FDMT instead of by brute force in the case of interferometers, we calculate the computational complexity of the dedispersion step in the same setup using brute-force dedispersion to be 220 TFlop/s.

8.2. The Proposed Solution

First, we quickly review the standard imaging process of interferometry, using the approximations of a flat sky and short observation. Assuming there is no dispersion, the desired score is

Equation (46)

denoting by $\hat{x}$ the Fourier transform of x. To efficiently calculate this score for every pixel ${p}_{x},{p}_{y}$, we can use the relation

Equation (47)

denoting by Lj the two-dimensional location of antenna j on the plane (under the approximation of having all antennas on the same plane). Now we can calculate the score at all positions $({p}_{x},{p}_{y})$ at the same time by a two-dimensional Fourier transform of the array:

Equation (48)

Equation (49)

where ${\delta }^{2}((a,b))$ is equal to one if $(a,b)=(0,0)$ (to the desired approximation), and zero otherwise.

The summation in Equation (48) is a sum of squares. This means that coherent dedispersion operations must be performed before correlating9  because the imaging process calculates the sum of the squared absolute values of the voltages.

Incorporating dedispersion into this, we can see that the block size ${N}_{f}$ we used earlier is transformed in this framework to the size of the Fourier transform done by the correlator. Because of the unknown dedispersion, at each image location we need to get a time–frequency matrix. This means that we need to perform a different two-dimensional Fourier transform operation for each narrow frequency band. After that, we can apply FDMT for every pixel's time–frequency matrix. Denoting the complexity of the ith step of the algorithm by Ci, the complexity of the coherent dedispersion + STFT of all individual antennas is

Equation (50)

The complexity of correlating all pairs of antennas is

Equation (51)

The complexity of the imaging process is

Equation (52)

The complexity of the FDMT algorithm (without the STFT part, which was already done in this context) is

Equation (53)

So, the total complexity of this process is

Equation (54)

While the complexity of this process is indeed "optimal," in the sense that it is only a logarithmic factor larger than the number of independent results, implementing this will result in a reduced computational efficiency. This is due to data transport between the imaging stage and the dedispersion stage. Between these stages, $\tfrac{{N}_{l}{N}_{s}{N}_{d}}{{{N}_{p}}^{2}}$ complex numbers are being transported.

This could be mitigated by the fact that dedispersion can be done before the imaging operation.

In order to move the FDMT operation before the imaging operation, we need to take some care for chromatic aberrations. If the frequency band is wide enough, then different frequencies might represent different locations in the (u, v) plane. The location on the (u, v) plane where we need to put a correlator output of frequency f and a pair of antennas $j,k$ is

Equation (55)

Therefore, requiring that the visibilities of all frequencies in the frequency band will be on the same bin in the (u, v) plane yields

Equation (56)

where $\delta {p}_{{uv}}$ is the angular frequency difference between adjacent grid points in the (u, v) plane. If the band is wide or the field of view is large, this condition will not hold for pairs of far-apart antennas. In this case, it is necessary to split the frequencies into subbands that are narrow enough to maintain Equation (56), and after that, perform FDMT on each subband separately. Then, for each final dispersion measure (at the final dispersion resolution, not in the subband's resolution), we can shift the frequency bands in time accordingly, grid the visibilities in the optimal way, and perform the imaging operation via a two-dimensional FFT to get the optimal detection scores in the image plane.

Another issue is that if tp is short and the required field of view is large enough, it may happen that the time delay for the pulse's arrival between different antennas will be larger than tp. In such a case, it is important to facet the field of view to areas that are small enough that the time delay of the pulse's arrival time (with respect to the central direction) to different antennas across the narrowed field of view is smaller than tp. Then, perform this search algorithm to detect bursts within each facet.

Since the FDMT's input and output dimensions have the same size, the communication complexity of the proposed solution is $\tfrac{{N}_{a}({N}_{a}-1){N}_{s}{N}_{d}}{2{{N}_{p}}^{2}}$, which should (if ${N}_{a}^{2}\ll {N}_{l}$) make the algorithm's runtime be computation limited and thus feasible.

This summation-before-imaging scheme was already used by Law et al. (2015) without the preceding step of performing FDMT to reduce the computation time of the dedispersion step. It is important to note that, using our scheme, the computation required to perform the imaging part is always larger than that of the dedispersion part. This is not true when using brute-force dedispersion, which is the most computationally demanding stage when using a combination of a dense antenna placement, large bandwidth, and fine time resolution and when searching for pulses with high dispersion measures. For example, see the computational complexity stated above for searching FRBs with MeerKAT. This process is summarized in Algorithm 4.

Algorithm 4. Finding Short Pulses with Interferometers

Input: Antenna voltage series.
Output: $S(d,{t}_{0},{p}_{x},{p}_{y})$ for every time, dispersion, and sky location.
1: for dispersion d0 in the range $[0,{d}_{\max }]$ in steps of
do
2: Initialize $\hat{S}({t}_{0},f,{p}_{u},{p}_{v})=0$
3: for antenna index j. do
4: Create the signal xj by convolving the jth antenna signal with the dedispersion filter with index d0.
5: Apply STFT with block size of ${N}_{f}$ on the signal xj to obtain ${\hat{x}}_{j}$.
6: end for
7: for every pair of antennas $j,k$ do
8: for each $t,f$ do
9: $({p}_{u},{p}_{v})=\tfrac{({L}_{j}-{L}_{k})f}{c}$
10: $\hat{S}(t,f,{p}_{u},{p}_{v})=\hat{S}(t,f,{p}_{u},{p}_{v})+{\hat{x}}_{j}(t,f){\hat{x}}_{k}(t,f)$
11: end for
12: end for
13: for Each populated $({p}_{u},{p}_{v})$ do
14: ${I}_{{p}_{u},{p}_{v}}(t,f)=\hat{S}(t,f,{p}_{u},{p}_{v})$
15: Apply FDMT with ${d}_{\max }={d}_{\mathrm{step}}$.
16: end for
17: Data transpose operation. Each processing unit now holds tables of the form ${\hat{S}}_{t,d}({p}_{u},{p}_{v})=\hat{S}(t,d,{p}_{u},{p}_{v})$
18: for each dispersion d and time t do
19: Perform two-dimensional inverse Fourier transform to calculate $S(t,d+{d}_{0},{p}_{x},{p}_{y})={{ \mathcal F }}^{-1}({\hat{S}}_{t,d}({p}_{u},{p}_{v}))$.
20: end for
21: end for

Download table as:  ASCIITypeset image

9. CONCLUSION

We present the FDMT algorithm, which performs an exact incoherent dedispersion transform with a complexity of $2{N}_{t}{N}_{f}+{N}_{f}{N}_{{\rm{\Delta }}}{\mathrm{log}}_{2}({N}_{f})$. We show that regular implementation tricks of the trade can be combined with the FDMT algorithm to achieve significant computation speedup. We also present a variant of the FDMT algorithm that is slightly more computationally intensive but concentrates all memory-access operations in two global transpose operations, and might present further speedup on massively parallel architectures such as GPUs. We show that the FDMT algorithm dominates all other known algorithms for incoherent dedispersion and has complexity comparable to the signal-processing operations required to generate its input data. Therefore, we conclude that incoherent dedispersion can now be considered a nonissue for future surveys. We provide implementations of the FDMT algorithm in high-level programming languages, which when run on a desktop computer achieve a faster runtime than the state-of-the-art implementations of brute-force dedispersion on cutting-edge GPUs.

We further present an algorithm that bridges the gap between coherent and incoherent dedispersion, and we show that the computational complexity of this algorithm is orders of magnitude lower than that of coherent dedispersion for pulses of resolvable duration. Using this algorithm, it will be possible to perform blind searches for giant pulse emitting pulsars with the sensitivity of coherent dedispersion searches.

Last, we compute the operation count for a blind search of short astrophysical bursts, such as FRBs, with modern radio interferometers and arrive at the conclusion that it is computationally feasible using existing facilities.

We thank the anonymous referee for suggesting the use of convolving filter banks in the hybrid-coherent dedispersion algorithm. In addition, we thank the referee for his useful comments and corrections that greatly improved the readability and quality of the paper. B.Z. would like to express his deep thanks to Gregg Halinnan for introducing him to the problem of dedispersion. The authors would like to express their thanks to Avishay Gal-Yam, Shrinivas Kulkarni, Matthew Bailes, David Kaplan, Ora Zackay, and Gil Cohen for their useful comments and advice regarding the paper. E.O.O. is the incumbent of the Arye Dissentshik career development chair and is grateful to support by grants from the Willner Family Leadership Institute Ilan Gluzman (Secaucus NJ), Israeli Ministry of Science, Israel Science Foundation, Minerva, and the I-CORE Program of the Planning and Budgeting Committee and The Israel Science Foundation.

Footnotes

  • Sometimes, an additional stage of binning is then applied to reduce the time resolution.

  • We define efficiency as the ratio between computational operations performed and the amount of calculated scores, each score being the optimal S/N statistic for detecting a pulse with the specific combination of starting time and dispersion measure.

  • We call a statistical score or algorithm insensitive if it retains substantially less than $\sim 70 \% $ of the optimal S/N.

  • A subband of the data is a part of the data that is both limited and continuous in frequency.

  • The Python code is available from https://sites.google.com/site/barakzackayhomepage/home, and the MATLAB code is available from http://webhome.weizmann.ac.il/home/eofek/matlab/ (Ofek 2014).

  • In modern computers, the fastest memory buffer is the L1 cache. An access to a value that is not stored in the L1 cache causes a memory read from slower storage media such as the L2 cache or the RAM memory, and this is sometimes called a "cache miss."

  • The Fourier transform of the time axis yields what is known in the literature as the fluctuation power spectrum, α.

  • This is a more realistic choice, since using large ${N}_{{\rm{\Delta }}}\gt {N}_{f}$ usually loses sensitivity (see Section 2), and the number of frequencies is usually larger than 210.

  • The process of calculating ${\hat{x}}_{j}({t}_{0},f){\hat{x}}_{k}^{* }({t}_{0},f)$ is referred to as "correlating" in the literature and is calculated by a computing infrastructure usually called "the correlator."

Please wait… references are loading.
10.3847/1538-4357/835/1/11