A Real-time, Pipelined Incoherent Dedispersion Method and Implementation in FPGA

In pulsar observation, dispersion occurs due to the interstellar medium. The dispersion significantly affects the detection of pulsar signals. To overcome the dispersion effect, incoherent dedispersion methods are often applied. The tranditional inchoherent dedispersion methods are computationally expensive and troublesome. To deal with this problem, in this paper, we developed a Real-Time, Pipelined Incoherent Dedispersion Method (RT-PIDM). RT-PIMD only caches the summed-up time series, instead of all the frequency spectra, so the memory consumption is determined by the number of DM trails, whereas the traditional method’s memory consumption is determined by the number of frequency channels. In most of the situations, the number of frequency channels is several times more than that of DM trails, which means the memory consumption of traditional methods is more than that of RT-PIDM. With RT-PIDM, we designed a 1.2 GHz bandwidth prototype digital backend, and we finished pulsar observation with the 40 m radio telescope at Yunnan Observatory. The results demonstrate that the RT-PIDM can be implemented inside a single FPGA chip with less Block RAM, and the proposed RT-PIDM dedisperses the pulsar signal in real time and achieves the same result as compared to traditional incoherent dedispersion.


Introduction
Pulsars are highly magnetized neutron stars that rotate rapidly (Hewish et al. 1968;Bonazzola & Gourgoulhon 1996;Abbott et al. 2010;Caraveo et al. 2003;Stinebring et al. 2000). Pulsars produce radio beams, which sweep the sky like a lighthouse (Zhu-Xing et al. 2014). If the beam is oriented toward the earth, it will produce periodic pulses, which can be measured with radio telescopes and dedicated backends. During their travel through installer medium (ISM), these pulses are attenuated and speared over time (higher frequencies will arrive earlier than lower frequencies). This phenomenon is called dispersion, which makes these pulses hard to be detected without further processing. Meanwhile, the improvements on the observing bandwidth (Deneva 2007;Siemion et al. 2010;Baffa 2014;Vertatschitsch et al. 2015;Liu et al. 2018) and multi-beam receivers (Manchester et al. 2001;Lorimer et al. 2015) result in a large amount of data, so real-time dedispersion is vital in pulsar observation. Traditionally, two techniques are used to dedisperse the pulsar signals: coherent dedispersion and incoherent dedispersion. Coherent dediserprsion technique deconvolves the transfer function of dispersion from the complex voltage signal. Incoherent dedispersion technique aligns the small frequency channels in time.
Previously, a lot of astronomical instruments with real-time dedispersion have been developed. In the mid 1980 s, Hankins et al. developed the first real-time coherent dedispersion system at the Arecibo observatory using a hardware-based implementation (Hankins & Rajkowski 1987). In their system, a quadchirped transversal filter with 2MHz bandwidth was used. Similarly, D.C Backer et al. developed the Berkeley Pulsar Processor (BPP) instrument, which used a CMOS 1024-tap transversal filter. BPP finishes the real-time coherent dedispersion, with 100-200 MHz total bandwidth (maximum) and 50-100 channels. (Backer et al. 1996).
As the conventional radio astronomy instruments are highly specialized, with custom, complex, dedicated for individual applications, it requires 3-5 yr for designing and debugging the system (Parsons et al. 2006). Therefore, some new technologies are necessary for radio astronomy. Since the pioneering work of Weinreb (Weinreb 1961), digital processing hardware is widely used in radio astronomy. In the past decades, Field-Programmable Gate Arrays (FPGAs), Central Processing Units (CPUs), and Graphics Processing Units (GPUs) are used as the cores of the radio astronomy instruments. These digital devices are re-programmable and flexible in developing and debugging, which can help reduce the design time of new instruments.
With the digital processing hardware, some new digital backends were developed for new telescopes with wideband receivers in the recent years. The PuMa-II backend at the Westerbork Synthesis Radio Telescope (WSRT) demonstrated that a CPU cluster can be used to implement a near real-time coherent dedispersion system for a bandwidth of 160 MHz, by dividing the bandwidth into 8 sub-bands, and then dedispersing and combining the signal (Karuppusamy et al. 2008). The Green Bank Ultimate Pulsar Processing Instrument (GUPPI), functioning on FPGAs, a CPU cluster and a GPU cluster of 8 GPUs, implements a wide bandwidth coherent dedispersion system supporting a total bandwidth of 800 MHz (Ford et al. 2010). A fully real-time coherent dedispersion system based on GPUs has also been developed for the pulsar backend at the Giant Metrewave Radio Telescope (GMRT) (Kishalay & Yashwant 2016).
In the past few years, CPUs and GPUs are important for coherent dedispersion, which is computationally expensive. Compared to coherent dedispersion, incoherent dedispersion does not get the high time resolution (Hankins 2017), but it requires less computational resources. Therefore, some researchers implemented real-time incoherent dedispersion in an FPGA. Clarke, N et al. implemented a new incoherent dedispersion method optimized for FPGA-based architectures intended for deployment on the Australian SKA Pathfinder and other Square Kilometre Array precursors for fast transients surveys (Clarke et al. 2014). Even though the implementation of incoherent dedispersion is simpler than that of coherent dedispersion, it still requries some memories for caching the spectra data. The memory requirement is mainly determined by the number of frequency channels.
To get the better performance, more frequency channels are required for incoherent dedispersion, which means the size of Block RAM in FPGA is not large enough for the incoherent dedispersion. In this paper, we present a Real-Time, Pipelined Incoherent Dedispersion Method (RT-PIDM). The memory requirement for RT-PIDM is determined by the number of DM trails, instead of the number of frequency channels, so that the real-time incoherent module can be implemented in a single FPGA without external memory. With this feature, we designed a 1.2 GHz bandwidth digital backend with a FPGAbased, real-time incoherent dedispersion module. It is implemented in a single Xilinx Virtex-6 LX240T FPGA chip, so it is compact and power-saving.
In the following sections, we present a digital backend with the real-time, pipelined incoherent dedispersion module. In Section 2.1, we present the signal processing architecture implemented in a single FPGA. The detailed comparison between coherent dedispersion and incoherent dedispersion is shown in Section 2.2.1. The real-time, pipelined incoherent dedispersion method and its implementation is shown in Sections 2.2.2 and 2.2.3. The experiment results at Yunnan Observatory are shown in Section 3. The paper closes with conclusion and acknowledgments.  for dividing the high-speed data stream into eight separate data streams. The Parallel Fast Fourier Transform (FFT) Unit is used for parallel channelization. The Power Calculation Unit is used to calculate the power of the input signals. The Accumulation Unit is used for power accumulation. For summing or separating the power of left and right polarized signals, the Add or Independent Unit is used. The digital backend has two output data units, Ethernet Unit and Real-Time Dedisperison (RTD) Unit. The Ethernet Unit is used for the output of raw channelized data, which is searching data in this paper, while the RTD unit in FPGA is used for real-time dedispersion. Each RTD unit is used for one DM trail, hence for multiple DM trails, multiple RTD instances are required, which will create multi output RTD data streams. Due to the fact that the size of dedispered data from RTD unit is much smaller than that of the searching data, it can be used for realtime searching for pulsars and fast radio transients. As the parallel FFT unit has been introduced in another paper (Liu et al. 2017), we will mainly discuss about the newly developed real-time dedispersion module in FPGA in this paper.

Coherent Dedispersion and InCoherent Dedispersion
Pulsar observations are distorted by the deleterious effects of dispersion. When an electromagnetic wave travels through the ionized interstellar medium, signal at low frequencies travels slower than that at higher frequencies due to dispersion, which causes a broadband sharp pulse from the source to be smeared out in time, when detected with a receiver having a finite bandwidth. Hence, proper correction techniques are vital for pulsar observations. As the wide observing bandwidth brings large amount of data, it is difficult to do offline data processing, so it is necessary to do real-time dedispersion.
Two techniques are widely used to eliminate the effect of dispersion: Coherent dedispersion and Incoherent dedispersion (Lorimer & Kramer 2012). In incoherent dedispersion, the observable bandwidth is divided into different channels, and the dedispersion delay of each channel is where, v 0 is the lowest frequency, and Δv is the bandwidth of each frequency channel. DM is the dispersion measure. The delays are applied to align the pulses in the separate frequency channels, which are summed up to produce a dedispersed time series. Coherent dedispersion removes the dispersive effect of the interstellar medium by convolving the digitally sampled telescope output voltages with an inverse filter function, derived from the tenuous, cold plasma dispersion law (Hankins & Rickett 1975) where, v 0 is the center frequency of the observation, v is the bandwidth of the pulsar signal, and D is related to DM Coherent dedispersion completely eliminates the dispersion effect, but it is more computationally expensive. On the other hand, incoherent dedispersion is not too expensive, but it does not remove the intra-channel dispersion. If we compare coherent dedispersion with incoherent dedispersion, we can find that if the intra-channel dispersion smearing time is smaller than time resolution, coherent and incoherent dedispersion will have the same performance. From Equation (1), we know the intra-channel smearing time is mainly related to Δv. The small Δv reduces intra-channel smearing time. However, this requires a large number of frequency channels across the bandwidth, which will decrease the time resolution. Therefore, we need a trade off between intra-channel smearing time and time resolution. Following the nyquist sampling theory, if the sampling frequency is f s , the observing bandwidth is f s /2. The number of frequency channels is Nv, so the bandwidth of each frequency channel is The highest time resolution of the system is In order to search for millisecond pulsars with microsecond pulse width (Backer et al. 1982;Wolszczan & Frail 1992), time resolution should be microsecond scale. For example, the time resolution (Δt) can be chosen to 64 μs, so the frequency resolution (Δv) is 15.625 kHz. To get the same performance as coherent dedispersion, the intra-channel smearing time in incoherent dedispersion should be smaller than the time resolution. With a specified Δv, the intra-channel smearing time can be calculated by Equation (1), which depends on v 0 and DM. The relationship between v 0 and DM is shown in Figure 2. For example, with the chosen Δt = 64 μs and Δv = 15.625 kHz, if the observation starts from 2000 MHz, coherent dedispersion and incoherent dedispersion will have the same performance, when DM is smaller than 3938.51 pc cm −3 .

Real-Time, Pipelined Incoherent Dedispersion Method
Before describing the RT-PIDM, first we will review the widely used direct dedispersion algorithm (Barsdell et al. 2012). The direct dedispersion algorithm works by directly summing frequency spectra along a quadratic dispersion trail. For each time sample and dedispersion trail, the algorithm computes an array of dedispered time series D from an input spectra data set A. The dedispered data is where, d represents dispersion measure, t represents time when the time series data is produced, and v represents frequency channel. N v is the total number of the frequency channels, and A v,t is the dynamic spectra of frequency channel in time series.
where, delay n,v is calculated by Equation (1), and Δt is the sampling time of the system. round is the integer function. Incoherent dedispersion based on direct incoherent dedispersion algorithm access the frequency spectra many times to compute the dedispersed time series for different DM trials, so a large amount of memory is necessary for caching the frequency spectra data. When the N v and d increase, the memory size will also increase. No matter how many DM trails, the memory consumption always  stays the same. The memory consumption for the direct dedispersion algorithm is is the maximum discretized delays, and res is the bit width of spectra data. The required size of memory is much larger than the total Block RAM inside an inexpensive FPGA chip, that is the reason why an external memory chip is almost always necessary for the incoherent dedispersion work (Clarke et al. 2014).
A simple example about the direct incoherent dedispersion is shown in Figure 3. In the example, N v is equal to 4. Because dispersion delay is nonlinear in time, we set Δt(d, v) like this in this example: The blocks with different colors are the spectral data at different times. Therefore, the size of the buffer for caching the spectral data is 3 × 4 = 12. If the bit width of each spectral data is 8 bits, the total buffer size for direct dediserpsion is 12 Bytes. As the dedispered result is a time series, the aligned spectral data at the same time will be summed up. The final dedispered result is shown in Figure 4.
The proposed RT-PIDM completes the incoherent dedispersion in a more efficient way. As the final dedispered result is a time series, RT-PIDM only caches the final time series, instead of all the spectral data, so that it can save a lot of memory. The block diagram of the RTD module based on RT-PIDM is shown in Figure 5. The RTD module consists of a calculation unit, a RAM_Δt and a RAM_D. The calculation unit is used for setting different delays for different spectral data, and summing up all the frequency spectra as dedispered data. RAM_D is a RAM for caching the dedispered data. RAM_Δt is a RAM for storing discretized delays for different frequency channels.
The calculation unit is the core part in the RTD module. As the spectral data are transferred to the RTD module from low frequency channels to high frequency channels, two important registers are in the calculation unit: BaseAddr and OffsetAddr. The BaseAddr register contains the address of the memory cell in RAM_D, and it will increase by one, when a new set of A v,t is coming. Different frequency channels have to be set different delays, so OffsetAddr is used for setting the offset address to BaseAddr, which is the dispersion delay in the RT-PIDM. The value in OffsetAddr is read from RAM_Δt. The figure about the two registers are shown in Figure 6.
Below, we show how the RTD module works: (a) At the beginning, BaseAddr and OffsetAddr are set to 0.
(b) When a new set of A v,t is coming, BaseAddr will increase by one. (c) Discretized dispersion delay for v 0 will be got from RAM_Δt as OffsetAddr. (d) The data from RAM_D at BaseAddr+OffsetAddr will be read out, and added with A v t , 0 . Then the result will be written back to the same address. (f) Read the data out from RAM_D at BaseAddr, which is the dedispered data at t. (g) Repeat the step (b) to (f), so that the RTD module can output the dedispered data continuously.   With the same dispered signal shown in Figure 3, an example about how the RTD module based on RT-PIDM works is shown in Figure 7. The buffer size for RT-PIDM is 3, which is the dashed box shown in Figure 7. If the bit width of spectral data is still 8 bits, the buffer size will be 3.75 Bytes. The buffer for RT-PIDM has three memory cells, which are addressed from 0 to 2, and they are used for caching the dedispered data at t0-t2 at the beginning. White blocks shown in Figure 7 is produced before t0, so we can ignore them.
Next, we show how the RT-PIDM works: (a) At t0, A v,t0 is coming. A 0,t0 (red block) will be added to Addr0. (b) At t1, A 0,t1 (yellow block) is added to Addr1. As A 1,t1 (red block) is the spectra data produced at t0, but delayed Δt due to dispersion, it should be added to Addr0. (c) At t2, A 0,t2 (blue block) is added to Addr2. A 1,t2 (yellow block) is delayed Δt, A 2,t2 and A 3,t2 (red block) are delayed 2Δt due to dispersion, so these two spectra data should be added back to Addr1 and Addr0. Now, all the spectral data produced at t0 have been added to Addr0, which means the data cached at Addr0 is the dispered data for t0, and it should be the output of RT-PIDM. Then the data stored at Addr0 is reset to 0. (d) At t3, new spectral data is coming. The buffer is working as a circular buffer, so A 0,t3 (green block) is added to Addr0. The same as before, A 1,t3 − A 3,t3 are added back to Addr2-Addr1. The dedispered data produced at t1 is at Addr1, which is the output of RTD module now. (e) At t4, the same as before, the dedispered data for t2 is the output of RT-PIDM, and it is at Addr2.
As we discussed before, the memory consumption for RT-PIDM is only the RAM_D and RAM_Δt in the RTD module. RAM_D is used for caching the dedispered data, so the depth of RAM_D is ( ) Dt d v , max max . Because the sum of frequency spectra is cached in the RAM_D, the bit width of the RAM should be ( + N res log v 2 ) bits. Hence, the memory consumption of RAM_D is     As RAM_Δt is used for storing the discretized delays for all the frequency spectra, the depth of the RAM is N v , and the bit width of the ( ) Dt d v log , 2 max max . Therefore, the memory consumption of RAM_Δt is The total memory consumption for RT-PIDM is Comparing MemSize Direct and MemSize RT−PIDM , we can find the difference is that MemSize Direct is proportional to N v , and MemSize RT−PIDM is mainly proportional to N log v 2 . ) has been deployed at SKA, which is used for for real-time, multi-beam transient searches (Clarke et al. 2014). The number of DM trials is 448, so the FTA (Frequency -Time Array) of the SKA digital backend occupies 85.5 MBytes, which is based on the direct method.  With RT-PIDM, the memory consumption for 448 DM trails is 25.68 MBytes, which is 30.03% of the memory consumption in the digital backend at SKA.

Implementation
We implemented one RTD module on our Cascaded Reconfigurable Architecture Board (CRABoard) shown in Figures 8 and 9 (Liu et al. 2021) to demonstrate the RT-PIDM works.
The implementation of the RTD module described in this paper is shown in Figure 10, which consists of a software part (ARM) and a hardware part (FPGA). The discretized delays, Δt(d, v), are necessary in incoherent dedispersion, and they are determined by a specific DM and lowest frequency of observation. As Δt(d, v) should be calculated and set to hardware before the RTD module starts to work, and no realtime requirements are necessary for it, it will be easy to compute Δt(d, v) in an ARM or a host computer with Equation (1) and (7). We finished the computation on the ARM module, and then write the Δt(d, v) to a Block RAM in FPGA through data bus. We could also compute the delays on host computer, and then write them to FPGA via Ethernet port. It's fast to compute the Δt(d, v), so the computation delay can be ignore. After the RTD module starts to work, the computation is in real time.
In our prototype design, the sampling frequency is 2.4 GSps, and the number of frequency channel is 2048, so Δv of the system is 585.9 kHz. The time resolution is adjustable, which is from 34.1 to 243 μs. We did the experiments at Yunnan Observatory with the receiver starting from 2190 MHz, so the maximum DM is Therefore, the maximum delay in the whole bandwidth is 240.1 ms, so ( ) Dt d v , max max here is 988, which requires a RAM_D with depth of 1024. The bit width of spectra data is 8 bits, so the memory consumption of the RTD module in our prototype design is ( ) +´= 2048 10 1024 19 bits 4.875KB 13 A v,t is transferred to the RTD module along with other two signals: en_sync and cnt_sync. en_sync is used for indicating that the spectral data is valid, so the rising edge of the signal can be used to indicate that the new set of A v,t is coming. cnt_sync is used to indicate the order of the spectral data. For example, if cnt_sync is equal to n, it means the spectral data is A n,t . The relationship between A v,t , en_sync and cnt_sync is shown in Figure 11. Calculation unit finishes the core dedispersion, and the core part implementation of the unit is shown in Figure 12 . BaseAddr is the base address, which increases by one in every calculation round. OffsetAddr is read from RAM_Δt at address cnt_sync, which is synchronized to A(v, t). For example, when A(v n , t) comes, cnt_sync is equal to n, so Δt(d, v n ) will be read to the calculation unit as OffsetAddr. The sum of BaseAddr and OffsetAddr is raddr, which connects to the AddrB port on RAM_D for reading out. Because the reading out operation will take several taps of clock, the A(v, t) will be delayed the same taps of clock for the synchronization, and then added to will be written back to the same address, so waddr is several taps delay of raddr. en_sync This signal indicates that the spectra data are valid or not.

Data_In
These are spectral data.

SOF
This signal indicates that the valid data are coming, and it is a delay of en_sync. cnt_sync The signal indicates the order of spectral data.

BaseAddr
This is a register, caching address of the memory cell for dedispered data.
OffsetAddr This is a register for setting different delays for different frequency channels.

RAM_D_ena
This is the enable signal for reading data from RAM_D.

RAM_D_raddr
This is the reading address of RAM_D, and it is the sum of BaseAddr and OffsetAddr.

RAM_D_wea
This is the enable signal for writing data to RAM_D.

RAM_D_waddr
It's the writing address of RAM_D, and it is four-clock delay of RAM_D_raddr.

F_Addr_Equal
This signal indicates whether RAM_D_raddr and RAM_D_waddr are the same.

RAM_D_din
This is the input of RAM_D.

RAM_D_dout
This is the output of RAM_D.

DATA_In_D6
This is the 6-clock delay of Data_In. It's for synchronization.

Dvalid
This signal indicates that the output data of calculation unit is valid.

DD_Data
This is dedispered data from calculation unit, and it is synchronized to Dvalid.
For lower DMs, some values of Δt(d, v) read out from RAM_Δt will be the same. However, reading and writing operations on the RAM_D take some time, which means +D D d t t rd , will be read out before the latest value updated to the RAM_D. To solve this issue, the raddr and waddr will be compared continuously. If raddr is equal to waddr, a signal named F_Addr_Equal will be high, which is the selection signal for D sum . More details about the definition of the inputs and outputs are shown in Table 3, and the timing diagram is shown in Figure 13.

Observation Results
On 2021 June 17th, we finished pulsar observation with the 40 m radio telescope at Yunnan Observatory. During the experiments, we stored RTD data from RTD unit and searching data from Parallel FFT unit. A Pulsar Digital Filter Bank (PDFB) is also deployed at Yunnan Observatory, which is developed by the Australia Telescope National Facility (ATNF) in 2008. It is a system capable of processing up to 1 GHz of bandwidth in four primary modes of operation: folding, search, spectrometer and baseband outputs (Hampson et al. 2008). The PDFB was also used for observing the same pulsar at the same time, so that we can compare the signalnoise to ratio (S/N) of the profile from our Craboard and the PDFB. Some information of the pulsar we observed is shown Figure 13. Timing diagram about how the calculation unit works.  in Table 4, and the information is taken from ATNF 5 pulsar database (Manchester et al. 2005). In the experiment, we set the time resolution to 34.1μs.
The folded result of RTD data is shown in Figure 14, and we can see a peak and a shoulder shown in the figure. To demonstrate the RTD module works, we also did the offline dedispersion on the searching data, which contains all the frequency spectra. Because we used the customized data file  5 https://www.atnf.csiro.au/research/pulsar/psrcat/ format, we have the folding and dedispersion code in Matlab for data processing, which is available on github. 6 The results are shown in Figure 15.
To compare with the RTD result and searching data result, we zoomed in the Figure 14 and Figure 15 (b) to get more details, which are shown in Figure 16. The comparison results are shown in Table 5. The S/N, pulse width and offset between the peak and shoulder are almost the same. To make the further comparison, we put the normalized searching result and RTD result in one figure, and calculated the error between the two results. The comparison result is shown in Figure 17. The comparison result and error between the two normalized results prove that the RT-PIDM has the same result as the searching result.
Because the RTD data is just a time series and searching data is the frequency spectra, the data speeds of RTD data and searching data are The size of the RTD data is about 862 times smaller than that of the searching data, which makes it easier to process the RTD data than process the searching data. We stored the RTD data and searching data for 30 s. RTD data is approximately 2 MB in size, while searching data should be approximately 1.7 GB in size. Because the bandwidth of the Craboard is 1200 MHz, while the bandwidth of the receiver at Yunnan observatory is 110 MHz, a lot of data is invalid. Therefore, we only stored the valid spectral channels in the data file, which takes up around 200 MB for the 30 s data. The RTD data and the searching data are processed in Matlab, which takes about 0.9 s for the RTD data and about 6.5 s for the recorded searching data. The processing time will be much greater if full bandwidth searching data is recorded. The RTD data, searching data, and the processing matlab code are all available on github. 7 The PDFB results are shown in Figure 18, which are processed by PSRCHIVE (Hotan et al. 2004). The profile from PDFB looks almost the same as that from RT-PIDM, but we did not find any information about the S/N algorithm implemented in PSRCHIVE, so we read the PDFB data with Matlab, 8 and then plotted the figure and computed the S/N in Matlab as we did on RTD data and searching data. The S/N algorithm used in Matlab is shown in Appendix. The processed result and detailed figure are shown in Figure 19.
The comparison result between Figure 16(a) and Figure 19(b) is shown in Table 6. The S/N, pulse width and offset between the peak and shoulder on the profile are also almost the same. Therefore, the comparison result demonstrates that the RT-PIDM implemented on the Craboard has the same result as the PDFB has.  Figure 17. The normalized RTD result, searching result and the error between the two results. The experiment results on J0835-4510 at Yunnan observatory show that the results from RT-PIDM, offline dedispersion and PDFB are consistent. With the proposed RT-PIDM, we can get the same dedispersion result on a compact and powersaving platform.

Conclusion
In this paper, we proposed a Real-Time, Pipelined Incoherent Dedispersion Method (RT-PIDM). With the pipelined computing architecture, only summed-up time series is stored in memory, so the memory consumption of RT-PIDM  is smaller than that of traditional methods, which makes it possible to be implemented inside a single FPGA. With the feature of RT-PIDM, we preset a prototype digital backend with 1200 MHz observing bandwidth. The results got at Yunnan Observatory demonstrate that RT-PIDM has the same result as offline incoherent dedisersion and the PDFB deployed at Yunnan observatory has. In the future, RT-PIDM will be useful for pulsar and fast transient search on some compact hardware platforms.
This work was supported by The National Natural Science Foundation of China under grant U1731120. The authors gratefully thank Yuxiang Huang, Longfei Hao et al. from Yunnan Astronomical Observatory for their help on the observation, and also acknowledge the helpful comments and suggestions from the reviewers.

Appendix Signal-to-Noise Algorithm used in this Paper
The S/N algorithm we used in this paper is where F i is the power of each bin, σ is the variance of the bins without pulse signal, N F>3σ is the bin number of pulse signal.