Using Giant Pulses to Measure the Impulse Response of the Interstellar Medium

Giant pulses emitted by PSR B1937+21 are bright, intrinsically impulsive bursts. Thus, the observed signal from a giant pulse is a noisy but direct measurement of the impulse response from the ionized interstellar medium. We use this fact to detect 13,025 giant pulses directly in the baseband data of two observations of PSR B1937+21. Using the giant pulse signals, we model the time-varying impulse response with a sparse approximation method, in which the time dependence at each delay is decomposed in Fourier components, thus constructing a wavefield as a function of delay and differential Doppler shift. We find that the resulting wavefield has the expected parabolic shape, with several diffuse structures within it, suggesting the presence of multiple scattering locations along the line of sight. We also detect an echo at a delay of about 2.4 ms, over 1.5 times the rotation period of the pulsar, which between the two observations moves along the trajectory expected from geometry. The structures in the wavefield are insufficiently sparse to produce a complete model of the system, and hence the model is not predictive across gaps larger than about the scintillation time. Nevertheless, within its range, it reproduces about 75% of the power of the impulse response, a fraction limited mostly by the signal-to-noise ratio of the observations. Furthermore, we show that by deconvolution, using the model impulse response, we can successfully recover the intrinsic pulsar emission from the observed signal.


INTRODUCTION
Radio signals emitted by pulsars propagate through the ionized interstellar medium (ISM) and are distorted by a variety of frequency-dependent effects such as dispersion, birefringence, and scintillation due to multi-path scattering (Rickett 1990).While these distortions make it difficult to observe the intrinsic pulsar emission at low radio frequencies, they also contain a wealth of information about the structure of the ISM along the line of sight to the pulsar.
If the pulsar emission can be coherently separated from propagation effects in the ISM, both the intrinsic radio emissions of pulsars and the structure of the ISM can be better studied.For example, coherent dedispersion (Hankins & Rickett 1975) is used to remove the ν −2 dispersive effects of the ionized ISM, by applying an inverse filter to raw voltage (baseband) data, for a known dispersion measure (DM).Circular birefringence caused by Faraday rotation can also be corrected in a similar manner using a rotation measure (RM).
Corresponding author: Nikhil Mahajan mahajan@astro.utoronto.caThere is, however, no similar general technique for inverting the multi-path scattering effects responsible for scintillation.
Traditionally, pulsar scintillation is studied using dynamic spectra.Recovering the impulse response of the ISM from a dynamic spectrum requires solving an ill-posed phase retrieval problem, since generally only the amplitude information is kept in the process of creating a dynamic spectrum.Thus, approaches to this problem so far (Walker et al. 2008;Baker et al. 2022) usually impose a sparsity constraint in order to tackle it.Osłowski & Walker (2022) show that these techniques fail when the true impulse response is dense.
Alternatively, cyclic spectroscopy (Demorest 2011;Walker et al. 2013) is a technique that exploits the cyclic nature of pulsar emission, to generate "cyclic spectra", which preserve some of the phase information that would traditionally be lost in dynamic spectra.These cyclic spectra can then be used to decouple the impulse response of the ISM from the intrinsic pulse profile, as demonstrated by Walker et al. (2013).However, this technique assumes the observed signal is cyclostationary (i.e., the pulse profile is stable over the integration time).Transient emission phenomena such as nulling, mode changing, or giant pulse emission would violate this assumption, and could affect the validity of a cyclic spectrum.
In this work, we present a technique for using raw baseband data of bright and impulsive giant pulses as direct measurements of the ISM's impulse response, and apply it to observations of PSR B1937+21.While this technique does not depend on a priori knowledge about the source or assumptions about the structure of the impulse response, it does require the source to produce bright impulsive bursts at a relatively high rate.Using the fact that nearby impulses are imprinted with the same response, we find thousands of giant pulses, which we then use to model the time-varying impulse response of the ISM.We discuss how this modeled impulse can help us to study the structures in the ISM which cause scintillation, or recover the intrinsic pulsar emission from the observed signal to better understand the radio emission mechanism in pulsars.

BACKGROUND
For a point-like source such as a pulsar, propagation effects through the ionized ISM can be treated as linear and time-invariant over short timescales.When the pulsar emits a signal x(t), we observe where * denotes a convolution, h is the impulse response function (IRF), and is the noise term.This IRF includes dispersion, scattering, and scintillation effects due to multipath propagation, as well as any instrumental effects.Due to the relative motion of the pulsar, the Earth, and the structures in the ISM, the IRF evolves with a characteristic timescale usually called the scintillation timescale, t scint .So, the effects of the ISM can be properly characterized by h(τ, t), the instantaneous impulse response at time t, where τ is the relative delay (also known as the lag).For most sources, the timescales are well separated: typical scintillation timescales are of order a few minutes, while maximum lags at which the IRF still has power are of order a millisecond.If a source emits an impulse at time t 0 , such that x(t) = a 0 δ(t − t 0 ), with a 0 a complex-valued amplitude, then we observe y(t) = a 0 h(t−t 0 , t 0 )+ .We can shift the observed signal in time to acquire a noisy measurement of the instantaneous IRF at time t 0 scaled by a complex amplitude which is a property of the emitted impulse.The impulse response must naturally be causal such that h(τ, t) = 0 ∀τ < 0.
Pulsars that emit giant pulses can be used for such measurements of the IRF, since giant pulses are so short to be unresolved (in relatively narrow frequency bands).In order to properly measure and model h(τ, t), the pulsar must emit sufficiently bright pulses at a sufficiently high rate such that multiple good measurements of the IRF can be made within the scintillation time.If impulses are emitted too sparsely, then it may not be possible to recover information about the more rapid variations in h(τ, t).The bright giant-pulse emitter PSR B1937+21, which produces thousands of impulsive giant pulses per hour, is suitable for fully modeling h(τ, t).

Conjugate Wavefield
Pulsars emit giant pulses irregularly, which means that any model of the IRF, h(τ, t), requires an interpolation scheme.A convenient one is to write h(τ, t) as a Fourier series with terms regularly spaced in frequency.The Fourier conjugate of t is −f D , where f D is the differential Doppler shift relative to the line of sight (following astronomical convention of a Doppler shift being positive as the source is moving away from the observer).In the usual physical picture where the delay τ reflects the extra path length introduced by scattering off a structure some distance away from the line of sight, one has f D = τ ν.
We define a "conjugate wavefield", where F t −1 is the inverse Fourier transform along t.With this, the IRF at any time t can be determined by (4)

Relation to Dynamic and Secondary Spectra
In pulsar scintillation studies, a dynamic spectrum, I(ν, t), is usually measured, and often the so-called "secondary spectrum", S(τ, f D ) = |F ν,t [I(ν, t)]| 2 (where F denotes a Fourier transform, here along frequency ν and time t), is calculated, because in τ, f D space the underlying structure of the scattering screen is much more apparent.
The dynamic spectrum is related to the time-varying IRF via (where we assume that the possible (slow) variation of the pulsar emission strength with frequency has been removed).Hence, the secondary spectrum is related to the auto-correlation of conjugate wavefield via S(τ, f D ) = |W W | 2 (where denotes cross-correlation).

OBSERVATIONS
We observed PSR B1937+21, a 1.56 ms pulsar, with the 327 MHz Gregorian receiver at the Arecibo Observatory for 2 hours on 2021 May 7 (MJD 58245), and 30 minutes on 2021 May 29 (MJD 58298).Using the Puerto Rico Ultimate Pulsar Processing Instrument (PUPPI) in raw baseband mode, we recorded 32 contiguous 3.125 MHz bands of dualpolarization baseband data (i.e., a sample spacing of 320 ns).
A polyphase filter is applied in the PUPPI backend which reduces spectral leakage between bands.We use the BASE-BAND (van Kerkwijk et al. 2021) and PULSARBAT (Mahajan & Lin 2023) software packages to read and process the raw baseband data.For our analysis, we only use bands in the range of 297.3125 to 356.6875 MHz as the remaining bands fall outside the analog bandpass filter and have diminished signal strength.This leaves us with 19 usable frequency bands, with a total bandwidth of 59.375 MHz centered on 327 MHz.
We pre-process the data by normalizing the passband to correct for the effects of the polyphase filter and any bright narrow-band channels that may occur due to radio frequency interference (RFI).For both observations, we experience insignificant amounts of RFI and so we apply no further RFI mitigation.The data are then coherently dedispersed using a DM of 71.0201 pc cm −3 on MJD 58245, and 71.0169 pc cm −3 on MJD 58298, with the DM values inferred from giant pulses found in the dataset.For the technique presented in this paper, it is not important for the data to be perfectly de-dispersed, since any excess dispersion will be captured as part of the ISM's impulse response.
We correct for relative time delays between the left and right circular polarizations of 7.85 ns (MJD 58245) and 37.85 ns (MJD 58298)1 .These delays are dominated by instrumental effects, but also include a contribution from circular birefringence in the ISM due to magnetic fields, also known as Faraday rotation, characterized by a rotation measure (RM) of ∼ 7 to 8 rad m −2 (Yan et al. 2011;Dai et al. 2015;Wahl et al. 2022).

GIANT PULSE SEARCH
Giant pulses, being bright narrow bursts, are usually easy to detect.However, to be useful as measurements of the ISM's impulse response, we need to align their observed signals to each other in τ (see Equation 2) to well within a sample.We achieve this by precisely measuring the differences in time of emission between neighbouring giant pulses.Note that we use the term "giant pulse" to refer to any sufficiently bright impulse detected by our technique, without a specific notion of what causes giant pulse emission.
We split up our observation into blocks much shorter than t scint such that the IRF can be taken to be essentially timeinvariant across a few adjacent blocks.We then use an iterative procedure where giant pulses found in one block are used to find giant pulses in adjacent blocks and so on until all blocks have been searched.In one cycle of this iterative loop, the goal is to find impulsive giant pulses in a given block centered on some time t .We have y(t), the coherently dedispersed observed signal (processed as described in Section 3), and a set of previously detected giant pulses (in neighbouring blocks) of the form, such that |t j − t | < ∆t max , and a j are the a priori unknown complex amplitudes of the pulses.The chosen ∆t max must be much smaller than the scintillation timescale, t scint , such that we can safely treat these giant pulses as approximations of h(τ, t ).In our data, we find that t scint ≈ 70 s, and we choose ∆t max = 20 s.Since we expect the signal-to-noise ratio of the IRF to drop exponentially with delay, and since the parts at high delay vary faster (since generically these deviate more from the direct line of sight), we truncate the giant pulses.We chose τ max = 0.4 ms, which captures approximately 90% of the power in the IRF.
Since the observed signals are discretely sampled, we can form a matrix G with elements G j,k = g j (τ k ) such that where a is the vector of amplitudes a i , and h is the discretely-sampled impulse response h(τ, t ).Since ah is a rank-1 approximation of G, we can use singular value decomposition (SVD) to estimate h (to within a complexvalued amplitude) as the first right singular vector of G (by the Eckart-Young-Mirsky theorem).
If we cross-correlate this estimate of h(τ, t ) with y(t), giant pulses which are intrinsically impulsive will show up as bright impulses, where is the noise term which includes all other contributions to the cross-correlation, t j is the location of the j-th impulse in time, and |c j | 2 / | | 2 represents the effective signalto-noise ratio of the giant pulse detection.We describe our giant pulse detection criteria in Section 4.1.
For detections considered significant enough, we precisely measure t j using an impulse estimation technique described in Appendix A. With that, we can extract time-shifted snippets of the observed signal, g j (τ ) = y(τ + t j ) to get noisy measurements of the IRF at t j with unknown amplitudes a j (Equation 2).These are then added to our growing set of detected giant pulses and used to find more giant pulses in other cycles of the iterative loop.To start the process, we begin by manually finding a very bright and preferably broadband giant pulse which serves as a reference for the giant pulse search (specifically, a reference for τ = 0).The iterative loop is started with the block containing this first giant pulse.We illustrate this in Figure 1.

Giant Pulse Detection Criteria
To ensure we get no false positives, we set strict detection criteria when finding the impulses in z(t) from Equation 8.The noise term is well described by complex-valued additive Gaussian white noise, and thus | | 2 ∼ χ 2 2 (the chisquared distribution with 2 degrees of freedom).For detecting peaks, we use a minimum signal-to-noise threshold of We can measure the location of an impulse with a standard deviation of σ tj 30 ns or ∼ 0.09 samples (see Appendix A).
This signal-to-noise threshold gives us a false positive rate of 10 −7.3 .However, an impulse is only accepted if it is detected in a minimum of 3 signal streams out of 38 (from 19 frequency bands and 2 polarizations).The expected number of false positives across all our data is thus much less than one.By requiring that an impulse must be detected in at least two frequency bands, we avoid accidentally detecting any bright narrowband RFI bursts.
We also require that all detections of the same impulse are no more than 200 ns apart from each other in time across the various signal streams.We use the median time of all detections to estimate t j , which lowers the error to σ tj 20 ns for the weakest pulses while being robust to outliers.Finally, we filter out detected pulses which are closer than 0.6 ms to each other since these signals would essentially contain a mixture of two differently-delayed copies of the impulse response.
With these criteria, we detect 9627 and 3398 giant pulses on MJDs 58245 and 58298, respectively.This corresponds to detection rates of 1.5 and 2.1 s −1 , respectively.The baseband snippets for these 13,025 detected giant pulses are made available as a dataset on Zenodo at doi:10.5281/zenodo.7901384.

MODELLING THE WAVEFIELD
From the giant pulse search, we have thousands of noisy measurements of the IRF, g i (τ ).Using Equations 2 and 4, we can form a system of equations that needs to be solved for the wavefield, For every τ , we have a linear system of the form y = Aw + where y and w are the data and wavefield vectors, respectively, at a particular τ , and A is the coefficient matrix with elements given by The linear systems for different τ are linked only through the amplitudes a j , which are not known a priori.We describe our method for estimating a j in Section 5.2, after discussing how we solve for W for known a j .

Orthogonal Matching Pursuit
Due to the low signal-to-noise ratio in the dataset, ordinary least squares without regularization gives poor results.Since we expect the wavefield to be at least globally sparse (i.e., the observed scattered signal occupies a small portion of the (τ, f D ) space), a sparse approximation technique should be appropriate.We use Orthogonal Matching Pursuit (OMP), an iterative greedy algorithm for approximating sparse signals (Tropp 2004;Cai & Wang 2011), with our choice determined mainly by it being computationally fast and easy to implement relative to most other sparse approximation or signal recovery techniques.
In each step of OMP, we select the f D column of A which is most correlated with the current residual vector and add it to a set of selected columns.Hence, we iteratively approximate the signal vector as ŵs = (A * s A s ) −1 A * s y, where A s is the submatrix of A that only contains the selected columns, then calculate a new residual vector r = y−A s ŵs , and make a new approximation.Here, * denotes a conjugate transpose.This is repeated until a stopping criterion is reached.The algorithm returns the best approximation of the signal vector, ŵ, which is non-zero only for elements corresponding to ŵs .We use a stopping criterion of A * r 2 ∞ > γ a 2 , where a is the vector of amplitudes.This essentially stops the algorithm when our residuals no longer strongly correlate with any f D column in A. According to Cai & Wang (2011), this stopping criterion is effective when the noise is Gaussian.If all the signal components in the actual wavefield were very bright, then we would be able to perfectly recover the wavefield using OMP under this stopping criterion.However, in natural signals, the magnitudes of sorted components decay as a power law and there is usually no clear boundary between signal and noise.Thus, the choice of γ presents a trade-off between denoising and accuracy.The variance of noise that ends up in the modeled wavefield decreases exponentially with increasing γ.

Warm start
Initially, both the amplitudes of the detected giant pulses a j and the wavefield W (τ, f D ) are unknown to us.However, from Equation 9, if the wavefield is known, we can solve for the amplitudes, and vice-versa.We use this idea to "warm start" our optimization routine.
Given a giant pulse, g j = a j h j + , we can estimate |a j | via where N is the length of the data vector and σ 2 is the variance of the noise term, , which is assumed to be additive Gaussian white noise.The giant pulse needs to have a high signal-tonoise ratio to ensure that g j 2 > N σ 2 .Without flux calibration, we cannot determine the total integrated intensity in the impulse response (sometimes called the magnification).So, we assume that h 2 = 1 (which is reasonable for our case, since the density in f D and τ of our solutions implies that the radiation arrives to us via many different paths).
Given two giant pulses, g j and g k , close to each other in time such that they approximate the same impulse response h, we can also estimate the relative phase difference between their amplitudes via, This can be iteratively applied to determine the phases of all giant pulses relative to some reference pulse.The absolute phase is arbitrary and cannot be determined.
Due to the low signal-to-noise ratio in our giant pulse dataset, the amplitudes cannot be reliably estimated directly.Instead, we use overlapping 20 s intervals with a step size of 10 s and construct higher signal-to-noise approximations h SVD,i of the IRF at the center of each interval i using the SVD-based technique from Equation 7. The amplitudes a SVD,i for these approximations are then easily estimated.We use overlapping intervals to get more reliable estimates for the relative phases between adjacent approximations.
With the amplitudes a SVD,i , we then solve for an approximation of the wavefield using Orthogonal Matching Pursuit as described in Section 5.1.This wavefield approximation differs from the actual wavefield since combining information from giant pulses in an interval essentially applies a sinc filter on the wavefield.Furthermore, as the times at which giant pulses occur are essentially random, the times of the approximated IRF have some jitter around the mid-points of the intervals, based on how the giant pulses were weighted in the SVD.For instance, a very bright giant pulse at the edge of an interval will dominate the approximation, leading to a jitter of almost half the interval width.
We can, however, use this approximated wavefield to estimate the amplitudes of the individual giant pulses using Equation 9. Then we use these amplitudes again to solve for a better model of the wavefield using the giant pulse data directly.

Wavefield Solutions
For our solutions of the wavefield, we restrict f D to the range (−25, +25) mHz, with N fD = 1125 equally-spaced points.Since the scintillation timescale is around 70 s, we do not expect to see signals at higher differential Doppler shifts.The number of components is larger than required by lengths of our data sets (especially for the shorter observation), but given our choice of OMP for sparse approximation, this makes no difference to the result.
For the warm start described in Section 5.2, we use a stopping criterion of γ = 12, to ensure that the approximate wavefield solution is virtually guaranteed to not inclue any noise components.This avoids biases when estimating the amplitudes of individual giant pulses.Furthermore, we restrict the warm start solution to a delay range of 0 ≤ τ < 1.44 ms, i.e., we exclude the high-τ portion which contains very little power and noise dominates.
With the amplitudes in hand, we then construct the final wavefield, extending out to a delay range of −0.4 ≤ τ < 3.6 ms and using a more relaxed stopping criterion of γ = 5.This value of γ is chosen to ensure that we can see even faint features at large τ , but comes at the cost of allowing a larger fraction of noise into the solution.
After the removal of relative time offsets between polarizations (as described in Section 3), we can expect both po- larizations to have almost the same impulse response (this assumption is tested in Section 7.2).We ignore the higherorder effects of Faraday rotation which are insignificant for our observation given the source's low RM.We can, therefore, make wavefields which are polarization independent by treating each polarization as a separate measurement of the same IRF.In Figure 2, we show these wavefield solutions for both observations across multiple bands.In Figure 3, we compare the predicted IRF from the modeled wavefield with the observed signal from a bright giant pulse.In the zoomed inset of the top panel, we can see the strong correspondence between the predicted IRF and the observed giant pulse signal.However, this correspondence seems stronger than expected given the amount of noise in the observed signal.We further investigate this overfitting in Section 7.
We can see that the primary structure in the wavefield has a characteristic parabolic envelope which is often seen in other observations of pulsar scintillation (Walker et al. 2008;Stinebring et al. 2022;Main et al. 2023).We also see no signal at negative delays, which is expected since the IRF must be causal.The structures in the wavefields appear to get wider in f D with increasing frequency.This is due to the frequency scaling of f D .We can, instead, transform the wavefields to a more natural (τ, τ ) space via the transformation f D = τ ν.In this space, common features across frequency bands are aligned in (τ, τ ).As a result, we can get a higher signal-to-noise wideband representation of the total intensity in the wavefield via ν |W (τ, τ )| 2 .Figure 4 shows this combined (τ, τ ) wavefield for both observations.An interactive version of the figure also shows |W (τ, τ )| 2 across the frequency bands, where the correspondence of features across frequency is visually evident.

2.4 MS FEATURE
In the wavefield for MJD 58245, a relatively bright feature can be seen at high delay, of around 2.4 ms (which we refer to as the "2.4 ms feature" from here on), approximately 1.5 times the rotational period of the pulsar.A corresponding, but fainter, feature at a slightly higher delay can be seen on MJD 58298.We fit the features with a bivariate (2D) Gaussian in the (τ, τ ) space to measure the location of the fea- Comparison between the predicted impulse response and an observed giant pulse at 327 MHz.As can be seen from the zoomin as well as from the flat residuals, there is good correspondence between the data and the model.ture's center, which is denoted by a cross in Figure 4.The τ and τ values at the center of the Gaussians that we fit are provided in Table 1.
Assuming that τ changes linearly between the two observations, one expects ∆τ = τ ∆t = 0.235 ms, in agreement with our measured change in delay, ∆τ = 0.228 ± 0.016 ms.This implies that the features in both observations most likely correspond to the same scattering structure.On the sky, this structure is moving away from the line of sight to the pulsar, leading to a higher delay in the later observation.
We also investigate the frequency dependence of the delay for the 2.4 ms feature by fitting a 2D Gaussian to the feature on a rolling average of 3 frequency bands.In Figure 5, we show the 1σ contours of these 2D Gaussian fits for different frequencies on MJD 58245.Following Brisken et al. (2010), who fit the angular offsets θ from the line of sight to a power law of the form θ ∝ λ γ , we try fitting to τ ∝ ν −2γ (using that for a single, thin screen, τ ∝ θ 2 ).Using the centers of the 2D Gaussian fits for different frequency bands, we measure γ = 0.061 ± 0.005.This value of γ is similar to the value measured by Brisken et al. (2010) for their "1 ms feature".According to Simard & Pen (2018), this value for γ is plausible for features caused by lensing from an under-dense refractive plasma sheet.It should be noted that the feature does not merely shift downwards in τ with increasing frequency, but also get smaller in size.Thus the apparent shift in τ is could simply be a consequence of high-delay paths becoming less favorable at higher frequencies.We do not find a significant frequency dependence along τ .

VALIDATION
In order to measure the performance of the modeled wavefields, we compare how well the predicted impulse responses correlate with the giant pulse data.Given two signal vectors, x and y, we can compute a normalized correlation which takes on values in the range [0, 1].However, if we have two noisy vectors, x and ŷ, which include additional Gaussian noise terms, then we compute a noise-corrected normalized correlation where N is the length of the vectors, and σ x 2 and σ y 2 are the respective variances of the noise terms in the vectors.If the noise terms are uncorrelated, then r ≈ r.However, if the noise terms are correlated, the numerator in Equation 14will be inflated by an additive term, N σ xy where σ xy is the covariance between the two respective noise terms.

Cross-Validation
We employ cross-validation with 13 "folds", i.e., we divide the giant pulses into 13 sets, each distributed similarly in time and polarization.We take out one of these sets as a "validation set", and solve for an unpolarized wavefield using the remaining giant pulses, called the "training set".We do this 13 times such that every giant pulse is in a validation set once.
In Figure 6, we show the noise-corrected correlations between bright giant pulses and the predicted IRFs (over 0 ≤ τ ≤ 0.8 ms).One sees a clear difference between the validation and training sets, with the noise-corrected correlation for the validation set at r 0.86, while that for the training set is at r 1.05.The inflation in r for the training set is due to correlated noise, induced by overfitting: with our choice of a relatively low value of γ, some of the noise in the data has been included in the model.That this is indeed the cause of the difference can be seen in the bottom panel of Figure 6, where we show the normalized correlation using only the pure-noise regions.As expected, the noise of pulses in the validation set is uncorrelated with that of the predicted IRFs, while the noise of pulses in the training set is slightly correlated.
It should be noted that the mean r for the validation set is slightly higher for MJD 58298 compared to MJD 58245, despite wavefield solutions using the identical parameters.This is likely due to the higher overall quality of the solved wavefields from MJD 58298 where we find a higher detection rate of giant pulses.The quantity of r2 0.75 has a convenient interpretation as being the fraction of total integrated intensity of the true impulse response that is captured by the modeled wavefields.

Comparing polarizations
In making polarization-independent wavefields, we make the assumption that the impulse response is the same between polarizations.Since Faraday rotation effects cause circular birefringence as light propagates through magnetized ISM, we know this assumption can be false in general.In practice, one could measure the average RM from bright giant pulses and correct for it properly with an inverse filter.In our case, however, the RM is small enough that simply correcting for a constant time delay (as described in Section 3) between polarizations is sufficient, as the higher-order effects are negligible in our narrow, 3.125 MHz bands.
In order to test this assumption, we can measure the correlation between the two polarizations.For individual giant pulses, we know that the noise terms are correlated between polarizations due to the underlying regular pulsar emission and possibly due to imperfections in the receiver (for example, the polarizations may not be perfectly orthogonal).Instead, we estimate the IRF for giant pulses within 20s blocks (using a rank-1 approximation as described in Equation 7).This estimate should have uncorrelated noise across the two polarizations, but the IRF itself should stay relatively constant over a 20s period.We measure a mean noise-corrected correlation between the the left-and right-circular polarizations of r 0.98.Thus, we find that the two polarizations seem to largely contain the same information about the IRF.

Predictions in Gaps
We also experiment with how our technique performs in regions where there are no data.We do this by solving the wavefield on a subset of giant pulses where we exclude data from a small span of time in the middle of the observation.We find that for any span of time larger than the scintillation timescale, the predicted IRF fails to correlate with the data in the gap, with the noise-corrected correlation dropping to zero for the entirety of the gap.Even with a short 1 minute gap, the noise-corrected correlation at the center of the gap drops to r 0.5.We see the same effect when solving the wavefield on only the central 1 hour of data on MJD 58245.The correlation drops to zero almost immediately outside the bounds of where data were available.
This implies that, for our data set at least, every "scintillation timescale"-sized period of time contains unique and non-redundant information about the time-varying impulse response, and that the predicted impulse response cannot be trivially interpolated into regions where there are no data, if the gaps are of order t scint or longer.

INTRINSIC PULSAR EMISSION
The intrinsic pulsar emission can be recovered via deconvolution.We observe a signal y = (h * x) + , and our goal is to compute a filter, g, such that x = g * y approximates x well.We define a regularized inverse filter, where H and G are the Fourier transforms of h and g respectively, and µ is a regularization parameter.When µ = 0, then g is simply the inverse filter of the IRF h.However, since direct inverse filtering can be unstable due to division by values close to zero, a small positive µ can be used as a form of regularization.Since we do not know the true IRF but only a noisy approximation of h, this deconvolution scheme

Inter-pulse
Figure 8.A closer look at the main pulse (left) and inter-pulse components (right) in the recovered "intrinsic" pulse profile across 19 frequency bands.Red arrows point at a bump caused by giant pulse emission.The vertical axis for each profile is in intensity with arbitrary units.Scaling is different between the left and right panels to help see details; the inter-pulse component is approximately half as bright at its peak as the main pulse component.
is not optimal.We use a constant value of µ = 0.01 which is determined by trial-and-error -higher values of µ result in undesirable artifacts in the recovered signal, while lower values result in a lower signal-to-noise ratio in the recovered signal.
We apply the regularized inverse-filter, g(τ ), to the observed signal, y(t), to recover the "intrinsic" emission signal, x(t).Using pulsar timing models from NANOGrav (Alam et al. 2021), we fold x(t) to get an "intrinsic" pulse profile in the same way that the observed signal is usually folded.Figure 7 shows a comparison between the observed and recovered intrinsic pulse profiles.For both observations, the intrinsic profiles look almost identical despite the observed profiles being noticeably different due to changes in scattering between observations.There is also a strong correspondence in the intrinsic profile across the 19 frequency bands (also shown in Figure 8).In our work so far, all frequency bands have been processed entirely independently of each other.Thus, this correspondence provides additional evidence for the validity of our modeled wavefields and the predicted impulse response.In Figure 7, one also sees that the intrinsic profile is stable across our 2 hour observation, even though the observed profile varies significantly due to scintillation effects.Since we assumed earlier that the impulse response has a constant total power over time, the total intensity of the recovered pulse profile is likely inaccurate over short timescales.In principle, the observed pulse profile could be used to calibrate the variations of the total IRF power over time, which could be used to adjust the amplitudes determined in Equation 11.
In Figure 8, we find a narrow bump on the trailing shoulder of both pulse components which we identify as the giant pulse emission component.Almost all of the detected "impulses" from our giant pulse search come from these two regions in pulse phase.We defer the in-depth analysis of the pulse phase distribution of detected pulses and the intrinsic pulsar emission to a future publication, which is in preparation.9. DISCUSSION

Solved Wavefields
In Figure 4, we can see faint but periodic bands in the background noise in delay.This is a manifestation of the regular pulse emission underlying the giant pulse data.In our sparse approximation technique, we use the same stopping criterion for all delays.Thus, at delays where the regular pulse emission contributes, we see slightly brighter noise bands.
The wavefield we recover is relatively dense and filled in within its envelope.We see at least three main "arms" localized in τ extending up into higher delays, in both observations.The wavefield recovered via cyclic spectroscopy by Walker et al. (2013) for observations of PSR B1937+21 at 428 MHz also show a relatively filled-in primary wavefield structure, although it extends only out to about ∼ 0.5 ms, as expected given the higher observing frequency.
In constrast, the wavefields recovered by Walker et al. (2008) and Baker et al. (2022) for observations of PSR B0834+06 around 320 MHz show thin parabolic structures which indicates highly localized, anisotropic scattering.It is possible that the wavefield structure in PSR B1937+21 is caused by the signal propagating through several scattering structures or screens.Since PSR B1937+21 lies in the Galactic plane (b = −0. • 3) at a distance of 2.9 kpc (Ding et al. 2023), while PSR B0834+06 is at higher Galactic latitude (b = 26.• 3) and is at only 0.65 kpc, it is not implausible that there are more scattering structures along the line of sight to PSR B1937+21.
Another piece of evidence pointing to multiple scattering structures is that the bright 2.4 ms feature is quite extended in both τ and τ .If this was an independent scattering structure in the ISM, it would have to be quite large, with many scattering points.It seems more plausible that instead the situation is similar to what is inferred for the "1 ms" feature in PSR B0834+06 (Liu et al. 2016;Zhu et al. 2022), viz., that the large τ and τ reflect scattering off a few strong scattering points quite far from the line of sight, and that the extent represents further scattering also by other structures closer to the line of sight.

Comparison with other techniques
All other methods of recovering wavefields so far operate on dynamic spectra.A conventional dynamic spectrum is created by folding channelized intensity data into a pulse profile across many sub-integrations.Since this removes phase information, it makes reconstructing the impulse response of the ISM much more difficult.Furthermore, since usually several phase bins are necessary to sufficiently localize the signal in pulse phase, the spectral resolution is inherently limited by the pulse width, and generally it is impossible to determine the impulse response beyond the pulse period.Hence, a dynamic spectrum would not contain information on features like that at 2.4 ms we detect, since that occurs at a delay of 1.5 times the rotation period of PSR B1937+21.
The situation is better with cyclic spectroscopy, as it keeps most of the phase information (Walker et al. 2013).On the other hand, it relies on the assumption that the pulsar signal is cyclostationary, and thus suppresses information in the observed signal which does not conform to the cyclic frequency provided.For this reason, the recovered intrinsic pulse profiles of PSR B1937+21 of Walker et al. (2013) do not exhibit a giant pulse emission bump.

Wavefield Performance
We find that our solved wavefields fail to interpolate across even relatively short gaps of a minute or so in the data with-out significant loss in performance, implying that the timevarying impulse response contains unique non-redundant information across every scintillation time.This reflects the fact that the wavefield is dense, with an extent in f D of roughly the inverse of the scintillation time.This is easiest to understand if there are multiple scattering structures along the line of sight to the source.
This does not exclude that a sparser wavefield would have greater interpolation performance, since there would be less overall information to encode, and every short period of time would contain largely redundant information.Such a wavefield might be found for a different pulsar with a less complicated scattering geometry along the line of sight, or perhaps for PSR B1937+21 at higher frequencies, where scattering is less efficient.
In principle, it may also be that even at our frequency it is possible to resolve the scattering points by including larger frequency chunks, if the scattering geometry does not vary strongly with frequency and is close to being resolved in our present bands.Modest evolution with frequency is suggested by our measurements of the 2.4 ms feature, as well as from observations of PSR B0834+06 which show similar structures in neighboring frequency bands (Brisken et al. 2010).We have not pursued this with our data, both since we worry about complications arising from the gaps in frequency coverage caused by the polyphase filter and because it would seem better to start at somewhat higher frequency where the situation should be less complex.We note, though, that with wider frequency bands it may be required to account for the evolution of structures with frequency, perhaps using more physically motivated models such as those of Simard & Pen (2018).
Finally, we note that features in the wavefield are expected to move in delay (given non-zero τ ) even across observations a few hours long.Thus, we suspect that a solving the wavefield over a wavelet basis, localizing features in time, rather than a Fourier basis may improve performance by promoting sparsity.

CONCLUSIONS AND RAMIFICATIONS
In this paper, we present a novel technique for giant pulse search by pattern matching the raw voltage signals to find pulses at a higher rate than conventional methods.We have made available the baseband snippets for all 13,025 giant pulses found using this technique as a dataset on Zenodo at doi:10.5281/zenodo.7901384.
This technique can be applied to study giant pulse emission in other pulsars, such as PSR B1957+20 (Main et al. 2017) and PSR J1823-3021A (Abbate et al. 2020), both of which are known to emit giant pulses at a high rate.It might also be used in combination with cyclic spectroscopy, to give a better starting point for inferring the impulse response from the cyclic spectra, and/or to verify that that the resulting responses properly de-scatter giant pulses.Another application would be to attempt recovering weaker bursts in repeating fast radio bursts such as FRB20201124A which show similar scintillation patterns for bursts nearby in time (Main et al. 2022).
A basic assumption of our method is that giant pulses share the same impulse response.Hence, by using it to test whether this is the case, one can determine if a giant pulse emission region is being spatially resolved by scattering, as has been inferred for the Crab Pulsar (Lin et al. 2023).The solved wavefields themselves can be a useful tool to study the structure of the ISM.For instance, comparing wavefields of different circular polarizations, one can study magnetic fields along the line of sight to the source.
As we demonstrated, the intrinsic emission of a pulsar can be recovered via deconvolution using the predicted IRFs.We will present an analysis of the intrinsic emission for PSR B1937+21 in an upcoming paper.This technique could be even more useful at lower frequencies where the emission profile of pulsars is often so dominated by stronger scattering effects that the pulse profile is completely washed out (Kondratiev et al. 2016).The ability to recover the intrinsic emission profile at frequencies as low as 100 MHz would aid our understanding of the radio emission mechanism in pulsars.
We thank the Signal Processing community on Stack Exchange for many helpful insights, and the Toronto scintillometry group, in particular Ue-Li Pen and Rebecca Lin, for discussions.This research has made use of NASA's Astrophysics Data System Bibliographic Services.Computations were performed on the Niagara supercomputer at the SciNet HPC Consortium (Loken et al. 2010;Ponce et al. 2019).SciNet is funded by: the Canada Foundation for Innovation; the Government of Ontario; Ontario Research Fund -Research Excellence; and the University of Toronto.MHvK is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) via discovery and accelerator grants, and by a Killam Fellowship.
Facility: Arecibo (327 MHz Gregorian).The data used in this publication was obtained as part of observing project P3229.

Figure 1 .
Figure 1.Giant pulse search.Top: Squared-modulus of the observed signal.The reference pulse is marked with an arrow.The inset shows the real part of the complex-valued voltages of the reference pulse, with the 0.4 ms span used in the cross-correlation to find further pulses highlighted in pink.Bottom: Signal crosscorrelated with the reference pulse, revealing many other impulses.The dashed horizontal line denotes the detection threshold we use.The inset is a zoom-in around the impulse marked with the arrow, with the dashed vertical line indicating the impulse's computed location.

Figure 2 .
Figure2.Wavefield solutions for three specific bands (of width 3.125 MHz), for both our observations.The color scale corresponds to log 10 |W (τ, fD)| 2 in arbitrary units.One sees that most power is contained within τ 0.8 ms, but that the IRF extends to larger delays, with a faint structure at ∼ 2.5 ms.
Figure3.Comparison between the predicted impulse response and an observed giant pulse at 327 MHz.As can be seen from the zoomin as well as from the flat residuals, there is good correspondence between the data and the model.

Figure 4 .Figure 5 .
Figure 4.The total intensity of the wavefields in the two observations (with the wavefield intensities of the 19 subbands aligned using (τ, τ )coordinates before summing).The color scale corresponds to log 10 |W (τ, τ )| 2 in arbitrary units.A cross denotes the measured center of the 2.4 ms feature in both panels.The excess noise around τ = 0, 1.6, and 3.2 ms (as well as, more faintly, at 0.8 and 2.4 ms), is due to the regular pulse emission.An interactive version of this figure, which shows the (τ, τ ) wavefields for each frequency band, is available at https://www.astro.utoronto.ca/∼ mahajan/interactive/interactive wavefield.html.

Figure 6 .
Figure6.Top: Noise-corrected normalized correlations between giant pulse data and predicted IRFs for 0 ≤ τ ≤ 0.8 ms, where most of the signal lies.The correlations are corrected for contributions by noise in both data and model, but not for correlation in the noise between the two.Bottom: Normalized correlations for the pure-noise regions of τ < 0 and τ > 3 ms between data and model.The correlation between the model and the training set confirms that they share correlated noise, while the lack of correlation with the validation set shows that there are no systematic effects shared between all pulses.

Figure 7 .
Figure 7.A comparison between the observed pulse profile (left panels) and the recovered "intrinsic" pulse profile (right panels) for PSR B1937+21.Top: The average pulse profiles at 327 MHz (bandwidth of 3.125 MHz) for our two observations, at MJD 58245 and MJD 58298.On the right, the profile for MJD 58298 is plotted with a dashed line to help visually identify both profiles.Middle: The average pulse profiles for the observation on MJD 58245 across the 19 frequency bands.Bottom: The pulse profiles at 327 MHz (bandwidth of 3.125 MHz) across time, for the observation on MJD 58245.