A guide towards optimal detection of transient oscillatory bursts with unknown parameters

Objectives. Recent event-based analyses of transient neural activities have characterized the oscillatory bursts as a neural signature that bridges dynamic neural states to cognition and behaviors. Following this insight, our study aimed to (1) compare the efficacy of common burst detection algorithms under varying signal-to-noise ratios and event durations using synthetic signals and (2) establish a strategic guideline for selecting the optimal algorithm for real datasets with undefined properties. Approach. We tested the robustness of burst detection algorithms using a simulation dataset comprising bursts of multiple frequencies. To systematically assess their performance, we used a metric called ‘detection confidence’, quantifying classification accuracy and temporal precision in a balanced manner. Given that burst properties in empirical data are often unknown in advance, we then proposed a selection rule to identify an optimal algorithm for a given dataset and validated its application on local field potentials of basolateral amygdala recorded from male mice (n=8) exposed to a natural threat. Main Results. Our simulation-based evaluation demonstrated that burst detection is contingent upon event duration, whereas accurately pinpointing burst onsets is more susceptible to noise level. For real data, the algorithm chosen based on the selection rule exhibited superior detection and temporal accuracy, although its statistical significance differed across frequency bands. Notably, the algorithm chosen by human visual screening differed from the one recommended by the rule, implying a potential misalignment between human priors and mathematical assumptions of the algorithms. Significance. Therefore, our findings underscore that the precise detection of transient bursts is fundamentally influenced by the chosen algorithm. The proposed algorithm-selection rule suggests a potentially viable solution, while also emphasizing the inherent limitations originating from algorithmic design and volatile performances across datasets. Consequently, this study cautions against relying solely on heuristic-based approaches, advocating for a careful algorithm selection in burst detection studies.


Introduction
Neural oscillations reflect the synchronization of neurons that dynamically interconnect with one another to form a neuronal ensemble, which rhythmically shifts its overall excitability over time (Buzsáki and Draguhn 2004). These oscillatory activities have been credited for the transfer of locally generated information between distant neuronal populations, but having been frequently analyzed in the spectral domain, they have long been characterized as periods of sustained increase in power at specific frequency bands (Fries 2015, Tal et al 2020. However, a growing number of studies suggest that sustained oscillations can in fact stem from an accumulation of temporally transient and spectrally specific burst-like events observed at a single-trial level (Jones 2016, Lundqvist et al 2016, Zich et al 2020. A sustained neural activity therefore may be a mere repercussion which arises when bursts of different amplitudes, durations, and interburst intervals are averaged over multiple trials (Quinn et al 2019).
The averaging across time and trials which can obscure the temporal dynamics of rhythmic activity has been widely used in many studies to identify behavioral correlates of neural rhythms (Shin et al 2017). Yet, several recent studies have parted away from such interpretation of neural oscillations as a rhythmic recurrence of frequency-specific patterns and started to recognize transient neural bursts as a critical temporal signature that guides behaviors. Oscillatory bursts occurring at a single-trial level have been continuously reported and proposed to underlie sensory, motor, and cognitive systems of different species, including sensorimotor abilities of non-human primates and humans (Bartolo andMerchant 2015, Feingold et al 2015), motor impairments in human Parkinson's disease (Tinkhauser et al 2017), volitional control of beta activity in the rat motor cortex (Karvat et al 2020), influence of anxiety on reward-dependent motor learning in humans (Sporn et al 2020), working memory of non-human primates (Lundqvist et al 2016(Lundqvist et al , 2018, and perception and attention in mice and humans (Shin et al 2017).
For studies that had required a temporally precise correspondence between neural activities and animal behaviors, specificities in time and frequency were essential, and the accurate detection of bursts was necessary to parameterize their metrics for further analyses. Most of these studies adopted variations of a spectral analysis method first proposed by Caplan et al (2001) (later called as 'better oscillation detection' (BOSC) method in Whitten et al 2011), thresholding the duration and amplitude of the bursts on time-frequency representations of neural signals (Sherman et al 2016, Shin et al 2017, Lundqvist et al 2018, Wessel 2020. To maximize temporal localization and avoid the trade-offs between time and frequency resolutions, alternative methods which detect bursts directly from the time-domain without computing the spectral power of neural activities were also utilized (Neymotin et al 2008, Nair et al 2018, Sporn et al 2020. Because bursts are extracted from single-trial responses, these methods are associated with shortcomings inherent to the low signal-to-noise ratio (SNR dB ) of an individual signal (Tal et al 2020). Stronger stochastic noise processes (e.g. white and pink noises) may obscure time-frequency profiles by contaminating frequency decompositions, introducing pseudo-burst artifacts even in the case where a clearly sustained oscillation is present (van Ede et al 2018, Donoghue et al 2020).
Although several burst detection algorithms were reviewed recently, and new methodologies have attempted to overcome the aforementioned problem, a few formal comparisons of commonly used methods have been made thus far. In this study, we compared five different classic burst detection algorithms: bandpass filtering (BP), envelope-based method (ENV), single-tapered short time Fourier transform (S-STFT), multitaper method (MTP), and continuous wavelet transform (CWT). First, we generated synthetic time series data containing oscillatory bursts to simulate local field potentials (LFPs) and tested the performance of each algorithm under varying signal-to-noise levels and event durations. We assessed the algorithmic performance by measuring the proportion of correct predictions made and how temporally accurate these detections are. Next, we constructed a decision matrix by defining an index of confidence using the performance metrics. This synthetized signal-based decision matrix was then applied to LFPs obtained from freely moving mice to compare the efficacy of the algorithms.

Simulation datasets
Each trial of a neural signal simulation was generated by superposing theta (9 ± 1 Hz), beta (25 ± 2 Hz), and low gamma (41 ± 4 Hz) oscillatory bursts onto a noisy time-series (figure 1(A)). These transient bursts were modeled by applying Tukey windows (cosine fraction = 0.5). To maintain the scale-free nature of the neurophysiological signals, the absolute length of the simulation was fixed across trials. For each simulated burst, its duration was randomly sampled from the uniform distribution (of 3-12 cycles). A total number of 75 bursts (i.e. 25 bursts for each frequency band) were embedded in a 300 s simulation (sampled at 512 Hz), of which the first and last 0.5 s of simulated data were burst-free. These bursts were randomly situated in a synthesized signal with a boundary condition of having the minimum inter-burst interval of 1.9531 ms. The amplitude of each burst was scaled with 1/f χ burst , where f burst is the frequency of a burst oscillatory event and χ defines the extent to which power decays with frequency, to reflect power-law dynamics of the neurophysiological signals (Quinn et al 2019). Hence, bursts defined by the central frequency 9, 25, and 41 Hz of each band were scaled by the constants 0.7192, 0.6170, and 0.5729.
The noisy time series consisted of two components: the aperiodic noise and the Gaussian noise. The aperiodic stochastic random noise process was designed as a first-order autoregressive model equivalent to an all-pole infinite impulse response (IIR) filter using a direct-pole placement approach (Quinn et al 2019) to simulate spontaneous Five signal components are synthesized to simulate a neural oscillatory activity. 9 ± 1 Hz theta bursts (light blue), 25 ± 2 Hz beta bursts (violet), and 41 ± 4 Hz gamma bursts (dark blue) are superposed onto a Gaussian white noise (red) and an aperiodic noise process (orange) that reproduces a 1/f scale-free dynamics of the neural oscillations to generate a single simulated signal (black). (B) The Welch's PSD estimates of simulated signals, each composed of different Gaussian white noise intensities indicated by the SNR dB levels. (C) The averaged Welch's PSD estimates of simulated signals (n = 11 000; black) and their isolated noise processes (n = 11 000; orange) present the landscape of our theoretical simulation, depicting spectral components of the burst-embedded signals and the power-law dynamics of incorporated noises. physiological signals with a 1/f χ noise. The additive white Gaussian noise was sampled from a normal distribution such that z ∼ N µ, σ 2 where µ = 0 and σ = 0.7192/ 20 √ 10 SNR dB . In this study, a total dataset of 11 000 signals was generated with an SNR dB ranging from −10 to 10 dB on a 2 dB increment (i.e. 1000 signals per each SNR dB value). The effect of using different SNR dB values is summarized with the Welch's power spectral density (PSD) estimates, computed with a 10 s window and 50% overlap ( figure 1(B)). The signal characteristics of the entire simulation dataset are outlined in figure 1(C), and PSDs were computed using a 30 s window with a 50% overlap.

Burst detection algorithms
Five different signal processing algorithms were used to detect bursts, and all algorithms were written and implemented in MATLAB (R2020a, Mathworks, MA, USA). The schematics and mathematical properties of these algorithms are detailed in figure 2. Two of the algorithms, BP and ENV, extract bursts in the time domain. For BP, the signal is first bandpass filtered and then subjected to dual-amplitude and duration thresholds (Feingold et al 2015). The bursts are defined by the points during which the amplitude values exceed σ standard deviations (SDs) above the mean of the filtered signal and prolong over the duration of three oscillatory cycles. The start and end of the bursts were marked as the times at which the amplitude falls below the amplitude sub-threshold (figure 2(A)). For ENV, the amplitude envelope of the bandpass-filtered signal is obtained by applying the Hilbert transform (Sporn et al 2020). Similar to BP, the amplitude envelope is evaluated against the amplitude and duration thresholds. However, a single amplitude threshold defined as the xth percentile of the envelope is used here. The time interval of the ENV-detected bursts is the period during which the amplitude values cross this threshold (figure 2(B)).
In the last three algorithms (i.e. S-STFT, MTP, CWT), a signal is first represented as a power spectrogram in the time-frequency domain, and each frequency band of a spectrogram is quantized into binary values according to a set power threshold, defined as σ SDs above the mean of the power values in each band (figures 2(C)-(E)). The time periods of burst detections are the points at which these binary values above the power threshold persist over at least 3 oscillatory cycles. The only difference between these algorithms lies in the type of spectrogram they use and the limitations that follow from this choice. For S-STFT, the spectrogram is computed using a short-time Fourier transform (STFT) with frequency-dependent window lengths (i.e. a constant Q transform (CQT)) (figure 2(C)). For MTP, the discrete prolate spheroidal sequences (DPSSs) are used as multitapers, and the power spectrogram is computed by the Fourier transformation with frequencydependent taper lengths (figure 2(D)). Lastly, CWT transforms the signal via the CWT using Morlet wavelets of different scales (figure 2(E)). Based on the spectrogram types, the trade-offs between time and frequency resolutions are determined.
The amplitude and power thresholds of the algorithms were determined based on previous literature and grid search-based hyperparameter optimization (Yu and Zhu 2020). First, we set a threshold to a certain value used in previous studies for each algorithm. Then, having this value as our initial point, we tested multiple hyperparameters across a fixed range and selected one that detects bursts most correctly from a synthetic signal with the least noise (10 dB) and varying durations (3-12 cycles). Mathematical formalisms of the algorithms and the CWT algorithms from left to right. The first row portrays the schematics of each algorithm as a two-step procedure. In the first step, the algorithm represents a neural activity on a specific signal domain. The second step shows a burst detection process on a given representation, followed by an expanded image of a detected burst. The second row summarizes the signal domains from which bursts are extracted, and the third row describes the mathematical transformations that individual algorithms use to transfer signals into a corresponding domain. The last row recapitulates the tradeoffs between the time and frequency resolutions inherent to different types of power spectrograms under the Heisenberg-Gabor time-frequency uncertainty principle.
exact parameter values used to set the amplitude and power thresholds can be found in analytical details.

Test of burst detection accuracy
The algorithms were tested on our simulated dataset to measure their performances in successfully detecting beta bursts with varying SNR dB and burst durations. First, we applied each burst detection algorithm to simulated signals and identified the time periods during which bursts occurred. Secondly, if the overlapping percentage of a detected and real (i.e. simulated) burst was higher than or equal to 30%, we considered the detected event as a true positive (TP). If otherwise, the burst was registered as a false positive (FP). The non-detected real bursts were regarded as false negatives (FN). In the case where one detected burst is identified as multiple real bursts or multiple detected bursts are identified as one real burst, we counted such detections as TPs. Lastly, for each SNR dB and burst duration, precision (P), recall (R), and F1score were defined as TP/(TP + FP), TP/(TP + FN), and (2 × P × R)/(P + R), respectively. Here, we selected F1-score to represent a harmonic balanced mean of precision and recall. For FPs, the burst duration was parameterized as the number of oscillatory cycles that corresponds closest to the detected period of a burst. Together, these three metrics quantify the classification accuracy of the algorithms.

Test of temporal concurrence
For each TP burst, we determined its temporal concurrence by calculating how much a burst overlap in time with its corresponding real burst. We denote the timestamp vectors of a real burst S and a detected burst D as ⃗ S n = s n,start , s n,stop and ⃗ D n = d n,start , d n,stop for nth TP burst where the subscripts, start and stop, indicate the onset and cessation of a burst, respectively. The temporal accuracy is then defined such that where N TP is the number of TP bursts. Temporal concurrence can also be interpreted as the Jaccard index (or Intersection over Union; IOU), whereby an intersecting period of a real burst and its detection is divided by a total time interval during which either S or D occurs.

Test of detection confidence
The final confidence score of a burst detection was formulated as a combination of detection accuracy and temporal concurrence. The detection confidence S Det was obtained using where F 1 is the F1-score and S c ∈ [0, 1] is the temporal concurrence, with i and j indicating a particular burst duration and SNR dB condition (Xiong et al 2017). The normalization constant e was adopted so that the measures were bound within the closed unit interval from 0 to 1. The cut-off value for the detection confidence was calculated using the cut-off values used for the F1-score and temporal concurrence, both of which were 0.75, resulting in 0.584. This cutoff served as a criterion based on which good and poor algorithmic performances were distinguished.

Experimental design
The dataset and experimental model used in this publication are from our previous study. For full details of the experiment protocol, readers are referred to Kim et al (2020). Only key details are summarized here. All experiments were approved by the Korea Institute of Science and Technology (KIST) Animal Care and Use Committee (permit number: 2019-095) and complied with the National Institutes of Health Guidelines to minimize the pain and discomfort of the animals. Eight healthy mice (male, 10-12 weeks old, >25 g) were obtained from the Jackson Laboratory [B6.Cg-Tg(Thy1-COP4/EYFP)18Gfng/J, stock number 007612]. All mice underwent the chronic electrode-implantation surgery, and experimental procedures were performed 2-3 weeks after the surgery. Mice were kept individually at the Animal Facility in the KIST on a 12-hour light/dark cycle with ad libitum feeding. The experimental paradigm adopted from our previous work was designed to observe the anxiogenic and defensive behaviors of mice in a naturalistic setting, under which a spider robot (Model 18143A, Academy Plastic Model Co. Ltd, Republic of Korea) chases and attacks the subject as a threatening agent. In this four-stage experimental protocol, each stage lasted for a minute, and a single session of the protocol was repeated over eight times for each individual mouse (2 consecutive days, 4 sessions per day). In this paper, only the first session of each day was used, making up a total of 16 trials (8 mice × 2 days × 1 session). For all the subsequent analyses, the recordings were segmented to Stage 2, where a spider robot was introduced in the threat zone to attack the animals.

LFP data acquisition
The LFP signals were recorded using a wireless CBRAIN (Collective Brain Research platform Aided by Illuminating Neural activity) headstage (2.6 g, INTAN RHD 2216 amplifier board embedded), band-pass filtered to cut-off frequencies of 1 Hz and 4 kHz. LFP recordings were collected using a custom signal acquisition software built on MATLAB (R2019a, Mathworks, MA, USA) and sampled at 1024 Hz with electrode impedance <300 kΩ. This dataset was technically validated with the spectrograms. All LFP signals were notch filtered at 60 Hz with a second-order IIR filter before spectral analyses took place. For the acquisition of behavioral data, a high-speed, complementary metaloxide semiconductor-based camera (Lt225, Teledyne Lumenera, Ottawa, ON, Canada) was installed 1.6 m above the arena (floor area 60 × 60 cm), aligned perpendicular to its floor. A video of an experiment was recorded using a spectral camera and the StreamPix 7 software (Norpix Inc., Montreal, Quebec, Canada) at 30 FPS with a resolution of 1000 × 1000 pixels.

LFP data analyses
To evaluate the efficacy of each detection algorithm, we applied all five methods to LFP signals to detect bursts in three different frequency bands: theta (8-10 Hz), beta (20-30 Hz), and low gamma (35-45 Hz). Gamma bursts in the BLA served as our starting point, as increases in BLA gamma burst rates were previously observed in mice at the low gamma range (24-56 Hz) during robot attacks (Kim et al 2020). As neural oscillatory events often contain widely varying information across different frequency bands, the scope of the burst detection task was expanded to the theta and beta bands. For the beta and gamma burst detection, the hyperparameters of the algorithms were held identical to those used in the simulation. For the theta burst detection, the hyperparameters were adjusted and optimized on the empirical data as described in burst detection algorithms, as neural events were rarely detected with the default hyperparameters. The exact values of these hyperparameters are outlined in analytical details.
The real data we used were restricted to the Stage 2 (60-120 s) of the dataset, but the amplitude thresholds for the algorithms were defined on the full signal to account for the dynamics of neural activities throughout the entire experiment. LFP burst detections of each algorithm were evaluated against the ground truth burst events annotated by the human expert, and these evaluations were summarized in different metrics (i.e. P, R, F1-score, temporal concurrence, and detection confidence). The sum of variances and the sum of distances were calculated to investigate how consistent each algorithm is across different metrics. The former was measured by summing the variance of each metric value across trials (n = 16). The latter was computed as the L1 norm between the best possible score and trial-wise algorithmic performance, n |1 − metric n |, where n indicates the trial index.

Human burst annotations
The ground truth events in the LFP data were identified by four human researchers who annotated the onset and offset of the bursts under mutually agreed criteria. The researchers consisted of graduate and undergraduate students who have experience in neural signal processing, and we reconfirmed the annotations to ensure no errors were made. The steps and criteria for burst annotation are as follows: 1. Filter the signal to a specified frequency range. 2. Assign a threshold as 2 SDs above the signal mean. 3. Select the oscillatory cycles as a burst event if they persist over three cycles while crossing the threshold. 4. Identify two cycles, each in front and back of a selected burst event, as our starting and ending cycle. 5. Mark the onset and offset from these two cycles.
a. If the peaks of the starting and ending cycle are close to the threshold, define the onset and offset as the time points right before and after the peak happens, respectively. b. If these peaks are far below the threshold, mark the time points located middle of the starting or ending cycle and burst event.
The bursts annotated by human researchers did not show much subject variability. Yet, there were some degrees of subjectivity involved in step 5, leading to slight deviances between the marked onset and offset. However, this difference was small, and the time point values did not vary much. Since the human annotation process was most akin to BP, we implemented two exceptional cases to reduce potential biases: • If one cycle of the burst event touches the threshold, we consider it to have crossed the threshold. • For burst events prolonging more than four cycles, if there is one cycle within the burst event that is below the threshold, we consider it as a part of the burst.
These additional criteria allowed us to capture transient bursts without having them strictly constrained by the predefined threshold and consider the algorithms with smoother temporal representation (i.e. ENV, S-STFT, MTP) on a more leveled standing. The annotated burst events were assigned as ground truths if they were marked by all four researchers. Having the onsets and offsets of these ground truths, we compared the bursts detected by the algorithms against them and measured algorithmic performances in terms of classification and temporal accuracy.

Algorithm selection rule
In real experiments, the burst duration and SNR dB of a burst are not given; therefore, comparing and choosing an appropriate burst detection algorithm for a dataset is a difficult problem. Even in the cases of general-purpose event detection algorithms, there are no commonly accepted selection criteria, often requiring an ad hoc selection rule. Here, we developed a selection criterion by mapping the characteristic performance of simulated bursts directly onto real bursts detected from our experimental data. For the mathematical formulation of this algorithm selection problem, refer to analytical details. The basic idea behind the algorithm selection rule is that we estimate how likely each algorithm will perform on experimental data based on its performance on the simulated dataset. Our selection rule is as follows. First, we apply all the algorithms to a sample trial and detect bursts using each individual method. Secondly, we calculate the duration and SNR dB of each burst. The duration of a burst is converted from sample points to cycle lengths based on our frequency of interest, following τ cycle = L Burst · f · P s with L Burst as the length of a burst (in samples), f as the frequency of interest, and P s as the sampling period. Because our frequency of interest lies in the beta and gamma range, the duration of each burst was estimated using its mid-frequency (i.e. f = 25 and 40 Hz). The SNR dB of a burst is defined by where A Signal and A Noise are root mean squares of the amplitude measured from the burst and non-burst periods, respectively. For the rare cases in which the estimated duration of a burst was longer than the maximum value of 12 cycles, the duration was floored to the maximum. Likewise, if the estimated SNR dB level of a burst was larger than 10 dB, it was floored back to the maximum. Next, we choose the cost metric of our interest (e.g. F1-score, detection confidence) and map the burst detections to their metric values. For each simulated burst of specific duration and noise level, we already have a performance metric that indicates how confident each algorithm is in detecting a burst having these characteristics. Therefore, for each algorithm, we can map its burst detections to their corresponding cost metric values obtained from the simulation based on their duration and SNR dB . Since we used even SNR dB values, if the estimated noise level was odd, we averaged metric values mapped to two adjacent SNR dB . From this set of mappings, we can derive a distribution of performance metrics for each algorithm. Specifically, we obtain and plot the empirical cumulative distribution function (eCDF) of the metric values for each algorithm. Lastly, the area under the curve (AUC) of the eCDF is calculated. Because the best method will return the eCDF curve with a sharp increase near 1, we choose the method with a minimal AUC. This rule serves as a potential solution to the burst detection algorithm selection problem and was validated to be robust enough using our simulation data.

Detection confidence metrics by rank
To additionally assess the effectiveness of our burst selection rule, we proceeded to evaluate the confidence scores associated with bursts detected by the algorithm deemed the most optimal in each trial. In this process, the algorithms were ranked based on their inverse AUC values for each trial, and their burst detections were recorded. For all 16 trials, the confidence scores corresponding to their respective burst detections were organized into sets based on their ranks. We subsequently conducted a comparative analysis to determine whether the topranked algorithms exhibited superior performance compared to the lower-ranked methods. This analysis serves to reinforce the evaluation of the algorithm selection rule by examining its application at the individual trial level prior to aggregating the results across all trials. By scrutinizing the selection rule on a trial-by-trial basis, we aimed to gain insights into its efficacy and ensure a comprehensive assessment of how our selection rule determines the overall best algorithm.

Statistical analysis
Statistical analyses of performance metric values concentrated on comparing the optimal algorithm, as determined by our selection rule, with the remaining four algorithms. The differences in inverse AUC values among the five algorithms were assessed using the one-way analysis of variance (ANOVA) test. To address the issue of multiple comparisons, the post hoc Tukey-Kramer test was subsequently employed to compare the best algorithm against the other methods. Furthermore, to investigate the disparities in burst property distributions and confidence scores across different rankings, the Kruskal-Wallis test and the post hoc Dunn's test with Bonferroni correction were utilized. The significance level for all statistical tests was set at 0.05 to establish the threshold for determining statistical significance.

Analytical details
The mathematical backgrounds behind the algorithms and the selection rule described above are outlined below. The actual implementations of these details are shared in the code repository. Based on the literature review conducted for our study, it appears that BP and ENV are used equally as a time-domain method, but CWT was chosen more frequently as a time-frequency domain method than S-STFT and MTP.

Algorithm 1: bandpass filtering (BP)
The BP algorithm detects bursts from a filtered neural signal by marking oscillatory events that surpass a set threshold (Neymotin et al 2008, Feingold et al 2015, Nair et al 2018. This algorithm first bandpass filters the input signal using a two-way least-squares finite impulse response (FIR) filter (eegfilt.m function in the EEGLAB toolbox). The filter was chosen for its stability and to prevent possible phase distortions (Voytek et al 2015). For the theoretical simulation, the passband cut-off frequencies were defined to be 2 Hz away from the center frequency, i.e. f 0 ± 2. The amplitude threshold and subthreshold were set as 2.5 and 1.5 SDs above the mean of a filtered signal, respectively. If two points of a filtered signal above the amplitude threshold were separated by at least 1 oscillatory cycle length, these points were considered as belonging to separate bursts. After burst amplitudes have been prescreened, the duration threshold of three cycle lengths was applied, and all bursts that are shorter than this value were eliminated. Three cycles were chosen as our duration threshold, because any length measurement below this value can overestimate the number of cycles and thus provide unreliable analyses . Once every burst event was finalized, the beginning and end of each event were registered as times at which the amplitude fell below the subthreshold before and after the detected points (Nair et al 2018). For the theta burst detection task, the amplitude threshold was set as 1.8 SDs above the mean, with other hyperparameters held equal.

Algorithm 2: amplitude envelope (ENV)
The ENV algorithm filters a signal with a two-way least-squares FIR filter and computes an analytic complex-valued time series via the Hilbert transform. The amplitude envelope of a filtered signal is then extracted by taking absolute values of this time series. Next, the amplitude and duration thresholds are applied to the enveloped output as done in the BP method. Contrary to BP, however, the ENV method uses a single amplitude threshold defined as the 95th percentile of an envelope to screen amplitude crossings and mark the beginning and end of each burst (Sporn et al 2020). For the theta burst detection task, the threshold was lowered to the 90th percentile. The use of amplitude envelope in burst detection was first discussed in Poil et al (2008) and Tinkhauser et al (2017).

Algorithm 3: single-tapered shorttime Fourier transform (S-STFT)
To identify transient spectral events from neural signals, the S-STFT algorithm was applied as a two-step process. First, a signal is represented in a time-frequency domain as a power spectrogram. From this representation, bursts are detected according to the predefined amplitude and duration thresholds.
In this method, a spectrogram is calculated by convolving a signal with a window function using the short-time Fourier transform (STFT). Mathematically, the short-time frequency spectrum X (f, t) is given by the equation where τ is the time variable, w (τ − t) is the shifted window, and x (t) is the input signal (Allen 1977). Here, x (t) is a segment of an original input signal that is partitioned into a length equal to that of the window. The Fourier analysis is applied to each segment separately. Throughout the study, the Hanning window was used for our single taper (i.e. window) function.
The STFT was first used for detecting transient stimulus-evoked oscillations to study human auditory responses (Makeig 1993). Since neural bursts are time-varying and isolated in a very short time interval, we employed windows with frequency-dependent lengths to compute the Fourier transform. Such modified version of STFT (S-STFT) is also known as the CQT and ensures that the frequency bandwidths are proportional to the center frequencies of each band (Harris 1976, Brown 1991. Hence, the width of a Hanning window was defined as L = c · f s /f where c is the number of oscillatory cycles, f s is the sampling frequency, and f is the frequency of each band. For computing power spectrograms, c was fixed to six cycles. The frequencies f analyzed by the S-STFT were sparsely distributed for our study. While 24 voices per octave are usually used for the CQT to match the human auditory perception (Brown and Puckette 1992), because our focus was on a broader frequency range, we performed spectral analysis from 15 to 60 Hz, with a frequency axis separated by a 5 Hz increment from 15 to 30 Hz and a 10 Hz increment from 30 to 60 Hz.
Given a time-frequency representation of the CQT, we first binarily quantized a spectrogram by converting the points above the power threshold to one and those below to zero. A set of points that did not continue over the duration threshold were eliminated. The neural oscillatory bursts are thus defined as time intervals during which the instantaneous spectral power exceeds the power threshold bounded at 1.8 SDs above the trial mean value and lasts over the duration of at least three cycles. These bursts must be separated by at least one cycle length in time to be considered as separate bursts; otherwise, they were recorded as a single burst. The bursts are detected within a particular frequency bin that corresponds to the frequency range of our interest for each individual trial, and this process is repeated until burst detections took place in all specified frequency bins. The power threshold was lowered to 1.2 SDs above the mean during the theta burst detection task, with all other hyperparameters held equal.

Algorithm 4: multitaper method (MTP)
The MTP method is primarily adapted from Lundqvist et al (2016) and follows the same logic as delineated in the S-STFT section. While Lundqvist et al (2016) appears to have used a multitaper spectrogram specifically for the burst detection task, its application to the neural data analysis was introduced in earlier works (Mitra and Pesaran 1999, Jarvis and Mitra 2001, Hoogenboon et al 2006, van Vugt et al 2007. To compute a power spectrogram, we used the DPSS as our tapers in order to exploit their orthogonality. Since DPSS tapers are orthogonal to one another, these tapers each represent unique information of a signal when they are projected onto it (resulting in mutually independent spectral estimates), and the variance of spectral estimates decreases when they are averaged (van Vugt Mk et al 2007, Chandran et al 2016. Having a set of where K is the number of windows and h m is the mth DPSS taper function (Chandran et al 2016). Similar to the STFT, short time segments of a signal are multiplied by each of the DPSS windows and taken to the Fourier transform separately. The result of a transformation is then converted to real values and averaged over the number of tapers. The averaging of these spectral estimates is known to reduce the variance of the spectrum by √ K .
For a successful construction of the DPSS tapers, two parameters are imperative: the length of the time period over which the data are thought to be stationary and the desired spectral resolution (Prerau et al 2017). The data segment size N defined as the maximum time period (seconds) at which the data are thought to be stationary can be conceptualized as our window size, or the temporal resolution ∆t of a spectrogram. Therefore, N = ∆t = L/f s where the window length L (samples) is defined as in the BP algorithm. It is important to note that the MTP algorithm here utilizes tapers of different lengths at each frequency band, in contrast to how this method is conventionally practiced. The second parameter, the spectral resolution ∆f, is the bandwidth (Hz) of the main lobe in the spectral estimate and governs the minimum spectral range we can resolve on a spectrogram (Prerau et al 2017).
Once these two parameters are determined, we can compute the time-half-bandwidth product α such that Babadi and Brown 2014). Given such product, the number of tapers necessary for the spectral estimation is calculated as K = 2α − 1 (Prerau et al 2017).
With α and K, we finally construct DPSS functions at a frequency f as the solution to an optimization problem of finding a set of K tapers that maximize the spectral concentration within the desired bandwidth. The problem being here is a Fourier transform of the DPSS function (Babadi and Brown 2014). For our multitaper spectrograms, we used α = 2 and K = 3, and the set of frequencies analyzed was identical to the set used for the S-STFT method.
Finally, bursts were detected by binarizing a multitaper spectrogram such that the points crossing the power threshold of 1.8 SDs above the trial mean were put to one and the rest was set to zero. These crossings were registered as a burst if their points continued over the duration of three oscillatory cycles of our frequency of interest (Lundqvist et al 2016). Bursts were detected from each band for all specified frequencies.
For the theta burst detection task, the power threshold was set to 1.3 SDs above the trial mean, and only two tapers were used. The window was slid with the time step of 0.005 ms, in contrast to 0.01 ms used for S-STFT and CWT.

Algorithm 5: continuous wavelet transform (CWT)
The procedure taken by the CWT algorithm is identical to that of the S-STFT and MTP methods, except that the spectrogram is computed by the wavelet transform. The wavelet spectrogram was first used to detect oscillatory burst events in Tallon-Baudry et al (1996) and subsequent papers (Tallon-Baudry and Bertrand 1999, Lakatos et al 2004). The waveletbased spectrogram can be computed using the CWT by which the input signal is convolved with the wavelets of different heights and widths. The CWT of a signal x (t) is defined as in which u is the translation factor, s is the scale factor, ψ (t) is the mother wavelet, and ψ * (t) is the child wavelet translated and scaled from ψ (t) (Vrhel et al 1995). In this equation, 1 √ s is the energy normalization constant, and u was fixed to 1. A set of all child wavelets forms a wavelet family, and the CWT is a convolution of a signal by each member of this family. The power values are calculated by taking the absolute value of the coefficient magnitudes from the complex wavelet-convolved data.
To obtain a wavelet spectrogram, the cwt.m function in the MATLAB Signal Processing Toolbox (R2020a, Mathworks, MA, USA) was used. The Morlet wavelet, which is essentially a modulated Gaussian function, was selected for the CWT since it is commonly used for the analysis of human EEG due to its sinusoidal oscillatory shape (Schiff et al 1994). The analytic Morlet wavelet given by the software is defined in the frequency domain aŝ where ω 0 is the center frequency of a mother wavelet and was set to 6, andÛ (ω) is a unit step function in the frequency domain. The unit step function was implemented to make the wavelet function purely analytic such that the function becomes zero for all negative frequencies (Mallat 2009). It should be additionally remarked here that the complex Morlet wavelet of different mathematical forms has also been widely used for wavelet spectral analyses and burst detections (Sherman et al 2016, Shin et al 2017, Wessel 2020. The entire frequency axis of a spectrogram was covered by adjusting the scale of the mother wavelet. The scales for a wavelet were discretized by ten voices per octave, and the minimum and maximum scales were determined automatically by MATLAB based on the energy spread of the analytic Morlet wavelet. While the initial output of the CWT results in a scalogram represented on a time-scale plane, scales were converted into frequencies to construct a time-frequency power spectrogram (scal2frq.m in the MATLAB Wavelet Toolbox). That this conversion is dependent on a mother wavelet comes as a disadvantage for the CWT, because it implies that some inflexibilities are incurred when we determine our frequencies of interest (Sejdić et al 2009). The suitability of a mother wavelet should thus be considered under the perspective of whether the shape of a wavelet sufficiently reflects signal components we are interested in and if the scale dilation of a wavelet outputs our desired frequencies.
The binary quantization of a power spectrogram and the subsequent burst detection followed the same procedure as described for the S-STFT and MTP algorithms. The power and duration thresholds were defined as identical to those defined for the S-STFT and MTP, except for the theta burst detection, in which the power threshold of 1.0 SDs above the mean was used. On a side note, the wavelet time-frequency representation has also been implemented in a BP algorithm-like fashion, in which the authors computed the average of the wavelet power spectrogram over the given frequencies (scales), and the scaleaveraged wavelet power values over 2 SDs above the mean were selected as gamma bursts (Lakatos et al 2004).

Time-frequency uncertainty principle
The spectral analysis methods in our study (i.e. S-STFT, MTP, CWT) represent the spectral information of a signal by isolating its segment in a time window whose length is dependent on the particular frequency band. This technique has been used only in CWT, but it was adopted by S-STFT and MTP so that every algorithm may detect transient power events at higher frequencies. Representation of a signal in the time-frequency domain introduces a trade-off between the temporal and spectral resolution, whereby an increase in one resolution leads to a decrease in another (Sinkkonen et al 1995). This trade-off can be mathematically described by the Heisenberg-Gabor uncertainty principle with the inequality in which the time-bandwidth product is denoted as the Heisenberg area A ψ , and σ t and σ ω are the standard deviations of the duration and bandwidth of a signal (Cohen 1989). While resolutions may vary, the Heisenberg area is set to a constant (for the wavelet transform, this indicates that each member of the wavelet family will have an identical Q-factor-the ratio of the center frequency to bandwidth) (Ladd and Wilson 1993). Our choice of the window function defines this area, as different window types assume different resolutions and spectral leakage constraints (Oppenheim and Schaefer 2013). However, it is worth noting that, because most of the commonly used windows have decent spectral concentration, the window length is a more critical factor for resolutions than the type of window used (Chandran et al 2016).

Algorithm selection problem
One of the primary purposes of our research is to solve the problem of selecting an optimal burst detection algorithm for a specific dataset. Here, we mathematically formulate our selection problem using the basic abstract model first suggested by Rice (1976). The algorithm selection problem states that: Given the trial x ∈ P and the algorithm A ∈ A where P is a problem space (i.e. collection of trials) and A is an algorithm space (i.e. portfolio of all algorithms), determine the selection mapping S (x) where s : P → A yields maximum performance score for each trial.
The performance scores are mapped from an algorithm space to a performance measure space R n via performance mapping p : A × P → R n . In our case, p (A, x) would output an eCDF of our particular cost metric (e.g. detection confidence) which can in turn norm mapped to AUC. Then, for each trial, we find arg min A∈A ||p (A, x) || for a trial x. We define the algorithm to be optimal when a particular algorithm is most frequently identified with the minimum AUC across all the trials (cf MTP in figure 7(A)).

Comparison of algorithmic performances on simulation dataset
3.1.1. Precision, recall, and F1-score To determine the efficiency of the algorithms, we first quantified the detection accuracy of each method based on the classification statistics of burst detections extracted from the simulated signals. We applied five algorithms (i.e. BP, ENV, S-STFT, MTP, and CWT) to detect beta (23-27 Hz) oscillatory bursts and evaluated P, R, and F1-score for different burst properties, each paired by specific burst duration and SNR dB . For each of these signal conditions, we then identified the best-and worst-performing methods to compare algorithmic performances (figure 3). Difference in classification accuracy was most evident when detecting transient and noisy bursts. While salient bursts longer than five cycles were minimally affected by the choice of the algorithm, detection performances on weaker and shorter bursts showed high variances across the employed methods. The heatmaps of P, R, and F1-score for individual algorithms are shown in figure 3-1. As expected, the classification accuracy dropped rapidly as the duration and SNR dB level decreased.
In general, all five algorithms produced high precision scores for bursts longer than 200 ms. Precisions for these bursts were primarily noiseinvariant, except for the cases where sufficiently strong noises (≤−6 dB) were incorporated in the signals ( figure 3-1, first column). For bursts of shorter durations (≤200 ms), low precision scores were observed, and the scores decreased as more noises were introduced. While MTP performed best in detecting the shortest bursts (120 ms), BP also performed well on shorter bursts over the wide range of SNR dB levels (−10-6 dB), with S-STFT surpassing BP only when the noise is sufficiently weak ( figure 3(A)). On the other hand, MTP resulted in the lowest precision across most of the burst durations (⩾160 ms) and all noise levels ( figure 3(B)). To contrast the best-and worstperforming algorithms, we further inspected conditions in which the differences between the highest and lowest precision scores are maximized. Precision was highly dependent on the selected algorithm when burst events are shorter and heavily affected by noise ( figure 3(C)). Similarly, every algorithm reported high recall values for the longer bursts under sufficiently low noises (⩾200 ms, ⩾−2 dB) (figure 3-1, second column). Each heatmap showed a clear boundary that demarcates low and high recall values in a shape of an exponentially decaying curve. Hence, the algorithms were relatively more invariant to burst durations for recall than precision. Recall metric also revealed a pattern of algorithmic performances opposite from that shown for precision. First, MTP and ENV produced the highest recall for the shorter (≤160 ms) and longer bursts (⩾240 ms), respectively, across higher SNR dB levels (⩾−2 dB), although MTP dominated the detection of noisy bursts across the entire range of burst duration ( figure 3(D)). Secondly, BP and S-STFT were identified as the worst-performing algorithms for recall. BP performed poorly for bursts under stronger noise across different durations, The color scheme for temporal concurrence values at each burst duration and SNR dB is identical to (I). The first and second moments of temporal concurrences for each SNR dB under particular duration constraints are plotted as a bar graph (gray) and an error bar (black), respectively. whereas S-STFT performed worse on the highly salient bursts (figure 3(E)). As in the case of precision, however, the differences between best-and worstperforming algorithms (figure 3(F)) illustrated a large algorithmic dependency for shorter and somewhat salient bursts (≤160 ms, ⩾−4 dB), as well as longer and noisy bursts (⩾160 ms, ≤−4 dB).
To measure the algorithmic efficacy in a comprehensive manner, F1-score was computed using precision and recall. On the heatmaps of individual algorithms, the concave-shaped boundary previously shown for recall was retained (figure 3-1, third column). MTP and CWT scored the highest F1 scores on detecting shorter bursts, whereas BP scored the lowest (figures 3(G) and (H)). However, for MTP, such good performance was restricted to the shortest bursts (120 ms). Although ENV performed better than MTP as the event duration increased (⩾240 ms), this observation was largely irrelevant for the corresponding durations (figure 3(I)). Of note, for all three metrics we used, differences between best and worst performances diminished when bursts were both short and strongly noisy (≤160 ms, ≤−4 dB), as every algorithm performed poorly in such extreme cases.

Temporal concurrence
The key question in neural burst studies is whether there exists a causal relationship between a burst and a specific behavior. Due to the complex nature of oscillatory bursts, however, the exact time points of a burst are hard to articulate in nonstationary signals, making this causality hard to infer. Therefore, to investigate temporal errors of burst detections, we measured temporal concurrence for each detection algorithm (figure 4(A); see figure 4-1 for measurements in real data) and recapitulated it as a function of the burst duration and SNR dB (figure 4-2). Consistent with the previous results on classification accuracy, every algorithm provided high concurrence scores for longer and more salient burst events. Yet, for longer bursts with a low SNR dB , these scores rather decreased. With a cut-off criterion of 75%, 0-2 dB or 2-4 dB of the SNR dB level served as the floor condition for a good temporal concurrence in all of the tested methods, indicating that the noise serves as a more influential factor in determining temporal concurrence than the burst duration.
To observe the algorithmic dependency, the bestand worst-performing algorithms were then visualized with respect to temporal concurrence. Here, CWT captured event periods most precisely for the bursts shorter than 160 ms. ENV and MTP reported high temporal accuracy for the longer bursts, with ENV performing better for salient bursts and MTP better for noisy bursts ( figure 4(B)). It is also important to note that CWT and MTP revealed an opposing pattern: for transient bursts (<200 ms), CWT performed best and MTP performed worst in general; for weak but longer bursts (<−2 dB, >280 ms), MTP often output the highest temporal concurrence, while CWT received the lowest scores (figures 4(B) and (C)). Nonetheless, there was no noticeable difference in concurrences between the algorithms, even though a slight increase in difference values was observed for the shortest bursts embedded in weak noises (figure 4(D)).
Next, to clarify the effect of signal properties on temporal concurrence, we looked at how concurrence changes across a signal condition with another property fixed. First, the temporal accuracy across durations was probed while the noise was fixed. At a low noise regime (SNR dB = 2-6 dB), mean temporal concurrence, averaged over algorithms and a range of noise conditions, gradually increased proportional to the burst duration, whereas at a high noise regime (SNR dB = −6 to −2 dB), it monotonically decreased (figure 4(E)). With stronger noises, longer bursts suffered from lower temporal accuracy, because accumulated detection errors over time points outweighed an increased probability of the overlap between a true burst and its detection. Next, the temporal accuracy was examined across noise levels with the burst duration fixed. At both short (duration = 120-200 ms) and long (duration = 400-480 ms) duration regimes, concurrence scores increased rapidly as noise decreased, although the rate of change was higher for the long duration regime (figure 4(F)). Together, these results suggested that temporal concurrence is more dependent on the variability of SNR dB than that of burst duration, contrary to what was shown for classification accuracy.

Detection confidence
Detection performance (represented by F1-score) and temporal concurrence were then considered simultaneously by adapting the detection confidence measure initially implemented in the temporal action detection task (figure 5(A); Xiong et al 2017). In figure 5-1, we depict detection confidence heatmaps for each individual algorithm. Having relatively nondifferential temporal concurrences, the overall topographical pattern of these heatmaps showed patterns similar to those observed for F1-score, with a clear boundary of an exponentially decaying shape demarcating high and low confidence measures (cf figure 3-1). Every algorithm reported high confidence scores for longer and more salient bursts but lower confidence for shorter or weaker bursts.
The best-and worst-performing algorithms in detection confidence both showed balanced topographies, evenly reflecting the results previously seen from F1-score and temporal concurrence (figures 5(B) and (C)). For the former case, MTP and CWT performed best on the shorter bursts (≤200 ms), while for the longer bursts, ENV and MTP yielded the highest confidence scores when noises were weak and strong, respectively ( figure 5(B)). For the latter case, BP and S-STFT performed worst for detecting the shortest (120 ms) and longer (≤280 ms) bursts across most of the SNR dB levels (figure 5(C)). Additionally, MTP returned the lowest confidence scores for relatively longer bursts (160-240 ms). Still, all algorithms produced similar confidence scores for multiple conditions of the burst duration and SNR dB . The differences between the highest and lowest confidence scores did not reveal any large variances in their values across algorithms, although this contrast was relatively more prominent on the shorter and more salient bursts (⩾−4 dB, ≤240 ms) (figure 5(D)). Following our knowledge of the best-performing algorithms, we summarized the optimal algorithms for each specific pair of burst properties as a detection confidence-based decision matrix (figure 5(E)).
To additionally inspect how differences between algorithms translate into other frequency bands, we computed decision matrices for theta and low gamma bursts. By comparing these matrices, we could observe which patterns of algorithmic differences generalize across frequency bands. Gamma burst detection showed a similar pattern of detection confidence as beta burst detection (figure 5-2). MTP performed best for transient bursts, as well as bursts that were long and noisy. For bursts that were relatively longer but more salient, ENV, S-STFT, and CWT were better. However, for theta burst detection, the predominance of MTP broadened across most of the burst characteristics, with S-STFT and CWT preserving its influence slightly ( figure 5-2). Thus, we concluded that as the overall duration and amplitude of bursts increase (cf figures 1(A) and (C)) from the gamma band to theta band, the performance of MTP also improved. This enhancement is likely due to its ability to smooth and denoise windowed signals, reducing spectral leakages of a frequency bin if a sufficient length of windowing is guaranteed. The best-performing algorithms and their highest detection confidence scores are presented as a heatmap for each burst duration and SNR dB level in a percentage scale. Detection confidence values were approximated to their first decimal places before being numerically compared. Each algorithm is specified by its color, and if multiple algorithms resulted in identical detection confidence scores, these values were marked in gray. (C) Identical to (B), but for the lowest detection confidences yielded by the worst-performing algorithms. Confidence scores of each individual algorithm can be found in figure 5-1. (D) The difference between best and worst performances was calculated by subtracting the lowest detection confidence from the highest detection confidence for each pair of burst properties and is shown as a heatmap with a cut-off at 20%. (E) Based on the heatmap of best-performing algorithms in (B), the decision matrix (left) that demonstrates the best algorithms for each duration and SNR dB was plotted. Each cell includes a number identifier, which refers to a single method or multiple algorithms that performed most reliably under specific signal properties (right).

Algorithm selection rule and its application to burst detection tasks
The performance of a burst detection algorithm depends on the distribution of burst events across signal properties in each dataset. To address this challenge, we developed a criterion to select the optimal burst detection algorithm for a particular dataset (figure 6(A)). We tested this selection rule on mouse BLA LFPs (16 trials, 1 min) obtained from a previous study where a mouse attempted to escape from a spider robot attack (figure 6(B); Kim et al 2020). Since neural signals often encode distinct information at different frequencies, the selection rule may not operate uniformly for bursts in different frequency bands. Hence, we applied the test to LFP signals filtered into theta, beta, and gamma bands as three separate burst detection tasks. Following our rule, in each task we initially applied five different algorithms to individual trials to detect bursts and calculated the event duration and SNR dB level of each detection (figure 6(E)). We then mapped each pair of burst properties to a cost metric of our choice, namely detection confidence. These mappings for each trial were used to compute eCDF and its AUC (figure 6(C)). Finally, the algorithm with the lowest mean AUC across trials was selected as the optimal algorithm (figure 7).
Before assessing the effectiveness of the algorithm selection process, we examined the bursts detected by each algorithm to ensure that the distributions of burst durations and noise levels exhibited distinct characteristics across frequency bands. We hypothesized that if these distributions were statistically different, the burst detection tasks could serve as a testbed for evaluating the generalizability of algorithmic performance. Our statistical analysis revealed significant differences in both burst properties, suggesting that these tasks were suitable for assessing how the same algorithm behaves when data exhibit varying distributions of burst properties (figure 6(D)). Furthermore, the number of bursts captured from each frequency band was reasonable, given the previous findings supporting increased theta and gamma activities when dealing with fear in adverse environments (Likhtik et al 2014, Kim et al 2020. The application of the algorithm selection rule in burst detection tasks demonstrated that our method could identify an algorithm that is statistically superior to the others based on empirical data. In the theta burst detection task, MTP was identified as the bestperforming algorithm, exhibiting significantly higher performance than all other approaches ( figure 7(A)). However, in the beta and gamma burst detection tasks, while CWT reported the highest inverse AUC values, it did not report statistical significance over the other algorithms (figures 7(B) and (C)). This observation suggests that multiple algorithms can yield similar performances for a given input data. By applying the algorithm selection rule, we could identify a putatively best-performing algorithm based on the unique information within a dataset; however, this inference did not always result in a single algorithm that statistically outperformed the other methods. The selection rule thus highlights that the behavior of an algorithm can deviate largely depending on the For each trial, algorithms were ranked based on their inverse AUC values, and detection confidence scores of their burst detections were extracted. The average and standard deviation of these scores, sorted by rank, are presented as bar plots and error bars, respectively. The top left inset provides an overview of the statistical difference between bursts from Rank #1 algorithms and other ranks. In the theta burst detection task, bursts from the top rank demonstrated significantly higher confidence scores compared to lower ranks (1 vs. 2: p = 3.000 × 10 −4 ; vs. 3: p = 7.436 × 10 −8 ; vs. 4: p = 4.405 × 10 −11 ; vs. 5: p = 5.282 × 10 −16 ), as indicated by the Kruskal-Wallis test (p = 2.991 × 10 −17 , H = 83.61). The top right inset shows the distribution of algorithms within each rank. (E) The same analysis as in (D) was performed for the beta burst detection task. Confidence scores of the top-ranked algorithms outperformed those of the lower ranks (p = 5.444 × 10 −7 , H = 34.66). The top three ranks did not report any statistical difference in confidence scores, but the confidence scores of the first rank were significantly higher than those of the bottom two ranks (1 vs. 2: p = 0.6560; vs. 3: p = 0.1935; vs. 4: p = 3.000 × 10 −4 ; vs. 5: p = 6.654 × 10 −7 ). (F) The same analysis as in (D) was conducted for the gamma burst detection task. Following the Kruskal-Wallis test (p = 8.084 × 10 −11 , H = 53.11), the post hoc test revealed that while the detection confidence scores of the top two ranked algorithms showed no difference, the other three ranks displayed significantly lower scores compared to the top rank (1 vs. 2: p = 0.9752; vs. 3: p = 0.04770; vs. 4: p = 1.000 × 10 −4 ; vs. 5: p = 3.087 × 10 −10 ). The Tukey-Kramer test was employed as the post hoc multiple comparisons test for (A)-(C), while the Dunn's test with Bonferroni correction was used for (D)-(F). Statistical significance levels are denoted as * p ⩽ 0.05, * * p ⩽ 0.01, * * * p ⩽ 0.001, and n.s. indicates non-significant results.
configurations of burst properties within a dataset. For selections of the optimal algorithm using cost metrics other than detection confidence in the theta, beta, and gamma burst detection, readers should refer to figures 7-1-7-3, respectively.
We also examined the algorithm selection rule at a single-trial level to explain why the algorithms exhibited non-significant performance differences in certain detection tasks. When detecting theta bursts, MTP consistently performed the best in most trials, with the detection confidence scores of the topranked algorithms being consistently higher than those of lower ranks ( figure 7(D)). On the other hand, figure 7(E) shows that CWT, S-STFT, and ENV occupied approximately the same number of positions among the top three ranks in the beta burst detection task. Consequently, the confidence scores of the bursts in these ranks did not exhibit significant differences. These results illustrate why these three algorithms reported similar inverse AUC values Table 1. Evaluation of algorithmic performances against manual human annotations. For each algorithm applied to the real data (n = 16 trials; see figure 5), the efficacy of theta burst detections estimated by the moments, distance, and variance of five different metrics is evaluated against the manual human annotations. Distance and variance of the algorithms are further summarized by the sums across all metrics. For algorithmic performances on beta and gamma burst detection, refer to tables 1-1 and 1-2.
Algorithmic performances against human annotated ground truth labels: theta burst detection a The sum of Manhattan distances (i.e. L1 norm) between the best score and trial-wise algorithmic performance score, ∑ n |1 − metricn| where n indicates the trial number. b The variance of metric values across trials (n = 16) for each algorithm.
Note: Bold indicates the algorithm with minimum sums of distances and variances.
in the algorithm selection process, leading to the lack of significant differences between algorithms in figure 7(B). A similar observation can be made for the gamma burst detection task, where CWT, S-STFT, and BP were consistently among the top tworanked algorithms across trials, with no significant differences in confidence scores between these ranks (figure 7(F)). This explains the comparable performance between these algorithms when the selection rule was applied to the entire dataset across all trials.
Finally, to quantitatively compare the selected algorithms with those chosen based on human visual screenings, we evaluated their performance using various metrics. By comparing the burst detections of each algorithm against the ground truth labels annotated by human researchers, we computed the classification accuracy, temporal concurrence, and detection confidence for each algorithm. These performance metrics were summarized as the mean and standard deviation across trials. In all three tasks, most methods exhibited small variances regardless of the metric type (tables 1, 1-1, and 1-2). These variances were then aggregated across metrics to assess the consistency of individual algorithms across different evaluation criteria. S-STFT and MTP displayed relatively larger sums of variances compared to the other algorithms, primarily due to larger variances in their classification accuracy. This disparity can be interpreted as a consequence of sliding a long window at a stride (50 for θ, 100 ms for β and γ) longer than the sliding step (sampling period) of BP, ENV, and CWT in representing temporal and spectral information of neural activities, resulting in a higher likelihood of including unnecessary information. Temporal concurrence was not affected by this choice as its measurement was limited to TP bursts only.
To assess the discrepancy between ideal and actual performances, we also computed L1 norm distances between the best available score and trialwise algorithmic performances. Again, the distances were summed across metrics, and the algorithm with the shortest summed distance was considered the best-performing algorithm based on the human annotations. BP, CWT, and ENV were selected as the optimal methods for the theta, beta, and gamma burst detection, respectively (tables 1, 1-1, and 1-2). The algorithms chosen by visual screenings in the theta and gamma burst detection tasks did not match those identified by the algorithm selection procedure. Even for the beta burst detection, although CWT was selected by both humans and the selection rule, the order of algorithms by performance did not align. This observation suggests a mismatch between human priors and the mathematical assumptions of the algorithms, particularly considering that the human-based detection relied on thresholding of the filtered time-domain signal and focused on Gaussianlike wax-and-wane waveforms.

Discussion
In this paper, we presented a formal comparison of five classic burst detection algorithms and proposed an algorithm selection rule that identifies a method which performs best for a given dataset. We found that when detecting transient and noisy bursts, the accuracy of detection and its temporal information vary meaningfully across the algorithms. Our results thus confirm that each algorithm has a different degree of sensitivity to specific burst signal properties. While the classification accuracy of an algorithm largely depended on the burst duration and was relatively invariant to the noise level, the temporal accuracy was more affected by the noise level than the burst duration. We further validated that these two measures can be combined as the detection confidence score without losing their details. Using this score as our evaluation criterion, we established the algorithm selection rule to identify the best algorithms and showed that the selection of an optimal algorithm depends largely on the characteristics of the dataset.
It is important to note, however, that the algorithms discussed in this paper are conventional and basic methods, which utilize classic signal processing techniques and rely on thresholding of a signal representation in the time and time-frequency domain. The algorithms introduced here are only a subset of available algorithms, and there are advanced algorithms which try to improve their shortcomings. The purpose of this paper is to (1) highlight the limitations of five burst detection algorithms that researchers habitually resort to, (2) observe how these limitations are translated to experimental data, and (3) offer a potential selection rule that can discern an optimal algorithm for a specific dataset. The application of burst detection algorithms and their selection rule on the mouse LFP dataset showed that multiple algorithms can perform at a similar level and that the difference between algorithmic performances is not always statistically significant. This observation was more conspicuous for gamma bursts, likely because gamma bursts had a lower SNR overall than theta and beta bursts as the same aperiodic noise was applied to them, while having the lowest amplitude following the 1/f power law dynamics. Similarities in the performance level thus argue that there is a sufficient room for the algorithms to improve (figures 7(B) and (C); tables 1-1 and 1-2) and a comparison of newly developed algorithms should be extended in future work. In the subsections below, we discuss where the differences between the algorithms arise, the weakness of the selection rule, advanced burst detection algorithms, and the limitations of our study.

Temporal accuracy is central to the burst detection problem
The significance of our study lies in (1) that there has been no extensive comparison of burst detection algorithms prior to this study and (2) that our proposed algorithm selection rule enables a formal comparison of candidate algorithms, followed by the quantitative identification of a putatively optimal algorithm without referring to any research heuristics. Such quantification in the latter was made available by employing the performance metrics that represent the evaluation criteria fundamental to burst detection: classification accuracy (how many detections match ground truth events) and temporal accuracy (how precise these detections are in time).
Although there have been several studies on neural oscillation detection, most of them lack a discussion on temporal accuracy as a relevant criterion. In a previous study by van Vugt et al (2007), where the authors evaluated spectral analysis algorithms designed to detect sustained neural oscillations, the comparison was restricted to the classification accuracy and burst characteristics (e.g. frequency specificity, peak amplitude). Algorithms developed for the sleep spindle detection problem were also evaluated based on their detection performances (i.e. F1score) and burst characteristics (Ujma et al 2015, Lacourse et al 2019. It is likely that for most of the neural oscillation detection tasks thus far, the classification accuracy was sufficient to determine the algorithmic efficiency. However, for bust detection, information about temporal errors at submillisecond scales are necessary, as accurate matching between a burst and a behavior should be fulfilled to guarantee reliable inference about the temporal dynamics of neural burst events. A recent paper introducing a new burst detection algorithm called oscillation events (OEvents) compared its algorithm to BOSC method, but it restricted its scope to only two algorithms and focused on burst characteristics (i.e. burst duration, peak frequency) as well .
Here, we provide not only the codes for implementing the algorithms but also the simulation results of various performance metrics (figures 3-1, 4-2, and 5-1). Therefore, in applying the algorithm selection rule, evaluation criteria can be flexibly altered to focus only on either the classification accuracy or the temporal accuracy, in contrast to our use of detection confidence in figure 7. It is also noteworthy that all our metrics remain consistent with the machine learning models. Specifically, detection performances are based on the precision-recall relationship, and temporal concurrence can be interpreted as IOU. The combination of two measures, detection confidence, was also motivated from the computer vision task. Given the mathematical formulation of each metric, our study can be easily extended to include newly emerging algorithms including those that utilize deep neural networks.

Domain of signal representation underlies the difference between the algorithms
The algorithms in this paper share an analogous mechanism, utilizing amplitude and duration thresholds to detect bursts. However, they are characterized by distinct constraints that arise from the domain in which they represent a signal. In the case of the time domain-based methods, the signal representation requires less computational cost, as a transform from one domain to another is not necessary. This computational efficiency allows these algorithms to be easily implemented in on-line burst detection tasks (Kanta et al 2019, Karvat et al 2020. Yet, the burst detection process itself is strictly limited by the filter types and parameters, since only the temporal dimension is being resolved and the spectral bounds of a raw signal must be manually specified. In the case of the time-frequency domain, a signal is represented via a transformation that projects information into the additional spectral domain. This extra dimension eliminates the need for explicitly specifying the frequencies and, in theory, allows algorithms to detect bursts from multiple frequency bands simultaneously. However, such domain transform entails a trade-off between the temporal and spectral resolution (see analytical details). To mitigate the computational complexity and adjust the trade-off, the use of a sliding window is inevitable. Hence, the algorithms are dependent on the window length and subjected to temporal smoothing, which indicates that a temporal resolution must be larger than the sampling period for some algorithms. One consequence of such coarse resolution is the overall low temporal concurrence of MTP and S-STFT across varying burst durations when strong noises are absent and do not act as a confounding factor (figure 4). Nonetheless, temporal smoothing can still improve the algorithms by denoising the signal locally and making them more robust to instantaneous noises (i.e. false positives).

Utility and weakness of the algorithm selection rule
One weakness of our algorithm selection rule lies in its dependency on simulation results. Since the selection rule estimates algorithmic performances on experimental data using performances on simulation data, it is inevitably affected by how synthetic signals and their hyperparameters are configured. Consequently, the selection rule will work best if synthetic signals perfectly represent a specific dataset we have. However, it is currently impossible to model a simulation without losing any information from the data. For instance, if the noise components in simulation do not reflect those in experiments well enough, simulation results will not translate to experimental results and produce an inaccurate algorithm selection. Moreover, it is difficult to generate a synthetic dataset every time we have new data, as it will cost a lot of computational resources. Still, it would be a good option to construct a simulated dataset if a frequency range of interest is drastically different from ours (see the provided tutorials) or a more rigorous algorithm selection is sought.
One potential way to further advance the selection rule is to model a synthetic noise component based on aperiodic noise level measured from the spectra of real data. Another candidate is to extract the burst characteristics (e.g. peak frequency, peak amplitude) from bursts detected by the algorithms and compute inverse AUCs from their respective eCDFs. These values can then be combined with the AUC −1 values derived from detection confidence scores (figure 7), so that an algorithm can be selected based on both burst detections and their characteristics simultaneously.
Despite the aforesaid weaknesses, we anticipate that the algorithm selection rule will demonstrate reliable performance across datasets with diverse burst properties. Our simulation data represents a canonical LFP signal, and the results indicate that the selection process adequately identifies the most effective algorithms (figure 7). However, it is important to note that the selection rule does not always choose a unique, statistically significant approach when multiple algorithms perform at a similar level (figures 7(B) and (C)). Thus, the proposed algorithm selection rule does not always guarantee a definitive choice of the optimal algorithm, as the chosen method may not exhibit meaningful differences compared to other algorithms.
Moreover, the burst detection algorithms may suffer from inaccuracies in precisely measuring bursts, as evidenced by the generally suboptimal performances of the algorithms under study (mean detection confidence scores <0.7, see figures 7(D)-(F)). As the selection rule currently determines the best algorithms from a limited subset of available options, it is important to acknowledge that the true best algorithm may exist outside this subset. As depicted in figures 7(D)-(F), the trial-based algorithm selection offers additional insight by illustrating that each algorithm appears to perform differently in each trial within a dataset. This observation aligns with previous literature and the motivation of our study, which suggests that burst properties can vary markedly at the single-trial level, thereby obscuring the spectral profiles of neural signals when averaged over time. Consequently, our findings suggest the need to seek algorithms that are robust across data distributions and trial variability in the future. Although our selection procedure does not always identify the best algorithm, it sheds light on the similarities in performance between multiple algorithms, as well as the limitations of the algorithms investigated in our study.

Interpretation of bursts in event-based analysis
The current movement in research from an epochbased analysis to an event-based analysis calls for the evaluation of burst detection algorithms. A number of recent studies reported experimental evidence that neural burst events at a single trial level correlate meaningfully with perception (Feingold et al 2015, Shin et al 2017), cognition (Lundqvuist et al 2016, Nair et al 2018, and behavior (Tinkhauser et al 2017, Yu et al 2021 in multiple species. These studies focused on the beta and gamma bursts, whose characteristics (e.g. amplitude, rate, duration) changed significantly in response to the various task and disease conditions, the relationships which were obscure in averaged epochs. As the computation of these characteristics relies primarily on the burst time interval, measuring the accuracy of the algorithm is deemed imperative.
The emerging relevance of burst events to cognitive behaviors reinforces the view that neural bursts mediate the communications between distinct brain regions. Such perspective is supported by the theoretical study on the gamma oscillations which suggests that transient phase-locking between temporally cooccurring bursts gates information flow across local neuronal networks (Palmigiano et al 2017). When multiple E-I random networks of Wang-Buzsáki neurons with large inhibitory synaptic conductance were coupled by long-range excitatory connections, large-scale dynamics generated simultaneous bursts in distinct brain areas. Together, this model explains the circuit-level role of the burst under the communication-through-coherence hypothesis, which argues that the synchronization of brain oscillations is pivotal for inter-regional information transmission (Fries 2015). It thus seems possible that stochastic and transient activities of oscillations enable flexible information routing, through which different cognitive processes can be organized and executed.

Limitations and future directions
One potential drawback of the algorithm selection rule comes from the simplified design of synthetic signals. According to Buzsáki et al (2013), the frequency of brain rhythms can range across more than four orders of magnitude, even though a neural system generally favors a specific frequency band. These rhythms often exhibit a hierarchical organization, where oscillations of multiple frequency bands co-exist in a mutually interacting way (Canolty and Knight 2010). In this aspect, detecting bursts from signals simulated as fixed frequency band rhythms is likely to miss the multi-scale orchestration of neural activities. Hence, it is recommended to explore more complex configurations of simulation in testing the algorithmic efficiency.
Recently, different modeling tools have been developed specifically to model commonly observed signals such as LFP, EEG, and current source density (CSD) signals based on their underlying cellular and network mechanisms. Human neocortical neurosolver (HNN) uses a realistic neocortical circuit model that considers the biophysical origins of electrical currents observed in MEG/EEG . The mathematical model adapted in HNN has already been applied to interpret the alpha (7-14 Hz), beta (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29), and gamma (30-80 Hz) oscillations. Using this type of tool allows more detailed modeling of human brain activities that arise from noninvasive neuroimaging modalities. Another software tool is Networks using Python and NEURON (NetPyNE), which can be used to build data-driven multiscale network models. Like HNN, NetPyNE allows users to simulate LFP, CSD, and EEG signals that reflect signal properties of experimental data (Dura-Bernal et al 2019). Their multiscale simulation, which ranges from cell-to-cell connections to field potentials, enables predictions of which cell and network mechanisms underlie experimental measurements. The usage of such modeling tools to construct realistic and data-specific synthetic signals can potentially improve the accuracy of our algorithm selection procedure and aid our understanding of the meaning of oscillatory burst events at the cellular and circuit level. An example of applying NEURON and NetPyNE can be found in Dura-Bernal et al (2022), in which the macaque auditory thalamocortical circuits were modeled and successfully reproduced real LFP and CSD signals acquired during spontaneous activity and auditory tasks.
Secondly, the algorithms presented in our study are basic, conventional methods that employ classic signal processing techniques. There are many other algorithms that were recently developed to tackle the limitations of the algorithms in our paper. From simulation results, we observed that the performances of the five algorithms are similar across different performance metrics (figures 3-5). Moreover, the application of these algorithms on empirical data suggested that they demand many improvements (tables 1, 1-1, and 1-2). While these algorithms detect bursts based on their power and duration, recent studies argue that bursts should also be considered in terms of their spatial domain (Zich et al 2020) and waveforms (Jones 2016, Donoghue et al 2022. As multiple oscillations can co-exist across the brain and may overlap across space, two signals occurring at distinct regions can interfere destructively, resulting in a failure of burst detection. Furthermore, Jones (2016) demonstrated that a waveform can create spurious oscillatory activities and that different waveforms can produce transient bursts with approximately same peak frequency and duration. If neural signals involve non-sinusoidal oscillations, many analysis methods that utilize wavelets or FFT can return spurious results, as they often assume the sinusoidal structure of neural oscillations (Donoghue et al 2022). Thus, it is important for the readers to be aware of the mathematical assumptions and limitations of the discussed algorithms and consider advanced algorithms that attempt to resolve any downsides and constraints imposed on our algorithms.
Because the algorithms introduced in our paper are based on simple neural signal processing conventions and most of them were devised about a decade ago, it is easier to resort to them habitually. Although such common usage motivated us to formally compare them, here we briefly introduce a few algorithms that are good candidates for replacing commonly used methods. The first algorithm is the extended BOSC (eBOSC) method (Kosciessa et al 2020). It fits the aperiodic activity and detects frequency-specific burst events by employing rhythmicity information of signals characterized from rhythmic temporal episodes, which were in turn separated from arrhythmic episodes. Like eBOSC, the multiple oscillation detection algorithm (MODAL) also uses a fitting of the aperiodic activity to detect bursts (Watrous et al 2018). However, it does not require a user to prespecify a specific frequency range to detect events. MODAL adaptively identifies frequency bands that are customized for each recording based on its spectral properties and characterizes narrow-band neural oscillations that exceed the aperiodic 1/f spectrum in these bands. The next algorithm is called OEvents, which extracts oscillations from a wavelet spectrogram computed using Morlet wavelets and linearly spaced frequencies. An advantage of OEvents is its ability to detect bursts of nonconstant frequency and duration by putting a bounding box around each oscillatory event using a specified threshold and subthresholds . eBOSC, MODAL, and OEvents can be thought of as an extension of our algorithms in that they are all threshold-based algorithms. In contrast, there are other algorithms that do not rely on user-defined thresholds. The matching pursuit algorithm detects bursts by first computing different types of STFTs with diverse time and frequency resolutions and selecting the tile (denotes each time-frequency bin in the spectrogram) that has the maximum energy of the signal (Chandran et al 2016(Chandran et al , 2018. This algorithm can offer more precise estimates of burst onset and duration, like OEvents, but at the cost of computational time required for calculating multiple spectrograms for a single data. Another method is a cycle-by-cycle (bycycle) analysis introduced by Cole and Voytek (2019) which works directly in the time domain. This approach first segments signals into single cycles and outputs whether the oscillatory event is present in the signal within each cycle period. Such presence of oscillations is determined by four different features: duration threshold, amplitude consistency, period consistency, and monotonicity (Cole and Voytek 2019). As characteristics of burst events such as amplitude and symmetry can be directly measured from individual cycles, this method can be a useful option if one is interested in analyzing the waveforms or identifying specific types of oscillations that can occur at smaller amplitudes. The Hidden Markov model (HMM) is another algorithm that can be used for burst detection. It employs variational Bayesian inference to characterize how the model transitions between different oscillatory states (Quinn et al 2019). It is data-driven and unsupervised in that it does not necessitate any information about state timings or signal-related hyperparameters (e.g. threshold, window size). This approach was recently enhanced with neural network models such as dynamic network modes (which combines long short-term memory model (LSTM) with variational Bayesian inference; Gohil et al 2022) and multi-dynamic adversarial generator encoder (which implements LSTM in the context of an adversarial generator-encoder network; Pervaiz et al 2022), both of which aim to resolve the problem of Markovian assumption and mutual exclusivity of the HMM states. However, readers should note that all the foregoing algorithms do have their own hyperparameters under their own parameterization.
There are also a few other options we can attempt with the classic algorithms before referring to the advanced methods. The first option is to use an ensemble method, in which only the burst events detected by all five algorithms are used for subsequent analyses. The second option is to apply a subjectspecific threshold. Because adequate threshold values depend on the SNR of the recordings, we can select a threshold separately for each subject (Tinkhauser et al 2017, Sporn et al 2020. Whichever algorithm or approach one takes, one should keep in mind that the algorithms examined in our study are not likely to be the best method. Yet, the selection rule as currently proposed is complete and can easily be extended in future works as discussed above (see the section temporal accuracy is central to the burst detection problem).
The rationale behind the ensemble method partly arises from the discrepancy between algorithm selection based on human visual screening and the selection rule we have proposed. As mentioned earlier, this mismatch appears to stem from disagreements between human priors and mathematical assumptions of the algorithms. Human annotation strictly adheres to a specific set of rules that emphasizes the waveform and assigns relatively greater importance to ambiguous bursts that are transient and weak, which may not be perfectly captured by the threshold imposed by the algorithms. Therefore, it is crucial to consider which assumptions in the algorithm should be included or given more weight when detecting bursts. Even if the best algorithm is chosen, it can still be ineffective, biased, or insufficient to establish connections between neural data and cognitive or behavioral measures, in accordance with the objectives defined by human researchers.
Thirdly, the hyperparameters of the algorithms (e.g. the number of tapers, window size) introduce large degrees of freedom, but we simply assumed these values by adapting them from the previous literature without conducting detailed validation (see analytical details). One exception was amplitude and power thresholds, which were initially taken from the previous studies but were tuned afterward using a grid-search hyperparameter optimization. Although the literature has reached a rough consensus on these hyperparameters and they did not result in significant performance loss in our study, future works may compare a few candidate values to clarify their impacts. If one wants to reduce the number of hyperparameters or eliminate specific hyperparameter types, there are several alternative algorithms to turn to. For instance, time-delayed embedded or autoregressive model-based HMM (Vidaurre et al 2016(Vidaurre et al , 2018 does not require thresholdrelated parameters for the inference of burst event timings. Despite these limitations, the evaluation criteria proposed in this paper offer an effective means of examining performance differences between the algorithms. We expect that our findings will contribute to the study of the causal relationship between burst characteristics and cognitive processes by enhancing our understanding of accurate burst detection. However, this relationship has been mostly studied in the averaged neural activities and has less attended to vastly different characteristics (e.g. waveform) of individual bursts. In addition, recent advances which examined bursts and behaviors in naturalistic conditions (Karvat et al 2020, Kim et al 2020 express the need for a rigorous categorization of burst characteristics (e.g. duration, inter-burst interval) and an understanding of their physiological role. Therefore, future works probing how different algorithms resolve these characteristics also seem necessary. For more information about how various characteristics of neural oscillatory events relate to burst detection algorithms, see Donoghue et al (2022).
To summarize, we have evaluated the detection performances of different burst detection algorithms and presented their dependency on the signal properties. The algorithm selection rule in this paper was proposed to detect the transient oscillatory bursts in a more robust and precise way. The decision matrix and selection rule are expected to minimize errors arising from the suboptimal choice of an algorithm and ensure a credible detection accuracy. Our burst detection procedure will thus help aligning rapid large-scale neural dynamics onto the behavioral time axis in event-based analyses of the experimental data.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// github.com/jeelabKIST/Cho2022_BurstDetection. All the analyses and data visualizations were performed using custom codes written in MATLAB, R, and Python. (GitHub repository: https://github.com/ jeelabKIST/Cho2022_BurstDetection) This repository also includes the data that support the figures in this paper. For original data of the entire robotbased escape experiment, refer to Kim et al (2020). Guidelines for the synthetic data generation, use of burst detection algorithms, and implementation of burst selection rule can be found in the same repository as MATLAB Live Scripts (.mlx) tutorials. Further requests for devices and resources should be directed to and will be fulfilled by the corresponding author, Dr Jee Hyun Choi (jeechoi@kist.re.kr).