The influence of cardiac arrhythmias on the detection of heartbeats in the photoplethysmogram: benchmarking open-source algorithms

Objective. Cardiac arrhythmias are a leading cause of mortality worldwide. Wearable devices based on photoplethysmography give the opportunity to screen large populations, hence allowing for an earlier detection of pathological rhythms that might reduce the risks of complications and medical costs. While most of beat detection algorithms have been evaluated on normal sinus rhythm or atrial fibrillation recordings, the performance of these algorithms in patients with other cardiac arrhythmias, such as ventricular tachycardia or bigeminy, remain unknown to date. Approach. The PPG-beats open-source framework, developed by Charlton and colleagues, evaluates the performance of the beat detectors named QPPG, MSPTD and ABD among others. We applied the PPG-beats framework on two newly acquired datasets, one containing seven different types of cardiac arrhythmia in hospital settings, and another dataset including two cardiac arrhythmias in ambulatory settings. Main Results. In a clinical setting, the QPPG beat detector performed best on atrial fibrillation (with a median F 1 score of 94.4%), atrial flutter (95.2%), atrial tachycardia (87.0%), sinus rhythm (97.7%), ventricular tachycardia (83.9%) and was ranked 2nd for bigeminy (75.7%) behind ABD detector (76.1%). In an ambulatory setting, the MSPTD beat detector performed best on normal sinus rhythm (94.6%), and the QPPG detector on atrial fibrillation (91.6%) and bigeminy (80.0%). Significance. Overall, the PPG beat detectors QPPG, MSPTD and ABD consistently achieved higher performances than other detectors. However, the detection of beats from wrist-PPG signals is compromised in presence of bigeminy or ventricular tachycardia.


Introduction
Cardiac arrhythmias (CAs) have a prevalence of 3.2%-6.6% in the elderly European and US populations (aged 65-73 years) (Khurshid et al 2018) and are associated with high morbidity and mortality (Tsao et al 2023).Indeed, ventricular arrhythmias are a major cause of sudden cardiac deaths, which are estimated to 10%-20% of all deaths in Europe (Zeppenfeld et al 2022).Due to the asymptomatic and intermittent nature of certain CAs in their early stages (Rho and Page 2005, Gorenek Chair et al Gorenek (chair) 2017), they are often diagnosed late, at time of hospitalization for stroke or heart failure.
Photoplethysmography (PPG) is a promising technology for long-term and continuous ambulatory monitoring of cardiovascular parameters such as blood pressure and heart rhythm.PPG measures changes in blood volume by optical means and is often integrated in wearable devices like smartwatches (Lemay et al 2020, Allen andKyriacou 2021).Consequently, PPG-based devices have great potential for the early detection of CAs, leading to improved diagnosis, treatment and a reduction in complications.
Numerous studies have investigated the detection of atrial fibrillation (AF), the most common CA, affecting up to 34 million people worldwide (Chugh et al 2014, Hindricks et al 2021).Most of these studies relied on the analysis of irregularities in inter-beat intervals (IBIs).Besides IBIs, CAs also distort the morphology of individual PPG pulses.Such information can be extracted by pulse wave analysis (PWA) (Proença et al 2019) to improve the detection of CAs (Jeanningros et al 2022, Basza et al 2023).However, both IBIs and PWA rely on an accurate detection of heartbeats in the PPG signal.A suboptimal beat detection would introduce IBIs that contain two pulses (false negative detections) and pulses split in two IBIs (false positive detections).This would bias IBIbased measures of irregularity (Shannon entropy, RMSSD, pNN50, K) and compromise PWA computation.
Whereas beat detectors can be very accurate for healthy subjects (Charlton et al 2022), their performance has not been studied in the presence of different CAs.Only few studies focused on the evaluation of PPG beat detection performance during AF.Harju et al (2018) reported a mean absolute error (MAE) of 51 ms on IBI estimation from wrist-worn PPG in 21 subjects with AF.Their detection performance corresponds to an F 1 score of 96.5%.Väliaho et al (2019) reported performance equivalent to 94.5% F 1 score for pulse detection on 106 patients with AF.Recently, Charlton et al (2022) compared fifteen open-source beat detectors on multiple datasets associated with various conditions.Among them, the eight detectors that performed best overall achieved F 1 scores between 91.8% and 97.1% on 19 patients suffering from AF. Han et al (2022) developed a complex beat detector designed for HR estimation in presence of CAs.Their SWEPD algorithm detected IBIs with an F 1 score of 97.2% in 21 patients with AF and 97.8% when analyzing performance in the presence of frequent atrial and ventricular premature contractions.
To the best of our knowledge, there is no study that compared the performance of various beat detectors on various types of CAs.Considering CAs other than AF is important when screening large populations potentially displaying pathological rhythms, such as ventricular and atrial bigeminy or ventricular tachycardia.Hence, the choice of beat detectors can be a determining factor for the performance of CAs classificators based on IBIs and PWA.
In this study, we used the open-source PPG-beats framework developed by Charlton et al (2022) to benchmark the performance of 15 open-source beat detectors.The framework was applied on two newly acquired datasets containing 8 different types of CAs.The goals of this work are (1) to evaluate which beat detectors are effective and reliable in presence of verious types of CAs, and (2) to identify CAs for which heartbeat detection from wrist-PPG signals is limited.

Datasets
This research was conducted in accordance with the principles embodied in the Declaration of Helsinki, as well as local statutory requirements.All participants gave written informed consent to participate in the study.Subjects were offered to take part in the study regardless of their sex.Hence, the proportion of males and females is supposed to reflect the frequency of medical interventions for each sex.

Clinical dataset
The first dataset includes 58 patients referred for diagnostic or therapeutic electrophysiological procedures at the Lausanne University Hospital (CHUV).This study has been accepted by the local ethics committee of Lausanne (CER-VD, Project-ID 2021-00586) and registered on http://ClinicalTrials.gov(NCT04884100).
PPG signals were acquired at 100 Hz from a proprietary wrist-bracelet (CSEM, Neuchâtel, Switzerland).Concurrently, 12-lead ECG signals were recorded using the Axiom Sensis XP ® System (Siemens ® , Munich, Germany) at 2 kHz sampling frequency and bandpass filter settings of 0.5-200 Hz.ECG signals were used for gold standard annotations of both R-peaks (beats) and CAs.

Ambulatory dataset
The second dataset includes 44 subjects referred for an ambulatory Holter ECG recording for either 24 h (40 subjects) or 7 days (4 subjects).The clinical study has been conducted at Inselspital in Bern and is still ongoing.It has been accepted by the local ethical committee KEK-BERN (Project-ID 2021-02117).PPG signals were recorded with the same proprietary wrist-bracelet from CSEM as for the clinical dataset, together with a 3-lead Holter ECG monitor Lifecard CF (Spacelabs Healthcare ® , Issaquah, Washington, USA).R-peaks and CAs were annotated by the software Sentinel from Spacelabs Healthcare ® .To exclude PPG signals corrupted by motion artifacts, only periods for which motion was continuously low were selected.To this end, a moving average filter of 2000 s window was applied every 60 s on the absolute value of the differences in normed 3D accelerometer signals.Periods where the moving average was below 0.15 mG s −1 were considered as low motion.Only periods lasting more than 10 min were kept for analysis.PPG signals with a moving average below and above 0.15 mG s −1 are shown in figure 1.

Cardiac arrhythmia labelling
ECG signals of the clinical dataset were annotated by a medical expert who manually identified CAs.In contrast, ECG signals from the ambulatory dataset have been automatically annotated by the software Sentinel from Spacelabs Healthcare ® and corrected by a cardiologist.Independently of the dataset, both atrial and ventricular bigeminy, as well as trigeminy and quadrigeminy, or any combination of these rhythms, were indistinctly labeled as bigeminy.The label AVRT includes both atrioventricular reetrant tachycardia and atrioventricular nodal reentrant tachycardia.Finally, single atrial and ventricular premature contractions were not considered as CAs and were therefore ignored in this study.

PPG beat detector evaluation
The PPG-beats framework5 provided by Charlton and colleagues (Charlton et al 2022) was applied.The methods used to evaluate PPG beat detectors are identical to those of the original paper (Charlton et al 2022).The essential steps are summarized in the following.
The PPG signals underwent bandpass filtering between 0.67 and 8.0 Hz to eliminate non-cardiac frequencies.Then, beats were detected using thirteen open-source detectors listed in table 1.The PPG-beats framework (Charlton et al 2022) provides two additional detectors (SPAR and PWD) which had to be removed from analysis because of runtime errors for several signals.To apply PPG beat detection, the PPG signals were  segmented into 20 s windows with a 5 s overlap.Duplicate beats within overlapping segments were removed.This method guaranteed that no beat detectors were penalized for missing beats at the end or the start of the window (e.g. during initialization of the detector).Depending on the detector, timings of detected beats could either correspond to the pulse foot, the systolic peak, or the maximum of the first derivative.In order to perform an analysis that is comparable for all detectors, the middle-amplitude point of systolic upslope, defined as the timing associated with the mean amplitude of the pulse foot and the systolic peak, was used for analysis.To do so, for each beat, the preceding minimum (pulse foot) and subsequent maximum (systolic peak) were extracted if not yet provided by the detector.To synchronize PPG beats with reference ECG beats, ECG beats were considered correctly identified if at least one PPG beat was closer than 150 ms.The lag associated with the maximum number of correctly identified ECG beats was used to align the two beat time series.The synchronization step was directly applied on the full records of the clinical dataset and on low-motion periods (>10 min) of the ambulatory dataset.The performance of the beat detectors was evaluated based on the number of reference ECG beats (n ref ), estimated PPG beats (n PPG ), and correctly identified beats (n correct ) to calculate sensitivity (Se), positive predictive value (PPV) and F 1 score (F 1 ) as follows:Se 100 ´+ The performance metrics were calculated on a per-rhythm basis, both for the entire cohort and individually for each subject.To achieve this, reference ECG beats, estimated PPG beats, and correctly identified beats were aggregated by rhythm if they belonged to a homogeneous rhythmic event lasting at least 25 s.

Datasets
Table 2 details the seven different types of CA that were recorded in the clinical dataset and the two types of CA present in the ambulatory dataset together with the corresponding cumulative duration of arrhythmic events and the number of patients experiencing the specific CA.Among 58 subjects involved in the clinical dataset, 40 were men and 18 were women with a mean age of 56 ± 16 years.Skin color was categorized according to Fitzpatrick scale, as I (5 patients), II (26), III (9), IV (1), V (1), VI (1) and 1 patient had missing data.The ambulatory dataset consisted of 24 men and 20 women, with a mean age of 56 ± 16 years.Their skin colors were I (24), II (18), III (11), IV (1), VI (1) and 3 patients with missing data.The imbalance between the number of male and female is in accordance with the prevalence of CAs that affect males more frequently than females (Khurshid et al 2018).However the imbalance is very large in the clinical dataset, but no other reason than randomness can be identified to explain this difference.

Beat detector performance
Given the unequal proportions between the number of subjects of the two sexes for the majority of CA (see table 2), the restricted total number of subjects, and the important variability of inter-subject performance, the results are not separately detailed for both sexes.variable across subjects with some very inaccurate detections.QPPG is again top ranked with 83.9% median F 1 score.Bigeminy beats often remain undetected as well depending on the subject.Indeed, bigeminy shows the worst performance, the best detectors being ABD and QPPG with median F 1 scores of 76.1% and 75.7% respectively.Finally, top ranked beat detectors achieve high performance for both atrioventricular blocks and atrioventricular reentrant tachycardias.QPPG, ABD and WFD get medians F 1 scores between 97.2% and 97.9% for AV blocks.MSPTD is the best detector for AVRT with a median F 1 scores of 93.5% closely followed by PDA, QPPG, ABD, AMPD and PULSES (>92.1%).

Ambulatory dataset
To assess detector performance, only periods characterized by low motion were retained, leading to the exclusion of 695.7 h of signals, which accounted for 51.9% of the total duration.The subsequent assessment of performance was carried out on the remaining 684.5 h of motion-free PPG, as outlined in table 2. The evaluation of beat detector performance on the ambulatory dataset is shown in figure 3, with comprehensive metrics provided in table 4. On AF segments, QPPG is top ranked with a median F 1 score of 91.6%, closely followed by ABD and MSPTD (>90.8%).Half of the beat detectors perform similarly well on normal sinus rhythm, with MSPTD top-ranked at 94.6% and QPPG, AMPD, ABD, and WFD achieving medians of F 1 scores superior to 94.0%.The beats of bigeminy are once again poorly detected.QPPG, PULSES and WFD slightly stand out from other detectors with medians of F 1 scores between 80.0% and 78.8%.

Discussion
The aim of this study was to assess the performance of several open-source detectors for various types of CAs.Our findings help determine the type of detectors most suitable for the monitoring of CA in every-day life, but also highlight potential limitations in the detection of heartbeats for given CAs.
4.1.Beat detector performance ABD, MSPTD and QPPG detectors were consistently ranked among the best detectors for various CAs in both clinical and ambulatory conditions without any failure on specific CAs.These results are in line with the study of Charlton and colleagues (Charlton et al 2022), which concluded that MSPTD and QPPG detectors were performing best within various conditions (hospital, daily-life, emotions, atrial fibrillation, neonates and skin colors).Our analyzes highlighted the superior performance of the QPPG beat detector performance in hospital conditions (clinical dataset).This is likely due to the excellent sensitivity of QPPG, which is optimal for detecting beats occurring early in the cardiac cycle.It provides a clear advantage for CAs such as atrial and ventricular tachycardias, atrial flutter and AF without a significant loss in PPV, as it is the case with bigeminy for other detectors.This hypothesis was supported by the performance results obtained from the ambulatory dataset.Indeed, QPPG was top ranked in an ambulatory setting for CAs showing premature contractions (AF and bigeminy) and was very good for detecting normal sinus beats.MSPTD was the best beat detector for sinus rhythm.It showed very good performance during AF as well but was less efficient for detecting bigeminy beats.Both QPPG and MSPTD require low computational efforts and might be suited for embedding in a wearable device.This last point is crucial for the screening of large population with small devices and low battery consumption.

Limitations of beat detection in cardiac arrhythmias
All beat detectors show lower sensitivity in presence of ventricular tachycardia (VT), one of the fastest CAs.The onset of VT can be very abrupt, which results in PPG waves of decreased amplitude as illustrated in the last row of figure 4.This certainly induces strong differences between outputs of detectors that use different adaptive scaling mechanisms.Slow adaptation to abrupt changes in amplitude, such as those due to onsets of ventricular tachycardia, results in numerous missed detections.
The detection of bigeminy beats in both datasets was particularly poor compared to that of other types of CAs.This is due to premature contractions that occur very early in the cardiac cycle, leading to heartbeats that do not necessarily generate a pressure wave.The resulting changes in the PPG signal-reflecting blood volume changes in the peripheral arteries-are minimal, comparable to that of a dicrotic notch.Examples of bigeminy in figures 4 and 5 show that it is very difficult to detect such premature beats.It is therefore rather an intrinsic physiological limitation for the detection of heartbeats from blood volume variations in the peripheral vascular system.This opinion is in line with the work of Han et al (2020), which identified patterns formed by successive IBIs in a Poincaré plot to detect premature contractions.If this method was conclusive for the detection of isolated premature contractions, trigeminy and quadrigeminy, it was not the case for the detection of bigeminy with silent premature contractions.However, one possibility would be an in-depth analysis of the PPG waveform, to characterize it as typical bigeminy and deduce that it contains a hidden premature contraction.

Study limitations
Our work is limited by the inclusion of only five different types of CAs.The number of arrhythmic events of atrioventricular blocks (of any degree) and atrioventricular re-entrant (nodal or not) tachycardias was too small to draw significant conclusions in these two groups of CA.In addition, for the ambulatory dataset, the present analysis was limited to motion-free periods resulting in the rejection of 51.9% of data.In a future study, the influence of motion on the heartbeat detection performance should be investigated in more detail.Finally, ECGbased labelling of CAs have been annotated by one single expert (for the clinical dataset) or software annotations have been corrected by a single cardiologist (for the ambulatory dataset).Annotations that are more reliable could be obtained by systematically involving two cardiologists and keeping only periods of the data where both annotators agree.

Conclusion
In this work, we evaluated the performance of thirteen open-source PPG beat detectors in the presence of CAs.QPPG showed highest performance in terms of F 1 score.In addition, our evaluation revealed the reduced performances of beat detectors in presence of bigeminy and ventricular tachycardia.
This study provides solid support for selecting a beat detector for continuous monitoring of cardiac arrhythmias in every-day life.

Figure 1 .
Figure 1.Example of motion influence on PPG signals from the ambulatory dataset.The top row shows a PPG signal with a motion level of 0.18 mG s −1 (with respect to the moving average described in section 2.1.2).The bottom row shows a PPG signal from the same patient with a motion level of 0.11 mG s −1 .The threshold to reject periods corrupted by motion was set at 0.15 mG s −1 .

Figure 2 .
Figure 2. Beat detector performance (F 1 score) comparison by cardiac arrhythmia on the clinical dataset.The number of reference beats (N) per CA is written next to each subtitle in thousands (k).Black dots represent outlier subjects, boxes show the median, 1st and 4th quartiles, 10th and 90th percentiles of F 1 scores obtained per subjects while the black cross indicates F 1 score calculated across all subjects.Detectors are ordered by decreasing median of F 1 score.

Figure 3 .
Figure 3. Beat detector performance (F 1 score) by cardiac arrhythmia on the ambulatory dataset.The number of reference beats (N) is indicated in thousands (k) next to each subtitle.Black dots represent outlier subjects, boxes show the median, 1st and 4th quartiles, 10th and 90th percentiles of F 1 scores obtained per subjects while the black cross indicates F 1 score calculated across all subjects.Detectors are ordered by decreasing median of F 1 score.

Figure 4 .
Figure 4. Example signals of the 8 distinct cardiac arrhythmias from the clinical dataset.Each row shows the ECG signal (top curve) and the simultaneous PPG signal (bottom curve).Dotted vertical lines indicate the timing of detected ECG beats, and dots on PPG show the timing of detected beats by the QPPG detector.Each row shows the example of one cardiac arrhythmia: sinus rhythm (SR), atrial fibrillation (AF), atrial flutter (AFL), atrial tachycardia (AT), atrioventricular block (AVB), atrioventricular (nodal or not) reentrant tachycardia (AVRT), bigeminy (Bi), and ventricular tachycardia (VT).

Figure 5 .
Figure 5. Example signals of the 3 distinct cardiac arrhythmias from the ambulatory dataset.Each row shows the ECG signal (top curve) and the simultaneous PPG signal (bottom curve).Dotted vertical lines indicate the timing of detected ECG beats, and dots on PPG show the timing of detected beats by the QPPG detector.Each row shows the example of one cardiac arrhythmia: sinus rhythm (SR), atrial fibrillation (AF) and bigeminy (Bi).

Table 1 .
PPG beat detectors evaluated in the present study.

Table 2 .
List of cardiac arrhythmias with corresponding demographic and quantitative statistics.Demographic statistics are specified for males (M) and females (F).Durations include only motion-free periods.

Table 4 .
Beat detector performance on ambulatory dataset.The medians across subjects of F 1 score, sensitivity (Sens.), and positive predictive value (PPV) in percent (%) are detailed for each cardiac arrhythmia: atrial fibrillation (AF), atrial and ventricular bigeminy (Bi) and normal sinus rhythm (SR).