Sampling rate requirement for accurate calculation of heart rate and its variability based on the electrocardiogram

Objective. To develop analytical formulas which can serve as quantitative guidelines for the selection of the sampling rate for the electrocardiogram (ECG) required to calculate heart rate (HR) and heart rate variability (HRV) with a desired level of accuracy. Approach. We developed analytical formulas which relate the ECG sampling rate to conservative bounds on HR and HRV errors: (i) one relating HR and sampling rate to a HR error bound and (ii) the others relating sampling rate to HRV error bounds (in terms of root-mean-square of successive differences (RMSSD) and standard deviation of normal sinus beats (SDNN)). We validated the formulas using experimental data collected from 58 young healthy volunteers which encompass a wide HR and HRV ranges through strenuous exercise. Main results. The results strongly supported the validity of the analytical formulas as well as their tightness. The formulas can be used to (i) predict an upper bound of inaccuracy in HR and HRV for a given sampling rate in conjunction with HR and HRV as well as to (ii) determine a sampling rate to achieve a desired accuracy requirement at a given HR or HRV (or its range). Significance. HR and its variability (HRV) derived from the ECG have been widely utilized in a wide range of research in physiology and psychophysiology. However, there is no established guideline for the selection of the sampling rate for the ECG required to calculate HR and HRV with a desired level of accuracy. Hence, the analytical formulas may guide in selecting sampling rates for the ECG tailored to various applications of HR and HRV.


Introduction
Heart rate (HR) and its variability (HR variability (HRV)) play an important role in understanding the autonomic nervous system outflows in the context of physiological and psychological sciences.The existing body of literature has exploited HR and HRV in the studies of cardiovascular health and disease (Stein et al 2007, Haensel et al 2008, Thayer et al 2010, Hillebrand et al 2013, Soares-Miranda et al 2014), physical fitness (Plews et al 2013, Mongin et al 2022), mental health (Kemp and Quintana 2013, Quintana and Heathers 2014, Beauchaine and Thayer 2015, Pham et al 2021), and cognitive impairments (Luft et al 2009, Quintana et al 2012, Forte et al 2019).The use of HR and HRV in research and practice is expected to rapidly expand given the remarkable advances in wearable sensing technology to conveniently measure the physiological signals bearing HR and HRV, including electrocardiogram (ECG) (Sieciński et al 2020, Pham et al 2021, Parreira et al 2023), photoplethysmogram (PPG) (Lu et al 2008, Uçar et al 2018), and seismocardiogram (SCG) (Hurnanen et al 2017, Sieciński et al 2020) and ballistocardiogram (BCG) (Shin et al 2011, Brüser et al 2013) to list a few.
Accurate calculation of HR and HRV requires the measurement of a physiological signal at a high sampling rate and at a small quantization level.In particular, a physiological signal sampled at a low sampling rate can incur bias and uncertainty which can negatively influence the calculation of HR and HRV (García-González et al 2004).However, high sampling-rate physiological signal measurement increases power consumption and data storage requirements in wearable sensing devices.Hence, a trade-off is required between the accuracy of HR and HRV calculated from a physiological signal versus the power consumption in wearable sensing devices, so that sampling rate can be minimized while achieving the accuracy thresholds specified to HR and HRV.This paper explores the trade-off in the context of ECG-derived HR and HRV.
Previous work to determine sampling rate appropriate to calculate HR and HRV accurately exists (Hejjel and Roth 2004, Ellis et al 2015, Choi and Shin 2017, Kwon et al 2018, Béres and Hejjel 2021, Burma et al 2021).However, most if not all the previous attempts have non-trivial limitations.First, the sampling rate recommended in most existing work is empiric and experimental rather than analytical.In much of the prior work, a physiological signal was measured at a high sampling rate and subsequently down-sampled to a finite number of lower sampling rates.Then, an adequate sampling rate was recommended in a non-systematic manner after comparing the errors in HR and HRV pertaining to the lower sampling rates with respect to the 'ground truth' HR and HRV calculated at the original high sampling rate.For example, Kwon et al recommended 100 Hz and 250 Hz for HRV calculation in the time domain and in the frequency domain, respectively, after comparing the ground truth HRV calculated with the ECG sampled at 1 kHz versus HRV calculated with the ECG sampled at 500 Hz, 250 Hz, 100 Hz, and 50 Hz (Kwon et al 2018).Ellis et al recommended 125 Hz for HRV calculation after comparing 24 measures of HRV between the ground truth HRV calculated with the ECG sampled at 1 kHz versus HRV calculated with the ECG sampled at progressively lower sampling rates (Ellis et al 2015).Béres and Hejjel recommended 5 Hz and 50 Hz for HR and HRV calculation, respectively, after comparing the ground truth HR, RMSSD, and SDNN calculated with the PPG sampled at 1 kHz versus those calculated with the PPG sampled at 500 Hz, 200 Hz, 100 Hz, 50 Hz, 20 Hz, 10 Hz, 5 Hz, and 2 Hz (Béres and Hejjel 2021).Choi and Shin recommended 25 Hz for HRV (in terms of pulse rate variability) calculation in both time and frequency domains after comparing the ground truth HRV calculated with the PPG sampled at 10 kHz versus HRV calculated with the PPG sampled at 5000 Hz, 2500Hz, 1000Hz, 500 Hz, 250 Hz, 100 Hz, 50 Hz, 25 Hz, 20 Hz, 10 Hz, and 5 Hz (Choi and Shin 2017).Burma et al compared the ground truth HR and HRV calculated with the ECG sampled at 1 kHz versus HR and HRV calculated with the ECG sampled at progressively lower sampling rates and found that HR and HRV may require 50 Hz and 90 Hz, respectively, for adequate agreement with their ground truth counterparts (Burma et al 2021).Obviously, sampling rates thus recommended from a small number of subjectively chosen investigational sampling rates are not rigorous.Indeed, experimentally estimated error bounds on HR and HRV for a given sampling rate inevitably hinges upon the nature of the experimental data used: (i) the error bounds may be under-/overestimated if worst-case scenarios do not occur (although worst-case scenarios may rarely occur); or (ii) the error bounds may be valid only in the range of HR and HRV encompassed by the experimental data used.In this context, conservative worst-case error bounds may only be determined by strict analytical computations.Second, the metric used to recommend the sampling rate in most existing work is heterogeneous and often indirect, including statistical differences between ground truth HR and HRV versus investigational HR and HRV (Choi andShin 2017, Kwon et al 2018), true/false positive rates of R wave detection (Ellis et al 2015), and absolute and/or relative difference between ground truth HR and HRV versus investigational HR and HRV (Hejjel andRoth 2004, Béres andHejjel 2021).Given that the accuracy requirements for HR and HRV may vary depending on target applications, the knowledge gained from the existing work may not provide a comprehensive understanding of the relationship between the desired accuracy level versus the required sampling rate.Third, the recommended sampling rate does not account for the values of HR and HRV.It is conceivable that sampling rate must be increased as the underlying HR to be calculated increases and the underlying HRV to be calculated decreases.However, most existing work has failed to give appropriate account for the influence of underlying HR and HRV values in determining a recommended sampling rate.Notably, this limitation is partly attributed to the use of publicly available datasets such as the MIT/BIH dataset (Bui and Byun 2021) and the PhysioNet dataset (Ellis et al 2015), which often do not include data associated with large variations in HR and HRV.For these reasons, exact and quantitative sampling rate requirement for a physiological signal to achieve a desired level of accuracy in the calculation of HR and HRV remains unknown.
This paper intends to address the above gaps while complementing the existing knowledge on the recommended ECG sampling rate required to accurately calculate HR and HRV by developing analytical formulas that can serve as quantitative guidelines for the selection of the sampling rate for the ECG required to calculate HR and HRV with a desired level of accuracy: (i) one relating HR and sampling rate to a bound on HR error and (ii) the others relating sampling rate to a bound on HRV errors (in terms of root-mean-square of successive differences (RMSSD) and standard deviation of normal sinus beats (SDNN)).The formulas can be used to (i) predict an upper bound of inaccuracy in HR and HRV for a given sampling rate in conjunction with HR and HRV as well as to (ii) determine a sampling rate to achieve a desired accuracy requirement at a given HR or HRV (or its range).We validated the formulas using experimental data collected from 58 young healthy volunteers which encompass wide HR and HRV ranges through strenuous exercise on a treadmill.

Methods
In this section, we present the analytical formulas that can be employed to select an appropriate sampling rate that enables accurate calculation of HR and HRV based on the ECG with respect to a desired error bound.For this purpose, consider the time window of length L [s] which includes N heartbeats, i.e.ECG RR intervals (which may be consecutive or isolated; figure 1).

Sampling rate formula for accurate heart rate calculation
Denote the timings associated with the R waves pertaining to the ith RR interval as t i,1 and t i,2 [s] ,2 1).If the sampling rate is F s [Hz], then the corresponding sampling interval is given by D = - F s 1 [s] (figure 1).Then, the errors e i,1 and e i,2 in detecting t i,1 and t i,2 are bounded by: where ti j , is the timing associated with t i j , detected with the sampling rate F s (figure 1).If we denote the ith RR interval as  1), the error e i in calculating T i is given by: ,1 According to (1), e i is bounded by: i which implies that the error bound on RR interval is constant regardless of the underlying HR.Given an upper bound on the HR error , the lower bound of F s to satisfy the HR error bound is given by: In sum, (4) can be used to select a sampling rate to achieve HR errors smaller than d HR for the ground truth HR . Alternatively, if the upper bound on the HR error is given by the percentage of the ground truth HR the lower bound of F s to satisfy the HR error bound is given by: ,1 ith RR interval.t i,1 and t : i,2 timings associated with the R waves pertaining to T. i t : i j , timing associated with t i j , detected with the sampling rate F .
s  e t t : , error in detecting t .
i j The RMSSD measures the variability in the HR between consecutive heartbeats.Its expression is given by: Given an upper bound on the RMSSD error RMSSD the goal is to derive the lower bound of F s to satisfy this RMSSD error bound.| ˆ| s s -RMSSD RMSSD is given by: å å where the Cauchy-Schwarz inequality was leveraged to derive the last inequality.Per (1) and (3), ( ) e e - e e e e 2 .8 Then, it can be easily shown that (7) reduces to the following: Hence, the lower bound of F s to satisfy this RMSSD error bound of d RMSSD is given by:

RMSSD
Alternatively, if the upper bound on the RMSSD error is given by the percentage of the ground truth RMSSD

RMSSD
the lower bound of F s to satisfy the RMSSD error bound is given by: In sum, (10)-( 11) can be used to select a sampling rate to achieve RMSSD errors smaller than d RMSSD and h s , RMSSD RMSSD respectively.

SDNN
The SDNN measures the variability in the HR with respect to its mean value in a time window.Its expression is given by: Given an upper bound on the SDNN error SDNN the goal is to derive the lower bound of F s to satisfy this SDNN error bound.| ˆ| s s -SDNN SDNN is given by: å å å e e Then, it can be easily shown that (13) reduces to the following: Hence, the lower bound of F s to satisfy this SDNN error bound of d SDNN is given by: Alternatively, if the upper bound on the SDNN error is given by the percentage of the ground truth SDNN

SDNN
the lower bound of F s to satisfy the SDNN error bound is given by: In sum, ( 16), ( 17) can be used to select a sampling rate to achieve SDNN errors smaller than d SDNN and h s , SDNN SDNN respectively.

Validation of analytical formulas 2.3.1. Experimental protocol
Under the approval of the University of Maryland Institutional Review Board (IRB: protocol ID 1863513 approved in February 2022) and written informed consent, we collected ECG from 61 young healthy volunteers (age 18-38 years old; gender 22 females and 39 males; height 173+/−8.9cm; weight 70+/−11 kg, and body fat 18.4+/−8.4%).The study was conducted in accordance with the principles embodied in the Declaration of Helsinki and in accordance with local statutory requirements.Prior to the experiment, participants were asked to refrain from strenuous exercise for 24 h and caffeine, alcohol, and tobacco for 12 h.During the experiment, participants were asked to stand still on a treadmill for 3 min while a baseline recording was conducted.Then, they were asked to perform a modified Bruce submaximal aerobic test in which they performed (i) walking at a constant speed of 3.3 mph with 3% treadmill incline every 3 min for males and (ii) walking at a constant speed of 3.0 mph with 2.5% treadmill incline every 3 min for females, until they reached 80% of the age-predicted maximum HR calculated using the Fox equation (Shookster et al 2020).Subsequently, the treadmill was set to 2.5 mph with 0% incline for 3 min.Finally, the participants rested for another 3 min (figure 2(a)).Throughout the experiment, we recorded the ECG using 3 gel electrodes in a modified Lead II configuration interfaced to a wireless ECG amplifier (BN-RSPEC, Biopac Systems, Goleta, CA, USA) and a data acquisition unit (MP150, Biopac Systems, Goleta, CA, USA) at a high sampling rate of 2 kHz.

Data processing and analysis
We excluded data recorded from 3 participants from subsequent analysis due to errors associated with data collection (e.g., loss of wireless connection).Then, we analyzed the data as illustrated in figure 2(b).Details follow.
We used the ECG recorded at 2 kHz sampling rate as ground truth.We down-sampled the same ECG at 500 Hz, 200 Hz, and 50 Hz and used these ECGs as investigational ECG to validate the analytical formulas.From both the ground truth and the investigational ECGs, we detected the R waves using a built-in function in MATLAB ('findpeaks' with user-configurable parameters set on a subject-by-subject basis).Then, we removed the R waves associated with abnormal HR values and low-quality ECG waveform.First, we removed an R wave if it was associated with HR outside of 30-200 bpm.Second, we removed an R wave if it was associated with HR outside of the 95% confidence interval of HR values within a 20 s-long non-casual window (10 s to the past and 10 s to the future).Third, we removed an R wave if it is associated with a blunt peak (i.e. if an R wave was not defined as a single maximal point).
We aligned the available R waves in the ground truth ECG and in the investigational ECG.Then, we calculated HR.Subsequently, we calculated HRV, in terms of both RMSSD and SDNN, using HR thus calculated.In calculating HRV, we used a sliding, variable-length time window with 1 RR interval increment so that each time window includes a prespecified number of RR intervals.Then, we calculated the errors associated with HR and HRV between the ground truth ECG versus the investigational ECG: (i) HR on a beat-by-beat basis (i.e.= N 1 in (4)-( 5)) and (ii) HRV on a window basis.For illustration purposes, we considered a time window length for calculating HRV, so that the time window includes 10 RR intervals (i.e.= N 10 in ( 16)-( 17)).Although we regarded the 2 kHz sampled ECG as ground truth, it is associated with errors given by our analytical formulas for HR and HRV.Hence, we compensated for these errors in validating our analytical formulas by conservatively adding the error bounds pertaining to the 2 kHz sampling rate to the error bounds pertaining to the investigational ECGs.

Results
Figure 3 shows a representative example of (i) RR interval, (ii) ECG recordings during baseline and maximum exercise intensity, (iii) HR, and (iv) HRV, all associated with 2 kHz sampling rate.As intended, HR gradually increased (and thus RR interval gradually decreased) until approximately 1200 s and decreased approximately back to its initial level.RMSSD and SDNN were large when HR was small, while they were small when HR was large.Overall, the experimental protocol was successful in covering a large range of HR and HRV. Figure 4 shows the relationship between HR and HR calculation error bounds pertaining to 500 Hz, 200 Hz, and 50 Hz sampling rates as well as the experimental HR calculation errors.Figure 5 shows the relationship between HRV and HRV calculation error bounds pertaining to 500 Hz, 200 Hz, and 50 Hz sampling rates as well as the experimental HRV calculation errors pertaining to = N 10. Figure 6 shows the sampling rate required to achieve a desired level of (a) absolute and (b) relative HR errors with respect to HR. Figure 7 shows the sampling rate required to achieve a desired level of (a) absolute and (b) relative HRV errors with respect to HRV pertaining to = N 10.

Discussion
HR and HRV have played an instrumental role in a broad spectrum of psychophysiology science as versatile surrogates of autonomic nervous system outflows in response to physiological and psychological changes.To maximize their efficacy, HR and HRV must be calculated accurately.A key requirement to achieve this goal is to measure the ECG at a sufficiently high sampling rate.However, there is no established guidelines that aid the selection of appropriate sampling rate relevant to an application of HR and HRV at hand.This paper bridges this gap by presenting an array of analytical formulas relating the ECG sampling rate to upper bounds of HR and HRV errors.

Validity and tightness
The analytical formulas appear to predict the relationship between the minimum sampling rate versus the desired error bounds associated with HR and HRV (figures 4 and 5).The analytical formulas appeared to be valid independently of gender (not shown).The experimentally observed errors were strictly and consistently within the sampling rate-dependent error bounds predicted by the formulas regardless of gender.For HR, the formulas provided tight error bounds achievable with a sampling rate.For HRV, the formulas likewise provided tight worst-case error bounds (which appear to occur at small HRV values), while the error bounds pertaining to large HRV values did not appear as tight.The Cauchy-Schwarz inequality indicates that the RMSSD error becomes equal to its upper bound in (7) when ( ) are linearly proportional to each other.Likewise, the SDNN error becomes equal to its upper bound in (13) when ( ¯) and ( ) e e -+ i i 1 are linearly proportional to each other.In addition, the root-mean-squared errors (i.e. ( ) 2 in (13) in case of SDNN) attains maximum when all the elements in the summation are maximal, which happens when e e -=  D + 2 , in (14) in case of SDNN.Hence, the upper bounds of the HRV errors in (9) and (15) are achieved under the following conditions.In case of RMSSD (where r RMSSD is a constant): In case of SDNN (where r SDNN is a constant): Furthermore, as HRV becomes larger, HR must change greatly on a beat-by-beat basis, which is also not likely.In other words, the HRV error bounds derived from our formulas may become increasingly conservative (i.e. less tight) as HRV (i.e. the values of RMSSD and SDNN) increases.This may explain why the experimentally observed HRV errors in figure 5 exhibit a generally decreasing trend as RMSSD and SDNN increase.
An additional observation to note (although not shown) is that HRV errors tend to decrease as the number of RR intervals in the time window (i.e.N in (7) and (13)) increases.This observation may be explained as follows.
As N increases, it is less likely that (18) and (19) are satisfied.Hence, actual HRV error may tend to decrease as N increases, and consequently, the upper bound of HRV errors in (10)-( 11) and ( 16)-( 17) may become increasingly conservative as N increases.In sum, the analytical formulas presented in this paper to predict minimum required sampling rate to achieve a desired inaccuracy in HR and HRV (or alternatively, the inaccuracy in HR and HRV for a given ECG sampling rate) appear to be valid and provide tight bounds.

Explainable insights
In addition to the validity, the analytical formulas appear to be explainable.First, the HR formulas in (4), (5) elucidate that requisite F s increases as (i) HR increases and (ii) the desired error bound decreases (figure 6).Both these insights make intuitive sense.(i) As the underlying HR increases, RR interval decreases, which in turn increases calculated HR.To calculate shorter RR intervals without compromising accuracy, higher temporal resolution (i.e.sampling rate) is required in the measurement of RR intervals to more accurately detect the ECG R waves.(ii) As the desired error bound decreases, higher temporal resolution in the measurement RR intervals is likewise required to calculate HR more accurately.Hence, higher F s is required in both cases to increase the temporal resolution pertaining to the ECG measurement.
Second, the HRV formulas in (16), ( 17) elucidate that requisite F s increases as both the desired RMSSD and SDNN error bounds as well as the ground truth RMSSD and SDNN values (in case of relative error bounds) decrease (figure 7).Both these insights likewise make intuitive sense.(i) As the desired error bounds decrease, higher temporal resolutions in RMSSD and SDNN (which are both measured in [ms] unit) are required to calculate them more accurately.(ii) As RMSSD and SDNN decrease, higher temporal resolution is likewise required to calculate shorter RMSSD and SDNN while maintaining accuracy.Hence, higher F s is required in both cases.Admittedly, our sampling rate requirement is based on the worst-case analysis.Thus, the minimum sampling rates associated with both RMSSD and SDNN do not depend on the number of RR intervals, whereas the actual RMSSD and SDNN errors are expected to generally decrease as the number of RR intervals increases (see section (4.1)).
In sum, the analytical formulas presented in this paper relating the ECG sampling rate to upper bounds of HR and HRV errors may play a meaningful role in selecting a requisite sampling rate relevant to achieve a specified accuracy in HR and HRV calculation from the ECG, or alternatively, in predicting the inaccuracy in HR and HRV for a given ECG sampling rate.

Adequacy of sampling rates of commercial wearable HR and HRV monitors
Our analytical formulas are useful in estimating the sampling rate requirements for wearable ECG-based HR and HRV monitors.In regards to HR in adults, HR in normal resting conditions ranges from 60 bpm to 100 Considering the relative error, our analytical formulas predict that the ECG signal must be sampled at (i) >85 Hz, 35 Hz, and 18 Hz in order to measure HR within <2%, 5%, and 10% accuracy, respectively, in normal resting conditions; and at (ii) >170 Hz, 70 Hz, and 37 Hz in order to measure HR within <2%, 5%, and 10% accuracy, respectively, during strenuous physical activities.In regards to HRV in adults in normal resting conditions, RMSSD ranges from 7 to 103 ms, while SDNN ranges from 53 to 279 ms (Umetani et al 1998).
Our analytical formulas are also useful in estimating the accuracy of HR and HRV measurements associated with the existing wearable ECG monitors.Our review showed that existing wearable ECG monitors use sampling rates ranging 130 Hz to 1 kHz (Polar H10: 130 Hz, Equivital LifeMonitor: 256 Hz, Zephyr BioHarness: 1 kHz) (Lindsey et al 2023).Our analytical formulas predict that these monitors are estimated to have the worstcase HR and HRV accuracy summarized in table 1.As is obvious, a subset of existing ECG monitors may not be able to measure HR at an accuracy <2 bpm or <2%, while only Zehpyr BioHarness may be able to measure HRV at an acceptable accuracy.Hence, care may need to be taken in using HR and HRV provided by the existing wearable ECG monitors.

Conclusion
Analytical formulas that relate sampling rate of the ECG signal to error bounds in calculating HR and HRV were presented, for the first time to the best of our knowledge.The formulas will make meaningful impacts on the selection of sampling rate in the development of wearable ECG-based HR and HRV monitors.Future work must be invested to develop analytical formulas broadly applicable to other physiological signals capable of deriving HR and HRV, including the PPG and the SCG/BCG.

Figure 1 .
Figure 1.Problem formulation for determination of sampling rate required to achieve a desired level of HR and HRV calculation accuracy.L: time window length. -T t t : -Schwarz inequality was again leveraged to derive the last inequality, and ¯ å

Figure 3 .
Figure 3. Representative example of (a) ECG RR interval, (b) ECG recordings during baseline and maximum exercise intensity, (c) HR, and (d) HRV, all associated with 2 kHz sampling rate.

Figure 4 .
Figure 4. Relationship between HR and HR calculation error bounds pertaining to 500 Hz, 200 Hz, and 50 Hz sampling rates (red dashed lines) overlaid with experimental HR calculation errors (black dots).(a) Absolute errors.(b) Relative errors.

Figure 6 .
Figure 6.Sampling rate required to achieve a desired level of (a) absolute and (b) relative HR errors with respect to HR.

Figure 7 .
Figure 7. Sampling rate required to achieve a desired level of (a) absolute and (b) relative HRV errors with respect to HRV ( = N 10).Upper and lower panels correspond to RMSSD and SDNN.