Sample entropy applied to the analysis of synthetic time series and tachograms

Entropy is a method of non-linear analysis that allows an estimate of the irregularity of a system, however, there are different types of computational entropy that were considered and tested in order to obtain one that would give an index of signals complexity taking into account the data number of the analysed time series, the computational resources demanded by the method, and the accuracy of the calculation. An algorithm for the generation of fractal time-series with a certain value of β was used for the characterization of the different entropy algorithms. We obtained a significant variation for most of the algorithms in terms of the series size, which could result counterproductive for the study of real signals of different lengths. The chosen method was sample entropy, which shows great independence of the series size. With this method, time series of heart interbeat intervals or tachograms of healthy subjects and patients with congestive heart failure were analysed. The calculation of sample entropy was carried out for 24-hour tachograms and time subseries of 6-hours for sleepiness and wakefulness. The comparison between the two populations shows a significant difference that is accentuated when the patient is sleeping.


Introduction
Currently, cardiovascular diseases are the leading cause of death worldwide and more people die for this reason than for any other reason [1,2]. Factors such as obesity, the consumption of tobacco, physical inactivity, high blood pressure, hyperlipidemia, diabetes, and inadequate diet, are risk factors for cardiovascular disease, even more when these factors arise together [1]. Age is a factor that affects the heart and vessels, operating on the structures and systems of the body in general, so that it favours the emergence of cardiovascular diseases.
The use of non-linear analysis for the study of complex systems with fractal properties, allows more accurate descriptions of their behavior. The human body consists of different complex systems that have fractal and multifractal behaviours in healthy conditions [3], and it has been reported a decrease in these fractal trends when the systems are in a pathological condition [5].
Congestive Heart failure (CHF) is a syndrome in which patients typically have lack of air both at rest and during exercise; they show signs of fluid retention, and objective evidence of a structural or functional cardiac disturbance at rest [5]. The heartbeat interval time series obtained from ECG records of patients with CHF have been studied with various techniques of Nonlinear Dynamics, especially fractal and multifractal analysis [3,4] and found significant differences between healthy and CHF subjects. Complexity of these signals seems to be different, so we propose that computational sample entropy could help to differentiate the two types of series, and thus be able to support the diagnosis of patients with heart failure.

Methodology
In many cases, the information obtained from the measuring devices provides temporary signals x(i), where i = 1,..., N. The temporary registration of data obtained from measurements allows predicting the behaviour of the systems [6]. Time series can be characterized by four basic types of variation, which produce the changes observed in the series in a certain time interval, and give the series an erratic behaviour. These components are: secular trend, seasonal variation, cyclical variation and irregular variations, which are short term variations that are unpredictable and non-recurrent.
Frequently, it seems to be that a time series does not have useful information, but a more careful analysis can show the existence of correlations within the series, that is, the existence of a relation between the value at a moment i with the value at the moment j if j > i. When there is no relation between the value at the moment i with the value at the moment j (i ≠j) it is said that the time series is a succession of independent samples, this sort of time series is called white noise. In the case of white noise time series, a random ordering of its components would be again a white noise.
On the other hand, when the value of the variable i influences the future values j > i, it is said that there is dependence and therefore there is correlation. For example, a time series obtained from the accumulation of independent events of white noise is called Brownian noise; this type of noise corresponds to short range correlations [7].
We used a database of physiological signals from the web page Physionet [8] to obtain beat to beat intervals time series of 24 hours, this kind of time series is called tachogram. It was used the Normal Sinus Rhythm RR Interval Database, consisting of a population of 54 healthy individuals (30 men aged between 28 and 76) and 24 women aged between 58 and 73. It was also used two populations of CHF patients, one of them corresponding to the BIDMC Congestive Heart Failure RR Interval Database, which consists of 15 patients (11 men aged between 22 and 71) and 4 women aged between 54 and 63 classified in the NYHA scale 3-4, and a last database named Congestive Heart Failure RR Interval, with 29 patients (which includes 8 men and two women aged 34 to 79 and other 17 individuals of unknown gender) of different NYHA classifications [9]. The NYHA index was proposed by the Nueva York Heart Association as a method to classify CHF patients, the value IV corresponds to the patients in worst conditions.
The 24-hour signals were segmented into series of 6 hours for wakefulness and 6 hours for sleep intervals. Discrimination was possible thanks to the significant reduction in heart rate that individuals suffer while they are asleep, and the contrast with the beat to beat interval of the awake individual, which demands a greater flow of blood and this is reflected in the time between R and R [10] in the electrocardiogram signal.
To apply computational entropy analysis it is required to ensure that the used method provide values that do not depend on the series size, which is a problem with this type of methodologies, also is required a proper calibration of the method. Time series generated with a suitable methodology were used for this. For this reason were used time series with a certain value of β spectral power were generated, these series were obtained using the method proposed by Gálvez-Coyt et al. [7]. In this method a Gaussian white noise with N data is generated, and then the discrete transform of Fourier is applied. The result is the flat spectrum corresponding to a spectral power, where the amplitude | is the same for the coefficients with Subsequently new coefficients of Fourier transform were built, from those obtained from the Fourier transform of the Gaussian white noise: Now the magnitude of the filtered data is taken and a phase uniform and random is generated for each one in the interval. Then we get the Fourier coefficients for the magnitudes and phases previously calculated by using: The inverse Fourier transform of is obtained, With the purpose of obtaining a noise Gaussian with a certain value of β, the real part of the series [19] is considered. In Figures 1, 2 and 3 are shown the self-affine time series with N = 10000 data for β values 0, 1, and 2.    The method was implemented to generate a group of signals with N = 10000 data with known values of β, with β ranging from β = 0 to β = 2 with 0.1 increments between each series. Then we proceed to evaluate different types of entropy, but ultimately only the sample entropy was used because the best results were obtained for this type of entropy. The method proposed by Richman and Moorman was used for the calculation of the sample entropy (SampEn) [11]. Taking into account the time series, vectors of length m are defined by: , for 1 ≤ i ≤ N -m + 1. These vectors represent m consecutive values of x starting at point i, subsequently the distance between the vectors Xm (i) and Xm (j) is defined as the absolute value of the maximum difference between its scalar components: Subsequently, for a given , the number of is counted, such that the distance between and is less than or equal to r. Taking into account that : is defined as: Then m is changed to m + 1 and is calculated as: As a result, is defined as the probability that two sequences match in m points, and thus corresponds to the probability that two sequences match on m + 1. So the sample entropy can be defined as: The values of r and m are decisive for the precision of the method. In this case, values of and were chosen, which proved to be suitable for this type of time series analysis, allowing a greater efficiency in the method and reducing the loss of information [10].

Results
For the characterization of the sample entropy method, it was applied to the theoretical time series previously generated, calculating the value of entropy for series of N = 10000 data and values of power spectral exponent ranging from β = 0 to β = 2 with increments of 0.1 between each series. Figure 4 shows the graph obtained for this calculation, being the abscissas axis the corresponding β values and the ordinates axis the entropy value. The method shows a linear trend extending from the values of β = 0.7 to β = 2.0, the human beat to beat intervals time series belongs to this interval. Once characterized the method with time series of known values of β, it followed the analysis of the time series of the CHF database and healthy people for the three types of series (total 24 hours, sleep 6 hours and wake 6 hours). We proceeded to perform the analysis with the method of sample entropy. In the following figures, the graphic representation of the values sample entropy for each one of the samples, and the average value of both populations is represented as continuous lines. On dispersion graphs shown below, blue colour corresponds to healthy people, and red colour to heart failure patients.
Likewise, the statistical test for each type of study, i.e., complete series, series of 6 hours of sleep and series of 6 hours of wakefulness was made by using T-Student with a significance value of 0.5 and the hypothesis that were used for the previous study: the null hypothesis was and the alternative hypothesis (SE is sample entropy).
In the comparison carried out when they are sleeping, it was found a considerable difference between both averages, this difference is statistically significant. Whereas, for the series in which the individuals are awake, it can be seen a very small gap between the two average values, and even the average entropy of healthy people has a lower value than the patients with CHF, which does not correspond with the values obtained for the sleep period.

Conclusions
Unlike other methods for evaluating the entropy, sample entropy does not present alterations in their results by the length of the series, however, the processing time increases exponentially as the size of series increases. The use of the value gave reliable results in the calculation of the sampling entropy. We recommend the use of value m = 3 which was taken from the literature, which is recognized as a value that allows a satisfactory study of entropy and bringing precision to the method, without losing information. Applying the method to the tachogram analysis of healthy people and CHF patients it was possible to differentiate both groups, but the best results were obtained when people are sleeping, it is believed that this is because the condition of patients with CHF is aggravated by night.