Literature Review on Sleep APNEA Analysis by Machine Learning Algorithms Using ECG Signals

Obstructive Sleep Apnea is a respiratory disorder that impairs sleep quality by causing respiratory arrest. An irregular breath delay or decrease of airflow during sleep is the hallmark of the apnea syndrome. According to the literature, approximately 2% of middle-aged women and 4% of middle-aged men are affected. The disease is diagnosed by the physician in two steps. In the first stage, the physician reviews the medical records obtained using the polysomnography system. The disease is diagnosed in two stages by the physician, who examines the patient records taken with the polysomnography system in the first stage. New diagnostic processes and equipment are required as a result of the negative aspects of this procedure. The heart rate variable (HRV) and electrocardiography (ECG) signals are used, and ECG records from the patient and control groups are obtained. The optical filter was used to clean ECG signals and heart rate variables (HRV) from patient and control classes. After that, the ECG signal was used to calculate the HRV parameter. The HRV and ECG signals were then used to extract functionality. Reduced machine learning techniques, such as random forest, SVM, and the kNN feature selection process, were used to classify the extracted features. To evaluate the classifiers’ efficiency, the sensitivity and specificity values, as well as the accuracy rates for each class in the test set, were computed, and a receiver operating characteristic curve was developed. The method can be realized with Random forest, Support Vector Machine, and KNN, which have the best accuracy of 82.5 percent, 97 percent, and 89 percent, respectively, using 11 ECG and HRV features, according to the results. The system will work with these success rates. It is possible to implement a realistic sleep/awake detection method when all of these factors are taken into account. This means that using machine learning and signal processing methods, the ECG signal can be used to diagnose obstructive sleep apnea.


INTRODUCTION:
Sleep apnea is a sleep-related breathing disorder that causes respiratory problems and arousals due to a reduction or rise in airflow during sleep. Obstructive sleep apnea (OSA) is the most common form of sleep apnea. Polysomnography laboratories are commonly used to screen for this disease (PSG). This test takes the entire night and is more expensive, making it unaffordable for most people. As a result, new methods for detecting sleep apnea have been developed. Many algorithms and methods are now used to lower the cost and make it accessible to a wider range of people. The aim of this project is to analyse and detect this disease using ECG signals obtained from physionet and various machine learning algorithms. Untreated sleep apnea causes people to stop breathing regularly during the night, often hundreds of times.
Sleep apnea may lead to a variety of health issues, including hypertension (high blood pressure), stroke, cardiomyopathy (enlargement of the heart muscle tissue), heart failure, diabetes, and heart attacks if not handled. Sleep apnea, if left untreated, may cause job impairment, can lead to job failure, work-related injuries, and automobile accidents, as well as academic underachievement.
Obstructive and central apnea are the two forms of sleep apnea: x The more serious of the two is obstructive sleep apnea. Obstructive sleep apnea is described as a condition in which the upper airway is repeatedly blocked, either completely or partially. The diaphragm and chest muscles work harder to keep the airway open during an apneic episode as the pressure rises. With a loud gasp or body shake, breathing normally resumes. These episodes can disrupt sleep, reduce oxygen delivery to vital organs, and cause irregular heart rhythms.
x In central sleep apnea, the airway is not blocked, but the brain fails to signal the muscles to breathe due to dysfunction in the respiratory control centre. Central apnea is related to the function of the central nervous system.  • The ECG electrode placement was identical to that of a sleep laboratory. We used one-minute samples to speed up the feature extraction process in this report, and the best 40 samples were selected for review.

Feature extraction:
The ORS peak is detected from sleep apnea signals and the extraction is done by these methods in the project.
i. R Peak detection.
ii. Pan Tompkins algorithm

R Peak detection method:
The frequency content of ECG signals is often non stationary, which means that it varies over time. Wavelets are used to decompose signals into time-varying frequency (scale) components.
Working with sparser (reduced) representations allows measurement and prediction easier because signal features are often localized in time and frequency.
The QRS complex is made up of three deflections in the ECG waveform. The QRS complex, which reflects the depolarization of the right and left ventricles, is the most prevalent feature of the human ECG.
Load and plot an ECG waveform with two or more cardiologists annotating the R peaks of the QRS complex. For applications like R-R interval estimation, wavelets can be used to create an automated QRS detector.
Wavelets may be used as general function detectors when they divide signal components into various frequency ranges, allowing for a sparser representation of the signal. You will usually find a wavelet that looks like the function you're looking for.

Pan Tompkins algorithm:
x A low pass filter and then a high pass filter is used in cascade with the band pass filter.
The low pass filter's aim is to suppress high frequency noise.
x Integer arithmetic is used in the algorithm, allowing it to run in real time without requiring excessive computational power. Two ECG channels are available at the same time in the database.
x The filter architecture uses digital filters with integer coefficients, allowing for realtime processing speeds. Floating-point computing necessitates a high processing speed.
QRS power is maximized by using a pass band of around 5 to 15 Hz. As seen in the diagram, the filter is an integer filter with poles that cancel each other out. x A band pass filter has been applied to the signal since it has been differentiated. The signal must be squared before it can be used. This is the signal's non-linear processing.
It has been completed, and all positive values have been obtained. These values can then be used to process the corresponding squared waves. Furthermore, since QRS complexes occur, this processing focuses the ECG signal's higher frequencies. The R wave's slope is an absolute way to detect the QRS complex in an ECG.
x The presence of several QRS waves with long durations and high amplitudes in the ECG signal is abnormal. The waves can't be detected using just the slope of the R wave.
These waves are also detected with the aid of a moving window integrator.

Machine Learning Algorithm
The following are the algorithm used in this study:

Support vector machine (SVM):
Support Vector Machine (SVM) is a supervised machine learning algorithm for solving classification and regression problems. However, it is mostly used to solve classification problems. The value of each function is the value of a coordinate in the SVM algorithm, and each data object is plotted as a point in n-dimensional space (where n is the number of features you have). After that, we classify the data by finding the hyper-plane that simply separates the two categories (look at the below snapshot). Support Vectors are the co-ordinates of a person's discovery. The SVM classifier is a frontier that attempts to distinguish the two classes (hyperplane and line) as far as possible.
In the SVM classifier, creating a linear hyper-plane between these two classes is easy. Another pressing concern is whether this feature must be manually added in order to have a hyper-plane.
The kernel trick is a technique used by the SVM algorithm, which is a function that transforms a low-dimensional input space into a higher-dimensional space to turn a not separable problem into a separable problem. It's most useful when dealing with non-linear separation problems.
Simply put, it goes through a series of very complex data transformations before deciding on the appropriate way for separating the data depending on the labels or outputs you specify.

k-Nearest Neighbour algorithm (k-NN):
The k-Nearest Neighbor algorithm is one of the most common Machine Learning algorithms, and it is built on the Supervised Learning approach. The algorithm believes that the new case/data and existing cases are equivalent, and it assigns the new case to the group that is the most similar to the existing ones. It saves all current data and categorizes new data points based on how close they are to the original data. This means that the k-NN algorithm can easily classify new data into a well-defined group.
The k-NN algorithm can be applied to both regression and classification problems, but it is most widely used for classification. It's a non-parametric algorithm, which means it doesn't make any predictions about the data. It's also known as a lazy learner algorithm because it doesn't learn from the training set straight away; instead, it saves the dataset and uses it to interpret it later. The KNN algorithm actually stores the dataset during the training period, and when new data is received, it classifies it into a category that is somewhat similar to the new data.

Random forest algorithm:
Random forest is a learning algorithm that is supervised. It creates a "forest" out of an ensemble of decision trees, which are normally trained using the "bagging" process. The bagging method's basic premise is that combining different learning models improves the overall outcome. Simply put, random forest combines several decision trees to produce a more accurate and stable prediction. Random forest has the benefit of being able to solve classification and regression problems, which are common in today's machine learning systems. As a result, there is a great deal of variance, which leads to a more accurate model. As a consequence, the algorithm for splitting a node in random forest only considers a random subset of the functionality. You can make trees even more unpredictable by using random thresholds for each variable instead of searching for the best possible thresholds (like a normal decision tree does). The numerical filter was designed and applied in the first step to remove artefacts and noise that occurred during the signal acquisition. During the time, the sample PPG signals of the sleepwake states obtained after the filtering phase are shown. There is a difference in signal between the classes. HRV signal was derived after the signal had been cleaned in order to make better use of the ECG signal. Following that, the ECG and HRV signals were subjected to a feature extraction procedure. Using an extraction algorithm on the raw signal, the extracted features were classified. Performance evaluation criteria were used to determine classification processes.
The ECG signal is used to retrieve a total of 11 features. Since some of the features derived from ECG and HRV signals were common, some typical expressions were used in some areas.
Based on the given local minimum and maximum points, the signal is divided into intervals.
Local minimum points are defined at the signal's start and end points. The PPI was calculated using the time differential between successive peakAmp stages. PeakAmps and valleyAmps were originally known as the local maximum and minimum points of the ECG signal. In the case of missed peakAmp points, the next valleyAmp point was therefore discarded. Finally, PWA was calculated by subtracting valleyAmp from the previous peakAmp. There was no mistake in the PWA value because peakAmp and valleyAmp points were only observed in pairs and were discarded if one of them was missing.

CONCLUSION:
The k-Nearest Neighbours classification algorithm is one of the supervised learning methods for solving classification problems. Calculating the correlations between the data to be categorised and the learning set's normal behaviour data; stratification is achieved depending on the threshold values found using the average of k-data, which is the nearest. Most importantly, the role of each class is transparently described ahead of time. The k number of the nearest neighbouring, the threshold value, the resemblance calculation, and the inclusion of sufficiently typical behaviours in the learning set all affect the management performance.
SVMs have shown to be one of the most successful learning algorithms in the field of counselling. SVMs are useful not only in classification problems, but also in regression analysis.
In essence, SVMs have lines that can be used to linearly or nonlinearly divide two sets. The algorithm's aim is to find new data based on new error rates by separating data sets on a hyperplane. The learning data that are nearest to the hyperplane are called support vectors. With accuracy of 82.5 percent, 97 percent, and 89 percent, Random Forest, Support Vector Machine, and kNN are the most accurate.