Epileptic seizure detection by using interpretable machine learning models

Objective. Accurate detection of epileptic seizures using electroencephalogram (EEG) data is essential for epilepsy diagnosis, but the visual diagnostic process for clinical experts is a time-consuming task. To improve efficiency, some seizure detection methods have been proposed. Regardless of traditional or machine learning methods, the results identify only seizures and non-seizures. Our goal is not only to detect seizures but also to explain the basis for detection and provide reference information to clinical experts. Approach. In this study, we follow the visual diagnosis mechanism used by clinical experts that directly processes plotted EEG image data and apply some commonly used models of LeNet, VGG, deep residual network (ResNet), and vision transformer (ViT) to the EEG image classification task. Before using these models, we propose a data augmentation method using random channel ordering (RCO), which adjusts the channel order to generate new images. The Gradient-weighted class activation mapping (Grad-CAM) and attention layer methods are used to interpret the models. Main results. The RCO method can balance the dataset in seizure and non-seizure classes. The models achieved good performance in the seizure detection task. Moreover, the Grad-CAM and attention layer methods explained the detection basis of the model very well and calculate a value that measures the seizure degree. Significance. Processing EEG data in the form of images can flexibility to use a variety of machine learning models. The imbalance problem that exists widely in clinical practice is well solved by the RCO method. Since the method follows the visual diagnosis mechanism of clinical experts, the model interpretation results can be presented to clinical experts intuitively, and the quantitative information provided by the model is also a good diagnostic reference.


Introduction
According to the World Health Organization, epilepsy is a common neurological disorder affecting 50 million people worldwide [1]. For epilepsy patients, the disease is accompanied by various symptoms, such as myoclonus, rigidity, clonus, and atonic seizures [2][3][4]. In addition to the symptoms described above, epilepsy can affect the future neurodevelopment of infant patients [5]. Various examinations are performed in the diagnosis of epilepsy, of which long-term electroencephalogram (EEG) monitoring is the most important [6,7]. Clinical experts need to visually diagnose the recorded EEG data and make a comprehensive judgment based on the patient's clinical symptoms. However, epilepsy seizures are irregular, and the duration of each seizure is often short [8]. In clinical practice, long-term EEG monitoring is performed on the patient, and this requires clinical experts to identify the seizure features out of a large amount of recorded EEG data. Visual diagnosis of EEG is challenging, and clinical experts require professional training. In practice, it is also a timeconsuming process [9]. Thus, there is a need for an EEG seizure detection method that can reduce the workload of clinical experts.
Many methods have been proposed to detect epileptic seizures, and these are mainly divided into traditional and machine learning methods. In the traditional methods, EEG data is first processed using feature extraction, which is designed based on the various features in the seizure and non-seizure EEG data. One such method uses entropy to extract features because the epileptic seizures are caused by abnormal brain cell discharge, causing brain signals to contain more energy than usual [10][11][12]. In the nonseizure EEG data, the individual frequency bands are relatively stable. However, the frequency features change dramatically in seizure EEG data. Some methods utilize the time-frequency feature, and the shorttime Fourier transform and wavelet transform are commonly used [13][14][15]. In the next step, the extracted features are sent to common classifiers for the classification task to achieve seizure detection. Some classifiers, such as the support vector machine, random forest, naive Bayesian, and artificial neural network are commonly used [16]. Although traditional methods based on feature extraction can achieve superior performance, but the feature extraction requires manual design, and the methods are often not universal in the different datasets, limiting the use of these methods. With the development of machine learning in recent years, end-to-end models are used for epileptic seizure detection. EEGNet [17] uses onedimensional convolutional layers to extract EEG features and complete the classification task. Long shortterm memory [18] is another commonly used EEG classification model that considers the relationship between the data before and after the time series.
Among the above methods, traditional methods require the manual design of feature extraction algorithms and lack data generality. Although the end-to-end machine learning methods can help us avoid tedious manual design feature extraction, such models work as a black box that cannot provide necessary explanation information for each operation step [19]. In this study, we aim to provide a convenient and practical epileptic seizure detection method and also want to explain the model that provides more effective information as a reference to clinical experts. The goal is to improve the work efficiency of clinical experts and increase their confidence. In our method, we follow the visual diagnosis mechanism used by clinical experts that directly process the plotted EEG image data. Unlike the common methods, the recorded EEG is processed as vector or matrix data. First, recorded EEG data is plotted as images that include multiple channels, and the plotted images are similar to those on computer screens that the clinical experts observe [20]. Before we apply the image data to the machine learning models, there is the troublesome problem of data class imbalance to overcome. In clinical practice, during long-term EEG monitoring, epileptic seizures are irregular and short-lived [21,22]. The amount of seizure and non-seizure data for the recorded EEG data tends to demonstrate a heavy imbalance, and imbalanced datasets cannot be directly applied to machine learning models. As a result, we proposed an image generation method. In computer vision, image rotation, inversion and, cropping are commonly used for data augmentation [23]. For the EEG image, we use random channel ordering (RCO) to generate the new image by simply changing the channel order to generate the new image while not altering the EEG data. The recorded data can be adjusted and balanced using the RCO method.
The usual seizure detection models only output seizure or non-seizure classification results, but in epilepsy, this is not a simple two-dimensional selection. In clinical practice, the diagnosis of seizure or non-seizure is based on the waveform features in the EEG data. In addition, the degree of seizure also needs to be analyzed. In this study, the convolutional neural network (CNN) models of LeNet [24], VGG [25], deep residual network (ResNet) [26] and, vision transformer (ViT) [27] models are used for the EEG data classification task. For the CNN models (LeNet, VGG, and ResNet), after obtaining a trained model, we use the gradient-weighted class activation mapping (Grad-CAM) method [28] to analyze the model result. The last convolutional layer is the analysis object. By using Grad-CAM, the class activation mapping can be visualized. The ViT model's attention layer is also visualized in the result analysis since it already exists in the model architecture. The visualization results (e.g. class activation mapping and attention mapping) can show the model's classification basis, which provides beneficial reference information for clinical experts. After obtaining the seizure and non-seizure classification results and activation or attention mapping of the model, we try to provide more helpful information. Because the model's results being only provide a binary choice, the seizure degree information, which is essential for diagnosis is ignored. Moreover, along with the classification result, the classification probability is also converted and output into a value that measures the seizure degree. The binary seizure and non-seizure results are converted into a zero-to-one value, with higher values indicating more severe seizures.
In the experiment, EEG data records from multiple patients are used to evaluate the method and use leave-one-patient-out to verify the general data performance. Based on the experimental results, the Res-Net model achieves excellent classification performance and the model interpretation also provides the model judgment basis and quantitative information on the seizure state of EEG data, which are helpful for clinical experts.
The rest of the paper is organized as follows. Section 2 describes the evaluation EEG dataset and the methods used, including the EEG data plot, RCO, LetNet, VGG, ResNet, ViT models and the model interpretation methods. Nest section 3 explains the results of EEG data classification, model interpretation, and seizure detection. A discussion of the methods is presented in section 4, and the conclusions are presented in section 5.

Materials and methods
The evaluation dataset used in this study, which included seizure and non-seizure intracranial electroencephalography (iEEG) data, was collected from Juntendo University Hospital. The research is approved by the Ethics Committee of Juntendo University Hospital and the Ethics Committee of Tokyo University of Agriculture and Technology. We followed the visual diagnosis mechanism of clinical experts, and the recorded iEEG data were plotted first as an image. Due to the clinical data being heavily class imbalanced, the RCO preprocess was performed to ensure balanced data. Last, the balanced image data were fed to the machine learning models (i.e. LeNet, VGG, ResNet and ViT) for seizure and non-seizure classification task. The overall structure of the method is shown in figure 1.

Dataset
The dataset used for evaluation in this study contains eight patients, all with mesial epileptic seizures. For each patient, electrodes were primarily located in the hippocampus, lateral temporal and lateral frontal (eight hippocampus electrodes data were used in this study); the sampling rate is 2000 Hz, and the seizure and non-seizure labels are annotated by clinical experts. For each patient, one hour of nonseizure data and some seizure data were used for method evaluation. Seizure and non-seizure data were captured from the records of long-term iEEG monitoring, the half hour data before and after the seizure were not used. Dataset statistics are listed in table 1.

EEG plot image
In the EEG image plot, following the visual diagnosis mechanism of clinical experts, continuous EEG data is captured with a sliding window and plotted as an image. Plot is based on EEG wave amplitude, the image is exactly the same as the clinical experts saw in the clinical environment. The time windows use 10 s, which is the most commonly done in clinical practice [29]. For finer temporal resolution, the image is plotted with a time step of 1 s. Considering the different parameter settings (i.e. volt, millivolt, microvolt) during the data collection process, the data of each patient will be normalized first. After the normalization, each patient's data has a mean of 0 and a standard deviation of 1. The image is set to 256 × 256 pixels size and includes eight channels with grayscale, some image examples are shown in figure 2.

Random channel ordering (RCO)
During long-term EEG monitoring, the seizure state is often short-lived, leading to a heavy class imbalance in the recorded EEG data, and the imbalanced EEG data is difficult to use directly in a machine learning model. In the field of computer vision, there are some commonly used methods to generate more image data, such as rotation, inversion and cropping. In clinical practice, clinical experts diagnose the EEG data through visual judgment. In the process of displaying the EEG data, there is no fixed requirement for the display order of each channel, and it is often determined according to the personal habits of clinical experts or the default settings of the EEG display software. We propose the RCO method to generate the new EEG image. For an EEG image, a new image is generated each time the random channel is randomly shuffled, the RCO method is shown in figure 3. In the RCO method, we do not change any EEG data value, preserving all the generated images including all clinical record information.

Convolutional neural network (CNN) models
CNN models have demonstrated strong performance in many fields, especially in computer vision [30]. In this study, we use three commonly used CNN models of LeNet, VGG and ResNet to classify the EEG image data. LetNet is a simple CNN model with backpropagation algorithms and only includes two convolutional layers and three fully connected layers. VGG is a deep CNN model that includes more convolutional layers and it secured first and the second places in the localization and classification tracks in the Large Scale Visual Recognition Challenge (ILS-VRC) 2014 respectively. To simplify model optimization, the ResNet model uses the layer as learning residual functions with reference to the layer inputs and secured the first place on the ILSVRC 2015 classification task. Considering the scale of EEG data, the LeNet 5 model, the VGG 11 model and the ResNet 18 model are used for the EEG image classification task. In the code implementation, the PyTorch framework is used to build the LeNet, VGG and ResNet models. For VGG and ResNet models, the pre-trained models (by ImageNet [31]) provided by the PyTorch are used (torch.hub.load('pytorch/vision:v0.10.0' , 'vgg11' , pretrained = True), torch.hub.load('pytorch/vision:v0. 10.0' , 'resnet18' , pretrained = True)).

Vision transformer (ViT) model
Transformer architecture has recently become the most commonly used model in the natural language processing field [32]. The transformer improves the time complexity of model training as the training Figure 1. The overall structure of the classification method. EEG data recorded from long-term monitoring are first split by a 10 s window and 1 s step. EEG data is plotted as an image and preprocessed using the RCO method to generate samples in minor class. Last, the dataset is fed into some models for the classification task.  data can be processed in parallel in the model. Due to the transformer's architecture, its applications in the computer vision field are limited. To apply the transformer model to this field, the study [27] proposes a ViT model that is a variant of the transformer. Based on the experiments, the ViT model achieves excellent performance compared with stateof-the-art CNN models and has more efficient computing in the model training process. In our study, the code implementation of the ViT model utilizes ViT-PyTorch (Github: https://github.com/lucidrains/ vit-pytorch).

Model interpretation
For classification tasks, the output of common machine learning models is simply a binary choice (positive or negative). In some fields, binary results have already been used to complete the classification task; however, in epileptic seizure detection, the classification of seizure and non-seizure results is only the most basic information. In clinical practice, the visual diagnosis of an epilepsy EEG is based on the waveform features, which provide the basis for clinical diagnosis. Therefore, our model not only outputs the classification results and explains the classification basis so that the model operation process can be clearer to understand and increase the confidence of clinical experts. There are some methods for interpreting the CNN model in the classification task. Methods [33][34][35] aim to identify the pixels that have the most impact on the classification result using partial derivatives, guided backpropagation and deconvolution. In the CNN model, the convolutional layer includes spatial information that is destroyed in the fully-connected layer. The Grad-CAM method [28] uses the gradient information to achieve a detailed spatial analysis. By using the Grad-CAM method, the ResNet model can be visualized by class activation mapping, which can show the contribution of different parts of the image in the classification task. In our study, the code implementation of Grad-CAM uses [36]. Since the attention layer already exists in the ViT model, we can visualize the attention layer to show the basis of the model classification. The code implementation of the attention layer visualization uses Explainability for Vision Transformers (Github: https://github.com/jacobgil/vit-explain).
After the visual interpretation of the model, we want to provide more effective information to clinical experts as the diagnosis basis. Although the model result is only a binary outcome of seizures or nonseizures, but in the model classification processing, the probability that the input image belongs to the seizure or non-seizure class is calculated respectively. It takes the higher probability class as the classification output. The model classification probability reflects the degree of seizure and non-seizure for the input EEG image. Therefore, we use the model's classification probability outcome as a quantitative measure of seizure degree for the input EEG image. For each input EEG image, a value from zero to one is also output simultaneously, and a higher value means a more severe seizure.

Experimental results
To evaluate the performance of the model across patients, the leave-one-patient-out evaluation is used. The training data includes the multiple patients' data of the raw seizure data, raw non-seizure data, and RCO seizure data. The test data only includes one patient's data of the raw seizure data and raw nonseizure data. Because the EEG data is class imbalanced, the accuracy index cannot fully evaluate the performance of the model. In the experiments, the confusion matrix is used to evaluate the model performance. The confusion matrix includes four values: (a) true positives (TP), which is the amount of seizure data correctly classified as seizure, (b) true negatives (TN), which is the amount of non-seizure data correctly classified as non-seizure, (c) false positives (FP), which is the amount of non-seizure data misclassified as seizure data, and (d) false negatives (FN), which is the amount of seizure data misclassified as non-seizure. To more comprehensively evaluate the model's performance on the imbalanced data, the following three indexes are used, Precision = TP/(TP + FP), Recall = TP/(TP + FN), and F1 score = 2TP * (2TP + FP + FN). Precision measures in the detected seizure samples, how many is real seizure samples. Recall measures how many seizure samples were successfully detected. F1 score is the harmonic meaning of precision and recall.

Results of EEG plot image
The EEG data recorded from the clinical is first plotted as EEG images with a 10 s time window, 1 s step and scale of 5. The specific amount of seizure and non-seizure images for each patient are shown in table 2. The imbalance is how many times the number of raw non-seizure images is the number of raw seizure images. For the different patients, the imbalance between non-seizures and seizures also varies  -seizure Imbalance   01  218  3,270  3,591  16  02  228  3,420  3,591  16  03  66  3,498  3,591  54  04  33  3,564  3,591  109  05  81  3,483  3,591  44  06  83  3,486  3,591  43  07  277  3,324  3,591  13  08  214  3,424  3,591  17  Total  1200 27,469 28,728 39 from 13 to 109. In order to use the data for model training, the RCO method is used to improve the data imbalance. The number of RCO executions is determined based on the imbalance of individual patient data (imbalance -1) so that the amount of seizure and non-seizure data are roughly equal.

Results of CNN models
The LeNet 5, VGG 11 and ReNet 18 models are used for EEG image classification. In the model training, the cross entropy loss function and Adam optimization function are used, the batch size is 128 and the epoch is 100. The learning rate is tuned with (5 × 10 −5 , 1 × 10 −5 , 5 × 10 −6 , 1 × 10 −6 , 5 × 10 −7 , 1 × 10 −7 ). The learning rate is fixed in the first fifty epoch, after the fiftieth epochs, the learning rate is decayed by a multiplying of 0.995. The experiment is performed by using the leave-one-patientout method and the results are shown in table 3. The experimental results are the average of the last ten epochs, and the last two rows are the mean and standard deviation of the results for eight patients. From the experimental results, the ResNet model achieved the best classification results.
In the previous experiments, we used RCO images and pre-trained models (VGG and ResNet) in model training. We perform two comparative experiments on the RCO images and the pre-trained model. From the experiment results shown in table 3, the ResNet model shows the best classification result, in the comparative experiments, the ResNet model is used for RCO and pre-train validation. Because the clinically collected data is often imbalanced, we need to balance the dataset before the model training. To verify the effectiveness of the RCO method, we use the generative method of replicating images to balance the dataset. From the results are shown in table 4, RCO achieved better classification performance. The Res-Net model is used to verify the effectiveness of pretrain, the ore-train model provided by PyTorch is trained by the ImageNet dataset.
Before the image plot process, EEG data is normalized with a mean of 0 and a standard deviation

Results of ViT model
The ViT model is used for EEG image classification and the model architecture is as follows: image size = 256, patch size = 32, num classes = 2, dim = 1024, depth = 6, heads = 16, mlp dim = 2048, channels = 1, dropout = 0.1, and dropout = 0.1. In the model training, same as the LeNet, VGG and ResNet model experiment, the cross entropy loss and Adam optimization functions are used similarly to the previous experiments. The batch size is 256 and the epoch is 100. The learning rate is also tuned with a grid search (5 × 10 −5 , 1 × 10 −5 , 5 × 10 −6 , 1 × 10 −6 , 5 × 10 −7 , 1 × 10 −7 ). In the first fifty epochs, the learning rate is fixed; after the fiftieth epoch, the learning rate is decayed by a multiple of 0.995. The experiment is performed by using the leave-one-patient-out method and the results are shown in table 6. The experimental results are the average of the last ten epochs, and the last two rows are the mean  and standard deviation of the results for eight patients.
All patient data are evaluated by the models of LeNet, VGG, ResNet and ViT respectively, and their violin plots are shown in figure 5.

Results of one patient training
In the previous experiments, all models are evaluated by the leave-one-patient-out method. From the experimental results, all the models have achieved good detection performance. To further test the model performance under harsh conditions, we use the opposite method to the leave-one-patient-out evaluation method. In the experiment, just use one patient's data (raw seizure, RCO seizure, and raw non-seizure samples) as training data to train the model and all the other patients' data (raw seizure and raw non-seizure samples) as test data. All the parameter settings for model training are the same as in the previous experiments. The experimental results are shown in table 7. Compared with the leaveone-patient-out evaluation, the size of the training data is greatly reduced that of the test data is increased and the performance of the models has decreased.

Results of model interpretation
After using the ResNet and ViT models, the EEG images can be well classified (seizure or non-seizure). To mine more useful information from the model and provide it to clinical experts, we interpret the model to obtain the class activation mapping and attention layer visualization. For the ResNet model, we use Grad-CAM to analyze the class activation mapping on the last convolution layer. Some class activation mapping results are shown in figure 6 (upper). Based on the results, the classification basis (blue color) is consistent with the EEG waveform features. For the ViT model, the attention layer exists in the model architecture and is extracted for visualization; some results are shown in figure 6 (lower). The attention layer also shows the blue portion as the classification basis and is consistent with the EEG features.
In some commercial EEG software, seizure detection is also possible through trend (time-frequency) analysis. Seizure EEG includes more high-frequency components, and it is a key indicator of clinical After softmax, the model output of V is normalized to theV that all elements sum to one, andv i is corresponds to the probability that the input EEG image is classified as the i class. Ifv i is more than 0.5, the input EEG image is classified as class of i. For a typical seizure or non-seizure image, the value ofv i approaches one or zero, and it is used to quantify the degree of seizures in the EEG image. Some quantified results are shown in figure 8. The level of quantitative score levels in the images corresponds to the degree of EEG seizures. The class activation mapping, attention mapping and quantitative scores are provided to clinical specialists as reference information, that is useful for the EEG visual diagnosis.

Results of seizure detection
All model classification results are presented in the previous experiments. In this section, we present the seizure detection performance of the method on clinical EEG data. Differences exist between the test data used in the classification task and the clinical data.
The clinical data is a long-term EEG monitoring, but in preparing the training and test datasets, half hour data before and after the seizure are not used in preparing the training and test datasets. In order to truly demonstrate the effectiveness of the proposed method in clinical data, a segment of continuous data containing ten minutes of data before and after the seizure is used, and the results of the model in the continuous data are graphically displayed through the software shown in figure 9. This is a brainwave software built by our team that can display, annotate, analyze and performs other functions. In figure 9, the continuous data of Patient 1 is used to show the seizure detection performance; the red lines are the begin and end time labels annotated by clinical experts, and the results of the two models on continuous data are shown in two time bars at the bottom of figure 9. In the blue and pink time bar, data from ten minutes before and after the seizure (20 min 42 s) are plotted as images for classification; a few seconds after the end of the seizure is also classified as seizure data.
In this experiment, a 10 s window and a 1 s step are used for the image plot. This results in some of the images being plotted that contain both seizure and non-seizure data, after the seizure, there are nine images containing both seizure and non-seizure data, and the waveform of each image are shown in figure 10. If the image contains more seizure parts, the model classifies it as a seizure. Otherwise, it is classified as non-seizure.

Discussion
This study primarily aims to reduce the workload of clinical experts in the visual diagnosis of EEG data. Clinical experts need to find seizure data from longterm EEG monitoring data to diagnose the condition of epilepsy patients, but this step usually takes considerable time. Some methods have been proposed to detect epileptic seizures and these methods are divided into traditional and end-to-end methods according to whether manual feature extraction is performed or not. In traditional methods, EEG data is first processed with feature extraction and then classified by using some classifiers [13][14][15][16].  In the end-to-end method [17,18], feature extraction and classification are fused into one step by using the machine learning model. In comparing the two methods, the traditional method can explain the corresponding medical significance, but the method of feature extraction method requires manual design and lacks data generality. For the end-to-end method, the operation is black box like, even with the high performance, it is difficult for clinical experts to have high confidence in the black box model.  Following the mechanism of clinical experts in visual diagnosis, we use EEG images as the object of data analysis. Recorded EEG data is first plotted as images first. Due to the heavy imbalance of seizure and non-seizure data, it is hard to directly apply the machine learning model. As shown in figure 3, we propose an EEG image generation method (RCO) that just randomly the channel order. After the RCO process, the EEG image data is balanced. In the experiments, ResNet and ViT models are used for the classification task in the experiments. Tables 3 and 6 reveal that the two models performed excellently in the leave-one-patient-out test.
Unlike other fields, the application of computer methods in the medical field should be more cautious. We interpret and visualize the model operation to provide more effective information to clinical experts for diagnosis, Grad-CAM [36] is used for the (lower) are also consistent with the EEG waveform features. Along with the classification results of the model, the visualization outcomes can clearly explain the operating mechanism of the models and increase the confidence of clinical experts in the models. In addition, the classification probability in the last layer is also output, and the classification probability can be considered as the degree of seizure information in the EEG image. In the quantitative score results 8, EEG images of different seizure degrees obtained different quantitative scores. Due to 10 s time window being used to plot images, some images include seizure and non-seizure data at the same time, for those images, contain more than half of the seizure part, and it is classified as seizure data.
Compared with the traditional methods, our method avoids manually designing feature extraction. For the black box problem in the end-to-end method, we provide more effective information with the classification results, including the model interpretation (class activation mapping and attention layer visualization) and quantification scores; these are helpful reference information for the clinical expert diagnosis.

Conclusion
To improve the efficiency of clinical experts in EEG visual diagnosis, we use EEG plot images as input data for machine learning models (LeNet, VGG, Res-Net and ViT) and solve the data imbalance problem which is widespread in the clinic by using the RCO method. Based on the model classification results, we use methods (Grad-CAM and attention layer visualization) to explain the model and analyze the classification basis. In addition, the model's classification probability is used as the seizure degree in the EEG image. The classification results, model visualization, and seizure quantification score from the experiments are effective reference information for clinical experts.

Data availability statement
The data generated and/or analysed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.