Prediction of mild cognitive impairment using EEG signal and BiLSTM network

Mild cognitive impairment (MCI) is a cognitive disease that primarily affects elderly persons. Patients with MCI have impairments in one or more cognitive areas, such as memory, attention, language, and problem-solving. The risk of Alzheimer’s disease development is 10 times higher among individuals who meet the MCI diagnosis than in those who do not have such a diagnosis. Identifying the primary neurophysiological variations between those who are suffering from cognitive impairment and those who are ageing normally may provide helpful techniques to assess the effectiveness of therapies. Event-related Potentials (ERPs) are utilized to investigate the processing of sensory, cognitive, and motor information in the brain. ERPs enable excellent temporal resolution of underlying brain activity. ERP data is complex due to the temporal variation that occurs in the time domain. It is actually a type of electroencephalography (EEG) signal that is time-locked to a specific event or behavior. To remove artifacts from the data, this work utilizes Independent component analysis, finite impulse response filter, and fast Fourier transformation as preprocessing techniques. The bidirectional long short-term memory network is utilized to retain the spatial relationships between the ERP data while learning changes in temporal information for a long time. This network performed well both in modeling and information extraction from the signals. To validate the model performance, the proposed framework is tested on two benchmark datasets. The proposed framework achieved a state-of-the-art accuracy of 96.03% on the SJTU Emotion EEG Dataset dataset and 97.31% on the Chung–Ang University Hospital EEG dataset for the classification tasks.


Introduction
Mild cognitive impairment (MCI) is a condition that affects an individual's memory and thinking abilities.Early diagnosis and prediction of MCI can help in timely interventions and treatment planning and can reduce the chance of Alzheimer's disease (AD).The use of electroencephalography (EEG) signals to detect MCI and AD has increased in recent years.EEG signals are recordings of the electronic activity of the human brain which capture perception of the brain's functioning.It has been recognized as a useful tool for investigating neurodynamic time-sensitive biomarkers that could detect cerebral cortex irregularity interrelated with cognitive failure and dementia [1].EEG is not only a useful tool for recognizing cortical isolation related to cognitive weakening and dementia, but it is also advantageous due to its availability and efficiency compared to other imaging devices [2].The accuracy of classifying AD and MCI as a single group could be improved by adding EEG to a logistic model that also includes neuropsychological evaluation and cardiovascular history [3].EEG is used extensively in clinical and research settings to assess and monitor cognitive functions, such as in the recognition and management of neural and psychiatric disorders, including epilepsy, dementia, AD, and depression, among others [4].The cost-effectiveness of EEG makes it a more accessible option for clinicians and researchers, and its exceptional temporal resolution attains an ideal mechanism for examining the timing of neurological events related to cognitive processes [5].For the early diagnosis of AD, improved diagnostic techniques are essential since they can significantly improve patient care.When matching EEG signals from AD patients to those from healthy aged people, there are two main differences.The 'slowing effect' , or a decline in low-frequency power followed by a rise in high-frequency power, constitutes a few of these features.The loss of synchronization between adjacent pairs of EEG signals is another characteristic [6].
The research done in [7] suggests that machine learning (ML) algorithms can analyze quantitative EEG (QEEG) response data to forecast long-term neurological prognosis among individuals with hypoxic-ischemic brain damage.Both the random forest (RF) model and logistic regression approaches performed similarly to EEG response assessments done by a professional electroencephalographer.Before this, the diagnostic process relied on parameters like blood tests, spinal fluid analysis, MRIs, neurological examinations, etc., which were error-prone and laborious because the results were drawn manually from so many parameters.Both ML as well as deep learning (DL) algorithms have accomplished a vital role and attracted researchers' significant efforts in this area [8].These algorithms extract features from EEG signals, learn patterns from both normal and abnormal patients as training, and automatically classify EEGs as normal, MCI, or AD patients in future trials called inference/testing.Although, for analyzing EEG data, the ML methods are much more suitable as compared to others because ML is capable of handling complex and large data and giving desired predictions while recognizing the patterns.AD is one of the most common and risky diseases related to brain functions.In 1906, German pathologist and psychiatrist Alois Alzheimer coined its first name.Its symptoms are amnesia, identification issues, cognitive deficits, problems with contiguous perception, verbalization problems, reading and writing issues, and behavioral alternatives.In the study [9], AD was identified utilizing three electrodes and brief data acquisition moments by combining frequency and triple correlation procedures.The range of EEG signals frequency measures which occurs from 0.01 Hz to round about 100 Hz, are classified into 5 frequency bands, shown in table 1 [10].
The finite impulse response (FIR) filtering method is used to remove noise from EEG data.We remove noise from EEG data by adjusting filter parameters such as the cutoff frequency and filter sequence.ICA is a blind source separation approach that is used for separating a mixed signal into substantially distinct components.In this work, the Bi-LSTM network is employed, which is a novel approach for MCI and AD prediction.It has the ability to process sequential information and capture both past and future dependencies in the EEG data.By combining EEG signals with a Bi-LSTM network, this work creates a powerful tool for MCI and AD prediction that provides accurate and reliable results.Overall, this approach aims to create a powerful tool for MCI prediction that can support the initial identification and management of this condition, thus enhancing the standards of living for individuals affected by MCI and reducing the chance of AD.

Related work
The EEG signals of humans are utilized to measure electronic activity in the human brain and also utilized to gain insight into neurological processes.The study conducted in [11] provides an in-depth discussion and analysis of EEG signals.ERP technique is a method used in cognitive neuroscience and involves measuring the electrical activity of the brain that occurs when people are presented with stimuli and asked to perform tasks or make decisions [12].ERP shows the correlation between verbal memory measures and the amplitude of congruous late positive components when words are repeated, which could be utilized as an indicator for predicting incipient AD [13].A review conducted by [14] highlights the potential of ERPs as biomarkers for AD-related neuropathology ERPs are time-limited electrical brain responses triggered by specific sensory, cognitive, or motor events Cognitive ERPs, which occur 150 ms after stimulation, are thought to reflect cognitive activities such as memory, attention, and decision-making.This review further suggests that cognitive ERPs can distinguish cognitively intact individuals with MCI, AD, and the apolipoprotein E ε4 allele (ε4+) from normal older adults.Apolipoprotein E (APOE) ε4 allele could be a well-established hereditary hazard for AD.
Dementia and cognitive problems are more likely to appear in stroke survivors.In the study published in [15], working memory tests were conducted on 15 healthy volunteers, 15 stroke-related MCI individuals, and five patients with vascular dementia.The study conducted in [16] evaluated the level of novelty of the EEG tests as a characteristic of EEG recording identification, which is typically used in many other studies that analyze the power spectral density (PSD) or frequency content of EEG signals.Studying the QEEGs using statistical pattern recognition (SPR) techniques to classify the transition to dementia in people with particular cognitive failure (SCD) and MCI has been discussed by [17].To pinpoint MCI [18], used specific features extracted from slow sleep signals and spindles in combination with complexity and spectral values.The study conducted in [19] described a novel method to get features from EEG tests to distinguish between AD subjects, MCI subjects, and normal people.The use of DL methods such as long short-term memory networks (LSTM), Bi-LSTM, Recurrent Neural Networks (RNN), convolutional neural networks (CNN), and gated recurrent units (GRU) in healthcare applications have been briefly discussed by [20].The automated DL-based method for detecting sleep apnea waves from EEG tests and after extracting features using a variational mode decomposition algorithm was highlighted by [21] and further specified the temporal evolution of EEG modes using a fully CNNs and bidirectional long short-term memory (BiLSTM) layers.A hybrid method combining both crafted and learned (auto-coded) features to achieve high performance in EEG signal-based epileptic seizure detection has been introduced by [22].The RNN algorithms, specifically algorithms containing basic LSTM or BiLSTM models are used by [23] and evaluated the model against four different classification problems.A two-stage LSTM framework has been introduced by [24], the first using a multi-class classification job that identifies the condition of the patient i.e. whether healthy, suffering from MCI, AD, or otherwise.The next step uses a reversion task to accurately predict conversion times for MCI patients.The two DL algorithms, convolutional auto encoder neural networks and modified CNN were used to classify subjects into MCI, AD, and normal subjects by [25].LSTM was used because it is a famous algorithm for capturing sequential data and can be used to record long-term dependencies [26].Patient diagnoses were categorized by the LSTM network electronic health record gathered at Children's Hospital LA.According to some authors, it was the first experimental approach to using LSTM for multivariate disease classification using patient PICU time-based data.LSTM consists of two layers of 128 neurons and a droplet of 0.5 between-level probabilities and predicted target replications of 128 diagnoses from 13 randomly sampled time-based values.They matched the LSTM model along with basic logistic reversion and multilevel perceptron with the three internal levels, and the output showed that the algorithm surpassed the standards and the micro and macro AUCs were 0.8560 and 0.8075, relatively [27].The researchers also claimed that the LSTM model was used to predict future patient conditions and cures.
The Bi-LSTM model was utilized to classify intensive care unit (ICU) time-based data [28].A two-way LSTM model uses temporal features for the detection of the existence of fungi or bacteria present in human blood.The time series-based data/features were gathered from 2177 ICU subjects and were utilized in running the project.There were ten characteristics for each patient observed and examined, and the total data set contained 14 million values.The chronological characteristics of the subjects were obtained from the BiLSTM network and classified whether the subject's blood level was positive or negative [27].Unexpected resuscitation of patients in the ICU was classified using a BiLSTM method.The BiLSTM model was used with an extra layer of LSTM which gives a total of 16 internal units that outclassed other basic logistic regression networks [29].According to [30], a bidirectional, attention-and time-based LSTM adjustment factor predicted patients' subsequent therapeutic behavior which has been taken from their previous information on medical insurance.The datasets (real-time) that have been taken from two health institutions were utilized in the testing and analysis of the project and included the 'cancer dataset' , 'coronary heart disease dataset' , 'diabetes dataset' , the pneumonia dataset and the two-way LSTM network was utilized to measure integrated patient with their visit information in the hospital.The accuracy of the specified algorithm was measured reasonably for datasets of cancer, heart, diabetes, and pneumonia respectively.On extended independent component analysis (ICA), a multi-class joint spatial model-based moving window technique and bi-LSTM model to accurately determine mental stress levels from EEG signals have been addressed by [31].The study published in [32] created a DL framework employing BERT models that offer an efficient approach for predicting the transition of MCI to AD using clinical note processing.Time series data have been the subject of several studies for categorization and forecasting.However, inconsistent information leads to incorrect forecasts.Therefore, in order to mitigate the uncertainty and increase the predicted accuracy, a comprehensive model is needed [33].

Previous work limitations
The previous research on identifying MCI using EEG signals has contributed significant improvements to the discipline.However, there are various constraints that must be considered.MCI is a diverse disease with a number of underlying causes, including AD.The EEG signal offers numerous advantages over other techniques in the early-stage investigation of AD, but it also has significant drawbacks, such as limited data sizes make it challenging in clinical data acquisition and the presence of too many overlapping signals [34].Previous research typically did not take this variability into account, which has implications for the MCI prediction's efficiency.Because EEG signals are noisy, fluctuating, and non-stationary, differentiating MCI and AD patients using their EEGs is a difficult task.EEG signals are typically contaminated with different kinds of noise and artifacts.It has not been taken seriously in previous research, which makes it difficult to obtain useful information [8].The existence of noise, which affected the learning efficiency of classification, was the primary drawback of the earlier study [35].The prior study did not adequately preprocess the EEG data acquired from MCI and AD patients, which makes them visually indistinguishable [35].Most conventional ML techniques use characteristics that are manually generated and derived from unprocessed EEG data.These make the possibility of incorporating irrelevant characteristics, which have an impact on predicting performance.Existing approaches that depend on ML-based structures are unable to properly reveal meaningful indicators from deep submerged layers of EEG data [4].In order to get over these restrictions, a DL model called BiLSTM is used to solve some of the issues with conventional ML methods for MCI prediction using EEG data.

Motivation and contribution
The research on detecting MCI with EEG data and a BiLSTM network was motivated by a pressing desire to provide precise and non-invasive systems for the early identification and diagnosis of cognitive diseases.Early MCI identification is essential for immediate treatment and medical care to reduce cognitive deterioration and enhance the outcomes for patients.EEG signals are a possible early detection technique since they provide a non-invasive and economical way to measure brain activity.EEG signals record the temporal patterns of brain activity, enabling us to track changes across time.We leverage the temporal dynamics related to MCI development by utilizing a BiLSTM network, which is made to simulate sequential data and preserve the time-dependent features in EEG signals.By utilizing BiLSTM networks to analyze EEG signals for MCI prediction, we potentially enhance the accuracy as well as reliability of the estimates by making use of their capacity to identify complicated patterns and long-term correlations in the data.The proposed work offers various advantages, including: • EEG signals are sensitive to noise and artifacts brought on by a variety of things, including electrode movement, the activity of muscles, and outside interference.The proposed method uses ICA to distinguish different components, uses FIR filtering to remove noise, and deploys fast Fourier transformation (FFT) to examine the frequency composition of the data.• BiLSTM networks are highly suited for recording complex variations and fluctuations in brain activity with respect to time.It is capable of simulating the long-term reliant relationships in EEG data, enabling a more thorough examination of the temporal dynamics associated with MCI.• The proposed model executes data concurrently in forward and reverses directions.The network considers both the present and the future values when generating esti-mates at each time step due to this bidirectional exchange of data.• This network successfully captures and propagates information over lengthy intervals of time, which is especially crucial for analyzing EEG data that may span extensive durations.In order to prevent crucial temporal relationships from being lost during processing, it additionally saves pertinent data from earlier time steps and uses it to create predictions.

Objectives
The following are the key objectives of the proposed work; 1) To develop a non-invasive and cost-effective method for MCI and AD prediction.
2) To provide a timely diagnosis of MCI and facilitate early intervention and treatment planning.
3) To use EEG signals as a means of capturing information about brain activity and functioning and utilize the capabilities of ICA, FIR filtering, and FFT to eliminate noise and artifacts from the signals.4) To improve the accuracy and reliability of MCI and AD prediction by processing clean EEG data using the BiLSTM network.

Dataset
To validate the model's performance, we employed two publicly available datasets with well-structured clinical descriptions called Chung-Ang University Hospital EEG (CAUEEG) [36] and SJTU Emotion EEG Dataset (SEED) [37].Both datasets include incident history, the age of patients, and relevant diagnostic labels.
The CAUEEG database includes 1379 EEG recordings, collected from 1155 individuals.This dataset includes 21 channels per recording, where the first 19 of which are EEG signals.The remaining two channels are electrocardiogram and elektrokardiogramm.The International 10-20 standard was followed for recording the EEG.The signals were captured employing a digital electroencephalograph device at a sampling frequency of 200 Hz after being passed through an analog filter with a band pass frequency of 0.5-70 Hz.The signals were then transformed to common average referencing and stored on discs in the European data format.The recorded signal has a mean of 13.34 min and a standard deviation of 2.83 min.
The SEED dataset includes EEG and eye movement data of 12 participants, as well as EEG data of an additional three participants.The data was captured when the participants were watching film clips.The video segments are deliberately chosen to elicit many sorts of emotions, including happy, negative, and neutral.For every participant, three sessions were conducted on various days, with each session containing 24 trials.In a single session, the individual watched a video clip while the 62-channel was collecting EEG data and eye movements.The SEED dataset has three sessions with a total of 24 trials each.As a result, each individual in the SEED dataset has a total of 24 × 3 = 72 trials.The total number of trials in the SEED dataset is, 15 × 72 = 1080 with 15 subjects.Each session is divided into four-second non-overlapping chunks.For model training purposes, each segment is treated as a single data sample.Initially, a 200 Hz sampling rate is used to downsample the raw EEG signals.The EEG data is then passed to a bandpass filter between 1 Hz and 75 Hz to eliminate artifacts and filter out noise.Table 2 reflects the characteristics of both datasets used in this work.

EEG signal processing
ERP is a particular EEG signal type that is time-locked to a certain behavior.EEG represented as E(t), is a non-invasive method that uses sensors on the scalp to assess electrical brain activity where B (t) represents true brain activity, and N (t) shows various noise sources with respect to time t.EEG signal records and displays electrical activity from the brain in real-time, providing important diagnostic information to healthcare professionals.In this work, it is used for investigating cognitive processes.The activity of the brain, as it responds to certain stimuli, is represented by EEG signals.These stimuli comprise cognitive tasks that cause measurable brain functions.EEG is distinguished by its waveform structure, delay, and amplitude, which offer helpful insights into the fundamental brain processes governing cognitive and perceptual activities.EEG data frequently include noise from different sources, such as muscular activity or electrical interference.To acquire clear EEG data, it is essential to find and eliminate artifacts such as eye blinks or electrode movements.

Filtering and ICA
The brain is continually involved in a variety of tasks, such as blinking of the eyes, respiration, body movements, and variations in electrode channels and power lines.These components may add noise and variations during EEG examinations.Therefore, it may interfere with establishing a clear depiction of human brain functions in the obtained data.Additionally, in order to restrain, it is vital to carry away noises and artifacts from EEG signals.There are several preprocessing steps that should be performed on EEG signals before feeding them into a DL model.These steps help to enhance the superiority and accuracy of the data and ensure that the model can learn meaningful patterns from the data.One common approach to EEG signal filtering is to use a filter (band-pass), which selectively permits certain EEG signals with frequency bands to pass through while blocking other signals.A band-pass filter can be designed using a variety of methods, including FIR filters.We used the FIR filter due to its linear phase characteristics, which can provide a more accurate phase response compared to other filters, making it a popular choice for processing EEG tests.We eliminate undesired noise and distortion from the original EEG data by carefully choosing the filter settings, such as the cutoff frequency and filter order.ICA is used for separating mixed signals into their constituent parts [38].The basic idea behind ICA is to assume that the observed tests are linear combinations of independent sources and to use statistical techniques to estimate these sources.In the case of EEG signals, the sources are assumed to correspond to different brain regions that are generating the activity.By estimating the sources using ICA, it is possible to isolate the activity of different brain regions and study their functional connectivity.ICA is a method used to separate a multivariate wave into independent, non-Gaussian sources.Mathematically, the ICA method assumes that source signals are statistically impartial, in addition to being non-Gaussian in the environment, the number of source signals is either equal to or fewer than the number of detected signals.
ICA separates the data matrix (X) blindly using the condition that the generated source time courses (S) are maximum independent.In particular, ICA discovers a part of the 'unmixing' matrix (W) that, when combined with the original information (X), produces the matrix (S) where X, and S represent n * t matrices while W indicates n * n matrix.Using basic matrix algebra, equation (2) will indicate that where W −1 represents the mixing matrix, where its columns provide the weight distribution that the component reflects to each of the scalp channels.The original data X consists of the ith column of W and the ith row of S Hence the original data X is equal to the sum of the independent variables X i , denoted as: Our objective in applying ICA to EEG data is to recognize and distinguish the independent sources (such as brain activity, artifacts, etc) that contribute to the recorded EEG signals.The ICA technique enables us to enhance the EEG data clarity without affecting the actual brain activity by separating artifact components like blinking of the eyes and muscle activation.The optimization problem can be formulated as follows: where g (S k ) is a non-quadratic function that approximates the negentropy of S k , h (X k ) is a function that reduces redundancy.Where S = WX, W represents the unmixing matrix, and X reflects the original information.Iterative methods, such as the fast ICA algorithm, are used to tackle the optimization problem.
The EEG data are normalized to a certain range using min-max scaling.Scaling the signal includes setting the highest value to 1 and the smallest value to 0. The mathematical formula for min-max scaling is as follows: The minimum and maximum values throughout the dataset are represented by the variables X min , X max , and X scaled , respectively.Where X reflects the original input data.The final value of X scaled falls inside the normalized range, which is 0-1.By using min-max scaling, the EEG signal is normalized to fall within a predetermined range of values.This helps to compare signals from many subjects and to spot patterns in the signal that are essential to subsequent investigation.
To extract important features from EEG signals, the FFT is employed.Because EEG tests are complex and involve a broad variety of frequencies, the FFT is used to extract and analyze the frequency aspects of the waves.Each epoch is subjected to the FFT in order to identify the signal's frequency characteristics.The output of the FFT is a power spectrum, which displays the signal's frequency components' amplitudes.The power spectrum is used to extract characteristics like the amount of power in particular frequency bands (such as alpha, beta, and gamma), the power-to-frequency band ratio, and the position of peaks and troughs in the spectrum.The mathematical method used to calculate the PSD in EEG feature extraction employing FFT is: where x [n] represents the input signal, N shows the signal's length, and |. | stands for the complex number's magnitude.The PSD (f ) at frequency f serves as a representation of the EEG signal's power.It is a measurement of how much energy is contained within a given frequency range of the signal.The primary role of the PSD is to extract characteristics from EEG data for a variety of activities, including sleep analysis, cognitive functioning, and thinking.The power is determined for each frequency band by integrating the PSD over the appropriate frequency range.The classification process then uses these power levels as features.

Classification using BiLSTM
ML and DL algorithms have shown great potential in the classification of EEG signals for patients with MCI and AD.DL models, such as RNNs learn to capture important features from the raw EEG data, which can then be used to classify new EEG signals as belonging to either a healthy individual or an individual with MCI.In the proposed study an RNN based model called BiLSTM was used for the classification of MCI and AD patients using EEG signals.BiLSTM works best for sequence classification tasks.EEG data is considered as a sequence of electrical activity recorded from different conductors, placed on the scalp.BiLSTM network maintains long-term dependency between sequences of EEG data.In this work, we observed that it is the most suitable choice for the classification of EEG data.In this work, BiLSTM models are deployed to overcome the vanishing gradient problem that arises in traditional RNNs, which can make it difficult to train networks that rely on long-term dependencies.LSTM networks use a memory cell that allows information to flow through the network over time and gates that regulate the movement of data into and out of the cell [39].The cell state value for the specific time step is calculated as displayed in equation ( 9) where C t represents cell state at timestep t, f t shows forget gate, i t indicates input gate and C ′ t demonstrates new cell state candidate.The information that has to be 'forgotten' (i.e.removed) at a specific timestep is determined by forget gate f t .As shown in equation ( 8), the value of the vector is calculated using network weights (W f , b f ), hidden state h t , and input for the selected time step x t , as; The process of computing each vector includes the new candidate cell state C ′ t , as well as the vector that controls how the candidate cell state and the current cell state will combine when new data is received.These two vectors computed in equations ( 9) and ( 11), remind us that tanh (x) = e x −e −x e x +e −x , depends on the input and hidden states With all of this information, we can now see how the cell state works as an essential memory component.This memory is used for determining the hidden state, input hidden state, present hidden state, and final output state of the LSTM network.
The key advantage of BiLSTM networks is that they capture both contextual information in the past and future for each element in the input sequence of data.The distinguishing characteristics of this model are as follows: • Bidirectional data-flow: modeling the temporal relationships in the EEG signal data in the two directions is made possible with the use of bidirectional LSTM.This design combines the strength of LSTM cells with bidirectionality to record temporal relationships and contextual details in EEG signal feature vectors, which is crucial for MCI classification because the EEG signal data changes with time.• Sequential processing: EEG signals naturally follow a sequential pattern, which represents the brain's electrical activity with respect to time.The bidirectional processing enables the network to collect both past and future information, improving the interpretation of the EEG signal patterns.• Handling variable-length sequences: due to differences in the time frame of recorded brain activity, EEG data frequently include variable-length sequences.BiLSTM networks have the flexibility to handle input sequences of different lengths, eliminating the requirement for padding or fixed-size windows in EEG signal analysis.

Experiments
In this work, we used two freely available EEG datasets: the CAUEEG dataset and the SEED dataset.The CAUEEG dataset comprises 1379 EEG recorded signals obtained from 1155 participants.The dataset contains well-structured clinical descriptions as well as essential diagnostic labels.We calculated PSD with 19 channels using 1280-length EEG sequences.We used EEG sequences comprising 63 AD, 63 MCI, and 63 normal EEGs signals.We employed 200 Hz as the sampling frequency.Thus, we obtained 129 PSD features for each channel and 2451 PSD features for all 19 channels.The PSD features are calculated as N FFT /2 + 1, where N FFT (256) shows the Fast Fourier Transform points used in this study.The DC component (1) is added to extract additional details about the signal behavior.
The SEED dataset is also used as an additional baseline for model validation.This dataset contains EEG and eye movement recordings from 15 distinct subjects.External factors like subject movement and background noise can readily disrupt the EEG signal collection.The presence of noise, artifacts, and interference in the EEG signal constitutes a number of difficulties in categorizing cognitive tasks [39].In order to reduce artifacts and improve signal quality, the CAUEEG dataset is passed through a preprocessing step.The methods used for preprocessing included baseline rectification, filtering, and artifact removal.In order to make sure data quality and is sent to the BiLSTM network.The BiLSTM network was chosen to record both the temporal information and the spatial correlations between the EEG data.This made it possible for the network to continuously learn patterns and modifications in brain activity.The model is trained with a data split of 80% training, and 20% testing on the CAUEEG dataset, using the appropriate diagnostic labels provided in the dataset.Throughout the testing and optimization process, the model architecture and hyperparameters are determined and fine-tuned.The details of the BiLSTM layers used in this work are described in figure 1. Table 3 provides the parameters and their descriptions used in this study.
In this work, the time complexity of the BiLSTM network is measured using the number of training examples, the duration of the EEG signals, the number of layers used in the network, and the number of neurons present in each layer.The computational time is calculated as O , where T represents the number of time steps, L shows the length of the EEG signal, and N denotes the number of hidden units in each LSTM layer.The model is trained by iterating across the training data over multiple epochs, using forward and backward flows for each sample.The major goal of this investigation is to use EEG signals to examine the main neurophysiological differences between those with MCI and those without cognitive impairment.

Evaluation metrics
Several evaluation metrics are employed to assess the effectiveness of the proposed work for the task of MCI prediction.To quantify the results, we adopted precision, recall, F1 score, and accuracy.Accuracy is where TP, TN represent true positive and true negative, while FP, FN indicate false positive and false negative respectively.

Results and discussions
The model is trained using Google Colab, which provides a good environment for DL model training.Keras library is used for the development of the BiLSTM model.The model comprises of an input layer, the BiLSTM layer, which is made up of two LSTM layers, a dense layer, and an output layer, which is a fully connected layer, pursued with a softmax activation function, that generates the final classification scores.Various parameters are used to fine-tune the model, and the best performance is obtained with the Adam optimizer, the MSE loss function, batch size 32, shuffle = false, and early stopping with 300 epochs.In this work, 80% data is used for training the model while 20% of the dataset is used for testing.This work uses the K-fold (K = 10) cross-validation on the 80% training data.It is used to examine how effectively a model generalizes to new, previously unknown data.To accomplish this, the training data is partitioned into K sections.The model is trained on K−1 folds and verified on the remaining folds in each iteration.This method is performed K times using a different fold as the validation set.The test set, which contains previously unknown data, is maintained separately and not utilized in the cross-validation method.Following cross-validation, the model's efficacy is tested on the test set to obtain a fair evaluation.
Table 4 shows the result of the BiLSTM model on the CAUEEG dataset.In this work using the BiLSTM network, we achieved an accuracy of 97.31% for MCI, 96.04% for AD, and 95.87% for normal.Similarly, we obtained the F1-score of 96.35% for MCI, 95.70% for AD, and 96.09 for normal.
In this work, we observed that the proposed BiLSTM model accuracy depends upon various factors including model parameters, training dataset, and preprocessing techniques.We fine-tuned the model parameters and hyper-parameters to obtain the best results.Table 5 shows the results achieved on SEED dataset.Table 6 reflects the results obtained on both benchmark datasets.

Comparison with related algorithms
To evaluate the results of the BiLSTM model, we compared it with other classification techniques that have been applied in the past to categorize EEG data in MCI, Normal and AD to assess its performance.Table 7 displays the results of our BiLSTM network compared to other models like support vector machine (SVM), RF, and LSTM on CAUEEG dataset.The SVM technique is employed in this study to identify the best hyperplane for classifying various EEG data classes.We utilized the cost parameter (C) to manage the trade-off between maximizing the gain and reducing the classification error.A polynomial Kernel is utilized to translate the data into a higher-dimensional feature space.The RF algorithm creates an ensemble of decision trees and combines their predictions to determine the classification of the EEG signal.The following parameters are employed in RF-based classification: n_estimators = 100, max_depth = 10, and random_state = 42.The results achieved indicate that BiLSTM surpassed the other techniques after testing the accuracy of the model as well as calculating its sensitivity, specificity, F1 score, and accuracy.As shown in table 8 reflects the results obtained on SEED dataset.The ML algorithms like SVM and RF performed relatively worse than the DL models, LSTM, and BiLSTM.However, it is important to note that the performance of these approaches can vary depending on various factors such as the dataset, preprocessing, feature extraction, and fine-tuning of the model.Testing an ML model on a publicly available dataset can provide valuable insights into its performance and identify areas for improvement.In this study, we tested our BiLSTM classifier on the two well-known EEG datasets and achieved high accuracy.This indicates that the model is effective in predicting MCI, AD, and normal patients as per the available features learned in the training process in the EEG tests.The SVM algorithm obtained the accuracy of 89% while the RF algorithm achieved the accuracy of 85.34%.
model is successful in differentiating between healthy people, MCI, and AD patients.Figure 2 reflects the accuracy of the proposed model.Similarly, figure 3 shows the loss of the model.

Comparison with state-of-the-art works
The proposed work is compared with both ML and DL algorithms.Table 9 represents the result of different state-of-the-art techniques against the proposed work with the number of participants and their diseases.
The study conducted in [29] used an LSTM-based framework to find ICU readmission prediction.Their model performs well however they used a smaller number of features to perform prediction.ICA can separate non-stationary signals, which are common in EEG data, into their underlying sources.This is because ICA assumes that the sources are statistically independent, rather than on the assumption of stationarity that underlies methods such as PCA.In order to identify the neuroregulatory deficits associated with MCI, the study in [18] extracted certain characteristics of sleep from the EEGs of 40 participants.In order to categorize the features, they employed the SVM and GRU networks.Using the GRU network, they were able to get an accuracy rate of 93.46%.ICA can separate highly correlated sources, which is again useful in MCI and AD EEG data where the sources may be highly overlapping.In the study [41], sequential convolutional neural networks (SCN) are used to classify cognitive activities using EEG data.They also employed a multi-branch convolutional network (MBCN), which was influenced by the ResNeXt design.
According to their performance evaluation, MBCN surpasses SCN with a better accuracy of 88.33%.The drawback of this research is that increasing CNN size might boost accuracy while simultaneously increasing computing complexity.The FFT is used for analyzing EEG signals and extracting features that can be used to differentiate between normal and pathological brain activity.In the case of MCI, EEG tests can be used to detect subtle changes in brain activity that may be indicative of early-stage AD.Compared to other methods, FFT is advantageous because it can provide a detailed frequency-domain analysis of EEG signals, allowing researchers to identify specific frequency bands that are associated with MCI and AD.FFT is also computationally efficient, making it ideal for analyzing large datasets.The quality of the EEG signal, the exact parameters utilized for the FFT analysis, and the choice of feature extraction method are just a few examples of the variables that might affect how well the FFT extracts features from MCI and AD EEGs.The proposed approach employs ICA and FIR filtering to remove noise and artifacts and provide a clean EEG signal.FFT is used to extract features, and BiLSTM is applied to improve classification.When compared to cutting-edge models, the suggested strategy performs better.

Discussion
The purpose of this study was to uncover primary neurophysiological variations among individuals suffering from MCI and those aging normally, with the goal of providing useful ways to assess the success of treatments and perhaps forecast the risk of AD development.We did this by using ERPs, an EEG data type that provides good temporal resolution of the underlying neural activity.We used ICA, FIR filter, and FFT to preprocess the ERP data and eliminate artifacts.These methods successfully cleaned the data, ensuring accurate and reliable data to feed into the model.The BiLSTM network was chosen because it can learn changes in temporal information over long periods of time while retaining the spatial correlations between the EEG data.The BiLSTM network's capacity made it an excellent choice for our goal of capturing complicated temporal fluctuations representing sophisticated linkages across time in the brain.We tested the performance of our proposed technique on two benchmark datasets.For the classification of individuals into the three groups of normal, MCI, and AD patients, we attained a state-of-the-art accuracy of 97.31%.Figure 4 reflects the comparison of the proposed model with closely related algorithms.The proposed model achieves the highest accuracy for all four metrics like sensitivity, specificity, F-1 score, and accuracy.Here, we saw that BiLSTM captures interdependence in both directions by processing input sequences concurrently in forward and backward directions.This is especially helpful when working with EEG data, as input order is essential.The excellent accuracy attained by our proposed technique indicates its potential as a reliable means for discriminating between normal aging and cognitive impairment, with the additional advantage of forecasting the likelihood of AD development.

Conclusion
Precise and swift identification of MCI is required to avoid the progression of Alzheimer's and other types of dementia.However, the indications of MCI are complex and are frequently mistaken for those brought on by the natural aging cycle.In future work, MCI can be predicted with more accuracy using knowledge gained from other tasks, such as the classification of neurological disorders or other cognitive impairments, using the transfer learning approach.

Figure 2 .
Figure 2. Training and testing accuracy of the model.

Figure 3 .
Figure 3. Training and testing loss of the model.
To address this problem, we analyzed the effectiveness of the BiLSTM model for the categorization of EEG signals from individuals with MCI, AD, and healthy characteristics.The EEG signals are preprocessed to reduce noise and artifacts.To enhance the model's performance, metrics like cross-validation and hyperparameter adjusting are employed.Our results showed that the BiLSTM model outperformed other classification methods and achieved high accuracy and an F1-score.Our feature extraction analysis also revealed that the beta and gamma frequency bands are most important for the classification of EEG signals regarding MCI.These findings also revealed that the preprocessing techniques, such as filtering, and ICA normalize the EEG signals which as results improve the classification accuracy.

Table 1 .
EEG signals and its characteristics.

Table 2 .
Reflects the characteristics of both datasets.In the proposed system we trained the BiLSTM model to classify the data gained from EEG tests of various patients including both normal individuals and those diagnosed with MCI and AD.To preprocess the EEG data, the system utilizes several techniques.The FIR filter is used to remove any artifacts present in the data.ICA is then used to separate the signals obtained from different sources.This study also employs min-max pooling to normalize the data and the FFT for feature extraction from the preprocessed EEG data.The datasets used in this work are divided into 80% for training and 20% for testing the model.The BiLSTM model is then trained on the preprocessed EEG data to classify whether the patient has normal brain function, MCI, or AD.The system achieves promising results and can potentially be utilized as a diagnostic tool for the early diagnosis of MCI.

Table 5 .
Results obtained on SEED dataset.

Table 6 .
Results obtained on both benchmark datasets.

Table 7 .
Comparison of BiLSTM to other models on CAUEEG dataset.

Table 8 .
Comparison of BiLSTM to other models on SEED dataset.

Table 9 .
Comparison of BiLSTM with state-of-the-art works.