SchizoNET: a robust and accurate Margenau–Hill time-frequency distribution based deep neural network model for schizophrenia detection using EEG signals

Objective. Schizophrenia (SZ) is a severe chronic illness characterized by delusions, cognitive dysfunctions, and hallucinations that impact feelings, behaviour, and thinking. Timely detection and treatment of SZ are necessary to avoid long-term consequences. Electroencephalogram (EEG) signals are one form of a biomarker that can reveal hidden changes in the brain during SZ. However, the EEG signals are non-stationary in nature with low amplitude. Therefore, extracting the hidden information from the EEG signals is challenging. Approach. The time-frequency domain is crucial for the automatic detection of SZ. Therefore, this paper presents the SchizoNET model combining the Margenau–Hill time-frequency distribution (MH-TFD) and convolutional neural network (CNN). The instantaneous information of EEG signals is captured in the time-frequency domain using MH-TFD. The time-frequency amplitude is converted to two-dimensional plots and fed to the developed CNN model. Results. The SchizoNET model is developed using three different validation techniques, including holdout, five-fold cross-validation, and ten-fold cross-validation techniques using three separate public SZ datasets (Dataset 1, 2, and 3). The proposed model achieved an accuracy of 97.4%, 99.74%, and 96.35% on Dataset 1 (adolescents: 45 SZ and 39 HC subjects), Dataset 2 (adults: 14 SZ and 14 HC subjects), and Dataset 3 (adults: 49 SZ and 32 HC subjects), respectively. We have also evaluated six performance parameters and the area under the curve to evaluate the performance of our developed model. Significance. The SchizoNET is robust, effective, and accurate, as it performed better than the state-of-the-art techniques. To the best of our knowledge, this is the first work to explore three publicly available EEG datasets for the automated detection of SZ. Our SchizoNET model can help neurologists detect the SZ in various scenarios.


Introduction
Schizophrenia (SZ) is a complex, neuropsychiatric and cognitive syndrome that appears to result from a disruption in brain development caused by hereditary or environmental factors, or both. SZ disturbs the thinking, behaviour, and feeling of an individual. According to the reports published by the World Health Organization (WHO), about 21 million people accounting for 1% of the global population are suffering from SZ (WHO 2022). The onset of SZ typically occurs between late adolescence to the beginning of early adulthood. It emerges earlier in males (early 20 s-late adolescence) than in females (early 20 s-early 30 s) (Bromet and Fennig 1999). It is one of the top 25 leading causes of worldwide disability (Jin and Mosweu 2017). The symptoms of SZ are heterogeneous that leading to reduced quality of life and functional impairments (Jin and Mosweu 2017). It is characterized by cognitive deficits, and negative and positive symptoms (Green and Horan 2015). The cognitive deficits involve language (difficult to understand for others), difficulty performing routine activities, lack of attention, and trouble with thinking (deviating from one subject to another with no logical reason). The negative symptoms (associated with negative SZ) are abnormal memory while the positive symptoms (associated with positive SZ) are hallucinations, delusions, and confused speech (Oh et al 2019, Lai et al 2021, Sadeghi et al 2022. The epidemiological characteristics of SZ have three lows (low visit rate, low detection rate, and low compliance) and three highs (high disability rate, heavy disease burden, and high recurrence rate). SZ could result in damage to various brain tissues as well as mental deterioration, resulting in severe mental disability. As a result, SZ has a negative impact on educational and occupational performance. The possibility of death in SZ is higher than that of healthy people due to physically preventable diseases (cardiovascular disorder, infections, and metabolic disease) (Siuly et al 2020). Suicide attempts among SZ patients are about 50%, with a mortality rate due to suicide being about 4%-6% (Caldwell andGottesman 1990, Hettige et al 2017). About 69% of SZ patients do not get enough care and treatment resulting in an increased death rate, disability rate, and suicide rate (Baygin 2021). According to the WHO, timely detection of SZ may help experts to identify the stage and severity of SZ (WHO 2022). These factors demand a need for timely and accurate detection of SZ. Various resources such as interviews, imaging, and signaling techniques have been used to detect SZ. Interviewing by a qualified expert takes time which is susceptible to errors and biased in some cases (Lloyd et al 2017). Imaging tools (magnetic resonance imaging and computed tomography) are timeconsuming, more expensive, and necessitate extra recordings (Talo et al 2019). Electroencephalogram (EEG) signals can reveal changes in brain activity to identify various states of the brain (Khare andBajaj 2021c, Khare et al 2022). During EEG recording, sensors placed at the appropriate location on the scalp extract secret information about changes during SZ. In addition, researchers are well accepted by EEG signals in the automated identification of brain disorders such as Alzheimer's disease, seizures, and Parkinson's disease (Kumar andBhuvaneswari 2012, Khare et al 2022).

Related work
Recently, many studies have been developed to get insights into the automatic classification of SZ using EEG as a biomarker. The summary of the existing models developed for SZ detection is shown in table 1.

Findings and motivation
The summary of our findings by screening various literature is shown in table 2. It reveals that most of the automated SZ detection models have been developed on one EEG dataset either in a resting state or performing some task. Also, we noted that models like LSTM and 1D CNN help to extract Spatio-temporal information but with reduced performances compared to 2D CNN models (Cho andJang 2020, Vareka 2021). CNN allows automatic feature extraction and classification but applying non-stationary EEG signals directly to CNN may not reveal desired performance. Over the last decade, many CNN models have been developed whose architecture varied from tens to hundreds of layers. But there is no standard CNN model specified for a particular application. Therefore, the selection and development of the CNN model depend on the user and applications. Even with deep models like visual geometry group and ResNet, the desired performance is not obtained (Smith et al 2021). Also, these techniques involve handcrafted feature extraction, empiric selection of tuning parameters, rigorous statistical analysis for feature selection, and user-dependent classification techniques resulting in the lower performance of models.
Therefore, from the identified research gaps we have been motivated to develop a SchizoNET model comprised of Margenau-Hill time-frequency distribution (MH-TFD) and CNN. The TFD helps to study detailed insight into EEG signals by capturing minute details in terms of time-frequency-amplitude contents. A CNN model with a simple architecture is developed using multiple validation techniques including holdout, five-, and ten-fold cross-validation (FCV) techniques to extract and classify the deep features obtained from TFD. The proposed model is tested and evaluated on three public EEG datasets of SZ. The working steps of the SchizoNET model are as follows: (i) the temporal information of EEG signals about time-frequency-amplitude is extracted from MH-TFD, (ii) the TFD is fed to the developed CNN model for automated feature extraction and classification, (iii) different performance parameters are evaluated and compared them with the current state-of-the-art techniques. The contribution of the proposed SchizoNET model is listed as follows: • To the best of our knowledge, we are the first group to develop automated SZ detection on three EEG datasets. Therefore, the SchizoNET model has good generalization ability on different datasets.
• Visual inspection of EEG signals is very tedious and prone to human error. Hence, the study of temporal and spatial information of EEG signals about time-frequency amplitude is performed by MH-TFD. • Traditional techniques require extensive parameter tuning, the selection of handcrafted features is timeconsuming, and the appropriate choice of classifiers is difficult. Therefore, we developed a simple CNN architecture using fewer layers and tested it with multiple validation techniques to evaluate performance metrics (PM).
The paper is structured as sections 1 and 2 presented the introduction and related work. Findings of literature and motivation are covered in section 3, and details about materials and methods are covered in section 4. Results are presented in section 5, performance comparison with current state-of-the-art is provided in section 6, a discussion is covered in section 7, and conclusions are given in section 8.

Materials and methods
The steps of the proposed SchizoNET model involve details of EEG datasets, the extraction of simultaneous temporal and spatial information using MH-TFD, automatic feature extraction and classification using the CNN model. The schematic of the SchizoNET is shown in figure 1.

Datasets
The proposed method uses three publicly available EEG datasets to test the SchizoNET model. The first dataset is acquired from adolescents during resting state, the second dataset is comprised of resting-state EEG acquired from adults, and the third dataset is recorded during the press button task. The demographic details of these datasets are discussed below and presented in

Dataset 2
Dataset 2 comprises 14 subjects with paranoid SZ hospitalized at the Institute of Psychiatry and Neurology in Warsaw, Poland, and 14 HC subjects (Olejarczyk and Jernajczyk 2017). All patients met the ICD-10 criteria for paranoid SZ (F20). The criteria for inclusion of SZ subjects: are ICD-10 diagnosis F20, a minimum medication washout period of seven days, and a minimum age of 18. Exclusion criteria: organic brain pathology, presence of a general medical condition, pregnancy, first episode of SZ, and neurological diseases. The EEG was recorded for 15 minutes in an eyes-closed resting-state condition. The 19-channel (P3, Fp1, Fp2, Pz, C4, F7, F3, Fz, P4, C3, Cz, F4, F8, O1, O2, T3, T4, T5, T6) EEG montage built in-accordance to international 10-20 system was used.

Dataset 3
It is obtained from Kaggle which contains EEG signals of 81 subjects (button-tone https://www.kaggle.com/ broach/button-tone-sz n.d.). The diagnosis criteria for SZ patients were by the Structured Clinical Interview for DSM-IV. Subjects of both groups i.e. SZ and HC, were age, handedness (right), and gender-matched. Exclusion criteria for SZ included no dependence on substances for the past year while for HC subjects no history of substance dependence, current or past history of having a first-degree relative with a psychotic disorder, or DSM-IV Axis I disorder. The data were band-pass filtered between 0.5 and 15 Hz and a baseline was corrected at −0.6 to −0.5 s. The EEG epochs were artefact rejected for voltages exceeding ±100 μV at all scalp sites. The details about the dataset and acquisition steps can be found in Ford et al (2013). From the previous studies it has been found that pressing a button to generate a tone immediately is helpful in the detection of SZ and HC hence, it is used for analysis in the current work (

Margenau-Hill time-frequency distribution (MH-TFD)
The information provided by signals about frequency-domain and time-domain components helps to study the characteristics of any signal. Since time-based representations use the entire frequency span in which the signal is defined, they ignore some hidden characteristics along with frequency. Similar limitations are also true to frequency-based representations (Advanced Time-Frequency Signal and System Analysis 2016). To address this, transformations based on TFD are the best way to represent a time-dependent spectrum of non-stationary EEG signals. The linear TFD like STFT and wavelets uses a window to localize behaviour in time and frequency. But to satisfy the Heisenberg-Gabor inequality, the resolution in time-frequency of this transformation is limited by localizing window parameters like duration and bandwidth. The choice of smaller time duration results in greater bandwidth and vice versa due to a compromise between time and frequency in linear TFD (Advanced Time-Frequency Signal and System Analysis 2016). The CWT-based TFR requires an appropriate selection of mother wavelet; which is again tedious. The MH-TFD helps to overcome the limitation of linear TFD as it does not use localizing windows or wavelet selection. MH-TFD uses autocorrelation of a signal rather than windows thus, it does not restrict resolutions in time frequency. MH-TFD provides better representation and decomposes EEG signal components into TFD. The time-frequency representation obtained using MH-TFD is denoted by where y(. ) denotes the signal to be analyzed, t and f represents time and frequency, * denotes complex conjugate pair, and the kernel function is denoted by exp jptn . The above-mentioned equation can be simplified as (Hatami et al 2016) where Y( f ) is the Fourier transform of y(t). But, MH-TFD produces interference called cross-terms which interrupts the readability of the signal when analyzing it in multi-components. These cross-terms generate nonidentical components that severely distort the signals. The cross-term formulation of a signal is denoted as (Hatami et where the two-component of a signal is denoted by y i and y k with a cross-term CT y y , i k . The cross-term in the time and frequency domain can be minimized by using a kernel function. MH-TFD uses time and frequency crossterm reduction kernels with a provision of flexible length to minimize the cross-term of a signal. Due to this reason, MH-TFD is a suitable choice for obtaining the time-frequency representation of a signal. The EEG segments of three datasets with all channels are converted to TFD image. For dataset 1, 60 s of EEG segment (7680 samples) are transformed into TFD image using MHTFD. Similarly, for datasets 2 and 3, we have used 25 s (6250 samples) and 3 s (3072 samples) of EEG segments are converted to TFD. Therefore, for datasets 1, 2, and 3 we have obtained 1344, 21 702, and 49 3824 TFD images, respectively. These TFD images of all the channels are fed to the CNN model for the detection of SZ from HC EEG segments. The typical TFD of an SZ and HC EEG signals obtained by MH-TFD on three datasets are shown in figure 3. The TFD indicates that the energy content of EEG for SZ and HC is dominant in a lower frequency range.

Convolutional neural network (CNN)
The 2D TFD obtained using MH-TFD is fed to a CNN model. It is an automated tool that enables the extraction and classification of deep features. Convolutional, pooling, dropout, dense, softmax, and classification layers are the main building blocks of CNN. The extraction of deep features is controlled by convolution, pooling, and dropout layers while the classification is done through dense, softmax, and output (classification) layers. The convolutional layer is the heart of the CNN model comprised of filters (kernels) that are moved along the tensor (image) in a fixed length called stride. Convolutions of kernel and tensor are evaluated to obtain output feature maps. Zero paddings are applied to keep image size while non-linearity in the network is added using the activation function. The pooling layer reduces the dimension of the output feature maps by keeping the number of input and output maps unaltered. A dense layer is followed by a pooling layer which transforms a 2D matrix to 1D and assigns some scores to the deep features extracted from the preceded layers. The softmax layer allocates the probability using some algorithms to each feature score. Finally, the classification layer assigns the output class to feature maps. In addition, a CNN model also uses a normalization layer to bring all the feature maps to the same scale which helps regularization, avoiding overfitting. A dropout layer deactivates some of the neurons in the network to lessen generalization error and overfitting.
Various CNN models are developed with different combinations of layers which vary from application to application and user to user. Some use CNN models with fever layers while others use dense CNN models composed of hundreds of layers (Alom et al 2018). Even with different configurations of CNN models and multiple trials, the desired performance is not achieved. Also, there is no standard model available for this application due to a lack of prior (Wolpert 1996). Therefore, a CNN model is developed with five convolutions, three pooling, three dense, and one output layer. In addition, the developed model uses rectified linear unit (Relu) as an activation function to increase non-linearity, a max-pooling layer to reduce the dimensionality of the feature map, a dropout of 50%, and batch normalization layers to reduce overfitting. The summary of the proposed CNN model is shown in table 4.

Results
Traditional machine learning techniques require extensive statistical analysis for selecting handcrafted methods and features. Moreover, a precise selection of the classifier and its parameters is time-consuming and does not guarantee performance success. Thus, the SchizoNET model is developed for the automatic detection of SZ. Three public datasets comprised of push-button tasks and resting-state EEG signals are employed for testing the SchizoNET model. The EEG epochs are transformed into time-frequency representation using MH-TFD. For reducing the cross-term of TFD, the Kaiser time and frequency window of lengths 31 and 63 are used. Obtained TFD is converted into images and fed to the CNN model. The learning rate is 10 −04 , the bias and weight learning factor are both fixed at 10, the adaptive moment estimation optimizer is used to scale the learning rate of each weight, the batch size is 64, the total number of epochs is 60, and the frequency of validation is 50. All the parameters are selected empirically and maintained uniformly throughout the experimentation.
The DL models often offer very high performance; however, their stability is uncertain. Therefore, to verify the stability of our developed model, we have performed holdout (80% data used for training and 20% data used for testing), five-FCV, and ten-FCV techniques. The accuracy (ACC) obtained for each dataset using the aforementioned validation techniques is shown in table 5. The results show that the ACC for holdout validation in all the datasets is highest because it is not averaged, while it is slightly reduced in the case of multi-fold CV techniques.
The SchizoNET model is evaluated by measuring six performance measures: ACC, Cohen's Kappa (Kappa), precision (PRC), sensitivity (SEN), specificity (SPE), and F-1 measure. As our model is tested on balanced and unbalanced datasets, we have chosen the above-mentioned performance measures. Table 6 shows the performance measures obtained using holdout, five-FCV, and ten-FCV with our SchizoNET model for Datasets 1, 2, and 3. The result shows that our developed SchizoNET model provides high performance on all three datasets and validation techniques. Thus, the results of tables 5 and 6 confirm the robustness of our SchizoNET model to obtain high performance in different validation scenarios for all three datasets. This confirms that our model has generated more distinct deep features to accurately categorize SZ and HC EEG segments. Figure 4 depicts plot of accuracy and loss versus iteration obtained for training, testing and validation phases of proposed SchizoNET.
A model with higher accuracy and higher variation from its mean value does not make a significance. As a result, the STD from a mean value of each PM is evaluated to measure the effectiveness of the SchizoNET. Table 7 provides the variation of STD and margin of error (MoE) obtained for each PM for a 95% confidence interval (CI). The table shows that the model got very little STD from the mean for each PM. On dataset 1, F-1 provides the lowest STD of ±0.63 while the highest is ±1.43 for Kappa. For dataset 2, SEN achieved the highest STD of ±0.44 while PRC provided the minimum STD of ±0.13. Finally, on dataset 3, the least STD of ±0.68 is provided for F-1, whereas the highest is ±2.46 for SPE. The MoE obtained using 95% CI on all datasets reveals that PM on each fold during ten-FCV shows no significant variance, indicating that the SchizoNET method is reliable and effective.
To get more insight into the SchizoNET model, the percentage confusion matrix is evaluated for three datasets using ten-FCV as shown in Further, we have evaluated receiver operating characteristics (ROC) and area under the curve (AUC) for our SchizoNET model, as shown in figure 5. It is evident that our developed model provided the AUC of 97.69%, 99.99%, and 96.52% for dataset 1, 2, and 3, respectively. This shows that our developed model accurately performs binary classification of SZ and HC.

Performance comparison
The performance of the SchizoNET is evaluated further by comparing it with current state-of-the-art techniques. Tables 9, 10, and 11 shows the performance comparison of the SchizoNET model on dataset 1, 2, and 3, respectively.  (2021) used filtering and FFT-based rhythm separation. Different features combining Hjorth parameters (mobility, complexity, and activity), spectral power, and mean spectral amplitude have been extracted from the delta, beta, gamma, alpha, and theta rhythms. These features are classified using deep learning classifiers like CNN and

Discussion
The methods developed using nonlinear features require tuning of multiple parameters and showing degraded performance due to different noise and artefacts. FFT-based techniques analyze the EEG signals in the timefrequency domain. Still, time-based representations of a signal use the entire frequency span over which it is defined and may ignore some hidden characteristics along with frequency. Similar drawbacks are also true for frequency-based representations resulting in poor time-frequency localization. The analysis of EEG signals using STFT assumes the signal to be stationary over a duration, and also requires the selection of length and type of window. The discrete wavelet transform-based techniques decompose a signal into subbands by selecting a mother wavelet and decomposition level that are difficult to find. The EMD-based decomposition extracts instantaneous information regarding amplitude and frequency but lacks mathematical modelling and suffers from mode mixing. The TFD provided by STFT and CWT requires to satisfy Heisenberg-Gabor inequality, due to which the resolution in time-frequency of this transformation is limited by localizing window parameters like duration and bandwidth. The TFD obtained using SPWVD requires kernel function and its length as parameters. Improper choice of these parameters may produce severe distortion in the TFR. These shortcomings result in overlapping information about SZ and HC EEG signals that degrade the system performance. Some CNN-based classification models use single-fold fixed-length training and testing sets that might produce an overfitted model. Moreover, many studies are limited to a single EEG data evaluation which does not guarantee similar performance on other or different EEG datasets of the same problem. Our proposed SchizoNET model combines MH-TFD and CNN for automatic time-frequency-amplitude feature extraction and classification. The MH-TFD does not require any choice of a window but uses the autocorrelation of signals to be analyzed. In addition, the problem of cross-term is overcome due to the use of cross-term reduction window in frequency and time domain. This enables the extraction of more hidden information from EEG signals which reflect representative and distinguishable characteristics of it. The CNN model enables automatic classification to detect SZ and HC EEG signals. The evaluation of SchizoNET on three different EEG datasets of SZ developed using holdout, five-FCV, and ten-FCV has provided the highest performance over existing stateof-the-art techniques on most of the datasets. It is evident from tables 9, 10, and 11 that this is the first study to develop a novel DL model which can be used for all three public datasets and yield the highest performance. In addition, our developed model is simple as it has only five convolutional layers compared to benchmark CNN models like AlexNet, VGG-16, and ResNet-50 (Smith et al 2021). Also, our SchizoNET model requires fewer learning parameters i.e. about 42.8 million compared to existing AlexNet (approx. 61 million) and VGG-16 (approx. 138 million) parameters (Smith et al 2021). The merits of our developed SchizoNET model are as follows: • Robust: the SchizoNET model is robust because it is developed using three different EEG datasets.
• Accurate and stable: the developed model reported the highest and most consistent performance with holdout and cross-validation techniques.
• Simple and effective: our model is simple (only five convolutional layers) and has fewer learning parameters than benchmark CNN models.
The limitations of our proposed model are given below: • In our used three datasets, the number of subjects used in each dataset is too few to explore the LOSO validation technique.
• Our work does not localize the region of SZ.

Conclusion
The proposed SchizoNET model combines MH-TFD and CNN to automatically detect SZ patients using EEG signals. The TFD generated by MH-TFD has provided excellent resolution, hidden information, and detailed insight into EEG signals due to a reduction in cross-term. The TFD has facilitated the CNN model to extract deep features that drastically reduced manual efforts. The simple architecture of the proposed CNN model has drastically improved the system performance with fewer learnable parameters. Our developed model correctly identified 99.74% of SZ signals, the highest among current state-of-the-art techniques. Thus, the proposed SchizoNET model is robust, effective, accurate, and versatile as it obtained the highest performance matrices on three EEG datasets. Also, the designed model can detect SZ in a resting state, evoked potential, and tasks related to EEG acquisition. Our SZ model is more generalized as it does not require any feature engineering and can automatically extract and classify features. The limitation of our model is that it has not been developed using LOSO cross-validation due to the fewer subjects in each of the three datasets. In the future, we will develop a subject-based and channel-wise SZ detection model by using more subjects in each class.