Non-invasive Detection of Ketum Users through Objective Analysis of EEG Signals

Ketum leaves are traditionaly used for treatment of backpain and reduce fatigue. However, in recent years people use ketum leaves to substitute traditional drugs as they can easily be obtained at a low cost. Currently, a robust test for ketum detection is not available. Although ketum usage detection via test strip is available, however, the method is possible to be polluted by other substances and can be manipulated. Brain signals have unique characteristics and are well-known as a robust method for recognition and disease detection. Thus, this study has been done to distinguish between ketum users and non-users via brain signal characteristics. Eight participants were chosen, four of whom are heavy ketum users and four non-users with no health issues. Data were collected using the eegoSports device in relaxed state. In pre-processing, notch filter and Independent Component Analysis (ICA) were used to remove artifacts. Wavelet Packet Transform (WPT) was used to reduce the large data dimension and extract features from the brain signal. To select the most significant features, T-Test was used. Support Vector Machine (SVM), K-Nearest Neighbour, and Ensemble classifier were used to categorize the input data into ketum users and non-users. Ensemble classifier was found to be able to predict the testing instances with 100% accuracy for open and closed eyes task with Teager energy and energy to standard deviation ratio as the features.


Introduction
Mitragyna speciosa or ketum is a plant found abundant in the northern and east coast of Peninsular Malaysia. It has varieties of use, and some even use it as a substitution for the more expensive heroin. Although the actual clinical studies on the benefits of ketum are yet to be fully understood, some have reported that the leaves chewed helps treat musculoskeletal pain, fever, cough, diabetes, and hypertension [1]. However, in Malaysia, ketum has long been classified as the banned substance for consumption due to its intoxicating properties. Mitragynine, the main active element in ketum leaves is a psychoactive alkaloid controlled under the Poisons Regulations 1989 under the Poisons Act 1952. EEG signals of ketum users are believed to differ from ordinary people. Consuming ketum on a daily basis may reduce cognitive function as its properties are pretty similar to a drug. Currently, ketum 2 detection using a test strip is available. However, the result is easily manipulated and may be contaminated by other substances. On the other hand, EEG signals are unique and able to visualize an individual's physiological state. Furthermore, EEG signals analysis is a non-invasive method, robust, and cannot be fabricated. This study discusses how to distinguish between EEG signals of ketum users and non-user using a machine learning algorithm.

Literature review
EEG signal is unique, and the signal's characteristics can be used to analyze human brain state or activity. Thus, EEG signal analysis allows the researcher to determine the subject's cognitive state and differentiate between normal and abnormal subjects based on features in the signal.
Raw EEG signals suffered from noises due to environmental such as power line artifact and the subject itself, such as eye blinks, ECG, and EMG noises [2][3][4]. These noises need to be removed before pursuing to next step. In studies by Chandra et al. [5], Sharma et al. [6], Li [7], and Nuamak et al. [8], in the signal pre-processing stage, Independent Component Analysis (ICA) was applied to remove the noises such as the electrocardiogram (ECG), electromyogram (EMG), and electrooculogram (EOG). Bandpass filter and notch filter are also being implemented in this research paper to collect the signal within the desired frequency range and remove the power line artifact at 50 Hz.
Upadhyay et al. [9], in a study regarding detection of epilepsy using appropriate feature ranking techniques, stated that feature extraction and classification of EEG signals are the significant steps in EEG based epilepsy detection system. Due to EEG signals characteristics that are non-stationary, temporal, and spectral (i.e., Fourier Transform (FT)) features cannot extract useful information from these signals. To solve this issue, transformations such as the Short-Time Fourier Transform (STFT), the Wavelet Transform (WT), and the S-transform are frequently employed to describe EEG data in the time-frequency domain. Moreover, WT was the most suitable tool for analyzing the transient behavior of EEG signals in time-frequency planes. Additional to feature extraction, feature selection is used to select a subset from large numbers of available features that more robustly discriminate for classification purposes [10]. Singh et al. [11] proposed a method for analyzing and classifying EEG signals of epileptic patients using EEG rhythms. They found that mean frequency and root mean square (RMS) bandwidth estimated were statistically different and able to distinguish inter-ictal, ictal and healthy volunteer responses. Most of the researchers implement SVM in their research to differentiate two or more classes of people, such as Sharma et al. [6], Li et al. [7], Nuamak et al. [8], and Hadj-Youcef et al. [12].
From the discussions, it can be observed that EEG data is useful for recognition of pathological conditions and understanding brain responses. Therefore, the use of EEG analysis for identification of ketum abuse is a promising field to be explored. To the best of the authors' knowledge, current studies of ketum usage are limited to rats [13]. Therefore, this study will investigate the difference in EEG signals between ketum users and non-users in an effort to provide an alternative means of identifying ketum usage by utilizing EEG signals.

Subject selection
There were four subjects of ketum users and four subjects of non-ketum users involved in this study. The ketum users have been consuming ketum for at least one year. All participants at ages between 20-30 years old.

Experimental protocol
The experiment started with some briefing regarding the experiment to the subject. The subject was then asked to fill up a consent form, an EEGO cap on their head and sit on a chair comfortably at a distance of about 1.5 meters from the blank screen. Finally, there were two tasks need to be completed by the subjects for this research. The first task required the subject to stare at a blank screen for 10 minutes and the second task, the subject need to close their eyes for another 10 minutes of recording. Figure 1 shows the flow of the experiment.  Figure 1. Experimental protocol.

Experimental setup
The EEG signals have been collected using an EEGO sports device (ANT Neuro, Enschede, Netherlands) that has 32 channels with the placement of each channel follows the 10-20 standard positioning system. First, the sampling frequency was set to 2048 Hz. Then, the frequency band adjusted to 0.1 -200 Hz to obtain the frequency up until a higher level of gamma-band.
The data collection was done in a quiet environment with enough lighting to keep the individual from becoming distracted. During the recording, the subject wears the EEG cap attached to the amplifier and tablet with EEGO64 software before electrode gel was inserted into each electrode channel to improve the conductivity and stabilise the connection between electrode and subject's scalp. Subjects were not allowed to move their bodies or blink their eyes during data gathering to minimize EMG and OEG artifacts in the signal. The data was exported to the personal data storage once the recording was stopped for further analysis using the MATLAB software. The ketum users were in sober state during the recording of the data.

EEG signal processing
In EEG signal processing and analysis, there are three basic steps: pre-processing, feature extraction, and classification [14].

Pre-processing
During data collection, the sampling frequency was set to 2048Hz. At 2048 Hz of the sampling frequency, more informative data can be obtained. However, it may lead to computational issues. Therefore, from the sampling frequency 2048Hz, the data were down-sampled to 512Hz. The data for each subject was then divided into 40 frames of six seconds.
The raw EEG signal might be contaminated with noises and artifacts. For example, the eye blinks, ECG, EMG, and power line interference. In Malaysia, power line interference occurs at a frequency of 50 Hz. Thus, to attenuate the artifact, a notch filter set to 50Hz was used [15], [16], [17], [18]. Next, the eye blinks artifact and EMG signals were removed using Independent Component Analysis (ICA). ICA is a linear decomposition technique that aims to reveal the underlying statistical sources of mixed signals [19]. The removal of artifacts needs to be done with correct identification. Usually, if the noises are not significant, the noises will not be removed because removing the part might induce losses of real EEG signal.

Feature extraction
Feature extraction is the process to eliminate the redundancy of the EEG signal since the EEG signal is a redundant discrete-time sequence [20]. Therefore, wavelet packet transform (WPT) was used. In WPT, the pre-processed data was further decomposed into two parts: approximate coefficient and details coefficient. The level of decomposition was up to seven levels which produce a total of 255 nodes. The nodes selected for this research were from node 127 until 151, covering from 0 -100Hz frequency spectrum. Six features were extracted from the selected nodes for further analysis as discussed below;  (1) where () xt is the first derivative and () xt is the second derivative of x with respect to time, t respectively. Equation (2) shows the Teager energy in discrete-time signals.

Energy to standard deviation ratio (ESTD).
ESTD is used to measure the fluctuation of energy between different frequency components of the signals, as shown in equation (3) 2 [] () where  is the standard deviation of x .

Energy to mean ratio (EM).
The energy to mean ratio is used to determine the distribution of energy in different frequency components of the signals, as shown in equation (4).
where 1 m = and 0.2 r = represent the embedding dimension and tolerance, respectively.

Hjorth parameters.
There are two features under Hjorth parameters which are mobility and complexity. Mobility represents the mean frequency and has a proportion of the standard deviation of the power spectrum, whereas complexity compares a signal's similarity to a pure sine wave. The computation for the features is shown in equation (6), equation (7), and equation (8) [7].

Feature selection
The students T-Test is used as a feature selection method to identify optimal features for identification of ketum users and non-users [22]. The filter-based feature selection approach assess the correlation or dependency between input variables that may then be filtered to identify the most relevant features using statistical measures.

Statistical analysis: T-Test.
The T-Test is one of the statistical analyses that compare the mean between two classes. P-value is a statistical result that evaluates if there is a significant difference between two classes, ketum users and non-ketum users. The test is likely to represent the real difference between populations [23]. The result from the T-Test analysis of the features is as shown in Table 1 and Table 2.
The p-value and t-score of the T-Test are important for selecting the features. The difference between groups is significant when the p-value is smaller than 0.005. The t-score is a ratio of the difference between two groups and the difference within the groups. The greater the magnitude of the tscore, the greater the difference between groups. The negative sign for the t-score only due to the reversal in the directionality of the effect being studied. It has no impact on the significance of the difference between groups of data. So based on the p-value and the t-score from the six extracted features, two features that have been chosen as the input for the classifier are Teager energy and energy to standard deviation ratio (ESTD).

Classification
For quantification of ketum users identification using features derived from EEG data, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Bagged Tree Ensemble Classifier are used. Tenfold cross validation technique is used to validate the classification results [24,25].

K-Nearest Neighbor (KNN).
In K-nearest neighbour, an object's classification is determined by majority voting of its neighbours. It is allocated to the most prevalent class among its k nearest neighbours [26,27]. The K-Nearest Neighbour (K-NN) classifier is a primary method in the pattern recognition community because of its simplicity, interpretability, and performance [28]. As for this research, the Cosine distance measure with 3 neighbours was used.  [29]. SVM identifies the best hyper plane with the most significant margin to split all data points into different classes. In the SVM algorithm, the primary principle is to convert the input data into a greater dimensional space and then construct an optimal hyper plane distinguishing between the two categories in the transformed space [30]. The quadratic kernel function was applied for the SVM model.

Bagged Tree Ensemble Classifier.
Meta classifier or ensemble classification are learning algorithms that create a collection of classifiers and then characterize new data points by deciding that accounts for their predictions [31]. The majority voting model produces the outcome for an unobserved occurrence. From the overall votes, the classification having the most votes win and represented the ensemble's prediction. Bagged Tree was utilized in this research with 30 learners.

Result and discussion
Scatter plots are used to represent the relationships between ketum and non-ketum users as shown in figure 2 for task 2. The graphs show the Teager energy for ketum and non-ketum users quite distinct from each other. From the figures, it can be observed that the cognitive process between the two groups of subjects are different, thus the Teager energy feature was able to efficiently elucidate the differences. The classification performance for Teager energy and ESTD were analysed using the KNN, SVM and Enseble classifiers and the results are reported in Table 3 and Table 4 as shown. The classification performance was evaluated based on the accuracy, sensitivity, specificity, and precision for both tasks. From the analysis, it can be observed that both Teager energy and ESTD features are able to distinguish the ketum users and non-users effectively.  Although comparative study with similar observation cannot be made due to lack of research in Ketum usage, the ensemble classifier used for this study was able to predict the testing instances with 100% accuracy for both tasks and features. This is because the ensemble classifier combines the classification rates of several independent decision tree classifiers trained using the bagging tree technique. From the observations, it can be seen that the energy derived from the EEG signals between the two groups are statistically significant. This suggests that the ketum-users have difficulties to be in relaxed mental state compared to their non-user peers even in sober state.

Conclusion
The study shows that Teager energy and ESTD are useful features in identifying ketum and non-ketum users, while the ensemble is the best classifier to distinguish the classes. There was also a slight difference in the performance of SVM and KNN. Although they did not reach 100% accuracy, their categorization accuracy was still greater than 80%. As a result, the relaxation mental state employed in task 1 and task 2 are appropriate for identifying ketum users. The proposed algorithms provides a promising alternative non-invasive method to detect ketum usage in sober ketum users.