Enhancing clinical communication assessments using an audiovisual BCI for patients with disorders of consciousness

Objective. The JFK coma recovery scale-revised (JFK CRS-R), a behavioral observation scale, is widely used in the clinical diagnosis/assessment of patients with disorders of consciousness (DOC). However, the JFK CRS-R is associated with a high rate of misdiagnosis (approximately 40%) because DOC patients cannot provide sufficient behavioral responses. A brain–computer interface (BCI) that detects command/intention-specific changes in electroencephalography (EEG) signals without the need for behavioral expression may provide an alternative method. Approach. In this paper, we proposed an audiovisual BCI communication system based on audiovisual ‘yes’ and ‘no’ stimuli to supplement the JFK CRS-R for assessing the communication ability of DOC patients. Specifically, patients were given situation-orientation questions as in the JFK CRS-R and instructed to select the answers using the BCI. Main results. Thirteen patients (eight vegetative state (VS) and five minimally conscious state (MCS)) participated in our experiments involving both the BCI- and JFK CRS-R-based assessments. One MCS patient who received a score of 1 in the JFK CRS-R achieved an accuracy of 86.5% in the BCI-based assessment. Seven patients (four VS and three MCS) obtained unresponsive results in the JFK CRS-R-based assessment but responsive results in the BCI-based assessment, and 4 of those later improved scores in the JFK CRS-R-based assessment. Five patients (four VS and one MCS) obtained usresponsive results in both assessments. Significance. The experimental results indicated that the audiovisual BCI could provide more sensitive results than the JFK CRS-R and therefore supplement the JFK CRS-R.


Introduction
Disorders of consciousness (DOC), such as vegetative state (VS) and minimally conscious state (MCS), are clinically diagnosed on the basis of behavior scales such as the JFK coma recovery scale-revised (JFK CRS-R) and the Glasgow coma scale (GCS), which rely on motor responses to external stimulation [1]. People in a vegetative state may awaken but show no awareness of themselves or their environment, whereas patients in a minimally conscious state may demonstrate inconsistent but reproducible signs of awareness [2]. Furthermore, emergence from MCS (EMCS) is characterized by a reliable and consistent demonstration of functional interactive communication or the functional use of two different objects [3]. Patients in locked-in state (LIS) similarly show very limited signs of awareness due to profound sensory and motor deficits but have retained self-awareness and normal or near-normal cognitive capacities [4]. Several studies have not included LIS as a DOC [3,5].
Among these behavior assessment methods, the JFK CRS-R is suggested to be reliable and widely applied in clinical assessments [6]. The JFK CRS-R was first introduced by Giacino et al in 1991 and was revised in 2004 [7]. The JFK CRS-R contains six subscales, including auditory, visual, motor, oromotor, communication and arousal functions. The score in each subscale depends on whether the DOC patient has specific behavior responses to sensory stimuli. For example, in the communication subscale, if clear, discernible and accurate responses occur for all situational orientation questions, the patient receives a score of 2. If clear, discernible responses occur for at least two but not all questions, the patient receives a score of 1. Otherwise, a score of 0 is given. However, behavior-based scales, such as the JFK CRS-R, have some problems [8]. First, many patients with DOC cannot maintain a stable state during the evaluation. Second, these patients are usually unable to make normal physical movements [9]. As a consequence, high rates of clinical misdiagnosis can occur. For example, recent studies have observed that 37% to 43% of patients diagnosed as being in a VS actually show signs of awareness [6,10,11].
Considering these limitations of behavioral assessment scales, brain-computer interfaces (BCIs) could be used as an assistance tool for clinical evaluation because they can detect command/intention-specific changes in brain signals, such as electroencephalography (EEG), without requiring any behavioral expression [9,12]. Several BCI systems have recently been applied in awareness detection for patients with DOC [13][14][15][16]. Cruse et al accessed 16 VS patients by having them complete a motor imagery (MI) task in which they were instructed to imagine movements of their right hand and toes in response to commands [13]. Three of the patients could repeatedly and reliably generate appropriate EEG responses to two distinct commands. Pan et al reported a visual hybrid BCI that combined P300 and SSVEP to detect awareness in four VS, three MCS and one LIS patients [14]. They successfully demonstrated command following in three patients (one VS, one MCS and one LIS). In our previous study, we applied an audiovisual BCI system to detect the awareness of seven DOC patients; the patients were instructed to selectively attend a target of two different number stimuli [15]. Five of the patients exhibited command following. These studies focused on DOC patients' overall awareness levels and did not evaluate the patients' functional consciousness in detail, as might be done with the JFK CRS-R. In addition, several studies have reported that BCIs could detect functional communication in LIS patients. Lulé et al tested a four-choice (Yes, No, Stop and Go) auditory oddball BCI with 18 patients (13 MCS, 3 VS and 2 LIS) [17]. Their results showed that one LIS patient had a correct response rate of 60% and therefore was able to communicate using the BCI. Overall, the performance of these BCIs designed for DOC patients is generally poor. This is mainly because DOC patients' recognition levels, such as attention, are substantially lower than those of healthy subjects, and recognition levels are associated with the performance of a BCI system. Furthermore, to our knowledge, no BCI has been designed to directly assist in behavioral scalebased assessments. It is difficult to obtained sensitive results for several items in traditional behavioral scales. BCI methods are superior to clinical assessment because the former does not depend on the patients' behavioral responses. However, the performance of BCIs in DOC patients is generally poor. A statistical test can be performed to show whether BCI results are significant. Furthermore, we need to develop novel BCIs that are suitable for DOC patients. Previous studies have validated the effectiveness of audiovisual BCIs [15,18]. In this study, we used an audiovisual paradigm to improve BCI performance.
In this paper, we propose a new protocol to assist JFK CRS-R for assessing the communication ability of DOC patients. The new protocol was designed following JFK CRS-R communication subscale-based assessment. Specifically, the new audiovisual BCI system was based on semantically congruent audiovisual stimuli, i.e. 'Yes' and 'No'. We imitated the JFK CRS-R communication subscale by administering situationorientation questions to DOC patients. After a question was proposed, two flashing buttons with the Chinese words 'Yes' and 'No' appeared on the computer screen. When a button was flashing, the corresponding spoken word 'Yes' or 'No' (in Chinese) was heard simultaneously. The patients were instructed to selectively focus on the flashing button corresponding to the correct answer ('Yes' or 'No') to the question and the corresponding spoken word. The BCI system determined the patient's choice by detecting event-related potentials (ERPs), such as P300. The detection result was presented as online feedback. Based on the feedback results, the clinical examiners could make a determination of the communication ability of the patient. Thirteen patients participated in our experiments with BCI-and JFK CRS-R-based communication assessments. The experimental results demonstrated the efficacy of our BCI approach in supplementing the JFK CRS-R assessment of communication in DOC patients.

Subjects
Thirteen patients (eight VS and five MCS; mean ± SD, ± 37 12 years of age; twelve males; see table 1) from a local hospital participated in the experiment. None of the patients had a history of impaired visual or auditory acuity. The Ethics Committee of the General Hospital of the Guangzhou Military Command of PLA in Guangzhou, China, which complies with the Code of Ethics of the World Medical Association (Declaration of Helsinki), approved the experimental procedures. Each patient's legal guardian provided written informed consent for the experiment and for the publication of their individual details in this manuscript. The clinical diagnoses were based on the JFK CRS-R, which comprises six subscales addressing auditory, visual, motor, oromotor, communication and arousal functions [7]. Details, including the JFK CRS-R scores of the thirteen patients, are shown in table 1.

Data acquisition
EEG data were amplified using a SynAmps2 device (Compumedics, Neuroscan, Inc., Australia) sampled at 250 Hz and filtered between 0.01 and 30 Hz. The EEG signals were recorded from 30 electrodes using an EEG cap (LT 37) based on the international 10-20 system and referenced to the right mastoid. All electrode impedances were maintained below 5 kΩ during data collection.

Experimental procedure
The thirteen patients participated in two assessments: the JFK CRS-R-based behavioral assessment and the BCI-based communication assessment. The JFK CRS-R-based behavioral assessment was conducted by a clinician from the General Hospital of the Guangzhou Military Command of PLA. Following the standard protocol, the clinician administered the following 6 situation-orientation questions to the patient in random order. Questions 1-4: 'Am I touching my ear/nose right now? The clinician touched or did not touch his/her ear/ nose'; Questions 5-6: 'Am I clapping my hands right now? The clinician clapped or did not clap'. The communication scale score depended on the patient's discernible verbal or nonverbal communication response. If clearly discernible and accurate responses were given for all six questions, the patient received a score of 2. If a clearly discernible response (e.g. head nod/shake, thumbs up) was given within 10 s for at least two of the six questions, the patient received a score of 1. If no discernible verbal or nonverbal communication response was given for any question, the patients received a score of 0. The thirteen DOC patients underwent three JFK CRS-R-based assessments shortly before and after the BCI experiment and again two months after the BCI experiment, as shown in table 1.
The graphical user interface (GUI) of our audiovisual BCI system is shown in figure 1. First, a situation question and an instruction were presented in the upper part of the screen. The number of questions with an answer of 'yes' was equal to the number of questions with an answer of 'no', and the questions  were presented in random order. Two word buttons with 'Yes' and 'No' in Chinese were randomly presented on the left and right sides of the screen. When an audiovisual stimulus was presented, the color of the corresponding button changed from green to black, and the color of the word included in the button changed from white to black. Simultaneously, the corre sponding spoken word (65 dB) was presented in the headphone on the same side as the button. The experimental paradigm of the audiovisual BCI-based communication assessment is shown in figure 2. Before the online experiment, each patient performed a calibration run of 12 trials. The test run contained five blocks, and each block consisted of 12 trials. Each block was conducted on a separate day because patients were easily fatigued. The test run lasted from one to two weeks.
In the BCI experiment, each trial began with an audiovisual instruction of approximately 20 s. During the instruction, an experimenter administered a situation question to the patient twice; the questions were the same as those used in the clinical evaluation. The patient was asked to focus on the button with the answer and count its repetitions silently. The experimenter and family members explained the instructions repeatedly so that the patients paid attention to the audiovisual target stimuli. There were two rounds of audiovisual stimulations following the instruction period. In the first round, the audiovisual stimulus of one button (randomly chosen from the two buttons, such as 'No' in figure 2) was presented five times, and the audiovisual stimulus of the other button, such as 'Yes' in figure 1, was then repeated five times. The second round of audiovisual stimulations was the same as the first round. Each audiovisual stimulus lasted 300 ms. The time interval between every two adjacent audiovisual stimuli was randomly chosen from 700, 900, 1100, 1300, and 1500 ms. If the target was detected by the classification algorithm after two rounds of audiovisual stimulations, the sound of applause and the detected result were presented for 4 s as the feedback; otherwise, a cross appeared on the screen for 4 s. Finally, there was a break at the end of each trial of at least 10 s depending on the patient's level of arousal. In the BCI-based assessment, if the patients showed sustained eyelid closure during a trial, the recorded trial was discarded, and an arousal facilitation protocol was administered, such as presenting deep pressure stimulation to the shoulder. The next trial began after the patient reawakened. This arousal facilitation protocol was similar to that used in the clinical assessment.

Data processing
For all trials, the EEG signals recorded from 30 channels were band-pass filtered (0.1-20 Hz). We then extracted epochs corre sponding to each stimulus from 0 to 600 ms after stimulus onset for each channel. All epochs were baseline corrected using a baseline of 100 ms before the stimulus onset and downsampled by a rate of 5. Next, we concatenated the epochs from all 30 channels to obtain a data vector. Finally, a feature vector corre sponding to each button was constructed by averaging the vectors from all ten corresponding stimulations. Using EEG data from the calibration run, we first trained an SVM classifier in which the feature vectors corresponding to the target and ... ... nontarget numbers were labelled as +1 and −1, respectively. Furthermore, the classification model was updated after each test block using the data from this test block. For example, we used the data from Block 2 to re-train/update the SVM model for the test in Block 3. We used this model training method because (i) the blocks were conducted on separate days for each patient; and (ii) the patients' statuses did not allow them to perform a training run before each block. For each test trial, the trained SVM was applied to the two feature vectors corresponding to the two buttons, and the predicted result corresponded to the button with the higher score.
For each patient, the online classification accuracy was calculated as the ratio of the number of trials with correct responses to the total number of presented trials. To assess the significance of the accuracy, we used a binomial test based on Jeffreys' Beta distribution in which the significance level in a two-class paradigm is calculated as follows [19,20]: where N is the number of actual trials (in this experiment, = N 60-the numbers of discard trials); m is the expected number of successful trials (in this study, m = N/2 for a twoclass problem); a is the expected accuracy (0.5 in this study); λ is the accuracy rate; and z is the z-score based on the standard normal distribution. At a significance level of 0.05 for a one-sided test, z is 1.65. Using this formula, for our two-class BCI, the accuracy rate λ corresponding to the significance level was 60.4% for 60 trials in this study. If given an accuracy rate λ, we can calculate z using this formula and obtain the p value based on the standard normal distribution. Table 2 summarizes the online accuracy of the BCI experiment and the JFK CRS-R communication subscale scores for each patient. One MCS patient (P13) achieved a score of 1 on the JFK CRS-R-based communication assessment and an accuracy of 86.5% in the BCI-based assessment. Five patients obtained unresponsive results (four VS and one MCS; P1, P2, P3, P4 and P9) in both the BCI-based and JFK CRS-Rbased assessments. The other seven patients (four VS and three MCS; P5, P6, P7, P8, P10, P11 and P12), who all had scores of 0 on the JFK CRS-R communication subscale before the experiment, achieved accuracies ranging from 65.5% to 86%, which were significantly higher than the chance level (p < 0.05, the binomial test). In addition, the results of the two JFK CRS-R-based communication assessments after the experiment showed that four patients (P8, P10, P11 and P12) who achieved responsive results in the BCI-based assessment improved on the scores they obtained in the JFK CRS-Rbased assessment before the experiment.

Results
Based on whether the online classification accuracy of the BCI experiment was significantly higher than the chance level we classified the patients into two groups: the responsive group, which included patients P5, P6, P7, P8, P10, P11, P12 and P13, and the unresponsive group, which included the remaining patients. The average accuracy of the responsive group with standard deviation was 77.3% ±4.88. For the patients in the responsive group, the group-average ERP waveforms and each patient's waveform from the 'Cz' channel are shown in figure 3. The ERP waveforms of each patient were extracted by time-locked averaging of the EEG signal across all trials in the test run for each stimulus type, whereas the group-average waveforms were obtained by averaging the ERP waveforms of all patients in that group. As shown in figure 3, obvious P300 responses were observed for the target stimuli in the group-average ERP waveforms and in each patient's waveforms. Figure 4 shows the group-average ERP waveforms and each patient's waveforms in the unresponsive group. In the group-average waveforms, P300 responses seemed to be elicited by both target and nontarget stimuli, but they could not be distinguished. Furthermore, for patients P1 and P4, no P300 responses were observed for the target stimuli. For   patients P2, P3 and P9, P300 responses were elicited by both target and nontarget stimuli, but these responses could not be distinguished.
The results of the comparison between responsive and unresponsive groups based on several clinical parameters (age, time since onset, JFK CRS-R total scores) are reported in table 3. There were no significant differences between the responsive and unresponsive groups in age, time since onset and JFK CRS-R scores before the experiment. However, a significant difference was observed between the two groups in JFK CRS-R scores shortly after the experiment ( < p 0.05) and the JFK CRS-R scores two months after the experiment ( < p 0.01). This result implies that the patients in the responsive group recovered better than those in the unresponsive group.

Discussion
BCI-based methods could evaluate DOC patients' responses to external stimuli based on brain signals instead of behaviors and may provide an assistive tool for clinical evaluation. Previous studies mainly focused on using BCI to detect the awareness of DOC patients [13][14][15]. These studies determined whether patients were aware by verifying whether the patients followed commands. However, BCI-based methods have not been used to assess the single consciousness function included in behavioral scales, such as the 6 subscales in the JFK CRS-R. In this paper, we proposed an audiovisual BCI-based communication assessment system to supplement the JFK CRS-R in testing DOC patients' communication ability. The combination of the JFK CRS-R-and BCI-based assessments may provide more sensitive and precise diagnosis results than the JFK CRS-R alone.
In our previous study [15], we proposed an audiovisual BCI system for awareness detection in patients with DOC based on semantically congruent audiovisual stimuli, including visual and spoken numbers. Our experimental results for healthy subjects showed that the audiovisual BCI outperformed the corresponding visual-only and auditory-only BCIs, whereas the experimental results for the patients demonstrated satisfactory performance of the system in awareness detection. In this study, we used an audiovisual BCI that was a variant of the BCI system described in a previous study [15], to assess communication in DOC patients. Thirteen patients (eight VS and five MCS) participated in our experiment. Among these patients, one MCS patient (Patient 13) who received a score of 1 in the JFK CRS-R communication subscale before the experiment and a score of 2 in the JFK CRS-R-based assessment shortly after the experiment achieved an accuracy of 86.5% in the BCI experiment, which demonstrated the effectiveness of our BCI-based communication assessment system. The other twelve patients (eight VS and four MCS) received a score of 0 in the JFK CRS-R communication subscale before the experiment. However, seven of these patients (four VS and three MCS) achieved accuracies that were significantly higher than the chance level in the BCI-based assessment. This means that the seven patients could understand the examiner's situation questions and answer these questions through our BCI system.
In the JFK CRS-R communication subscale, a score of 1 indicates that the patient can answer some questions, and a score of 2 indicates that the patient can answer all questions correctly. Therefore, scores of 1 and 2 in the JFK CRS-R communication subscale are responsive results, and a score of 0 is an unresponsive result.
We now compare the sensitivity of the two assessment methods. First, we assumed that the responsive samples in the JFK CRS-R-based assessment were actual positives. The true positive rate (TPR) is often used to estimate the sensitivity of a test and can be defined as follows: the number of true positives (TP) / (number of true positives (TP) + number of false negatives (FN)). The number of total actual positive samples (TP + FN) is denoted by a and the number of TP in the BCI-based assessment is denoted by TP b . Then, the TPR corresponding to the BCI-based assessment is TP b /a, denoted by TPR b . The number of TP in each JFK CRS-Rbased assessment are denoted by TP c1 , TP c2 , and TP c3 , and the TPR in each JFK CRS-R-based assessment is denoted by TPR c1 , TPR c2 , and TPR c3 . Because TP c3 > TP c2 > TP c1 (see table 2), TPR c3 > TPR c2 > TPR c1 . Furthermore, all responsive results in the three JFK CRS-R-based assessments were detected by the BCI-based method (see table 2). Therefore, we had TP b ⩾ TP c3 > TP c2 > TP c1 , and TPR b ⩾ TPR c3 > TP R c2 > TPR c1 . However, as the number of responsive patients was small in this study, the statistical test for the above results was intractable. But we could still conclude that the discrepancy between the BCI-and JFK CRS-R-based assessments suggested a potentially higher sensitivity of the BCI-based assessment in consciousness detection.
The specificity of a test can be evaluated by the true negative rate (TNR), which is calculated as follows: the number of true negatives (TN) / (the number of true negatives (TN) + the number of false positives (FP)). The number of FP in the BCIand JFK CRS-R-based assessments is denoted by FP b and FP c , and the TNRs in the BCI-and JFK CRS-R-based assessments are denoted by TNR b and TNR c . In this study, since we assumed that the responsive patients in the JFK CRS-R were true positives, FP c = 0, and TNR c = 1. Furthermore, FP b ⩾ 0, thus TNR c ⩾ TNR b . However, because we had no evidence of FP in the BCI-based assessment, it is difficult to compare the specificity of the two methods. Among eight patients in the responsive group in the BCI experiment, one patient (P13) obtained responsive results in the JFK CRS-R-based assessment before the experiment. The JFK CRS-R-based assessments shortly after and two months after the experiment confirmed that four patients (one VS and three MCS; P8, P10, P11 and P12) could communicate to some extent. Although three other patients (P5, P6 and P7) who achieved significant scores in the BCI-based assessment still had scores of 0 in the JFK CRS-R communication subscale, their total JFK CRS-R scale scores later improved. These results indirectly indicate the effectiveness of our BCIbased assessment. An interesting observation in this study is that all eight patients in the responsive group in the BCI experiment had improved JFK CRS-R scores after the experiment (as shown in tables 1 and 3). Specifically, two MCS patients (P10 and P12) from the responsive group of BCI experiment, who obtained unresponsive results in the JFK CRS-R-based assessments before and shortly after the experiment but responsive results in the JFK CRS-R two months after the experiment, recovered and were released from the hospital three month after the experiment. Furthermore, as shown in figure 3, the ERP waveforms of the patients in the responsive group showed that the target stimuli elicited obvious P300 responses, whereas the nontarget stimuli did not. For patients in the unresponsive group, no P300 response was elicited by the target stimuli (see figure 4). These results and those in table 3 show that our BCI paradigm might indicate the prognosis for DOC patients to some extent but requires further study.
Recent studies have explored the effectiveness of BCI technology as a communication tool for patients with amyotrophic lateral sclerosis (ALS) [21,22]. For example, Sellers et al evaluated the effectiveness of a BCI operated by detecting a P300 elicited by one of four randomly presented stimuli (i.e. Yes, No, Pass and End) in three ALS patients and three controls. The offline analysis indicated that two of the three ALS patients' classification rates were equal to those achieved by the controls [22]. Nijboer et al reported that six ALS patients achieved mean online and offline accuracies of 62% and 82%, respectively, using a visual 6 × 6 P300 speller [21]. Several studies have also reported successful attempts to restore communication with brain-computer interfaces for LIS patients [17,23]. In Lulé et al's study [17], eighteen patients (three VS, thirteen MCS and two LIS) were evaluated with a fourchoice (Yes, No, Stop and Go) auditory oddball BCI. One LIS patient obtained a significant online accuracy of 60%, whereas the online accuracies for the other patients were not significantly higher than the chance level. De Massari et al proposed a two-class (Yes and No) paradigm based on the slow cortical potentials for communication [23]. The offline analysis results involving three LIS patients indicated that one patient obtained an accuracy (70%) significantly higher than the chance level, whereas the other two did not. If the subjects achieved an overall accuracy higher than or equal to 80%, they were deemed proficient in the BCI [24]. In this study, an MCS patient obtained an accuracy higher than 80% in the BCI experiment, whereas the others did not. Therefore, only this patient could proficiently use the system for communication.

Conclusion
Overall, our experimental results demonstrate that the BCIbased method can provide some degree of correction for a doctor's clinical diagnosis and that a more precise evaluation could be obtained for DOC patients by combining the JFK CRS-R-and BCI-based assessments. Furthermore, one MCS patient (P13 in this study) could proficiently use this simple communication BCI system to exchange messages with the outside world. In the future, we will establish a practical communication system for this patient.