Preliminary study of pilot stress and mental workload monitoring through physiological signals

The aviation industry is moving towards single-pilot operations due to increased operating expenses and a shortage of pilots. The necessity of developing a digital cockpit assistant leads to discovering methods to assess the stress and mental workload of pilots. This study used twenty-eight healthy volunteers to conduct preliminary computerised cognitive tasks while recording their physiological data for PPG, EDA, and temperature under four different stress and workload situations. The results highlight how they are sensible to a binary classification between a relaxed and more cognitively demanding condition.


Introduction
Human-machine interaction (HMI) systems have consistently prioritized achieving optimal safety levels.This focus on safety is driven by today's rapid expansion of the technology market, resulting in more frequent scenarios where operators interact with machines equipped with varying degrees of automation.It is crucial to maintain an adequate level of safety during such interactions [1].One of the most influential sectors in this context is aviation, according to the EASA AI Roadmap [2].In modern commercial aviation, two pilots share the cockpit and are assigned either Captain (CA) or First Officer (FO) roles.The CA has the primary legal responsibility for the aircraft's safe operation and has the final say in decision-making.In situations where the CA is unable to perform its duties, the FO takes control of the plane and is responsible for ensuring its safe operation.However, with the aviation industry's continuous evolution, there has been a growing emphasis on SPOs.[3].The term SPO stands for Single Pilot Operation, which denotes a scenario in which one pilot is responsible for flying an aircraft while receiving piloting support services from either equipment or ground operators.[4].According to the National Aeronautics and Space Administration (NASA) [5], SPOs would make it possible to address two major issues affecting the aviation industry today, namely the declining trend in pilot availability and the rising operational expenses, particularly on short and medium-haul flights [6].Hence, it's crucial to create a cockpit assistant capable of understanding the cognitive workload of pilots and their capacity to control the aircraft, potentially supplanting the current First Officer role.[7].However, estimating an operator's mental workload (MWL) is complex.This context is demonstrated by a unique way to assess this cognitive condition, which still needs to be added to the literature.In particular, three strategies are mainly adopted to assess MWL: subjective questionnaires, behavioural analysis (by exploiting the so-called primary and secondary tasks evaluation methods), and physiological measures [8].The study of the physiological response to alterations in cognitive load is an increasingly crucial approach, particularly with today's expansion of the biomedical industry.However, a thorough method for evaluating this mental status has yet to be discovered because of the complexity and diverse nature of the cognitive workload, necessitating the development of new approaches to address this problem.Indeed, physiological signals such as cardiorespiratory measures, brain and electrodermal activity, body temperature, and monitoring of eye parameters have been investigated in recent years to study their relationship with MWL.Nevertheless, a robust solution to this issue has yet to be found [9] [11] [13].Hence, this paper's primary purpose is to examine the correlation between stress variation and mental workload by analyzing three physiological signals: heart rate, skin activity, and body temperature.These particular signals have been chosen due to the non-invasive nature of the sensors, which is vital for potential future applications within SPO's cockpit.Due to the lack of reliable and consistent solutions for this kind of application, we decided to perform this preliminary study by exploiting two specific cognitive tests, Stroop and N-Back, on a population of 32 volunteer subjects.These two tests were adopted to induce different levels of external stress and MWL, respectively.This paper is organized into the following sections: Section 2 introduces a brief state-of-the-art analysis related to stress and MWL assessment in aviation; Section 3 presents our methods adopted in this preliminary study; Section 4 reports the obtained results.Section 5 discusses the conclusions.

Background
Due to the disruptive growth of Human-AI interaction systems in the aviation sector and, thus, the necessity of pilots' stress and MWL monitoring [2], it is necessary to clarify what these terms refer to.For many years, the Human Factors and Ergonomics (HFE) field has explored the idea of mental workload.This condition has no one single definition because of its complexity and multidimensional character.In fact, over the years, a number of researchers have attempted to establish MWL as a common ground by offering various definitions pertinent to their respective application domains.Nevertheless, a comprehensive study developed by Van Acker et al. [15] tried to find a general framework by defining MWL as a physiological processing state that is experienced subjectively and shows how one's finite and diverse cognitive resources interact with the cognitive task demands they are being exposed to.In particular, as shown in Figure 1, it is clear how MWL is part of the decision-making process, where attention and working memory are significant factors that limit operators in obtaining and interpreting information from the environment.Mental models and goal-directed behavior represent key methods for overcoming these limitations [16].Furthermore, Figure 1 shows how, in this case, the workload of the pilot and stress conditions can affect the correct perception of the future status of events, inducing errors in the decision-making process.Also, there is no unique definition of stress.In fact, this condition is different for different people in different situations and depends on the application field [18].In the aviation field, a definition can be found by Martins et al., defining stress as "the response of the body to stimuli that affect the normal physiological balance of a person, causing physical, mental or emotional strain" [19].Assessing this condition can be challenging as what may cause stress for one individual might not be the same for another.

Stress and mental workload assessment methods
As reported in [10], aviation safety requires a collective effort across the industry due to the complex and subjective nature of mental health.MWL and stress could not be seen as two distinct things.The Debie et al. model [8] views stress as a depleting element that influences an operator's MWL, suggesting a clear cause-and-effect relationship between the two.The literature shows primarily three methods for evaluating pilots' stress and MWL: subjective evaluations, behavioral analysis, and physiological measurements.Post-performance questionnaires, consisting of ad-hoc questions that gauge a pilot's subjective perception of mental workload and stress, are commonly utilized as an evaluation tool.This approach is favored because it's relatively easy to implement and cost-effective [20].Behavioral measures involve monitoring a pilot's conduct during a flight and comparing them with a predetermined mission plan.By checking the amount of wrong or right actions, it is possible to infer the MWL and stress levels of the operators [21].We can obtain information about these two cognitive conditions by processing physiological signals.It is a method that is gaining more and more interest in recent years thanks to the rapid expansion of biomedical technologies [22].The literature reveals several indicators that are responsive, albeit distinctively, to fluctuations in both mental workload (MWL) and stress.These signals include heart rate, skin conductance, eye movement, brain activity, respiration, body temperature, muscle activation, and voice patterns, as demonstrated in [12].However, a complete, reliable physiologically based solution is still lacking [13].

Materials and Methods
This section presents the materials and methods adopted in this analysis.In particular, this paper aims to expand the research already shown in the previous work described in [14].

Test characteristics
In this study, two different computerized cognitive tests widely adopted in this research area were performed to stimulate the participants' different levels of stress and mental workload.The Stroop test [23] is a method utilized to replicate external stress situations.It involves displaying a sequence of color-related words written in the language of the country where the test is being conducted but colored with different hues and requesting the user to click on the button that represents the color of the word.The N-Back test [24] is used to induce MWL.The test involves a square grid projection comprising nine boxes on the screen.A grey square will move within the grid and change its position every 2.25 seconds.Participants are instructed to press a button when the square returns to the same position as it was 1, 2, or 3 steps before.The test is

Population involved and equipment
The study was conducted with the approval of the Politecnico di Torino ethics committee (P.N. 1606).A total of 32 participants (64% men and 36% women) were recruited, with a mean age of 26.The g.tech hardware and software equipment was used to acquire synchronized multichannel signals.The g.HIAMP 144 Biosignal Amplifier was used as the signal acquisition device with sensors listed in Table 1.The sampling frequency was fixed at 1200 Hz, a notch filter at 50 Hz was applied to all, and digital filters were applied as per Table 1.Four participants were excluded due to issues during acquisition.

Procedure and signal processing
We followed the procedure described in [14].Initially, the volunteers' physiological baseline was established through a relaxation phase.The first test administered was the Stroop test, which consisted of three sub-phases with increasing external stressors.Following this, a five-minute rest phase was implemented to restore the physiological signal to the baseline level.Subsequently, the N-Back test was conducted, which comprised a visual, an auditory, and a dual phase (where both visual and auditory elements were combined).Each phase of the N-Back test was further divided into three sub-phases, corresponding to the 1, 2, and 3 back concepts.

Signal Processing
Also, in this case, we followed the procedure described in detail in [14].We extracted key characteristics pertaining to heart activity, skin activity, and body temperature.Heart monitoring characteristics were analyzed using PPG signal processing, yielding 19 features associated with PPG shape, Heart Rate (HR), and Heart Rate Variability (HRV).Parameters such as mean, median, and standard deviation were computed for PPG amplitude, duration, and rise time.Subsequently, the beat per minute (BPM) trend was analyzed to extract HR and HRV features in the time domain.Further assessment involved evaluating HRV features in the frequency domain through spectrogram analysis of the BPM.We opted to analyze Electrodermal Activity (EDA) to monitor skin activity, which comprises slow and fast components: Skin Conductance Level (SCL) and Skin Conductance Response (SCR).For SCL analysis, mean, standard deviation, and slope were determined.Additionally, the mean and standard deviation of SCR peak amplitude and rise time, along with the average number of peaks per phase, were evaluated.We also examined certain aspects of body temperature raw signals and their first derivative, including initial, final, delta (final minus initial), mean, standard deviation, variation over time (change in value over a specified time interval), and variation over time slope (the first coefficient of linear regression of temperature or its first derivative within the relative phase).

Results
The previous section detailed the signal-processing approach that resulted in the extraction of 43 features for every participant across each testing phase.To analyze the outcomes, we chose to treat each test individually, necessitating the consideration of four distinct datasets: Stroop, N-Back Visual, N-Back Auditory, and N-Back Dual.Each of these tests was further segmented into three smaller components.Therefore, it becomes evident that each dataset comprised an equal volume of data partitioned across the four test segments, inclusive of an additional rest phase derived from the physiological data collected during the initial relaxation phase, as elucidated in [14].

Features statistical analysis
Once the four datasets mentioned above were composed, we decided to implement a statistical analysis to understand if the features obtained for each phase (corresponding to different external stress and mental workload conditions) show some differences among their distributions.This procedure allows us to understand if there is a link between the variation of external induced stress or MWL and the considered physiological signals.The same process is repeated four times to analyse the features' significance in the Stroop and the Visual, Auditory, and Dual N-Back tests.The procedure is structured as follows: (i) The first step was to implement a Kruskal-Wallis (K-W) test [25] for each feature for each dataset between the four distributions of data corresponding to the four phases of the test.The choice of the K-W test is justified by the impossibility of assuming a normal distribution of the data.More specifically, if the p-value output was p < 0.05, we considered the test significant and could deeply analyse our tests' effect.The positive outcome of the K-W means that among the four distributions, at least one of them is different from the others; therefore, our tests generated a physiological response to the induced stress (related to the Stroop) or MWL (related to N-Backs).Two examples are reported in Figure 2.  (ii) The second step of our procedure was related to a more precise and deeper analysis to understand how many phases of the tests show statistical differences with respect to the others.In fact, for each dataset and feature that obtained a positive result in the K-W test, we adopted a Wilcoxon Paired (W-P) comparison [26] between each possible paired combination between the four test phases.( Moreover, a further step was to perform the same procedure considering in the W-P comparison the combinations related to the rest and the other phases (Rest vs. Stroop 1, Rest vs. Stroop 2, and Rest vs. Stroop 3); the results are reported in Table 3.By observing Table 2, it is possible to notice that from the fourth row, there is a significant drop in the number of features, giving a positive output from the W-P comparison.This behaviour can be explained in Table 3.It is almost a correspondence with the number of features that demonstrated significance in the rest of the comparison.Considering the initial forty-three features, nearly half of them are sensible in discriminating rest and all the other three more cognitively demanding phases.Therefore, a first consideration emerging from this analysis is that a binary classification between rest and cognitive demanding phases is possible through this physiological approach.At the same time, only a few features can distinguish the different mental challenging stages (represented by the last three rows of Table 2).Another consideration is evaluating how many features are sensitive both to stress and MWL.This can be achieved by comparing the relevant features corresponding to the fourth row of Table 3, resulting in 16 characteristics.These features encompass various parameters such as the standard deviation of PPG shape's amplitude, duration, and rise time; BPM standard deviation and pNN50 related to time-domain HRV; standard deviation of LF, HF, and LF/HF related to frequency-domain HRV; mean and standard deviation of SCL; initial, delta, standard deviation, and slope of body temperature; and finally, reaction time.

Conclusion
It is gaining increasing attention in aviation due to the possible transition towards Single Pilot Operations.However, to reach this goal, there are some barriers to overcome, including the development of a digital cockpit assistant that is able to replace the figure of today's First Officer.In this context, being conscious of pilots' mental workload and stress levels during the flights is essential.Nowadays, there are three ways to assess these mental conditions: subjective evaluations, behavioural measures, and physiological evaluations.Of these methods, examining signals obtained from the human body has emerged as a dependable option, benefiting from the expansion of the biomedical market in recent years.This growth has led to the availability of more affordable, compact, and trustworthy sensors and technologies.This situation is opening new spaces for research to integrate the physiological approach to the necessity of the aeronautical field in the transition toward SPOs.A reliable solution that links the variation of human body parameters and perceived stress and MWL must still be included.Therefore, this paper aims to enhance the preliminary analysis introduced in [14] by performing a deeper statistical analysis related to the physiological approach to assessing a person's mental workload and stress levels.We demonstrated how a binary classification between a rest and a more cognitively demanding condition is possible, only considering three physiological signals: PPG, EDA, and body temperature.This study highlights the feasibility of the physiological multimodal approach to evaluating stress and MWL.The next step is to expand the number of involved participants and the acquired physiological signals in order to understand the most significant.It is the first step to bringing this approach to designing the next-generation cockpit for SPO's aircraft.

Figure 1 :
Figure 1: This figure shows the decision-making process.Situation Awareness, Mental Workload, and Stress influence an operator's response to external stimuli [17].
BPM mean distributions during the Auditory N-Back test phases.

Figure 2 :
Figure 2: This Figure presents two examples of the K-W test.Figure 2a related to the variation of the SCL mean during the Stroop test, while Figure 2b is related to the BPM mean during the Auditory N-Back test.

Table 1 :
This table reports the g.tec sensors adopted and the high-and low-pass frequencies considered in the g.Recorder software during the real-time acquisition sessions.

Table 2 :
for example, Rest vs. Stroop 1, Rest vs. Stroop 2, Rest vs. Stroop 3, Stroop 1 vs. Stroop 2, etc.).We considered a statistically significant difference between the two populations when p < 0.05.(iii)Thefindings derived from this analysis are presented in Table2.The initial row displays the features that exhibited significant results from the Kruskal-Wallis test.Subsequent rows indicate the count of characteristics that displayed one, two, and so forth positive outcomes in each pairwise comparison.The outcomes generated by the statistical method outlined are presented in the following table.It displays the number of significant features identified for each dataset, depicted in the columns.

Table 3 :
The outcomes derived from the statistical method utilized to explore potential binary classification are presented in the following table.It outlines the number of significant features identified for each dataset, depicted in the columns.