Speech spectrum analyses for estimating operator functionality

The paper deals with the problem of estimating human operator functional characteristics using frequency analysis of operator’s speech. The analysis is carried out using operator’s speech transfer function based on the classical definition of transfer function in the automatic control theory. The algorithms for calculating the operator’s speech transfer function are presented, the experimental results are discussed.


Introduction
In recent years, there has been an intensive development of man-machine interface tools in technology, such as modern audio technologies [1][2][3], neural networks [4] etc. In [5], the correlation between human operator physiological state and variations of his speech are investigated.
This work is devoted to the analysis of speech signal in the frequency domain for the purpose of estimating operator functionality. The analysis is based on the speech transfer function, which is introduced by analogy with the transfer function of the classical theory of automatic control [6].

Algorithm for calculating a frequency transfer functions
Let us consider an audio record of a word and divide it into intervals of 20-40 ms. Every interval contains a sequence ( ), xN where is N -the number of samples. Then the following operations are performed [3,7]:  signal pre-filtering  Thus, on each time interval of 20 ... 40 ms duration, the speech signal is considered as a sequence of spectral density values averaged over the frequency band. This approach is based on the classical definition of the spectral density of the signal [6].
The algorithm described above is applied to each word realization, as a result, we obtain a matrix of dimension , When calculating the transfer function, we additionally carry out averaging for each frequency band over t N time intervals. Finally, we get the formula for the modulus of the transfer function between the first and second speakers in decibels (dB):

Dependence of the estimates of the speech transfer function on the structure and volume of speech material
To test the performance of the above algorithm, the following experiment was performed. Under normal conditions, the speaker pronounced the Russian words "пилотаж" (aerobatics), "масштаб" (scale), "навигация" (navigation) in 50 realizations of each. Note that the analysis of the above algorithm suggests that it does not depend on the language used. Indeed, in [7] for a similar algorithm using examples of Russian, Kazakh and English languages, it was shown that changing the language does not affect the accuracy of automatic word recognition.
The experiment was repeated after 4 hours, also under normal conditions. Based on the data obtained, the transfer functions were calculated separately for each word for all its implementations ( Figure 1) As you can see, the values of the function basically correspond to a unity gain ( level 0 dB), which indicates the constancy of the speaker properties. In this case, deviations of this do not exceed ± 1.5 dB. Figure 1 also characterizes the degree of dependence of estimates on the structure of speech material: deviations of estimates calculated for individual words relative to the average do not exceed ± 1 dB. To establish the dependence of the estimates averaged over three words on the number of realizations, we compared the values of the function calculated over 50 realizations and taken as a standard with estimates for n = 3, 5, 10, 15, 20, 25 realizations. The standard deviations (s.d.) of each estimate from the standard were considered as a measure of the error ( Table 1). The table shows that already with 3 realizations the s.d. of errors do not exceed 0.5 dB, which is sufficient to detect significant changes.

Influence of noise-protective headphones on speech
This section deals with the transfer functions for speakers in normal conditions, and for the same speakers in noise protection headphones.  The resulting graphs show the effect of noise protection headphones on the speaker. It should be noted that the reactions of different speakers differ from each other, however, typical is the change in amplitude up to ± 3 dB in the frequency range 6 ... 11 kHz.

Impact of noise on the state of a speaker over time
Consider the issue of changing the speaker's state during the experiment. To do this, a set of 150 words was divided into 4 parts, after which the transfer function for each part was calculated. The total time of the experiment was 2 minutes, the duration of one part was 0.5 minutes. During the experiment, the speaker was exposed to a low-frequency noise of 90 dB in the frequency band up to 2 kHz. In general, the graphs for each part are similar, but the amplitude of the graphs of the 4th part is lower. This can be explained by the fact that over time the speaker gets used to the noise and the volume of his voice decreases, that is, the Lombard effect weakens (Lombard effect -the instinctive rise of the speaker voice volume in noisy environment [1]).

The influence of hearing defects on the frequency properties of speech
Let us consider the application of the proposed function to identify the speech features of a group of helicopter pilots who have an occupational hearing disorder -hearing loss. A speaker who had no diagnosed diseases of hearing and speech was taken as a reference speaker. The results for the three speakers in this group are shown in Figure 6-8.   Comparison with the transfer function of a speaker without hearing diseases ( Figure 1, the spread is within ± 1.5 dB), shows that the graphs of the transfer functions for this group of speakers differ in a greater variation of values: for speaker D5 from -4 to 3 dB, for speaker D6 from -5 to 2 dB, for speaker D7 from -6 to 4 dB. Thus, the method considered in the article provides a good diagnostic feature for identifying speakers with hearing diseases based on their speech samples.

Conclusion
The paper reports the following main results:  an algorithm for calculating the speaker's speech transfer function, which enables to analyze changes in the speech frequency spectrum due to the external conditions or the speaker's state;  examples are presented that characterize the capabilities of the proposed speech transfer function to detect the state of the speaker external conditions or.