Assessment of human factor in critical infrastructures

Methodology for assessing human error probability under emergency conditions, taking into account internal and external factors, the degree of uncertainty, which can influence the reliability of a “man-machine” system, is presented. A model based on the HRA (Human reliability analysis) methodology is used. Probability of a human error is determined, which combines the nominal error and the influence of environmental conditions. The error is compared to the rated (design) permissible critical value dependent on the system state. It enables steadily monitoring the technical state of the system. In case of an unsatisfactory state, correcting actions are performed and probability for a human error is again assessed as well as the extent of the system improvement is tested. The methodology uses reliability criteria taking into account the degree of human involvement into the process of controlling critical infrastructures. The methodology includes efficiency analysis and assessment of the safeguarding system with regard to the human factor. Currently, there is no special documentation setting the procedure of human behavior under emergency conditions by means of quantitative methods. The methodology can serve the purpose of developing a practical manual for people having to make decisions; it can be used in training the personnel for acting in emergency situations. It is useful for assessing the influence of human factor on operation of critical infrastructures used for various purposes.


Introduction
The standard of the SSSRIEC 62508-2014 (the Russian State Standard Specification, the International Electrical Commission) [1] cites the first manual on system reliability in the part concerning human activities. The standard can be applied in any field of man-machine interaction. It is designed to be used by engineering personnel and heads of organizations. Consideration is given to ensuring reliability related to the Human Factor (HF) and aimed at methods of a human-centered approach to design-engineering and reliability improvement. These techniques can be used at all stages of a system life cycle. The document allows taking into account the human effect on system reliability, involving application of human engineering principles.
Analyzing the overall reliability of a technical system, it is necessary to take into consideration all aspects of human activity, including its positive features, constraints, opportunities and areas for improvement [2][3][4][5][6][7]. If operator is part of a complex technical system with safety ensured, all possible unfavourable consequences of outages caused by HF (errors, violations, omissions or malicious acts) are critical. The standard presents only qualitative methods and a brief review of foreign quantitative techniques. To date, the remaining challenge is developing manuals on quantitative assessment of HF.

Methodology for human factor assessment in critical infrastructures
The system the operation of which necessarily involves man-machine interaction includes man (operators/process controlling men), machine (equipment, mechanisms) as well as social and physical environment where the interaction takes place. Man can be involved in various stages of the technical system life cycle, he can influence its operational reliability by virtue of his activity and making decisions in accordance with the functional responsibilities (as a designer, a manual developer, a functional manager and observer, an operator, an instructor, a maintenance and repair force) [2, 8-13,17 -19].
Human error probability (HEP) is a measure of the likelihood that plant personnel will fail to initiate the correct, required, or specified action or response in a given situation, or by commission will perform a wrong action. The HEP is the probability of the human failure event. Performance shaping factor (PSF) is a factor that influences human performance and human error probabilities. In SPAR-H, this includes: time available, stress/stressors, complexity, experience/training, procedures, ergonomics/human-machine interface, fitness for duty, and work processes [2].
Negative PSFs in SPAR-H, negative performance shaping factors (PSFs) are those PSF values that increase the nominal value rate, i.e., the PSF values which are greater than 1 are referred to as negative PSFs and figure in conjunction with positive PSFs in the overall HEP calculation [2,3]. When the number of negative PSFs is three or greater, then the HEP adjustment factor is applied.

Probability assessment of human error under emergency conditions
Initial data: • The number of people (operators) involved into the system operation -n.
• Working time of the n-th operator -.
• Parameter characterizing reliability of the n-th operator -.
• Characteristics of the environmental influence on the human reliability -PSFs factors: PSF 1 , PSF 2 ,…, PSF m . • The number of external factors -m.
• Parameter describing the environmental conditions-PSF . • Permissible critical value of human error probability -HEP .

Main methodology stages
In accordance with the research in [8] (included into the book [8]), it is possible to define: 1. Setting the number of n individuals (operators) who can be involved into the failure (the failure can be caused by a chance, system outage, a human error, etc.).
2. Assessing probability of a human error with regard to internal and external factors which can influence human reliability. The methodology (algorithm) makes use of the hybrid model based on HRA methodology and which takes into account:1) internal human factors (based on the application of the Weibull Distribution function); 2) external factors related to the environmental conditions (using PSFs factors specifying the system performance).
In case 1, the Weibull Distribution function enables determining a human error, taking into consideration the difference between the first hour of the operator's work and other time periods: where HEP is the rated probability of the human error dependent on human behavior only; is a moment of the operator' work; is the parameter specifying reliability of the n-th operator; is the parameter defining the reliability curve shape (it is noted in [8] that human behaviour is best characterized using the Weibull Distribution function with β parameter equal to 1.5); is the parameter dependent on the reliability of the n-th operator ( values): Figure 1 a, b demonstrate 6 graphs of combining two Weibull distributions with various combinations of and parameters; this distribution functions' combination is used to determine probability of a human error in the intervals of 0 to 1 and 1 to 8 hours of the operator's work. Only internal factors are taken into account in the calculated rated probability of the human error. The product of all PSFs determines the total effect environmental conditions have on operators: Contextual probability of a human error, combining the human rated error and the influence of the environment is calculated by [2,4,8]: where HEP is the contextual probability of a human error dependent on human behavior and environmental conditions; HEP is the human rated probability of a human error dependent on human behavior only; PSF is the parameter determining the total effect environmental conditions have on operators.
3. Comparing the contextual probability value of a human error HEP and the rated permissible critical value HEP dependent on the system state. If the values HEP and HEP are close to each other, the system doesn't need any improvement. Otherwise improvement is necessary to reduce the probability of a human error.
4. Determining a set of KPIs (key performance indicators) for the system analysis (in particular, KPIs are to deal with external factors influencing probability of a human error. If the influence of PSFs factors representing environmental conditions is measurable, then they can be used as KPIs (otherwise they should be transformed into measurable elements).

5.
System improvement is performed in order to reduce probability of a human error. In doing so, KPIs indicators are monitored using, for example, an instrument panel allowing keeping track of KPIs values.
6. At this stage, a new probability of a human error is assessed and the extent of the system improvement is examined. If probability of a human mistake HEP in this case is lower than the rated critical value HEP lim , then this probability is considered to be permissible and the improvement measures taken to be successful.
7. Reliability assessment of the "man-machine" system [14]. Man is represented as a system component. Cases are specified for reliability assessment of the system with the interaction of technical facilities and a human operator, assuming that equipment failures and operator's errors are rare, simple, separate events. Occurrence of more than one failure is of the same type; failures in the course of the system operation from t 0 till t 0 + t are unfeasible, the operator's ability to compensate for errors and ensure the faultless operation is an independent operator's feature. If it is impossible to compensate for the errors made by a human operator and equipment failures, then probability of the system failure -free operation is is the probability of failure-free operation of technical facilities for a time ) 1 , ( 0 0 + t t ; P 0 (t) is the probability of the operator's errorless work in the course of time t provided the equipment operation is trouble-free; t 0 is the total system operation time; t is the time interval under consideration. With the immediate compensation for the operator's errors and р probability, the probability for the failure-free system operation is In case of only technical facilities' failure, the probability for the failure -free system operation is 3 0 compensate for the errors and failures has been improved. If systems are investigated with regard to the extent to which human involvement into the system control is uninterrupted, there can be distinguished corresponding reliability criteria for each of these "man-machine" system (MMS) types. Probability for a failure-free, errorless and appropriate progress of the process under control for a t time given is such a criterion for systems of the first type. This progress of the process is possible in the following cases: 1) there are no outages in the equipment operation; 2) there happened to be an outage but in the course the operator accurately and timely took the necessary measures to eliminate the emergency situation; 3) the operator made erroneous actions but corrected them just-in-time. In accordance with the designation assumed earlier, the reliability of the MMS will be written as follows scs ds corr.err rec P P P P = ⋅ ⋅ is the recoverability of the operator estimating by the probability of correction of the error, Р scs is the probability of a signal from the control system, Р ds is the probability that the operator will detect the signal, Р corr.err is the probability of correcting erroneous actions when the entire operation is again performed. For MMSs of the second type, the reliability criterion is probability for a failure -free, errorless and appropriate problem solution. The system can perform the task in the event that at the required moment the operator is ready to receive the incoming information and, in addition: 1) there was no equipment outages during the temporary stop and the time of performing the task, the operator took accurate and correct measures or 2) there was an equipment outage but the operator timely eliminated it and made no errors in solving the problem, or 3) in the case of failure-free operation of the equipment the operator made an error but timely and accurately compensated for it. Reliability calculations will look like probability. For the systems of the third type, the reliability criterion is similar to that for the second type. The task can be considered to be performed by the system if 1) at the moment required, the equipment was in operable condition, there was no equipment outage during the time of performing the task, the operator's actions were timely and errorless, or 2) the equipment which wasn't ready or was malfunctioning was timely reconditioned while the operators made an error; 3) there was no equipment outage but the operator made an error and timely compensated for it. In this case, reliability can be calculated by 3 r T toa T r ER toa T rec P K P P P P K P P P P P P = + − + − , where r K is the equipment availability. The correct and valid account of HF at each of these stages contributes to ensuring maximal efficiency and reliability. 8. Efficiency analysis of the safe guarding system (SGS). There is an uncertainty in determining probability of a human error. An interesting approach to allowing for HF in the SGS efficiency analysis is considered in [15,16]. Risk assessments of safety disturbance caused by human errors enable determining the value of residual entropy of the measures taken to eliminate HF, using: where P hf is the efficiency of safety hazard countermeasures, related to HF. The risk of violating the security of critical infrastructure with probability of a human error will be written as: hf log 1 , where C -the base of the logarithm.

Future work
Further investigations in modeling human behavior in stress situations for various fields of industry are necessary. There is an urgent need in developing an algorithm based on this methodology and its software for critical infrastructures used for various purposes. Further investigation and a complete implementation of the methodology proposed in a critical infrastructure of the fuel and energy sector is planned.

Сonclusion
A comprehensive procedure for assessing probability of a human error under emergency conditions has been developed, which is based on the human reliability analysis methodology taking into account human factors using a combination of the Weibull distributions functions) and external factors related to the environmental conditions (using a table of factors contributing to the system efficiency (serviceability). The Weibull function enables determining probability of a human error, taking into consideration the difference between the first hour of the operator's work and other time periods. The procedure assesses reliability of a "man-machine" system, efficiency of the safeguarding system in a critical infrastructure. Risk assessments of safety disturbance caused by human errors and the safeguarding system enable determining the value of residual entropy of the measures taken to eliminate HF. As nowadays, there is no special documentation setting the procedure of human behavior under emergency conditions by means of quantitative methods, the procedure proposed can be used to fill in the gap. It enables developing adequate and efficient techniques for mitigating harmful influence of human errors on safety of the critical infrastructures.