Ontology of experiment planning for obtaining a probabilistic model of single-phase electricity consumers

The paper presents the rationale and the problem of forming a probabilistic model of single-phase electricity consumers. To solve the problem, the corresponding mathematical apparatus is stated and the ontology of experiment planning is developed. The applicability of the prepared ontology was demonstrated on a specific example of an open dataset from the Intelligent Systems Subcommittee (ISS) IEEE. At the same time, the authors identified the characteristic modes of energy consumption for a particular residential building and data sets for direct construction of a probabilistic model of single-phase electricity consumers with total data loss caused by filtering less than 10%.


Introduction
Often, when solving practical problems, the effectiveness of computer modeling is determined by the degree of similarity of the characteristics of the model of the investigated object with its real physical characteristics [1][2]. This is especially important in instances when a system of automatic control of a physical object is built based on the results of modeling. To solve the problem of controlling interphase switching of single-phase electricity consumers [3], a model of the consumer behavior is required. For this purpose, this work presents the developed mathematical apparatus and ontology for planning an experiment to obtain a version of such a model with specific examples of calculations.

Materials and methods
Previously, in the work of the authors [3], the problem of controlling interphase switching of singlephase electricity consumers was solved, the essence of which is formulated as follows: • n non-industrial single-phase consumers (SC) of electricity are connected to the distribution network powered by the distribution transformer substation (TS); • each of the SC is provided with dynamic switching tools in relation to the phases L1, L2, L3 within the TS; • the implementation of the solution to the task of switching the SC is performed dynamically at regular intervals . The input data for finding a solution is: • original number combination of SC to TS connection phases: • a set of digitized load current oscillograms corresponding to each combination of connections (1) for all SCs The solution to the problem is to find a new combination of numbers of SC connections to TS phases for the time ′ ( ′ < ) in real time: in the implementation of which, in comparison with the original combination (1), the smallest values are provided: the number of switching of the SC in phases, the values of the indicators characterizing the asymmetry of currents and their total harmonic distortions with the same set of oscillograms (2). To find the combination (3), the presented problem is reduced to a combinatorial optimization problem, and the search for its solution is performed using binary algorithms of agent-based metaheuristics. The objective function for the problem is calculated based on (2) and is presented in [3]. Obviously, to perform the calculations in the stated problem, a system for generating sets (2) is required. The most widespread spectral decomposition of periodic functions is their representation in the form of a truncated segment of the Fourier series [4,5]: where ( ) is a periodic function with main period ; We assume 0 , 1 , … , +1 , … 2 to be random variables. In this case, each SC or element in (2) represents a random process that can be described by the corresponding random vector ̅ = ( 0 , 1 , … , +1 , … 2 ).
Then, to build the model (2), it becomes necessary to preliminarily estimate the statistical parameters of the distribution 0 , 1 , … , +1 , … 2 . The solution to the assessment problem assumes the presence of corresponding statistical observations, for which, by means of (4), a statistical sample of the form must be formed: The statistical estimates obtained on the basis of (5) will make it possible to determine the parameters of the distribution functions for each of 2 values as well as to provide the opportunity to generate random vectors corresponding to the statistical sample (5) and determining the characteristics of the load currents (2) for a given number of individual consumers in solving the problem of search (3). As a statistical sample for the formation (5), we will use the information on the amplitude spectrum of the current consumption from the open dataset Private Home 1 [6], provided by the Intelligent Systems Subcommittee (ISS) IEEE at [7].

Results
Generally, a means of constructing mathematical models of various processes in order to increase the efficiency of experimental research is the planning of an experiment [8]. It allows to increase the reliability of the research results [9] and to reduce the time and funds for the experiment. The main stages of the planned experiment [10] can be described as follows.
1) The goal of planning an experiment in this case is to obtain parameters for distribution functions (6) at a given level of confidence. 2) With regard to the conditions of the experiment, information is available that the tests were carried out on the basis of installation single-phase 5.75 kVA in the period from 3rd to 18th June 2011 for a single family housing, with a sampling period of 5 minutes by means of the analyzer quality of electrical energy Chauvin Arnoux 8335 (France). Single family housing has the following characteristics: a) number of people per house: 3 adult people; b) number of rooms in the house: 3 bedrooms, 3 bathrooms, 1 living room, 1 kitchen, 1 laundry room, 1 hallway and corridor; c) main electrical equipment used: 1 washer / dryer, 1 dishwasher, 1 electric stove, 1 electric oven, 1 refrigerator, 1 microwave, 1 range hood, 1 vacuum cleaner, 1 hair dryer, 3 TVs, 1 LCD display, 2 laptops , 1 router, 2 electric heaters.
3) Input data is time counts with a period of 5 minutes (registered parameter) and corresponding random combinations of equipment active/switched on during this period (non-registered parameter). The output data is the data of the amplitude spectrum of the current consumption. 4) The output data is presented as a set of 4320 measurements for each harmonic, their values are measured in amperes with an accuracy of representation to the fifth decimal place. 5) Obviously, when using the data from the open ISS IEEE sets indicated in the previous paragraph, the experiment can be considered passive, based on the registration of input and output parameters characterizing the object of research, without interfering with the experiment in the course of its conduct. In this case, experimental data is processed after the end of the experiment. 6) Statistical processing of the experimental results to construct a mathematical model of the behavior of the characteristics under study was performed as follows.
a) In the general dataset, all measurements related to the inactive period of residents were excluded: from 00:00 hours to 09:00 hours. This period of time is characterized by the fact that during its course the elements of electrical equipment with low energy consumption operate mainly in automatic mode, which does not determine the main trend of daily energy consumption. As a result, the amount of data for research was reduced to 2700 measurements for each harmonic. b) Based on RMS statistics for current consumption , calculated through the effective values of harmonics of various orders ℎ according to the formula (8) all measurements were ranked. c) For the sequence of data obtained in 6.b, subsamples were identified that define the typical energy consumption patterns for a residential building. For this: i. using the method of a sliding window of variable length [11,12], estimates of the significance level were calculated at which the set of measurements within the bounds obeyed one of the known distribution laws; ii.
for each of the laws involved, possible bound configurations were recorded (numbers of boundary elements and estimates of the level of significance); IOP Conf. for uncovered bounds (with a size of 100 or more elements and an insufficient level of significance), based on binary optimization algorithms, their elements were filtered to obtain a configuration corresponding to one of the known distribution laws with a significance level while maintaining the maximum possible number of elements.
The set of configurations obtained in this way determined the size, boundaries, data sets and distribution laws that determine the characteristic modes of energy consumption. d) In order to obtain configurations for the effective values of harmonics (within the limits of the power consumption modes defined in clause 6.c), the measurements were also filtered for compliance with one of the known distribution laws with a given level of significance using the maximum number of elements (similar to n. 6.c.iv).
The resulting set of configurations determined the numerical composition, boundaries, data sets and distribution laws that determine the characteristic modes of energy consumption at the level of effective harmonic values.

7)
An explanation of the results obtained and the formulation of recommendations for their use assumes that, based on the results of the implementation of the stages 6.a÷6.d obtained data sets are associated with statistical samples (5), on the basis of which estimates are calculated to determine the parameters of the distribution functions (6). In this case, the generation of random vectors (7) within each characteristic mode of energy consumption for a residential building with level of significance becomes trivial.   When constructing the graph (figure 2), bounds with the following configuration were displayed: window width -not less than 205 elements, confidence level -not less than 0.7. The distribution laws that sub-samples within bounds obey are indicated by colors: magenta for Rayleigh, blue for Uniform, red for Lognormal, black for Weibull, green for Exponential.
Subsequent data processing based on the graph (figure 2) made it possible to identify 6 typical power consumption modes, the configurations of which are presented in table 1. The first mode corresponds to the first (from left to right in figure 2) uncovered bound, which, according to the filtering results, best matches the Weibull distribution law. The second mode is also defined on the basis of the Weibull distribution law (in figure 2 -the first covered, black bound from left to right. The third mode is determined on the basis of the Lognormal distribution law (in figure 2 the first red box on the left). The fourth mode is determined on the basis of the Uniform distribution law (figure 2 shows a frame between the red and magenta bounds, partially covered with many blue bounds). The fifth mode is defined on the basis of the Rayleigh distribution law (in figure 2, the larger magenta frame). The sixth mode is determined on the basis of the Exponential distribution law (in figure 2 -the rightmost one, starting from the left border of the magenta frame and partially covered with a green frame). The complete final configurations of power consumption modes are presented in the corresponding rows of table 1. Total data filtering loss for a collection of 2700 values was less than 9%. In this case, the highest percentage of relative losses was obtained for the first mode, the share of value range of which in the total value range is only 0.36%.
The set of configurations for the effective values of harmonics (within the bounds determined by the configurations of the modes from Table 1) was obtained in a similar way by filtering the corresponding data sets based on modifications of the binary algorithms of agent-based metaheuristics [13][14][15] with total data losses within 10%.

Discussion
The applicability of the described experiment planning ontology for obtaining a probabilistic model of single-phase electricity consumers is demonstrated with the example of one of the open ISS IEEE sets. Its applicability in relation to other datasets, datasets of a harmonic nature from other subject areas, as well as the study of the possibility of constructing appropriate probabilistic models with high specified values of significance levels is the subject of further research.

Conclusion
The obvious results of the work presented by the authors are: 1) ontology of experiment planning to obtain a probabilistic model of single-phase electricity consumers; 2) a demonstration of its application in the example of one of the open ISS IEEE sets; 3) a list of typical energy consumption modes for a particular residential building, obtained on the basis of processing statistics of its energy consumption; 4) datasets for direct construction of a probabilistic model of single-phase electricity consumers, obtained on the basis of filtering measurement results by means of modifications of binary algorithms of agentbased metaheuristics with total losses less than 10% of the total volume of measurements.