Using the inverse distribution function method and the modified superposition method in the NMPUD computer system

This paper presents a computer system for modelling one-dimensional random variables NMPUD, developed in the laboratory of mathematical modelling of Lyceum No. 130 in Novosibirsk. The results of the numerical experiments and the considerations justifying the practicability for using in the NMPUD system: the elementary densities constructed by the technology of sequential (inserted) substitutions, the densities representing weighted sums of elementary densities (which can be simulated using the modified discrete superposition method), the algorithms for a piecewise linear approximation of unknown densities using a given sample, the algorithms of the modified superposition method for computational modelling of random variables with piecewise linear densities, are also presented.


The NMPUD computer system
When carrying out scientific research of many real processes and phenomena, it is important to construct an appropriate mathematical model. These models make it possible both to refine the parameters of practical experiments (and sometimes to replace these experiments) and to predict new effects. In turn, the methods of computational mathematics make it possible to perform analytical and to approximate calculations associated with a particular mathematical model on modern computers. Stochastic (probabilistic) models are an important class of modern mathematical models. They correspond to computational (numerical) algorithms for statistical modelling (or Monte Carlo methods). These algorithms always include the choice of low-cost numerical procedures for simulating sample values of the random parameters being introduced.
The NMPUD (Numerical Modelling of Probabilistic Univariate Distributions) computer system is being developed, starting in 2020, at the laboratory of mathematical modelling of Lyceum No. 130 in Novosibirsk [1,2]. The NMPUD system is mainly used as a useful (and even necessary) tool for selecting the important efficiently simulated distributions and is intended for researchers and students who study, develop and (or) use computer stochastic models to solve important applied problems.
The NMPUD system bank includes the formulas for probability densities  [3], section 2.6 in [4]] and section 2 in this paper) of random variables ∈ ( , ) and the corresponding formulas (less often -algorithms) for computational (numerical) modelling (simulation) of sample values 0 of these random variables of the form 0 = ( 0 ; ), (1.2) where 0 ∈ (0,1) is a standard random number (see section 1.1.1 in [3] and section 2.4 in [3]), i. e. a sample value of the random variable ∈ (0,1) uniformly distributed over the interval (0,1); this sample value is implemented on the computer using special routines (referred to in programming languages like RAND or RANDOM).
In formulas (1.1) and (1.2), the letter defines a parameter (a set of parameters) of the distribution. The set of parameters can include, among other things, the boundaries of the interval ( , ).
The bank also records such an important characteristic of formula (algorithm) (1.2) as the cost of modelling (the average time of one realization of formula (1.2) -in nanoseconds).
To study a particular example of density (1.1) and modelling formula (algorithm) (1.2), the user of the NMPUD system can apply the main page (see figure). At the bottom of the page (on the left side), the density formula (1.1) and the corresponding distribution interval ( , ) are presented. At the center of the page, one can see the graph of this function (the white curve).
The modelling formula (algorithm) (1.2) is shown (or introduced) at the bottom in the right part of the main page of the system. The histogram is created by repeatedly simulating sample values by the formula and displayed in yellow bars at the center of the page. This process can be observed in stages. The correctness of the modelling formula (algorithm) (1.2) is determined by the visual proximity of the constructed histogram and the density graph (1.1).
The average time of one realization of formula (algorithm) (1.2) in nanoseconds is also shown on the right side of the screen (the large green number).
There exists an instruction for introduction of density formulas (1.1) and modelling formulas (algorithms) (1.2) into the system for carrying out the necessary studies of the computational efficiency of these formulas.

Creating the bank of elementary densities for the NMPUD system
The main content of the density bank of the NMPUD system is the so-called elementary densities. We recall the corresponding definitions.
The inverse distribution function method for a random variable ∈ ( , ) with a distribution function  [3] and section 2.6 in [4]).
To form a sufficiently large bank of efficiently modelled elementary densities, it is appropriate to use the following technology of sequential (inserted) substitutions. The term technology of sequential (inserted) substitutions for method 1 is related to the fact that the resulting density (2.4) can be taken as initial density (2.3) and another transformation ( ) can be During the development of the NMPUD system, the subsystem for generating an unlimited number of densities of the form of (2.4) and corresponding modelling formulas of the form of (2.5) from the initial densities (2.3) using classical elementary functions was elaborated (here the nerdamer and mathjs symbolic algebra packages were used). The only limitation in this process is the time required for the computer implementation of these formulas, since the formulas of the form of (2.5) become more complex as the number of substitutions ( ) increases. Using the automation of the sequential (inserted) technology we have got over 1000 densities, and only about a hundred of them are simulated faster than 100 ns (this was the criterion for including these densities into the NMPUD system bank).

The modified superposition method
The formulas of the form of (2.1), (2.5) of the inverse distribution function method are the most commonly used ones in practical calculations (see, for example, [3], [4]).
During the development of the NMPUD system, the possibilities of using probability densities efficiently simulated by other (than the inverse distribution function method (2.1)) methods of computer simulation of random variables were discussed and investigated.
In particular, it turned out that the use of the so-called modified discrete superposition method is effective for a number of important cases.
This method is used when density (

The case of a small number of summands
Using the NMPUD computer system, we investigated the efficiency of algorithm 1 (in comparison with the formulas of the inverse distribution function method (2.1)) for the case of a small number of terms in sum (3.1).
Let, for example, = 2 and ( ; ) = 1 1 ( ; ) + 2 2 ( ; ); 0 < 1 < 1, 0 < 2 < 1; 1 + 2 = 1. Even more significant is the example of simulation of the distribution density of the scattering angle cosine for the Rayleigh law on molecular photon scattering in the atmosphere, which has the form ( ) = 3 8 (1 + 2 ), −1 < < 1 (see example 11.1 in [3]). The consideration of equation ( is 59.20 nanoseconds (i. e. more than five times less than for the inverse distribution function method). CONCLUSION 3. The densities of the form of (4.1) and the modelling algorithms of the form of (4.2) can significantly complete the bank of densities and modelling algorithms of the NMPUD computer system.

On the NMPUD system block for big data processing and modelling piecewise linear densities
As a part of the development of the NMPUD computer system, the concept of a special block of the system for big data processing was developed. Using the functional content of this block, one can solve several tasks at once.
In formula ( The "conditionality" of this optimization technology is due to the fact that on the left-hand side of the equation of the form of (5.2), not the error of the algorithm ( ) ( , ) is used, but its upper boundary ( ) ( , ). For example, for the so-called 2 -approach (here the fulfillment of the inequality (5.2) is assumed in the mean-square sense) the following forms of the conditionally optimal parameters were obtained in [5] using method 2: STATEMENT 1 (see section 1.6.4 in [3] and section 11.3 in [4]). For a random variable with the elementary composite density (in particular, with density (3.1), (5.5)), the inverse distribution function method is equal to the modified superposition method (algorithm 1) with the simulation of the number by the standard algorithm, consisting of subtracting the quantities ∑ =1 from 0 ∈ (0,1) to obtain the first negative value.
It is easy to obtain formulas of the inverse distribution function method for random variables ̃( ) with distribution densities ( ; ) from relations (5.5): From this, we get the following efficient algorithm for simulating a sample value 0 of a random variable having a piecewise linear distribution density (5.1). ALGORITHM 2. 1. Simulate a standard random number 0 ∈ (0,1) and using a suitable and most efficient algorithm for simulating the integer-valued discrete random variable with the distribution , find the number = such that 0 ∈ Δ = [∑ −1
2. Simulate the sample value 0 of the random variable ∈ ( , ) with the distribution density (5.1) by the formula .
We must take into account that for the approximations of the densities ̂( ); ∈ ( , ) by the sample {̂1,̂2, . . . ,̂̂} the number of the semi-intervals [ , ) in formula (5.1) is sufficiently large. Thus, for simulation of the number at the first step of algorithm 2, we should apply the quantile method or the Walker method (see sections 1.3.3, 1.3.4 in [3] and sections 10.6, 10.7 in [4]) instead of the standard algorithm with subtraction of sums ∑ =1 from 0 .
There is also an idea of using the polygon of frequencies of the form of (3.1), (5.5), in which the probabilities are equal to each other: 1 = 2 = . .. = = 1/ . This can significantly increase the efficiency of algorithm 2, but it makes it more difficult for finding the required conditionally optimal parameters (5.3) and (5.4). The practicability of using the identical probabilities requires a separate study.
All the described ideas are taken into account when developing the corresponding special block of the NMPUD computer system.

Conclusion
This paper presents the computer system for modelling one-dimensional random variables NMPUD, developed in the laboratory of mathematical modelling of Lyceum No. 130 in Novosibirsk. We have presented the results of numerical experiments and the considerations justifying the practicability of using in the NMPUD system: -the elementary densities constructed by the technology of sequential (inserted) substitutions; -the densities representing weighted sums of elementary densities, which can be simulated using the modified discrete superposition method; -the algorithms for the piecewise linear approximation of unknown densities using a given sample; -the algorithms of the modified superposition method for the computational simulating of random variables with piecewise linear densities.