Experience with modeling values of the virtual catapult range

The development of a virtual catapult involves the synthesis of knowledge from several fields, namely physics, probability, statistics, regression analysis, and computer science. Our contribution is not focused on the description of the creation of the application from the software point of view, but it presents a method by which it is possible to generate the numerical values of the range of the virtual catapult along with their corresponding variability. When modeling these range values, it is possible to choose the classical theoretical approach, i. e., the physical approach using equations of motion, or to statistically process the measured values obtained by experimenting with a real catapult model. In our case, a statistical approach, specifically the Design of Experiments method, was chosen to estimate the values of the catapult range. The obtained appropriate regression model was used as the output of the statistical analysis of the measured data for point estimates of the range at specific settings of selected parameters of a real catapult. In order for the results of the simulated range values to correspond to the actual range values of the catapult, it was necessary to achieve a realistic fluctuation (randomness) of the results in the simulation of the catapult ranges around the predicted range value obtained from the regression model. This article suggests a way to ensure stochasticity when modeling such types of systems.


Introduction
Our virtual catapult was created as an educational tool for students at a technical university [1].The values of simulated ranges will be used in teaching the fundamentals of planning and evaluating experiments, particularly in the statistical processing of the obtained values from the simulation.Due to time and organizational reasons, the real catapult will be replaced in the educational process by an application with a virtual catapult.Our students will be able to learn how to properly plan the experiment according to the principles of the Design of Experiments (DoE) method, i.e. which measured values of the catapult range should be measured at what settings of the catapult parameters.Then they will also learn how to statistically analyze the obtained numerical values and interpret the obtained results.Due to the statistical processing of the data, it is necessary that the simulated values for the selected parameters of the catapult correctly reflect the actual range values of the real catapult, even with their variability, which is present in the reality.To be able to model shooting from a catapult in the application created by us, it was necessary to select a suitable software platform, create an appropriate model of the catapult itself, which will allow changing the selected values of its parameters and also express the equation of the catapult's range and at the same time ensure the fluctuation of the data around the calculated predicted range values.When drafting this article, we would like to focus on these last two aspects.We intend to describe the way in which we obtained a mathematical model that expresses the in-flight distance for selected values of the parameters of a real catapult.Also, we present the possibility of ensuring the variability of values around these predicted values.
The distance that the ball flies after being fired from the catapult can be expressed by a suitable regression model that can approximate the measured values of the range of the catapult.The advantage of this approach compared to the calculation of the range using the physical equations of an oblique throw is that during the regression analysis it is possible to work with the actual measured values, because it is not necessary to consider various limiting conditions related to the simplification of the problem in the physic solution of the task.The requirement is to have enough measured values in suitable selected levels of considered parameters affecting the range of the catapult, which enables a statistical approach to solving the problem.

DoE
DoE is a branch of applied statistics that deals with the planning, execution, analysis, and interpretation of controlled experiments to evaluate the effect of factors on response.It is a powerful data collection and analysis tool that can be used in a variety of experimental situations.The DoE method has been used successfully in many fields, including science, engineering, manufacturing, quality control, and research.We focus on industrial applications in the optimization of production processes, in the design of electromechanical components, etc. [2][3][4].

Experimentation
A full factorial design of the 2^4 type was used [5][6][7], it means a full two-level experiment, where four independent variables (factors) were chosen: Aarm length, Blaunch angle, Celastic band tension and Dball weight.The influence of these factors on the response-range of the catapult was investigated.The list of factors and their levels used in the catapult experiment is in the table 1.We performed five replications in all factorial points.To verify whether a linear regression model would be sufficient, a central point with 10 replications was added to the original factor points. Figure 1 shows the arrangement in which we conducted the real catapult experiment.Eighty values of the range of the real catapult and ten values of the range of this catapult at the central point were measured, so a total of ninety measurements were made.The numerical values of the inflight distances in cm for the selected experimental points can be seen in the table 2. The results of the catapult experiment were analyzed using Minitab statistical software.Due to the statistical evaluation of the measured data, we had to work with coded factor values, which are written using ±1 after linear transformation where x is the factor in original engineering units, s is the center of the selected factor interval, and lambda is half its interval length.The central point has all parameters equal to zero.

Analysis of the results of the experiment
By statistical analyzing the measured data, we obtained the following regression model for predicting the range of the catapult.A backward elimination procedures was used, and by successively eliminating insignificant terms in the regression function, we obtained a model that is hierarchical.Due to the mentioned hierarchy, the interaction term of factors B and D (B is Tension of the elastic band and D is Weight of the ball) had to remain in our regression model.The regression function (, ) is written in coded units,  is the matrix of values of independent variables,  is the vector of model parameters.

𝑓(𝑋
The analysis of variance (ANOVA) results on table 3 shows that the model is significant and adequate.The originally considered quadratic terms did not have to be included in the model, therefore the p-value for Lack-of-Fit is 0.256 > 0.05.The curvature is insignificant because the p-value is 0.057, which confirms our previous conclusion.2 show the remaining significant factors and their interactions.The interaction of elastic band tension and weight of the ball was not excluded due to hierarchy.The VIF factor for all terms of the model is equal to 1.
Adjusted  adj 2 , that penalizes the model for redundant terms in the model and thus helps us to see, if we already have an overfitting regression model, is: where  is the total number of terms in the regression model and  is the sample size.
Predicted  pred 2 indicates how well a regression model predicts responses for new observations: where PRESS value, for a data set of size n, is calculated by omitting each observation individually and then the remaining n -1 observations are used to calculate a regression equation which is used to predict the value of the omitted response value: where   is i-th measured responses and  ̂() is predicted value of response for i-th omitted observation.This statistical metric was obtained by Minitab software.
All the metrics calculated above indicate that our regression model is good, not overfitting, and its predictive power of the model is large.
The graphic analysis of the residuals is shown in figure 3 and it also indicates that the model is correct.Normality of residuals, homoscedasticity and independence of residuals are checked here.Goodness of fit is confirmed via a normal probability plot for residuals that are placed closely on a straight line.The histogram of standardized residuals has its typical shape for a normal distribution.A plot of residuals versus predicted values indicates that the variance of residuals is approximately evenly distributed.The requirement that the residuals should be randomly scattered around the horizontal axis and without a visible pattern is met.The plot of residuals versus observed values over time shows no functional dependence.The first ten observations seem problematic, but the rest of the residuals over time appear to be fine.Because residuals are standardized, therefore they do not exceed values of ±3 and are mainly centered near 0. a measure of the tightness of linear dependence between simulated and control measured values, has a high value close to 1, which indicates a high degree of linear dependence between them.The fitted regression linear model is significant because the p-value for the F test is equal to 8.85E26.
Figure 5 shows the differences in measured and simulated in-flight distance values for the catapult ball.The differences ∆  =   control measured −   simulated for  from 1 to 40 are expressed in cm.The distributions measured and simulated in-flight distances are compared in the following Boxplots in figure 6, where the actual ranges from the control measurement for the real catapult are on the left and the simulated ranges of the virtual catapult are on the right.In addition to the common and simple visual approach to evaluating models through regression of predicted versus observed values via a scatter plot and statistics mentioned above, there is also a comparison of slope and intercept parameters in a linear fit [8].Because we need the simulation range vs the measured range to be in 1:1 ratio, the best fit line should be at a 45° angle that crosses the y-axis at zero.As a result, we expected that the parameters  1 = 1 and  0 = 0 in the regression model  ̂=  1  +  0 y.Based on this idea, we tested the significance of a slope equal to one and an intercept equal to zero.From table 4 of the regression analysis output, it is possible to read that the 95 % confidence interval for slope  1 is (0.89; 1.04), and 1 is in this interval.The 95 % confidence interval for intercept  0 is (-5.82;17.36), where 0 is in this interval.Based on these tests results, it is possible to accept the hypothesis that the measured vs. the predicted range values for the same catapult parameter levels are the same.

Conclusion
In our paper, we describe the method of modeling the stochastic system, specifically the way we ensured the variability of the in-flight distances for the catapult for multi-run simulations.The basic idea behind the stochastic approximation of catapult ranges was to use noisy observations to construct a stochastic response surface regression function.Therefore, we focused on calculating the range value of the catapult not via the physical equations of the trajectory of the projectile motion, but we tried to predict this in-flight distance using a regression model found by the Design of Experiments method.The advantage of this approach compared to the theoretical calculation of in-flight distances using physics is that all conditions are captured, so there is no need to consider any simplifications and limitations.
From the performed measurements, we were able to express the necessary corresponding variability of the range values of the catapult by statistical analysis of the measured data.
It should be noted that the disadvantage of our approach was the need to create a real catapult with variable parameter values and perform measurements according to the DoE methodology.It is also disadvantageous that our model of the virtual catapult corresponds only to the real catapult on which the measurements were carried out, but this is a typical feature of all simulation

Figure 1 .
Figure 1.Implementation of an experiment with a real catapult.

Figure 2 . 2 .
Figure 2. Normal Probability Plot and Pareto Chart of the standardized effects of factors and their interactions.

Figure 5 .
Figure 5.The differences in the measured range values and the range values from the simulation.

Figure 6 .
Figure 6.Box-plots for measured range values according to conrol measurement and simulated range values.
measured -y i simulated[cm]

Table 1 .
List of selected experimental control factors and their levels.

Table 2 .
Results of the experimental runs.
A Normal Probability Plot and a Pareto Chart in figure