Comparison of regression models of linear and polynomial dependencies in moisture detection technique

The article presents the description of moisture detection technique in building materials and comparison of two techniques of analysis. More precisely, two models of linear regression, with linear and polynomial dependencies are presented. The readouts present the dependence between the moisture of ceramic brick, evaluated gravimetrically and permittivity values determined by TDR technique. The correlation between the moisture content estimated by the TDR and the moisture content determined gravimetrically is shown. Using the achieved data, regression models are compared and the quality of both models is determined.


Introduction
Problems with air and wall moisture make it necessary to measure moisture in building partitions.This makes it possible to quantify the scale of this phenomenon, determine its sources, and, most importantly, determine ways to counteract its negative effects and ensure the required indoor air quality.For this reason, the development of measurement techniques to enable fast and possibly noninvasive moisture detection seems to be an important issue both from the point of view of environmental engineering and construction.There exist a lot of techniques for measuring moisture [1], [2].One of the most important is the reflectometry technique (TDR -Time Domain Reflectometry).This is one of the indirect methods, which means that the parameter being tested is not moisture, but an indirect value that depends on moisture.With the TDR apparatus, appropriate software and calibration equations, it is possible to conduct moisture tests in materials and building partitions, e.g.[3].
The essence of the measurement using the TDR technique is to determine the dielectric permittivity of the medium based on the measurement of the time of flight of the electromagnetic pulse through the rods of the measuring probes.The dielectric permittivity ε [-] is a measure of the behavior of matter particles when an external alternating electric field is applied [4].As a result of applying an electric field to the medium, the water molecules rotate in the direction of the applied field.
The relationship between the dielectric parameters exhibited by wet porous media and the humidity of the medium is most often depicted in the form of physical and empirical models [5], [6].The advantage of physical models is a certain independence from calibration tests.The disadvantages include a general, often complicated mathematical description.According to the original studies describing the dielectric parameters of porous media, treated as a ternary mixture [7] the resultant (effective) dielectric permittivity of a porous medium can be described by the following formulas: where:   -dielectric permittivities of individual phases (solid, liquid and gas),   -volume fractions of individual phases.Eq. ( 1) describes the so-called lower limit, and the formula Eq. (2) describes the upper limit of the effective dielectric permittivity of the mixture.Real mixtures show an effective permittivity within the designated range these equations.Another approach are empirical models, created on the basis of laboratory measurements and based on correlating the results of moisture measurements using the gravimetric method with dielectric permittivity readings.The most frequently cited empirical models used in the practical assessment of medium humidity include the Topp model [4] which takes the form of a third degree polynomial: where:  -volumetric water content in the tested porous medium [cm 3 /cm 3 ],  -dielectric permittivity of the medium measured with the TDR technique [-].The volumetric moisture in Eq. ( 3) depends solely on the dielectric permittivity.However, this method does not always give satisfactory results.An alternative model, which allowed to increase the accuracy of the measurement, was proposed in the work [8].This model took into account the density of the material in a dry state, which allowed for a more accurate mapping moisture-dielectric permittivity relationship in materials characterized by various properties of the solid phase [9].The model is expressed in the form of a semi-empirical mathematical formula: where:  -dry material density [g/cm 3 ].The semi-empirical models presented above are characterized by great versatility, which is why they are commonly used in reflectometric research.They enable measurements of the moisture of porous media with acceptable accuracy.The aim of the article is to compare two models that vary the relationship between permittivity and material moisture, with an indication of the quality of their fit ( 2 ) to the measurement data, measurement uncertainties expressed as the Residual Standard Error (RSE) and the Root Mean Square Error (RMSE), and the significance of the coefficients.

Materials and methods
The following materials and equipment were applied for the research: red brick samples (apparent density 1500 kg/m 3 ); TDR multimeter manufactured by ETest, Lublin, Poland; own construction TDR surface sensor [10]; laboratory oven VO-500 manufactured by Memmert, Germany, laboratory scale WPT 6C1 manufactured by Radwag, Poland and PC serving as control station.
Samples of red brick having the following dimensions 220 mm × 120 mm × 40 mm were dried to constant mass.Then they were saturated to reach the moisture status equal 36 vol.%.For each moisture level, including zero, the apparent permittivity tests were conducted using the TDR equipment.
The parameters of the test were as follows: a constant temperature of 20 ±1° C and a relative air humidity of 50 ±5 vol.%.The sample was measured by means of the TDR setup.For the purposes of statistical analysis, the measurements were repeated 5 times.During the investigation a set of TDR waveforms was acquired to determine the time intervals of signal propagation which were in turn recalculated into apparent permittivity using the following formula [11]: where c -light velocity in vacuum [m/s], tp -time of signal propagation along the sensor [s], L -distance between the TDR sensor markers [m].
With the measurement the dependencies between material moisture evaluated gravimetrically and apparent permittivity values were established, which can be noted in the following general form: more commonly for the linear regression written as: and for the linear regression using the second degree polynomial as follows: where  denotes the dependent variable,  denotes the independent variable,  its realization and the random error.Symbols β0, β1, and β2 denote the structural parameters of the model and are determined, i.e., by the method of least squares.This method consists on minimizing the sum of the squares of the distances of all points from the line being determined.The dependency model for an element sample corresponding to Eq. ( 7) is of the form: where yithe value of the explained (dependent) variable for the i-th observation, xivalue of the explanatory (independent) variable for the i-th observation, ei -random disturbance of the distribution (0,  2 ), which is independent and symbols β0 and β1 denote the structural parameters of the model.The analogous dependency occurs for the case Eq. ( 8).
In the linear regression model given by Eq. ( 7), the coefficient β1 is known as the slope coefficient.The value of the slope determines what is the effect of a unit change in the independent variable on the dependent variable.If β1 is positive, an increase in x by a unit means that, on average, can be expected an increase in y by β1 units.If β1 is negative, then as x increases by a unit, y on average decreases by β1 units.The coefficient β 0 , known as y-intercept, provides information about what value the dependent variable can assume if the predictor has a value 0. Often it does not make any sense, because it may be outside the data range and in this case the intercept cannot be interpreted in a meaningful way.
One of the most important measures of fit of the regression model are: the determination coefficient  2 , F test, RSE and RMSE -which express the measurement uncertainties.The determination coefficient is given by the formula: where   -i-th observation of the variable y,  ̂theoretical value of the dependent variable (based on the model),  ̅ -the arithmetic mean of the empirical values of the dependent variable and n -number of observations.This coefficient determines what part (percentage) of the variance of the dependent variable was explained by the independent variables (i.e. the model).The closer to 100 vol.%, the better the regression model describes the behaviour of the examined dependent variable.The basic tool for estimating the significance of all variables in the model is the analysis of variance test (F test).This test verifies hypotheses: H0:  1 =  2 =. . .=   = 0, H1: ∃ ∈{1,2,…,}   ≠ 0. Assuming the hypothesis H0 is true, the statistic F follows the Fisher-Snedecor distribution with k and n -(k + 1) degrees of freedom.The null hypothesis H0 can also be understood as the statement that the impact of all independent variables on variable Y is negligible, as opposed to the alternative hypothesis H1 that the impact of at least one independent variable on variable Y is significant.Rejection of the null hypothesis means that at least one regression coefficient is significantly different from zero, so the linear relationship between the dependent variable and at least one independent variable is statistically significant.
The Residual Standard Error (RSE) is a measure how well a regression model fits a dataset.RSE is given by the formula: where   -i-th observation of the variable y,  ̂theoretical value of the dependent variable (based on the model) and df -the degrees of freedom, which is the sample size minus the number of parameters trying to estimate.The smaller RSE, the better a regression model fits a dataset and conversely, the higher RSE, the worse a regression model fits a dataset.A regression model with small RSE will have data points closely located around the fitted regression line.The residuals of this model (the difference between the observed values and the predicted values) will be small, which means RSE will also be small.On the other hand, a large RSE means that the data points are more loosely scattered around the fitted regression line.In this situation the residuals of this model will be larger, hence the residual standard error will also be larger.
The Root Mean Square Error (RMSE) is the standard deviation of the residuals (errors of model in predicting quantitative data).RMSE shows how concentrated the data is around the line of best fit.RMSE is given by the formula: where   -i-th observation of the variable y,  ̂theoretical value of the dependent variable (based on the model) and n -number of observations.The value of RMSE is always nonnegative and the lower the RMSE, the better the model.It is worth to mention, that RMSE is a square root of Mean Square Error (MSE) [12], [13].All statistical analysis included in this paper were carried out using RStudio [14].The analysis concerns the values of readouts of the dielectric permittivity by the TDR sensor of the ceramic brick.The regression models including first and second, respectively, degree polynomial were analyzed.The relationship between dielectric permittivity (ε) and moisture (θ) in model 1 and model 2 is presented in figure 1.The blue lines represent the regression curves and the shaded areas 95 vol.% confidence intervals.

Results and discussion
The first regression formula representing dependences between moisture (θ) and dielectric permittivity (ε) was of the following form:  = −0.14+ 0.06. ( It means that with an increase in permittivity by 1 unit, the moisture increases by 0.06.The determination coefficient  2 = 0.92 means that 92 vol.% of the sample variation of the dependent variable is explained by the variability of the model (in this case, the variability of the  feature).
The second regression model including the second degree polynomial had the form: The determination coefficient  2 = 0.97 means that 97 vol.% of the sample variation of the dependent variable is explained by the of the model (the variability of the  feature).The summary of both models is presented in table 1

Conclusions
According to the investigation including in this paper, the following conclusions may be formulated: • The determination coefficients suggest that the second degree polynomial better fits the data.
• Both models had statistically significant coefficients.
• The comparison of parameters which express uncertainty, namely the RSE and RMSE, also confirms that the second degree polynomial provides better model.• The better fit of model with the second degree polynomial is also visible in the scatter plots.
• Although the model 2 is better, the model 1 is also suitable in this case.
• Both regression formulas can be treated as calibration formulas for practical applications using TDR sensing technique.

Figure 1 .
Figure 1.Scatter plots showing the relationship between permittivity and moisture content; (a) in model 1 (b) in model 2.

Table 1 .
and table 2. Regression formulas representing dependences between moisture and dielectric permittivity.According to table 1, the determination coefficients R 2 for the linear regression model with second degree polynomial is greater than for the first degree (difference of 5 vol.% percentage points), so the model 2 better fits the data.It better describes the behavior of the examined dependent variable than the first one.Both models had at least one statistically significant coefficient, because in F-test in both cases p-value < 2.2 • 10 −16 .

Table 2 .
P-value of particular coefficients.In particular, all of coefficients in both models are statistically significant, which was confirmed by the results gathered in table 2. Again according to table 1, the RSE value in the first regression model was equal to 3.12 vol.% and RMSE was equal to 3.09 vol.%.In the second regression model the RSE value was equal 2.09 vol.% and RMSE equal 2.05 vol.%.The second model had smaller values of RSE and RMSE, so it suggests that the regression model with second degree polynomial is a better fitted model.The scatter plot presented in figure 1 also confirms that the second model better fit the data than the first one.

Table 3 .
Regression formulas representing dependences between moisture and dielectric permittivity -residuals.Table3presents results about residuals.The analysis of the minimum, maximum and the values of the quartiles also confirms the validity of the second model over the first one, because the differences in deviations are smaller.