The linear regression model to evaluate material moisture using reflectometric technique

The paper presents the linear regression model of moisture detection technique in the building material (clinker brick). In particular, the assumptions of linear regression model, which play very important role, are emphasized. The obtained data present the dependence between the moisture of clinker brick, evaluated gravimetrically and permittivity values determined by TDR technique. Using the achieved data, the optimal regression model is obtained and the advantages of applying the linear regression model are discussed.


Introduction
The problem of moisture in the building materials is still actual.It is directly related to the functioning of buildings and has a big influence to the environment.Therefore, development of techniques for detecting moisture in partitions, developing new methods, improvement and adaptation of the existing ones it seems to play very important role.
All of mentioned measurement techniques of moisture detection belong to the group of indirect methods, which means that the parameter being examined is not humidity, but an intermediate value, depending on humidity.In the case of the resistance technique, the measured parameter is electrical conductivity or electrical resistance of the porous medium placed between the measuring electrodes.In the case of the capacitive and reflectometric techniques, the measured parameter is  -the dielectric permittivity of the medium [3].
The resistance and capacitive measurement methods presented above have two advantages: low price and the possibility to perform quick, non-invasive humidity measurements.It is important that these two methods have a serious disadvantage that often makes measurement difficult, and sometimes even impossible.This is salinity sensitivity of the tested material, resulting from the low operating frequency of the device.This defect does not apply to the TDR measurement technique, i.e. an electrical technique operating on the principle of reflectometric measurement of the dielectric parameters of the medium, which is used to obtain the measurement results that will be examined in this work.The operation of the TDR technique is described in detail in the following articles [4][5][6].
One of the most popular methods of analyzing statistical data is regression analysis.The main idea of regression is prediction, forecasting data for some variable based on other variables.The simplest regression model is a linear model.In order to be able to apply it correctly, it is necessary to check the relevant assumptions.This article presents the possibility of implementing linear regression model for clinker bricks by checking the appropriate suppositions.This model presents the relation between the moisture θ and apparent permittivity ε.
Samples of clinker brick (220 mm × 120 mm × 40 mm) were dried to dry mass in laboratory oven.Then they were moistened to reach saturation state (θ = 16 vol.%)with 2 vol.% moisture step.For each moisture level, the apparent permittivity readouts were made using the TDR equipment.
All tests were performed at a constant temperature of 20 ± 1 °C and a relative air humidity of 50 ±5 vol.%.All TDR measurements were repeated 5 times for material moisture level.With the TDR equipment the dependencies between apparent permittivity and volumetric water content were established, which were noted as the simplest mathematical model of linear regression.
The general form of the linear regression model is following: where  -denotes the dependent variable,  -the independent variable, e -a random error and  0 ,  1 are the structural parameters of this model, cf.[8].However, in order to perform a valid linear regression analysis correctly, it is necessary to check the following assumptions [9]: • Linear relationship -i.e.there exists a linear relationship between the independent variable  and the dependent variable .If this condition is not met, the obtained results can be incorrectly interpreted and also the obtained predictions will be probably wrong.There are several tests to check the linearity of the model.The most popular of them are the Rainbow Test and the Harvey-Collier Test.In these tests, the null hypothesis is that there is a linear relationship between the dependent and independent variables.• The error term has conditional mean of zero -i.e. the expected value of the random component must be equal to 0. The errors are the difference between the observed and estimated values in regression analysis.This condition can be verified on the basis of the plot of the regression curve for the tested model.The regression curve should lie in the middle of the data points, then the sum errors will be zero.• The homoscedasticity -i.e. the variance of residuals is the same for all observations.Homoscedasticity allows the interpretation of whether the model predicts the dependent variable equally well for different values of the independent variable.There are several types of statistical tests verifying the phenomenon of homoscedasticity, the most commonly used is the Goldfeld-Quandt Test, the Harrison-McCabe Test and the Breusch-Pagan Test.The null hypothesis of these tests is that the variance of the residuals for each value of the independent variable is constant.• There is no autocorrelation between errors -this means the independence of observation errors, i.e. whether the residuals in the prediction of the dependent variable are correlated with each other.Well-fitted regression models assume that the resulting residuals are distributed randomly, without a consistent pattern.This assumption can be verified using the Durbin-Watson Test and the Breusch-Godfrey Test.The null hypothesis is that residuals from the regression are not autocorrelated.
• The residuals are normally distributed -this property is related to the analysis of the significance of coefficients of the regression analysis.This assumption is that the residuals should have a distribution close to the normal distribution.The normality of the distribution of residuals can be estimated using the Shapiro-Wilk Test and Jarque-Bera Test.The null hypothesis assumes that the research sample comes from a normally distributed population.• The number of cases must be greater than or equal to the number of parameters derived from the regression analysis -this condition is necessary to calculate the regression coefficients.In practice, it is assumed that there should be at least 15 or 20 observations per variable in a regression model.Thus, for a simple regression analysis with one predictor, the desired minimum sample size is 30-40 cases.
If all these conditions satisfied, the model is considered as correct, and the parameters of this model are treated as unbiased, efficient and consistent [10].

Results and discussion
Based on the results of experiment, which are presented in figure 1, it can be seen that there is a linear relationship between moisture and the apparent permittivity value of clinker brick obtained using the TDR setup of experiment.

Figure 1. The curve of linear regression for data
Therefore, a linear regression model can be created for the results obtained and its relevant assumptions analyzed.The linear regression model for clinker brick is as follows: ̂= −0.24 + 0.08 •  . ( Based on this model, it can be concluded that as the permittivity ε increases by 1 unit, the moisture θ of clinker brick increases by 0.08.The determination coefficient  2 = 0.95 means that 95 vol.% of the humidity is explained by permeability.This observation enables calibration of the reflectometric meter.To evaluate of the model quality a set of conditions ought to be checked to verify if the linear regression model is appropriate.These assumptions are as follows: • there exists a linear relationship between θ and ε; • the error term has conditional mean of zero; • the variance of residuals is the same for all observations; • there is no autocorrelation between errors; • the residuals are normally distributed; • the number of cases is greater than or equal to the number of parameters derived from the regression analysis.
All tests enabling verification of these conditions are presented in the table 1. Applying the Rainbow Test and the Harvey-Collier Test it can be noticed that there is no reason to reject the null hypothesis about the linearity of the model, cf.table 1.Moreover, the scatterplot of data also shows the linear relationship between the moisture and the permittivity, see figure 1.
During the construction of a linear regression model with an y-intercept (in this model  0 = −0.24), the method of least squares ensures that the arithmetic mean of the residuals is zero.Hence, this makes the assumption that the error term has conditional mean of zero is automatically fulfilled.Additionally, the regression curve lies in the middle of the data points, which also confirms this property, see figure 1.
The assumption of homoscedasticity also holds.Using the Goldfeld-Quandt Test, Harrison-McCabe Test and Breusch-Pagan Test it can be stated that there is no reason to reject the null hypothesis about the equality of variances, see table 1.Moreover, The Scale -Location plot and Residuals vs Fitted plot confirms that the variance of residuals is the same for all observations, see figure 2. The phenomenon of autocorrelation between errors is examined by the Durbin-Watson and the Breusch-Godfrey test, from which it follows that there is no reason to reject the null hypothesis that residuals from the regression are not autocorrelated, which means that this condition also holds, see table 1.Moreover, a serial error correlation is outlined in the Residuals vs Fitted plot, but it is not detected by the tests, see figure 2.
The residuals are normally distributed, which can be confirmed by the Shapiro-Wilk test and Jarque-Bera Test.By the terms of them there is no reason to reject the null hypothesis about the normality of the distribution of residuals, see table 1.Furthermore, the Normal Q-Q plot shows that the condition of normal distribution of residuals is satisfied, see figure 3. The last condition, i.e. the number of cases must be greater than or equal to the number of parameters derived from the regression analysis also holds, since there are 45 observations and two parameters derived from the regression model.
Therefore, all assumptions of a linear regression are satisfied, so the model is correct, and the parameters of this model can be treated as unbiased, efficient and consistent.There are many models representing the relationship between humidity and permeability.The most of them are polynomial models of the second order and higher.However, for some sensors and some materials, it is only possible to adopt a linear model, as it meets all the regression assumptions.In the literature, there are also such analyses, where the linear models, similar to the one presented in this article, are used, [11,12].

Conclusions
The analysis of regression is one of the statistical methods used to estimate the dependence between dependent and independent variable.In the paper is examined the relationship between permeability (independent variable) and moisture content (dependent variable) of clinker bricks.The correctness of the regression analysis results depends on the extent to which its assumptions are satisfied.In this paper a regression model for clinker bricks is presented and the regression assumptions are introduced.All assumptions of linear regression are satisfied, which means that obtained model fits the data well, so the moisture is well explained by the permittivity.

Figure 2 .
Figure 2. The Residuals vs Fitted plot and the Scale-Location plot

Table 1 .
The tests checking the assumptions of linear regression for the clinker brick using RStudio