Bayesian estimation of linear and nonlinear mixed models of fertilizer dosing with independent normally distributed random components

Many linear and nonlinear mixed response models are proposed to predict the optimum dose of fertilizer. However, a major restriction of this class of models is the normality assumption of the random parameter component. The purpose of this paper is to analyze the performance of linear and nonlinear mixed models of fertilizer dosing with independent normally distributed random parameter components. We compare the Linear Plateau, Spillman-Mitscherlich, and Quadratic random parameter models with different random effects distribution assumption, i.e. the normal, Student-t, slash, and contaminated normal distributions and the random errors following their symmetric normal independent distributions. The method is applied to datasets of multi-location trials of potassium fertilization of soybeans. The results show that the Student-t Spillman-Mitscherlich Response Model is the best model for soybean yield prediction.


Introduction
Many linear and nonlinear response models are commonly used to predict the optimum dose of fertilizer. One modeling approach is to fit a general quadratic form to the data by means of least squares under the assumption of a fixed effects model with independent normally distributed random error term with constant variances ([1]- [2]). However, this approach is unrealistic because it neglects the variability that probably exists between sites and/or years.
An alternative model is the mixed effects approach ([3]- [6]). This approach allows the parameters to have a random effect component that represent between sites or years variability. The random parameter models have been found to outperform the fixed parameter models to model dose-response relationships ( [5], [7]- [8]). Furthermore, the quadratic functional form commonly used is not always the best model. [7] and [9] showed that the stochastic linear plateau model and the Mitscherlich exponential type functions outperform the quadratic form. In a similar vein, [8] showed that the stochastic linear plateau function is more adequate than the stochastic quadratic plateau function for corn response to Nitrogen fertilizer.
The random parameter components and the errors are usually taken as normally distributed random variables ( [5]- [8]). However, the normality and symmetry assumptions may be too restrictive because in practice departures from normality is common. Particularly, [10] and [11] concluded that the field crop yield distributions are in general non-normal or non-lognormal. The degree of skewness and kurtosis vary by crop and by the amount of nutrients uptake. Rosa et al. [12] advocated the use of the Normal independent distribution for robust modeling of linear mixed models. Furthermore, [13] considered the Normal independent distribution for modeling of nonlinear mixed models. The Normal independent distribution is a class of symmetric, heavy-tailed distributions that includes the normal distribution, Student-t, slash, and contaminated distributions. The class of Normal independent distributions accomodate observations with heavy tails as well as the normal distribution.
Traditionally, fertilizer-dose response models are estimated by means of maximum likelihood estimation (ML) ([5]- [8]). However, for nonlinear models and small sample sizes ML is frequently biased ([14]). In addition, convergence can be a problem even with careful scaling and good starting values. Bayesian estimation is an alternative to ML. The advantages of Bayesian estimation are that the results are valid in small samples and that convergence in the case of nonlinear models is not an issue ( [14]- [15]).
The purpose of this paper is Bayesian estimation of random parameter dose (fertilization)-response (yield) models for yield data that is Normally independently distributed.

The normal mixed effects model
In general, a Normal mixed effects model reads: where the subscript is the subject index, ( ) is a vector of observed continuous responses for subject ( ) * ( ) ( )+ with ( ) the nonlinear or linear function of random parameters and covariate vector and are known design matrices of dimensions and , respectively, is the vector of fixed effects, is the vector of random effects, and is the vector of random errors, and denotes the identity matrix. The matrices ( ) with unknown parameter α is the unstructured dispersion matrix of the unknown variance of the error term. When ( ) is a nonlinear parameter function, we have the Normal NonLinear Mixed Model (N-NLMM); if ( ) is a linear parameter function, we have the N-Linear Mixed Model (N-LMM).

Normal independent (NI) distributions
A Normal independent distribution is defined as the p-dimensional random vector ⁄ where is a location vector, is a multivariate normal random vector with location vector , scale matrix and is a positive random variable with cumulative distribution function (cdf) ( ) and probability density function (pdf) ( ), is a scalar or vector of parameters indexing the distribution of the scale factor ([12]- [13], [16] The class of normal independent distribution is a group of symmetric heavy-tailed distribution of robust alternative to the routinely used of normal distribution for mixed effects model.

The NI-mixed effects model
Using the general framework (1), the following general NI-Mixed Model (NI-MM) is defined as: where the random effects are assumed to have a multivariate NI distributions and also the random errors.

Prior distributions and joint posterior density
Below, we apply a Bayesian framework based on the Markov Chain Monte Carlo (MCMC) algorithm to infer posterior parameter estimates. Following (1) and (2), the NI mixed model can be formulated in hierarchical for as: ) . Then, the complete likelihood function associated with ( ) , is given by To complete Bayesian specification, we need to consider prior distributions for all the unknown parameters ( ) . We consider [16]). For we take ( ⁄ ) ( ) for the Student-t (t) model, Gamma ( ) for the slash (SL) model. Furthermore, ( ) for and Beta ( ) for for the contaminated normal (CN) model.
Assuming independency of the parameter vector, the joint prior distribution of all unknown parameters is Combining the likelihood function and the prior distribution, the joint posterior density of all unknown parameters is

Model comparison criteria
The expected Akaike information criterion (EAIC) and the expected Bayesian information criterion (EBIC) are a deviance-based measure appropriate for Bayesian model selection ( [17]- [18]). Let and ( ) be the entire model parameters and data, respectively. Define where ̅ is the posterior mean of the deviance, is the number of parameters in the model, is the total number of observations. These criteria penalizing models with more complexity. Smaller value of EAIC and EBIC indicate a better fit ( [19]).

Data
The dataset is obtained from 19 multi-location trials of potassium fertilization of soybeans. The experiments were carried out between 2002 and 2014. The soil types are Ultisols, Inceptisols, Vertisols, and Oxisols with soil potassium contents varying from very low to very high. Common soybean varieties were used. Each experiments consisted of five levels of potassium fertilization. The doses applied were 0, 40, 80, 160 and 320 kg ha -1 of KCl. The plots were 6 by 5 m, or 4 by 5 m arranged in a randomized complete block design with three to nine replications. The response variable was soybean yield (t ha -1 ). The yields reported are averages over replications ( [20]- [22]).

Response functions
We consider three response functions: the Linear Plateau (LP), the Spillman-Mitscherlich (SM) and the Quadratic functions (Q).
The stochastic LP is defined as follows: where for location is the soybean yield; the potassium fertilizer dose; the intercept parameter; the linear response coefficient; the plateau yield; , and are the random effects; and is the random error term. In term of (1), ( ) ( ) ; ( ) and ( ). The stochastic SM reads: where is the maximum yield attainable by potassium fertilization; is the yield increase; is the ratio of consecutive increments of the yield; all other parameters, variables and distributions as in (3) The stochastic Q is defined as: where is the intercept parameter whose position (value) can be shifted up or down by the random effect ; is the linear response coefficient with random effect parameter ; is the quadratic response coefficient whose position can be shifted up or down by the random effect ; ( ) all other variables and distributions as in (3) ([7]- [9]).

Statistical analysis
The datasets was used to identify the model with the best fit among the random parameter models of fertilizer dosing. Several statistical models with differing distribution in the random effects and random errors are compared. These models are : Model 1: Normal distribution for the random effects and for the random errors (N-N) Model 2: Student-t distribution for the random effects and for the random errors (t-t) Model 3: Slash distribution for the random effects and for the random errors (SL-SL) Model 4: Contaminated normal distribution for the random effects and for the random errors (CN-CN).
The following independent priors were considered to perform the Gibbs sampler, ) for the slash model, ( ) and ( ) for the contaminated normal model, respectively.
For each of the models, we ran three parallel independent chains of the Gibbs sampler with size 50 000 iterations for each parameter with thinning of 5 and an initial burn in of 25 000. We monitored chain convergence using trace plots, autocorrelation plots and the Brooks-Gelman-Rubin scale reduction factor ( ̂) ( [23]). To avoid non-convergence, we normalized the original doses (subtracted the mean and divided by the standard deviation) which gave: -1.06, -0.70, -0.35, 0.35, and 1.76, respectively ( [24]). We fitted the models using the R2jags package available in R ( [25]). Figure 1 shows the histogram and normal Q-Q plot of soybean yield data for 19 locations, while the boxplot is presented in figure 2. The figures indicates non-normality (heavy-tailed) pattern. The Q-Q plot does not show a straight line, while the boxplot shows asymmetry and an outlier. Thus, it seems appropriate to fit a heavy-tailed model to the data.

Linear plateau response models
Based on the EAIC and the EBIC in table 1, we find that among the NI models the Student-t (t-t) Model gives the best fit, followed by the contaminated normal (CN-CN), slash (SL-SL), and normal (N-N) Model. We furthermore find that the heavy-tailed distributions outperform the normal distributions. Thus, the t-t Model is the best Linear Plateau Response Model.  Table 1 furthermore shows that for the t-t Model, all the fixed effects, i.e., the intercept parameter ( ), the linear response coefficient ( ), the plateau yield and the random effects ( ) are significant.

Spillman-Mitscherlich response models
Based on the EAIC and EBIC in table 2 we find the following rankings of the NI models: t-t < N-N < SL-SL < CN-CN. Therefore, the t-t Model is the best Spillman-Mitscherlich Response Model.  For the t-t Model, the fixed effects, i.e., the maximum yield coefficient ( ), the increase in yield ( ), the ratio of successive increment ( ) and the random effects ( ) are significant.

The Quadratic response models
Comparison of the EAIC and EBIC in table 3 leads to the following rankings: t-t < SL-SL < CN-CN < N-N. The results furthermore show that the heavy-tailed distributions outperform the normal distribution, and that overall the t-t Model is the best Quadratic Response Model. For the t-t Model, all the fixed effects, i.e., the intercept parameter ( ), the linear response coefficient ( ), the quadratic response coefficient ( ), and the variance component ( ) are significant.

Comparing the Linear plateau, Spillman-Mitscherlich and Quadratic models
Comparing the Linear Plateau (LP), Spillman-Mitscherlich (SM) and Quadratic (Q) models under four distributional assumptions, we find that the t-t Spillman-Mitscherlich model has the smallest EAIC and EBIC values among the competing models indicating that this is the best fit model for the soybean yield data (

Conclusion
We investigated the performance of linear and nonlinear mixed response models with normal independent (NI) distributions of random effects. We applied the Bayesian estimation framework to datasets of multi-location trials of potassium fertilization of soybeans. We compared the Linear Plateau, Spillman-Mitscherlich, and Quadratic random parameter models with different distributions of the random parameter component, i.e. the normal, Student-t, slash, and contaminated normal distributions and the random errors following their symmetric normal independent distributions. The overall results showed that for all three models of fertilizer dosing, the Student-t distributions outperform the normal distributions. The best model for soybean yield prediction is the Student-t Spillman-Mitscherlich Response Model.