The sandwich estimator approach counting for inter-site dependence of extreme river flow in Sabah

Regional estimation method is often used for estimating parameters of a distribution when data are available at many sites in a region to capture inter-site dependence. In this paper, we fit generalized extreme value distribution independently to model data of river flows at each sites in Sabah to avoid extreme value complex modeling. Since our approach violated the condition of spatial analysis, we consider the adjusted standard error to correct the wrong assumption of our marginal approach. As a result, we have an appropriate corrected variance of the generalized extreme value parameters.


Introduction
Spatial extreme analysis is always considered when an extreme data obtained from several sites in a region [1]. Well known examples of extreme value analysis for modelling environmental dependence data can be found in [1] and [2]. In extreme value theory, the multivariate extreme value distribution is the most suitable method to analyse the extreme events in several locations because the number of location is referred as a multivariate variable as mentioned by [3] and [4] in their studies. This is because one of the properties of multivariate extreme value distribution is for modelling the dependency between variables. However, this method may lead to model complexity and computational issues.
Another approaches for spatial extreme modelling that takes the dependency into account is the joint estimation which proposed by [5]. A motivation of this study was followed the study by [6] which proposed a method that using univariate extreme model independently at each site of the spatial extreme event. The likelihood functions were applied independently at each site. This method constructed based on the wrong statistical assumption as the data dependency between sites is being ignored. An alternative method proposed by [6] was to adjust the standard error of parameters to capture the data dependency. The value of parameters estimates obtained from this method were remains unchanged. However, some modifications on the asymptotic variance which known as the sandwich estimator need to be considered. The advantages and properties of the sandwich estimator have been discussed in [7]. In this study, we will apply the sandwich estimator to model an extreme river flow at several sites in Sabah. We also consider an analysis of small sample sizes of the observed extreme at each site. This study is an extension of the study by [8].
GEV distribution is always used for data that is available in block of maximum or minimum. The block is separate in equal length of time. For example, data is separated in yearly, monthly, weekly or daily. Shape parameter is the most important parameter in GEV family. GEV family consisted of three distributions as mentioned in table 1.

Shape parameter GEV Family
Frechet distribution According to [2], GEV distribution is highly recommended for model fitting under extremes event scenario. This is because directly pick one distribution among the GEV family may cause a biased fit.

Marginal Estimation
A marginal estimation using independent GEV distribution to each sites of an extreme river flow at Sabah is conducted for data analysis purpose. Suppose we have sites = 1, … , consists of = 1, … , years observation by assuming extreme values data are independent over the years. Therefore the likelihood function can be expressed as follows; where � ; � = ( ) . he corresponding log likelihood function is as follows; In this study, penalty function P( ) is used to provide the likelihood with the information that the value of is smaller than 1 as proposed by [3] as follows: This method is an alternative method to the standard maximum likelihood method for a case that involved small sample size of an extreme event. This method is known as penalized maximum likelihood estimation (PMLE). The method proposed in Section 2.2 is based on maximizing a likelihood function independently at each site which violated the statistical assumption of inter-dependency between sites. As proposed by [6], the sandwich estimator is used to modify the asymptotic variance that captured the data dependency as follows:

Accounting for
where � � � �� = −Ε∇ 2 ℓ� � � is defined as the second derivative of the log likelihood obtained from the equation (3) which also known as the observed Fisher Information matrix. The E∇ 2 is called as the expected values of hessian. The inverse of this matrix will produced covariance matrix under the independent assumption, which is equal to � � � �� −1 . While for � � � is the partial derivative of the log penalized likelihood function which approximating the error in likelihood estimation. The score function for is the gradient, ∇ of the log penalized likelihood, ℓ� � � with respect to which can be obtained as follows;

Case study: An application to river flow data
This section discussed the results of modelling an annual maximum river flow at different site in Sabah. Suppose be an annual maximum river flow at sites = 1, … , = 18. We conduct the model fitted using Generalized Extreme Value (GEV) distribution into each data, ~� , , �. Data recorded for all sites below 50 (n < 50). Since this study consists of data recorded below than 50 years, therefore an appropriate method should be considered for parameters estimation of the GEV distribution. Therefore, the GEV parameters are estimated using the PMLE method as stated in equation 3. As a result, the estimation of positive shape parameter will approach zero which improve the tail behaviour as claimed by [8]. As for negative value of ξ, the shape parameter estimation of PMLE method is close to the shape parameter estimation of maximum likelihood estimator (MLE). The results obtained in this study is in line with the previous research where the estimation of PMLE method shrinks the positive shape parameter towards zero. The illustration of shape parameter estimated obtained using MLE and PMLE can be seen in figure 1 as shown by [8]. Since the model assumptions are violated by ignoring the dependence between sites, the sandwich estimator is then applied to correct the model. This method still produced the same values of parameter estimates as marginal estimates, but with some modifications required on the standard error for the spatial extreme data. Table 2 below shows the results of the GEV parameter estimates and the standard error. This result is useful to predict the return value of an extreme river flow at each site in Sabah.

Conclusions
In this study the method which was proposed by [9] is applied in order to improve the inference of GEV parameters estimates known as the penalized maximum likelihood estimates. The result obtained is consistent with [9] which also found the shape parameter estimates shrink towards zero. In this method, a penalty function is introduced which then added to the standard maximum likelihood method. There are another existing penalty functions (for example [10]) with different function such as a smoothing parameter [3] and smoothing function to capture the behaviour of the spatial events [11]. By using the GEV distribution, the annual maximum river flow independently at each site is modelled. As mentioned earlier this method violated the statistical assumption of dependency between sites. We applied the sandwich estimator in order to correct the variances of GEV parameters. This is an alternative method to the multivariate extreme value distribution approaches for the spatial extreme value modelling. The studies that used multivariate extreme value distribution for model the spatial extreme event can found in [3] and [4]. However, this method may lead to computational issues due to the high dimensionality. The size of the correction in the variance is increases as the data dependency is being considered as mentioned in study conducted by [7]. The implement of sandwich estimator in this study helps to avoid high dimensional of mathematical computation. Therefore, it can conclude that the sandwich estimator is an appropriate method to model the spatial extreme river flow in Sabah.