Selection of Optimal Gridded Dataset for Application in Polish Sudetes Mountains

High quality measured weather data (MWD) are limited or not available for many areas. Also their time coverage can be relatively short. That is why the use of gridded climatic data (GCD) in environmental studies is very popular. However, GCD are valuable source of information, their accuracy can be sometimes insufficient for particular study. That is why GCD applicability should be checked before the study run. The objective of this study was to check the applicability of gridded data for dendroclimatological studies of larch from Sudetes Mountains. To do so we compared GCD with available MWD from several weather stations located in the mentioned area. Because many gridded time-series datasets are available, we also wanted to check which dataset is the best to use for the mentioned area. In the analysis high-resolution gridded data on monthly mean temperature and total precipitation, which cover the common period 1901-2013, in 0.5° x 0.5° network, created by: - Center for Climatic Research Department of Geography University of Delaware Newark (UD; model V4.01 for precipitation and temperature data), - Global Precipitation Climatology Centre Deutscher Wetterdienst (GPCC; model V7 for precipitation data) - Climate Research Unit, University of East Anglia (CRU; model CRU TS v.4.01 for precipitation and temperature data) were used. The available MWD data from several weather stations started in 1951, 1956 and 1957. The common period for the analysis covered years 1951(‘56 or ‘57) - 2013. For a given precipitation and temperature data the agreement and biases between GCDs and MWDs were assessed with the absolute mean error (ME) and root mean square error (RMSPE), L1- norm and Pearson correlation coefficient. Finally, linear regression analysis was performed to detect biases in the relationship between GCD and MWD, and the coefficient of determination (R2) was also calculated. GCD for precipitation show high similarity to MWD. Mean Pearson correlation coefficient values equal to 0.87 for GPCC, 0.82 for CRU and 0.8 for UD GCDs. For temperature the received values of Pearson correlation were relatively high and very similar, e.g. 0.97 for CRU and 0.96 (UD). L1-norm, ME, RMSPE and regression model confirmed small differences between analysed GCDs, but with better fitting of CRU GCD.


Introduction
High quality measured weather data (MWD) are limited or not available for many areas. Also, their time span can be relatively short. It can limit their applicability in environmental studies. An example World Multidisciplinary Earth Sciences Symposium (WMESS 2018) IOP Conf. Series: Earth and Environmental Science 221 (2019) 012120 IOP Publishing doi: 10.1088/1755-1315/221/1/012120 2 can be dendroclimatological studies, where the long data records can be crucial for the analysis of tree growth response to climate. That is why the use of gridded climatic data (GCD) is very popular. Many sources of GCD exist. Variety of variables can be found as daily, monthly and annually summaries in many versions of spatial resolution, from 10' to 5°. However, GCD are valuable source of information, their accuracy can be sometimes insufficient for particular study. That is why GCD applicability should be checked before the study run.
The studies that analyze and assess the quality of GDC data by comparison with relevant MWD from weather stations are relatively rare [1][2][3][4]. Most of them were performed for the United States [4][5], some exist for Africa and Asia [1], Australia [6] and Europe [7]. The quality of gridded data for dendroclimatological studies are also checked [8][9][10].
However in Poland, several dozens of weather stations exist, still their coverage is limited. Also, usually the available data records are relatively short (they start in 1950's or in next decades). That is why the use of gridded data in conducted dendroclimatological studies of larch in Sudetes would be very convenient. The aim of this study was to check the applicability of gridded data for their use. Because many gridded time-series datasets are available, we also wanted to check which of them is the most suitable for the mentioned area. The choice was made on the basis of the smaller differences between GCD box and corresponding weather station data. In the analysis high-resolution gridded data on monthly mean temperature and total precipitation with the most popular 0.5˚ x 0.5˚ network were analyzed.

Material and Methods
The process was carried out using, on the one hand MWD, on the other hand GCD data from the three different models. A total of 14 MWD on total monthly precipitation and 5 MWD on average monthly temperature were compared with suitable GCD boxes data. Corresponding pairs of time series were compared in term of their similarity, based on statistic and data mining methods.

Gridded data
Globally gridded precipitation and temperature data are available at many databases for example: www.esrl.noaa.gov, www.crudata.uea.ac.uk, www.climdex.org. All of them enable the access to several gridded datasets, which differ in spatial and time resolution. GCD are modelled base on weather stations historical data, both daily and monthly, and interpolation algorithms. In this paper three GCD datasets, which cover the period 1901-2013, in 0.5˚ x 0.5˚ network, publicly available were analyzed:  UD v4.01 -Center for Climatic Research Department of Geography University of Delaware Newark (UD) created grid model using a large number of weather stations from GHN2 (Global Historical Climate Network) and the archive of Legates & Willmott. In this analyse the newest model V4.01, with temporal coverage from 1901/01 to 2014/12 for both precipitation and temperature data was used [12]. Gridded data are available at esrl.noaa.gov database [13].  CRU TS v. 4.01 -Climate Research Unit, University of East Anglia (CRU) has calculated climatic model from monthly observations for land areas. The gridded dataset includes six climatic variables (mean temperature, diurnal temperature range, precipitation totals, wet-day frequency, vapour pressure and cloud cover). Model includes diagnostics associated with each interpolated value that indicates the number of stations used in the interpolation, allowing determination of the reliability of values in an objective way. In presented study the newest  [14][15]. Gridded data are available at crudata.uea.ac.uk webpage [16].  GPCC v7 -Global Precipitation Climatology Centre Deutscher Wetterdienst (GPCC) has calculated a precipitation climatology model for the global land areas, using data from more than 67,000 rain gauge stations. Furthermore, a semi-automatic quality control processing and additional visual control was used for better quality of GCD models. In the analyses we used the newest model V7 with temporal coverage from 1901/01 to 2013/12 [17][18]. Gridded data are available in esrl.noaa.gov database [13].

Statistical analyses
The selection of the best gridded dataset for the area was made by using similarity and dissimilarity statistical assessment. Commonly used mean error (ME) and root mean square error (RMSE) were computed for the pairs of time series (MWD and GDC) for precipitation and temperature. Also dissimilarity measure (L1-norm and Pearson correlation coefficient) were calculated. Finally, linear regression analysis was performed to detect biases in the relationship between GCD and MWD, with the coefficient of determination (R 2 ) [8,[19][20]. The comparison of precipitation and temperature data between weather stations and gridded data from three datasets was carried out for a common period 1951-2013, separately for each time series pairs (MWD-GCD). There were two exceptions: Szczawno Zdrój and Twardocice, for which the possible periods of analysis were shorter (1956-2013 and 1957-2013 respectively).

Results
The results of analyses were presented in figures 2 and 3. The results obtained for precipitation show the biggest differences between three GCD datasets for two parameters: Pearson correlation coefficient and determination coefficient (R 2 ). Generally, GPC data showed better fit to MWD for particular weather stations than CRU and UD. The exception was Kłodzko, for which the best results were observed for UD data. Mean Pearson correlation coefficients for particular datasets were equal to 0.88 (GPCC), 0.82 (CRU) and 0.80 (UD). For GPCC data, six values of correlation coefficient exceeded 0.9 (very high positive correlation). Correlation values for CRU data were lower by value of 0.07 from GPCC's. The lowest value of correlation coefficient was observed for UD data for Jelenia Góra (0.67; figure 2a). A high discrepancy in results was obtained for regression models. The values of a0 and a1 parameters differed significantly between datasets, as well as model quality of match, expressed by the determination coefficient R 2 . Again, the best match was demonstrated by the GPCC dataset (with the exception of Kłodzko). Mean R 2 equals to 0.79 for GPCC, 0.9 for CRU and 0.66 for UD dataset. Four of five the lowest values of R 2 were observed for UD (figure 2c). The low mean value of R 2 for CRU GCD was connected with poor regression model quality for Śnieżka (R 2 equal 0.54).
Dissimilarity measure L1-norm confirmed better fitting of GPCC dataset, with difference equal 4000 between analyzed gridded datasets. The lower value of L1-norm obtained for CRU data comparing to GPCC was caused by high misfit for two locations (Śnieżka, Kłodzko). Two from three very high L1-norm values were obtained for UD data, they exceed 28000 (Jelenia Góra, Przesieka), see figure 2b.
The mean error showed the cause of low fitting of UD data. ME for UD data had usually higher values than other GCDs (figure 2d), both with positive and negative signs. The biggest ME value was calculated for Kłodzko (47) for UD dataset. GPCC and CRU data had shown high correlation of ME results. For GPCC slightly lower values of ME were computed (figure 2d). The last measure -root mean square error (RMSE), confirmed problems with UD dataset. For almost all locations RMSE is lower than 5. Mean values are equal to: 0.39 (GPCC), 0.84 (CRU) and 4.45 (UD). The highest RMSE value was calculated for UD data (Mieroszów-48). Similarly, as for the mentioned above metrics, UD  Similarly, assessment for temperature show big disproportion between two analysed datasets (i.g. CRU and UD). Pearson correlation coefficient was higher for CRU than for UD (mean values equal to 0.98 and 0.96 respectively). Generally, the lowest value of correlation between MWD and GCD was received for Szczawno Zdrój (0.915), the highest -for Kłodzko (0.998). The biggest distance between correlations for one weather station was observed fore Jelenia Góra (0.04), between CRU and UD data ( figure 3a). Values of L1-norm showed high similarity between analyzed data. For two weather stations (Szczawno Zdrój, Śnieżka) UD had smaller values than CRU, for Jelenia Góra and Karpacz CRU data showed a better fit (lower L1-norm values). The highest L1-norm value was received for Śnieżka for CRU data (5003; see figure 3b). Regression parameters were more stable for temperature in comparison with precipitation. Parameter a1 varied slightly from 1, and indicated good quality of both gridded datasets. Determination coefficient (R 2 ) confirmed high similarity of both GCD's, mean R 2 value equal: 0.93 for CRU and 0.85 for UD. The lowest value of R 2 was observed for Szczawno Zdrój (0.35) for CRU data (figure 3c). Mean error (ME) and root mean square error (RMSE) showed the same pattern. Usually CRU data were better fitted, with lower parameters values. However, for Szczawno Zdrój and Śnieżka ME and RMSE results were better for UD than for CRU data (figure 3d and 3e).  UD CRU e) Root mean square error (RMSE) Figure 3. Similarity assessment results for temperature data. The weather stations where ordered from the lowest (left) to highest (right) locations.

Conclusions
The obtained results for precipitation showed weaker similarity of GDC to WMD than for temperature, which can be explained by high spatial variability of this climatic factor. The weakest fit for precipitation was observed for UD dataset. CRU and GPCC dataset better represent real data (MWD). In general, for GPCC datasets better results were obtained (especially R 2 and Pearson correlation coefficient). But it has to be admitted, that lower values of calculated parameters obtained for CRU data were mainly caused by the results obtained for one weather station, with the highest altitude (Śnieżka). With the results for Śnieżka be excluded, the obtained results for CRU and GPCC are similar.
Obtained results for temperature show high quality of GCD for two analyzed datasets, however better results were obtained for CRU data. Only for one weather station clearly better fit was found for UD data (Śnieżka). Both models had some problems with prediction of some weather station data (Szczawno Zdrój, Śnieżka). UD GCD were underestimated, with ME equal -2.8. Lower ME, but overestimated, was calculated for CRU GCD.
In addition, no relation was fund between altitude and similarity measures values (figure 2 and 3). All analyzed parameters were also checked for correlation with latitude and longitude, the correlations were negligible.
To sum up, lower differences between gridded and weather station data for temperature were observed for CRU data. This is supported by higher correlation values and better results for regression models: higher values of R 2 and lower values of mean error received for this factor. However for Śnieżka location, the UD data should be preferred. The precipitation data results can be affected by high spatial variability of this factor, especially in mountain areas, which can be difficult to preserve in gridded data. GPCC and CRU GCD had enough high similarity to weather stations data, that is confirmed by Pearson correlation coefficient, R 2 , and L1-norm. Over all it can be stated that analyzed CRU and GPCC gridded data can be used in dendroclimatological studies in the area, but in some cases with caution, observing the difference in altitude.