Performance of various gridded temperature and precipitation datasets over Northwest Himalayan Region

This study evaluated the performance of 07 gridded datasets viz. Asian Precipitation Highly-resolved Observational Data Integration towards Evaluation of Water Resources (APHRODITE), Climate Research Unit Time-Series (CRU-TS), University of Delaware (UDEL), Tropical rainfall Measurement Mission (TRMM)/ TMPA (TRMM Multi-Satellite Precipitation Analysis), Global Precipitation Climatology Centre (GPCC), Princeton Global Forcings Dataset (PGF), and European Reanalysis Interim (ERA-I) in capturing the amount, seasonality and trend of precipitation over different climatic zones of Northwestern Himalaya (NWH) i.e. Lower Himalaya (LH), Greater Himalaya (GH) and Karakoram Himalaya (KH). A similar comparison was also done for the temperature data but only with 05 datasets, viz. APHRODITE, CRU-TS, PGF, UDEL and ERA-I since TMPA and GPCC are precipitation datasets only. This study is a maiden attempt where in situ observation includes the data from elevations above 5000 m amsl (07 observatories) in NWH (Indian sub-region). Results reveal that for precipitation over NWH; ERA-I, GPCC, and TMPA/TRMM were found to be quite reliable datasets. For temperature, all datasets performed quite well but CRU-TS and ERA-I provided more reliable estimates. The mean absolute error ranged from 13.5 mm/month to 150.7 mm/month for precipitation and 0.75°C/month to 9.9°C/month for temperature. High values of the errors underpin the need for bias correction. On the basis of this analysis, monthly correction factors for wintertime temperature and precipitation have also been suggested for each dataset which when multiplied with corresponding datasets would result in closely approximated values for the area of interest. These results can serve as a guide for bias correction and selection of appropriate gridded datasets for use in studies pertaining to hydrological modeling over NWH.


Introduction
Recent decades have witnessed unprecedented warming globally with record warming over the high altitude Himalayan regions . The likely impacts of this warming on the cryosphere might endanger the constant water supply for upcoming decades (Immerzeel et al 2010), underpinning the need for exhaustive studies on climatic changes and related hazards. However, the absence of long-term precipitation and temperature data which are fundamental for such studies has created a void in our understanding of the dynamics of the Himalayan climate. Ironically harsh weather, inaccessibility, and ruggedness of the terrain in High Himalayan regions hinder the establishment and maintenance of observatories leading to sparse and nonuniform distribution of observatory network (Tong et al 2013). To overcome this problem of sparse and nonuniform network of observatories, many modeling centers worldwide have constructed gridded datasets with wide spatial and temporal coverage (Gyalistras 2003, Haylock et al 2008, Belo-Pereira et al 2011, Herrera et al 2012. But such gridded datasets often show some disagreement between each other due to different sources of observatories out of which 07 observatories had elevations above 5000 m a.m.s.l for comparison purposes. The evaluation of 07 (05) datasets was done for precipitation (temperature) and that too using long term data from very high elevations (>5000 m a.m.s.l) which entitles this work as a novel study. The basis of the evaluation was threefold: (1) Magnitude (2) Long term trends and (3) Seasonality. The performance of gridded datasets was evaluated for the winter period only since long term complete data series were available for the winter period only. After the analysis, this study also suggests monthly correction factors to aide close approximation of climate variables over NWH using different gridded datasets.

Study area
NWH with high altitudinal variation ranging between 2000 m amsl and 6000 m amsl lying in Indian sub-region was selected as the study area as shown in figure 1. On the basis of prevailing snow-met conditions in the area, NWH has been divided into three climatic zones viz. Lower Himalaya (LH), Greater Himalaya (GH) and Karakoram Himalaya (KH) (Sharma and Ganju 2000). For monitoring the snow-met conditions in the area, Snow and Avalanche Study Establishment (SASE), India has established a wide network of manual observatories at these locations where data is being recorded since 1985 as per WMO guidelines . The location of snow-met observatories lying in different zones of NWH is shown in figure 1.

Data
3.1. Field observations As explained in preceding section that SASE based observatories in all snow-climatic zones have data records for almost three decades. Thus, monthly data (wintertime) of total 19 SASE based observatories was taken for this study. Here wintertime refers to November-April months. Table 1 depicts the elevation and period of data records for each contributing observatory. Since NWH has a highly variable elevation range and consequently snow-met conditions, we evaluated performance of datasets at three climatic zones: LH, GH and KH.
A total of 07, 03 and 09 observatories fall in LH, GH, and KH respectively. Parameters under consideration are wintertime mean Temperature (mean of Maximum and Minimum temperature) and wintertime precipitation expressed as snow water equivalent (SWE). SWE was calculated by multiplying precipitation values with respective densities. The data was checked and corrected for missing gaps and inhomogeneity prior to their use in analysis . The data of stations falling in one climatic zone was averaged to obtain zonal values.

Gridded datasets
Gridded datasets for Precipitation and Temperature can be categorized into three main categories namely, (i) Gridded dataset obtained after geostatistical interpolation of ground-based observations (Interpolated datasets), (ii) Dataset generated by merging observations with model outputs, i.e. reanalysis datasets, (iii) Datasets derived by satellite-based observations (remote sensing) (Lutz and Immerzeel 2016). Information from five interpolated datasets (UDEL, APHRODITE, GPCC and CRU-TS and PGF), merged satellite observations (TRMM/TMPA), and reanalysis observations (ERA-Interim) were utilized in the present study. It is to mention that none of the datasets evaluated in this study was inclusive of ground-based data. Since the SASE based observatories are located in politically sensitive locations, data was never shared with external agencies for research purposes. Hence, the datasets being analyzed are completely independent of reference data. A summary of the various precipitation products is given in table 2, and a brief description of each product is provided here.
Tropical Rainfall Measuring Mission (TRMM) Multi-Satellite Precipitation Analysis TMPA) 3B43 ver.7 combines the information of TRMM microwave imager (TMI), Precipitation Radar (PR) and Visible and Infrared Scanner (VIRS) with Special Sensor Microwave Imager (SSM/I) and rain gauge data (Huffman et al 2007). Available on a monthly time scale from 1998 to present day, the product is derived by averaging the TRMM 3B42 version 6 datasets. The dataset can be accessed at http://disc.sci.gsfc.nasa.gov.
Asian Precipitation Highly-resolved Observational Data Integration towards Evaluation of Water Resources (APHRODITE) is a state-of-art daily precipitation dataset with high spatial resolution grids (up to 0.25°) for Asia. Taking the advantage of a dense network of rain gauges (5000-12 000 stations), the precipitation data was first interpolated at a resolution of 0.05°by considering the sphericity and orography and later on re-gridded to 0.25°and 0.50°resolution using area-weighted mean (Yatagai et al 2012). Daily precipitation (APHRO_V1101R1)  and daily mean temperature datasets (APHRO_ V1204R1)  are utilized in the present study. The datasets can be obtained from http://www.chikyu.ac.jp/precip. Daily values were then converted into monthly data using appropriate conversion factors.
Average temperature and total precipitation information from University of Delaware (UDEL) consists of monthly grids at 0.50°resolution. The gridded interpolation was done using numerous data from the ground stations that form part of the Global Historical Climatology Network (GHCN2) (Willmott and Matsuura 2001). The datasets can be accessed from https://www.esrl.noaa.gov/psd/data/gridded/data.UDel_AirT_ Precip.html.
The latest version of the Climate Research Unit Time-Series (CRU-TS) (ver. 4.10) dataset from the University of East Anglia has been utilized in the present study. CRU TS 4.1 data are produced using the same methodology as for the previous version 3 which also included the correction of systematic errors in precipitation and temperature. The product consists of monthly gridded fields of precipitation and temperature over land areas based on daily values, from 1901 to 2016 with a spatial resolution of 0.5° (Mitchell and Jones 2005). The dataset can be obtained from http://catalogue.ceda.ac.uk/uuid/. The Global Precipitation Climatology Centre (GPCC) dataset is based on World Meteorological Organization (WMO) reports (SYNOP and CLIMAT) collected from 7000-8000 stations via Global Telecommunication System (GTS). These reports were interpolated and merged by GPCC to improve the spatial coverage and data quality (Schneider et al 2013). The GPCC product version 5 used in this study was obtained from ftp://ftp.dwd.de/pub/data/gpcc/html/gpcc_monitoring_v5_doi_download.html.
ERA-Interim is the recent global atmospheric reanalysis data produced by the European Centre for Medium Range Forecasts (ECMWF) (Dee et al 2011a). This dataset provides estimates of temperature and precipitation from 1979 onwards. This dataset is being widely used as input for many Regional Climate Models (RCMs). Monthly data at a spatial resolution of 0.25°was utilized for this study.
Princeton Global forcing (PGF) dataset was created by combining reanalysis dataset by National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) with observational datasets like CRU-TS, GPCP (The Global Precipitation Climatology Project (GPCP)), TRMM and NASA Langley monthly surface radiation budget (Sheffield et al 2006). It has global coverage and is available at different spatial resolutions. For this study, dataset with a spatial resolution of 0.25°and temporal coverage from 1948-2016 was selected. The data was downloaded from http://hydrology.princeton.edu/data/pgf/.

Validation process
As can be seen from figure 1, observed data points are scarce and have non-uniform distribution i.e., they are either highly clustered or quite sparse. Since much of the area in figure 1 is devoid of point observations, available data points were not gridded/extrapolated to grid resolution in order to avoid the errors which could arise due to the limitation of interpolation methods (Gao et al 2014). As mentioned by Ahmed et al (2019), gridded data can be compared with observations by first computing the areal average of a grid box by using available in situ measurements and then conducting the grid-grid comparison. For this study, the areal average of observations in a grid was compared with grid value of the dataset. Given that different products and observatories have data availability for different durations, the comparison was made for a common period, i.e. 1998-2014.

Statistical indices
The performance of aforementioned products was evaluated by using four widely used statistical indicators like Pearson's Correlation coefficient (r), Bias (%), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) of seasonal time series.
(a) The use of MAE was suggested by Fox (1981). MAE ranges between 0 and +∞. Its perfect score is 0 i.e. estimated value (derived value from the dataset) is the same as observations. MAE values closer to 0 are considered better. MAE is calculated as given in the equation: has been considered to be amongst the best indicators in summarizing model performance (Fox 1981). Its value also ranges between 0 and +∞. Again, models/ estimations with RMSE values closer to 0 are considered better. It is calculated as given in equation (2), (c) Pearson's Correlation coefficient (r) was also calculated for studying pattern correlation between observations and estimated values. Its values range between −1 and +1. Its perfect score is +1. It is calculated as :

Obs Obs Est Est
Obs Obs Est Est   Est i and Est refers to estimated value (derived from the gridded dataset) and mean of estimated value respectively in equations (1)-(4). For temperature, Bias was not converted into percentage but was expressed as Mean Bias Error (°C).
(e) Trends: Wintertime trends for the period 1998-2014 were calculated for all datasets and field observations using a famous excel template named MAKESENS which uses Sen's slope estimator for slope and Mann-Kendall method for testing the significance of slope (Salmi et al 2002).

Ranking of datasets
For ranking of datasets, the Weighted Product Method (WPM) was used. It is one of the most commonly used techniques for decision making and is considered better than the Weighted Sum Model (WSM) because it provides a dimensionless analysis as it uses relative values than actual ones thereby eliminating the units. The details of this method are explained by Triantaphyllou (2000). For the assessment of performance of an alternative, equation (5) is used: Where n is the total number of criteria (five in our study), a kj refers to value of k-th alternative (datasets) in terms of j-th criterion and weight of j-th criterion is depicted by w j. It is to mention that for assignment of weights to criteria, Equal weights method (Wang et al 2009) was used according to which the weights of criteria are defined as: Since in our study, we assessed the performance of datasets in accurately capturing amount, seasonality and trends based on 05 statistical indicators, we assigned equal importance to each of the criteria. Equal weights method employs minimal use of priorities of decision makers and it is believed to produce results at par with optimal weighting methods (Dawes and Corrigan 1974). Here P (A k ) refers to performance score of alternative A k . In maximization case, any alternative showing maximum score is considered the best.

Precipitation amount and its distribution
North-Western Himalayas receive much of the precipitation during winters due to Western Disturbances (WDs) . Climatologically, Western Disturbances (WDs) while their eastward motion first hit LH and discharge much of the moisture here leading to maximum precipitation at LH followed by GH and KH (Dimri and Dash 2012). To ascertain if datasets could capture this spatial variability, a comparison between observations and datasets was made. The cumulative wintertime precipitation (observations and gridded based) for different climatic zones during 1998-2014 is given in table S1 is available online at stacks.iop.org/ERC/2/ 085002/mmedia. Comparison and distribution of wintertime monthly precipitation (observations versus gridded) at different climatic zones (in the form of Box-Charts) is shown in figures 2(a)-(c). Observations reveal that the wintertime precipitation received at LH is highest followed by GH and KH. This pattern was not being reproduced by all datasets. ERA-I exceptionally overestimated the precipitation for GH. PGF on the other hand estimated highest precipitation for KH. Additively, box charts revealed that majority datasets exhibited a dry bias that underpins the need for bias correction.
Results (table S1) indicate that for LH, the average observed wintertime precipitation is ∼937 mm. ERA-I (∼1021 mm) showed precipitation quite close to observations followed by GPCC (∼545 mm). APHRODITE, CRU-TS and TMPA depicted dry bias whereas maximum underestimated precipitation was depicted by PGF (∼33mm) and UDEL (∼67 mm). For GH, the average observed precipitation is ∼630 mm. Results indicated that UDEL (∼580 mm) and GPCC (∼556 mm) estimated values were quite close to observations. Again APHRODITE, CRU-TS, PGF and TMPA showed underestimated wintertime precipitation values. ERA-I on the other hand showed highly overestimated values (∼1424 mm).
For KH, the average observed wintertime precipitation is 370 mm. ERA-I showed slightly overestimated values (∼498 mm) while CRU-TS, GPCC, APHRODITE, PGF and TMPA underestimated the precipitation by considerable amounts. UDEL showed maximum underestimated values (∼67 mm).
Based on the amount of precipitation, the calculated values of Bias (%), MAE and RMSE were also in line with the above-mentioned results as shown in table 3. Thus, the order (ascending order) of calculated bias (%), MAE and RMSE for LH by different datasets follows ERA-I<GPCC<APHRODITE<TMPA <CRU-TS<UDEL<PGF. Consequently, ERA-I was top performer and UDEL was the worst performer in all three indices calculated.
Similarly, for GH the order of calculated bias (%), MAE and RMSE follows UDEL<GPCC<TMPA<APHRODITE<CRU-TS<PGF<ERA-I. Here, UDEL was best performer whereas ERA-I performed the worst.

Mean temperature magnitude and its distribution
The average wintertime temperature (observed and gridded datasets) over different climatic zones is shown in table S2. The distribution of temperature values (Box-Charts) over different zones is shown in figure 3(a)-(c). Results indicate that average wintertime mean temperature over zones LH, GH and KH is 2.1°C, −2.61°C, −14.71°C respectively. It can be clearly inferred that while traversing from LH to KH, the temperature values shifted from above zero (LH) to subzero (GH) and subsequently to very low (∼ −14.7°C) values for KH. All the datasets could capture this temperature gradient along elevation.
For LH, the observed mean temperature is above zero (∼2.1°C). All datasets estimated temperature values quite close to the observations except UDEL (∼7.8°C) and PGF (∼4.0°C) which showed warm bias. Temperature estimated by ERA-I was closest to observations. For GH, the wintertime mean temperature goes sub-zero (= −2.6°C). Interestingly all datasets coherently estimated subzero temperatures for this elevation range. UDEL closely estimates temperature value (in terms of magnitude), i.e. Tmean (Observed)=−2.6°C and Tmean (CRU-TS)=−3.0°C. All datasets showed cold bias in estimated temperature but maximum cold bias was shown by ERA-I (−12.5°C).
KH is characterized by very high elevation and thus experiences a very low temperature, i.e. ∼ −14.7°C.  For GH, the performance of datasets is as follows: MAE: PGF>UDEL>CRU-TS>APHRODITE>ERA-I. RMSE: UDEL>PGF>CRU-TS>APHRODITE>ERA-I. The results of performance for MBE were exactly similar to those obtained for RMSE.
For KH, the performance of datasets is as follows: MAE : ERA-I>UDEL>CRU-TS>APHRODITE>PGF. The results for RMSE and MBE were exactly similar to those obtained for MAE.

Seasonal trends
Seasonal trends were calculated for the period 1998-2014 using the in situ measurements as well as the gridded data. Trends and their significance were calculated using non-parametric Mann-Kendall test. Here also, the performance of datasets was based on their ability to capture the observed trend.

Precipitation trends
As explained in the preceding sections that for comparison purposes, trend values were also calculated for precipitation from gridded datasets and were compared with observed trends. The slope values of linear trend (hereafter referred to as b) can be seen in figure 4 and table 3.
For KH, a decreasing trend, i.e. b=−9.7 was observed for 1998-2014. This declining trend in precipitation was CRU-TS, ERA-I and PGF only. The performance of models (descending order) as observed was ERA-I>PGF>CRU-TS>GPCC>TRMM>UDEL>APHRODITE. Table 4 and figure 5 depict the magnitude of linear trend values over different elevations as observed and estimated by different datasets. During 1998-2014, LH showed cooling trends while warming was observed at GH and KH. Spatially maximum rate of warming was observed at GH which nullified the prevalence of Elevation-dependent warming (EDW) on NWH in contrast to global observations of higher rate of warming at highest elevations by many researchers (Pepin et al 2015). Recently, Kumar et al (2020) reported a high rate of glacier thinning in greater Himalaya region and attributed it to rising temperatures in the region using NCEP-NCAR data.

Seasonal temperature trends
For LH : Observations revealed a declining temperature trend (∼b = −0.036) during 1998-2014. This declining trend was captured by all datasets except APHRODITE. The performance of datasets in capturing trends is as follows PGF>CRU-TS>ERA-I>UDEL>APHRODITE.
For GH, observed data showed warming trend, i.e. 0.067°C year −1 during 1998-2014. Warming trends were captured by APHRODITE only whereas all other datasets showed cooling trends. The performance of datasets in capturing temperature trends over GH during 1998-2014 is as follows PGF>ERA-I>CRU-TS> UDEL>APHRODITE.
For KH, observed data showed a rising trend by magnitude 0.26°C year −1 during 1998-2014. All datasets except UDEL and PGF depicted rising trends with varying magnitudes. The performance of datasets in capturing temperature trends over KH during 1998-2014 is as follows APHRODITE>ERA-I>CRU-TS>PGF>UDEL.

Seasonality of precipitation
The relative comparison of estimated precipitation for individual winter months is shown in figure 6. Observed precipitation at all zones of North West Himalaya depict unimodal distribution during winter showing peak values during February. This pattern was being reproduced by all datasets except UDEL which showed the highest precipitation for the month of March. At KH, only ERA-I and APHRODITE showed precipitation distribution similar to observations while CRU-TS and GPCC showed peak precipitation values during month of April. Except for ERA-I which showed overestimated values at all zones, rest all datasets underestimated the winter cycle. Moreover, It can be observed in figure 6 that precipitation estimated by ERA-I for the months of November, December and January was almost equal to the observed precipitation at LH. Apart from visual comparison, Taylor diagram (figure 7(a)-(c)) which summarizes normalized standard deviation, Normalized RMSE (NRMSE) and correlation values to evaluate a model performance was constructed to determine the degree of agreement between observations and gridded datasets (Taylor 2001). The values of correlation coefficient significant at different levels of significance are shown in table 3. Interestingly for all climatic zones, estimated precipitation by all datasets show a high correlation coefficient with observed precipitation, i.e. R0.7. Highest correlation was shown by ERA-I for LH (R=0.970 ** ) and GH (R=0.980 ** ) while for zone KH, APHRODITE (R=0.991 ** ) showed maximum correlation value. APHRODITE was followed by ERA-I (R=0.965 ** ) at KH.

Seasonality of temperature
The comparison of monthly mean temperature during winter months at each climatic zone is shown in figures 8(a)-(c). As can be seen from the figures, minimum values were obtained for the month of January at all zones. This pattern was coherently produced by all datasets. Alike precipitation, Taylor diagram (figures 9(a)-(c)) was constructed for assessment of performance of datasets. The values of correlation coefficients are given in table 4. High correlation with observed data was shown by all datasets, i.e. R0.9. For LH, maximum correlation was shown by APHRODITE (r=0.985 ** ) and followed by CRU-TS (r=0.984 ** ) and minimum correlation was shown by ERA-I (r=0.949 ** ). For zone GH, maximum correlation was shown by APHRODITE (r=0.993 ** ) followed by UDEL (r=0.991 ** ) and minimum correlation was shown by ERA-I (r=0.976 ** ).
For zone KH, maximum correlation was shown by UDEL ( r=0.982 ** ) followed by APHRODITE (r=0.971 ** ) and minimum correlation value was observed in case of ERA-I (r=0.912 ** ). It is to be noted that though a slight difference in correlation coefficients was observed amongst datasets at all climatic zones but since the correlation values of even worst-performing datasets was above 0.9, we believe that all products are good enough to reproduce the seasonality of mean temperature during winter months at different zones of NWH.

Ranking of datasets
In the preceding sections, the performance of datasets was assessed using various statistical indicators. It was observed that any dataset which performed good in replicating one statistical property of observations failed to reproduce another property. Hence, Weighted Product Model was used to select the best dataset. The performance scores were calculated as given in equation (5) and lowest ranks (best performance) were assigned to datasets with highest performance score. The results of performance scores and ranks are given in table 5.
Thus, it can be inferred that for precipitation at LH, the top 03 performers are ERA-I>GPCC>TMPA. For precipitation at GH, best datasets include UDEL>GPCC>TMPA. Likewise, for precipitation at KH, best datasets are ERA-I>PGF>TMPA.
For temperature at LH, CRU-TS>ERA-I>PGF were best performers. For temperature at GH, the ranking of datasets is PGF>UDEL>CRU-TS. For temperature at KH, best datasets are ERA-I>UDEL>CRU-TS.

Correction factors
Gridded temperature and precipitation datasets that correctly represent the meteorological conditions are crucial components for hydrological modelling of high mountain areas. Our results reflect a high range of bias within and between different elevation zones for temperature and precipitation estimated by datasets. The improvements in terms of bias correction for the existing datasets would enhance the reliability of hydrological models in the region (Lutz et al 2014). Thus, we have calculated elevation wise seasonal correction factors by analyzing the observed and the estimated values to account for inherent uncertainties in existing datasets. The factors were calculated by dividing observed values with estimated ones (gridded product) over a climatic zone. Similar exercise was done by Dahri et al (2016). The results are shown in table 6(a) (Precipitation) and 6(b) (Mean Temperature). Respective datasets should be multiplied by suggested correction factors to get a close approximation of wintertime temperature and precipitation values over the corresponding zone in area of interest.

Discussion
Glaciers of high mountain Asia are supporting millions of people living downstream hence timely monitoring of their mass changes is crucial for the water supply in future (Immerzeel et al 2010). But such studies pertaining to hydrological modelling demand inputs of gridded temperature and precipitation datasets which accurately represent the hydro climatology over area of interest (Lutz et al 2014). As explained in the preceding sections that different datasets don't always agree with each other owing to differences in coverage of raw data sources, orographic corrections and the techniques used for interpolation. These differences make the datasets unique in terms of their performance over different areas. Thus, reliability assessment of datasets prior to their use cannot be overlooked keeping in mind the inherent differences of generation.
In case of precipitation, the highest overestimation/ wet bias was shown by ERA-I at all zones. This overestimation by ERA-I over high altitudes has previously been reported by many researchers in the past (Palazzi et al 2013, Tong et al 2013, Immerzeel et al 2015, Dahri et al 2016. Palazzi et al (2013) reported that the ERA-I estimates total precipitation instead of rain and snow components which makes room for overestimated values especially during winter when the contribution of liquid precipitation to total precipitation is minimal. Though ERA-I estimated precipitation seems overestimated for other regions, few researchers support the use of this dataset for Upper Indus Basin since only such high values of precipitation can sustain this huge extent of snow and glacier cover in the area (Reggiani and Rientjes 2015). Interestingly, the ERA-I based precipitation was found quite close to observations for KH where maximum glaciated terrain is found. Thus, the findings of Reggiani and Rientjes 2015 and our results strengthen the confidence in utility of ERA-I over KH.
It was also observed that interpolated datasets like APHRODITE, CRU-TS, PGF and UDEL showed dry/bias or underestimated values of wintertime precipitation over NWH. The number of stations covered while creating these datasets mainly decides the accuracy/utility of datasets for a particular region. APHRODITE though known to be the best dataset to represent precipitation over Himalayas, its accuracy is limited by sparse observations over high altitudes (Lutz et al 2014). It was also reported that APHRODITE has no single station over elevations>5000 m amsl which makes this product non-suitable for elevations above 5000 m amsl (Kumar et al 2015). It was reported that APHRODITE uses observations from approximately 5000-12 000 observatories while UDEL employs data from about 4100-22 000 observatories (Ahmed et al 2019). It was also reported by Palazzi et al (2013) that for locations with sparse station network, datasets like APHRODITE, CRU-TS and GPCC interpolate observations to unsampled sites which makes room for a significant element of uncertainty. If the observations used are representative of valley conditions, estimations usually depict dry bias or underestimated values of precipitation.
Satellite based estimation (TMPA) estimations also showed dry bias over NWH. TMPA is known to show large errors during winters because the land covered with snow and ice affects the accuracy of passive sensing (Hussain et al 2017). It was also reported that the correlation of TMPA/TRMM estimations with observed values reduced significantly especially above elevation of 2000 m amsl. It was reported by Bharti and Singh (2015) reported that TMPA showed weak correlation with APHRODITE (source of ground-truthing) when daily and monthly precipitation amounts were compared. Dahri et al (2016) had reported that underestimated precipitation by TRMM for very high altitudes could be due to the inclusion of valley-based stations for correction or validation of datasets. Also, It was reported by Wang et al (2017) that TMPA generally overestimates light precipitation (0-10 mm) and underestimates heavy precipitation (<10 mm).
GPCC performed exceptionally well as compared to other interpolated datasets owing to inclusion of data from ∼85 000 stations worldwide. The data collected then passes through stringent quality check procedures which further hones the reliability of this dataset (Sun et al 2018). Moreover, the interpolation techniques used by a dataset influence its capability to capture precipitation variability especially in mountainous areas. GPCC uses the SPHEREMAP technique which uses elevation as an input thereby giving accurate estimations for high elevation areas.
For temperature magnitude, it was observed that interpolated and reanalysis datasets like CRU-TS, ERA-I, PGF and UDEL outperformed APHRODITE at all zones which could be attributed to lesser complexity in temperature estimation as compared to the precipitation over mountainous regions. Except for microclimates which are influence by local factors, in situ observations of temperature over an area can be interpolated for high elevations using standard lapse rates or derived lapse rates for that particular area. Thus, the inclusion of in situ observations improved the accuracy of interpolated and reanalysis temperature datasets like CRU-TS, PGF, ERA-I and UDEL. Lutz et al (2014) had mentioned that the existing gridded datasets should be bias corrected in order to increase their utility in hydrological modelling. They had multiplied APHRODITE precipitation by a constant factor to derive accurate estimations for Upper Indus Basin. Dahri et al (2016) has also reported such correction factors for different basins recently. On the basis of our analysis, we have also suggested some monthly correction factors which if multiplied with corresponding datasets would help getting accurate estimations of wintertime temperature and precipitation over NWH.

Conclusion
This study evaluated the performance of 07 gridded datasets viz APHRODITE, CRU-TS, PGF, UDEL and ERA-I, TMPA and GPCC for precipitation and 05 (same datasets except TMPA and GPCC) for temperature over different climatic zones of NWH. Only wintertime (November-April) data was considered for comparison purposes. Evaluation/ranking was based on the capability of datasets in capturing amount, seasonality and trends of temperature and precipitation. Statistical measures like Mean Absolute error (MAE), Root Mean squared Error (RMSE), Pearson's Correlation Coefficient and linear slopes were calculated for assessment of the performance of datasets. Finally, the performance of datasets was ranked using the WPM method. Based on the final ranks, ERA-I, GPCC and TMPA were found to be reliable datasets for precipitation. For temperature, it was observed that the interpolated and reanalysis datasets like CRU-TS, ERA-I, PGF and UDEL performed better at all zones. Nevertheless, all datasets showed biases in the estimated temperature and precipitation. The calculated MAE ranged from 13.5 mm/month to 150.7 mm/month for precipitation and 0.75°C to 9.93°C for temperature. These high values of error underpin the need for bias correction. Thus, our study also suggested monthly correction factors for the winter period to assist researchers working in this field. However, authors feel that the performance of datasets should also be validated against other ground-based sources of information like mass balance and run-off values.