Impact of land-use land-cover datasets and urban parameterization on weather simulation over the Jakarta Metropolitan Area

Human-caused changes in land-use and land-cover (LULC) are most visible in metropolitan areas, when the majority of the land has been converted to urban land or built up. This study presents a modeling approach for simulating the spatiotemporal distribution of urban microclimate with the Weather Research and Forecasting (WRF) model using four urban parameterization schemes, namely a bulk, urban canopy model (UCM), building effect parameterization (BEP), and building energy model (BEM). The WRF model is set-up at 1 km spatial resolutions over the Jakarta Metropolitan Area to study the model’s sensitivity to the usage of alternative LULC datasets, the default MODIS and its modification 2017. The results show that the UCM and BEM schemes appear to be reliable in mapping urban weather conditions for all meteorological parameters examined. Given that the LULC categories in urban areas remained unchanged, changing the LULC in the model did not result in a large difference in error. The LULC dataset, on the other hand, can be used as information related to suburban areas that continue to grow in concurrently with urbanization. LULC updates can provide insight into how much temperature rise is occurring in urban areas and how it affects climate change.


1.
Introduction Land use indicates how people use land, while land cover indicates the physical type of land. Any change in land-use and land-cover (LULC) will lead to the dynamic flux of mass and energy change [1,2]. There has been a significant change in the global LU [3] and this needs attention considering that researchers agree that there is a relationship between LULC change and climate change [4]. LULC changes affect the atmospheric environment such as temperature, humidity, wind, and precipitation and also extreme rainfall events [5][6][7].
LULC changes caused by human activities is most noticeable occur in urban areas, where most of the land turned into urban land or build up. Because urban areas are generally warmer than surrounding (less developed) lands, a method for examining and evaluating the potential effects of climate change is required. One of the methods used, namely the weather research and forecasting (WRF) model, has been widely used for urban climate studies in various big cities in the world. WRF, which is a mesoscale model, are integrated with urban modelling system to address urban environmental issues on microscale [8].
Weather and climate simulations can be done with four different urban parameterization schemes, namely bulk, single-layer urban canopy model (UCM), a multilayer UCM with building effect parameterization (BEP), and finally a multilayer UCM with an integrated building energy model (BEM). The bulk scheme was used for simulating both air and land surface temperatures in Berlin, Germany and gave good results [9]. Compared to the bulk scheme, the single layer UCM reproduced more accurately the observed diurnal variation of urban temperatures in Athens, Greece [10]. Meanwhile simulation with the multilayer UCM is found to be more accurate than the bulk scheme in IOP Publishing doi: 10.1088/1755-1315/1039/1/012036 2 terms of modeled near-surface wind-speed, relative humidity, and total precipitation compared to observations during extreme heat weather event in Ottawa, Canada [11]. Although BEP has the best correlation with observations, BEM performed with the minimum value of bias and RMSE associated for temperature and relative humidity and shown to be more sensitive than the other schemes over the Metropolitan Area of Barcelona [12].
For further analysis, especially those related to urban heat islands, apart from choosing the scheme, it is also necessary to pay attention to the LULC datasets used in the simulation. More detailed LULC datasets in the study area other than global datasets are excellent for modeling if available. Several LULC datasets that have been compared by previous researchers include the United States Geological Survey (USGS), Urban Atlas (UA), CORINE land cover (CLC), Moderate Resolution Imaging Spectroradiometer (MODIS), North American Land Change Monitoring System (NALCMS), and others land-use developed by the user or based on the country's national database [10,13]. The sensitivity of the WRF simulation is also determined by the selected physical parameterization (microphysics, cumulus, PBL, radiation, land surface). The more appropriate the selection of physical parameters with regional climatic characteristics, the better the WRF simulation performance [11]. The aim of this study is to evaluate the performance of the WRF model and its urban schemes over Jakarta Metropolitan Area (Indonesia) and to investigate the sensitivity of the model towards the use of different LULC datasets and different spatial resolution.

2.
Methodology In this study, the WRF version 4.2 non-hydrostatic modeling coupled with the Noah land surface model was used to simulate urban meteorology around the Jakarta Metropolitan Area in the period 1 -31 May 2017. This period was chosen because it is the beginning of the dry season. Climatological data shows that May is the month with the highest temperature other than October but with higher humidity [14].

2.1.
Area of interest and data sources for model validation The study area is Jakarta Metropolitan Area, which includes the urban area of Jakarta city (Jakarta Province) and its surrounding buffer cities (3 cities in Banten Province and 5 cities in West Java Province) is located at 106 o 14' -107 o 18' E and 5 o 23' -6 o 46' S. The core urban area is 662 km 2 while the entire Jakarta Metropolitan Area is more than 7,000 km 2 . Jakarta, the largest tropical city located on the equator of Southeast Asia, is characterized by high annual temperatures. The Asian monsoon system causes a rainy season (October -March) and a dry season (April -September). Jakarta metropolitan area is topographically diverse city that is the coastal areas in the north and the mountainous regions in the south. Thus, the sea wind (onshore) and katabatic (anabatic) winds dominate the boundary layer wind system. Five climate station located in the Jakarta Metropolitan Area that records historical climate data was used for the comparison of the different WRF model experiments. Three stations are assigned to urban areas, one station to suburbs and one station to rural areas. Hourly observations of 2m air temperature (T2m), wind speed (WS), relative humidity (RH2m), and precipitation (PREC) are recorded at the station, all data sets collected by Meteorological, Climatological, and Geophysical Agency of Indonesia (BMKG). The characteristics of each observational site are listed in Table 1, while the locations of the sites are shown in figure 1.   Figure 2 shows the WRF domain configuration used in this study. Eight experiments of the high spatial resolution WRF model were conducted, to evaluate the ability of the WRF model to reproduce near-surface meteorological diurnal cycles (temperature, wind speed, and humidity) and rainfall accumulation. Three domains with different resolutions are used in the simulation. The domains are d01 (10 km), d02 (3 km), and d03 (1km) Number of grids for each domain is 101x101, 160x154, 169x178.

WRF experiments
The first LULC dataset is the MODIS default LULC [15]. This dataset is obtained from the MODIS database. The other LULC dataset is the MODIS LULC dataset taken in 2017, which we refer to as a modification (MODIS 2017) [16]. Atmospheric data used as initial and boundary conditions were provided by the National Center for Environmental Prediction (NCEP) GDAS/FNL 0.25 Degree Global Tropospheric Analyses and Forecast Grids (ds083.3) [17], and sea surface temperature (SST) data from the National Oceanic and Atmospheric Administration (NOAA), Optimum Interpolation 0.25 Degree daily Sea Surface Temperature (OISST) Analysis, Version 2 [18]. The physical parameterization schemes applied in all domains include: (a) the WRF singlemoment-6-class scheme for the microphysics processes [19], (b) the Bougeault-Lacarrère (BouLac) scheme for the planetary boundary layer [12], (c) the MM5 similarity scheme for parameterizing the model surface layer, (d) the Noah LSM for the land surface processes [20], (e) the RRTMG scheme [21] is used for parameterization of radiative process both short-wave and long-wave radiation. Cumulus parameterization was used only for d01 and d02 by employing the New Tiedtke scheme [22].
Urban parameterization without urban schema is the first WRF experiment in this study, hereinafter denoted as WRF Bulk experiment, to represent zero-order effects of urban surfaces. This urban physical option is assumed to have a common value for the entire urban domain and ignores the variation in surface morphology between neighborhoods. The second WRF experiment is the singlelayer UCM. UCM was designed to represent urban geometry, including street canyons, walls, roofs, and roads. It ranks the various urban classes based on their thermal properties. The UCM calculates the anthropogenic heat using a fixed temporal profile that is added to the sensible heat flux from the street canyon. The third WRF experiment was carried out by considering the parameters of the building effect, which is a multilayer UCM to characterize the impact of urban surfaces [23]. This experiment is referred to as the WRF BEP experiment. BEP is developed to capture the direct interaction of the buildings with the planetary boundary layer. It considers the three dimensional urban surfaces and accounts for the vertical exchange of heat, moisture, and momentum. This model estimates heat emissions from the canopy by considering the drag force, diffusion factor, and radiation properties. The BEP also accounts for the anthropogenic heat emission in the urban canopy. The fourth WRF simulation is BEM, works with same options as BEP. BEM can accurately estimate anthropogenic heat emission and to calculate energy exchange between building interior and outside atmosphere. Experiments 5 through 8 used the same model, namely Bulk, UCM, BEP, and BEM, but with the MODIS2017 LULC dataset.

3.
Results and discussion Spatial distribution of dominant land cover within domain 3 for MODIS default and MODIS 2017 is presented in figure 3. According to the simulation, there are 10 of the 17 LULC categories provided by MODIS in the study area, including five natural vegetation classes (evergreen broadleaf forest, woody savanna, savanna, grassland, and permanent wetland), three human-altered classes (cropland, urban and built-up, and cropland/natural vegetation mosaic), and two non-vegetated classes (barren and water bodies). The numbers of urban grids in MODIS 2017 are more than the default and almost the entire study area is in the form of built-up land. The model's output shows that the number of grids Previous research has documented the changes in LULC over period, namely that almost all urban areas in Jakarta and surrounding cities have been converted to built-up land [24,25]. This change in LULC is related to an increase in temperature, with built-up land having a higher temperature than natural and vegetation types [26].

Evaluation of LULC schemes
The RMSE of the simulated T2m, RH2m, WS and PREC compared to the observation data is shown in Table 2. For T2m, simulation of all schemes using the default LULC gives a smaller RMSE than MODIS 2017 LULC in urban areas (Kemayoran, Pondok Betung, and Curug), on the contrary in suburban (Bogor) and rural (Citeko) areas, the RMSE is smaller for MODIS 2017 LULC. The RMSE range is 0.2 to 2.3 degC and the MAE is 0.2 to 1.9 degC. All the simulations show good agreement with observation data, with high correlation coefficients (above 0.92). RH2m simulation using default LULC gives a smaller RMSE than using MODIS 2017 LULC except in rural locations. In suburban locations, the LULC update gives a larger error (5.1% to 12.6%). The use of urban schemes gives different results for each location with RMSE ranging from 4.1 to 13.2%, and MAE from 3.3 to 12.9%. Very strong correlation was also seen for RH2m (above 0.84) in all LULC and urban schemes, while strong correlation (around 0.78) only in rural areas for MODIS default LULC. On the other hand, the WS simulation shows that the RMSE MODIS 2017 LULC value is higher except for the urban locations of Kemayoran and Pondok Betung. The difference of the model performance is quite small among all eight experiments with an RMSE between 0.8 to 2.1 m/s. For precipitation, it can be seen that the RMSE value varies for the two LULCs. The lowest mean RMSE is in Kemayoran (17.7 mm) and the largest mean RMSE in Citeko (39.1 mm). The minimum MAE average is 8.8 mm and the maximum is 282.8 mm, with a correlation of approximately 0.28. This low correlation value is due to the simulation period being conducted at the beginning of the dry season, when the possibility of rain is very low. Except in rural areas, MODIS default LULC simulations have much lower RMSE, implying better performance than the MODIS 2017 LULC simulation. The slight difference in RMSE between two LULC simulations is due to the lack of land-cover changes in urban and suburban areas, whereas in rural areas, the surrounding environment changes from cropland to build up land.

Evaluation of urban schemes
The mean diurnal variations of T2m of observations and simulations over the stations in Jakarta Metropolitan Area during the simulation period using MODIS default LULC shown in figure 4. In general, all WRF experiments could capture the diurnal variation of T2m. Diurnal variations of mean urban temperature of the three urban locations are similar. Compared to observations, the bulk scheme shows an overestimation, while the UCM scheme is underestimated. When a comparison of diurnal variation is performed at suburban and rural stations, the differences between the simulations become unclear, as all schemes underestimate. The simulation results which vary widely between schemes in Kemayoran explain the large difference in the RMSE range ( Table 2). The UCM scheme seems to give the smallest error compared to other schemes. The BEM scheme also gives an error value that is not too large at locations in urban areas, while at locations in rural areas all urban schemes do not give results that are close to the observed values. It may be explained that in rural areas, the urban physics option provided by the urban parametrization scheme is not used at the grid points classified as nonurban. The use of appropriate urban scheme parameterization is highly dependent on an urban area's complexity [27]. The UCM simulation coupled to the Noah land surface model, on the other hand, performed best in simulating urban meteorology in this tropical city as well as in the semiarid urban environment [11].
The second row in figure 4 shows the mean diurnal variations of RH2m of observations and simulations. At urban locations, in general, WRF experiments underestimate the observation data, while in rural locations there is an overestimation. For suburban locations, all urban schemes used give similar results, underestimate during the day, and overestimate at night. As with temperature, the UCM and BEM schemes provide a fairly small RMSE value for RH2m. Variations in values between schemes are more clearly at night while during the day, all schemes reach almost the same value. The scheme that gives more errors than other schemes is Bulk (its RMSE is more than 10%).
The diurnal mean WS pattern of the observed data can be followed by simulations on the urban grid rather than the simulation on the suburban and rural locations (figure 4, third row). However, it seems that the WS from the WRF model is underestimated during the day and overestimated at night. When compared to other urban schemes, the BEM scheme has the lowest RMSE. The model cannot achieve a very high WS during the day in a suburban area (RMSE 1.9 m/s). Meanwhile, in rural areas,  The WRF model's ability to simulate hourly accumulated precipitation (PREC) is shown in figure  5. Except for Kemayoran, where the BEP and BEM schemes underestimated, the WRF model was unable to reliably represent monthly accumulated precipitation. All experiments yielded overestimated values in all locations. The best RMSEs are 16.9 mm and 16.1 mm for the UCM and BEM schemes, respectively. This could be driven by the fact that most precipitation events during the dry season were stratiform in nature due to synoptic-scale systems (easterly waves) and were unaffected by local variables such as topography and land cover. This could be driven by the fact that most precipitation events during the dry season were stratiform in nature due to synoptic-scale systems (easterly waves) and were unaffected by local variables such as topography and land cover [28]. Furthermore, because the physics option has not produced high accuracy in all scenarios, it may be necessary to use different parameterization based on geographic location to minimize the resulting bias [29].

Spatial distribution of the simulated weather
The spatial distribution of modeled mean T2m during simulation period is shown in figure 6 for selected LULC and urban scheme. It can be noticed that WRF simulations captured a higher temperature in the urban area (~26-33 degC) than in the surrounding suburban (~25-31 degC) and rural neighborhoods (~18-24 degC), characterizes the general condition of the urban heat island phenomenon. For this Bulk scheme with the MODIS 2017 LULC, simulation exhibits a uniform air temperature in the city domain, which illustrates the urban heat island effect. Although the temperature is the same during the day in all urban domains, it is only in the part of the city that the temperature is higher at night. A slight urban island effect can also be noticed with this experiment in a suburban location (Bogor). Not all areas that have urban-build up LULC show high temperatures, but only located in the city center and in surrounding areas that seem to have a large urban fraction. Air temperature does not depend only on land cover data but also affected by air advection [9].
Humidity follows the opposite pattern as temperature. Because the air temperature is higher during the day than at night, the humidity during the day is lower than the humidity at night. The highest RH2m is found early in the morning, before sunrise, and when the temperature is at its lowest. Figure  7 represents the spatial distribution of RH2m, which shows that it is lower in urban areas than in the surrounding areas. RH2m in the city is ~52-82% lower than in the surrounding suburbs (~56-84%) and rural area (~82-95%). The picture also shows that the humidity is higher to the south of the city center, which is a mountainous area.  Sea and land breezes have a strong influence on the study area, which is located near the coast [30]. However, the dominant influence of sea breezes can be hampered by high-temperature conditions in urban areas, which cause urban areas to become low-pressure centers. This is noticeable during the day, when the predominant wind is inconsistent from north to south. Kemayoran, which is closest to the coast, has a higher WS than other urban areas. (~0.9-3.8 m/s). The average WS in urban areas is ~0.5-3.4 m/s, in suburban ~0.5-2.9 m/s and in rural areas ~0.9-3.0 m/s. The high WS in rural areas is more influenced by its location in the mountains (figure 8).

Comparison of schemes
In comparison to the default, there is a temperature increase in the simulation using MODIS2017. Except for the simulation using the BEP scheme, all schemes are overestimated in urban areas. Rural areas, on the other hand, are underestimated (figure 9a). In the suburban, the difference between default schemes is not substantial, and all simulations underestimate. However, in MODIS 2017, the pattern is more like urban areas where Bulk, UCM, and BEM schemes overestimated. For all of the climatic criteria examined in this study, no dominant urban scheme produces equally good results. The UCM and BEM schemes, on the other hand, appear to be quite good at mapping urban conditions, particularly temperature observations. Different urban schemes do not provide significant results in rural areas, but they can identify suburban with the potential to grow become large city with higher temperatures than the surrounding area.
In contrast to the temperature simulation results, the BEP scheme produces RH2m simulation that are closer to the observed data. This could be due to the fact that BEP was designed to capture the direct interaction of buildings with the planetary boundary layer and calculate the vertical exchange of heat, moisture, and momentum (figure 9b). The wind speed and direction pattern is determined by the city's location and the influence of the dominant wind blowing at the time. Because the resolution used is insufficient to describe intra-urban conditions, they have not been seen clearly. However, the UCM scheme, which considers the urban canopy, can be used to overcome these challenges (figure 9c). The use of urban schemes to simulate precipitation, like the use of other models, must be timed because the results obtained will not be good for modeling as there is very little amount of rain (figure 9d).
Given that the LULC at the study site did not change over time, the LULC modification in the model did not produce a significant difference. The LULC dataset can be used to provide information about urban areas that are expanding as a result of urbanization. The LULC update, on the other hand, can provide an idea of how much temperature rises occur in urban areas and how they affect climate change.

Impact of spatial resolution on the accuracy of WRF
The RMSE of the mean hourly T2m observed and simulated using all experimental outputs for each domain with resolutions of 10 km, 3 km, and 1 km is shown in figure 10. The obtained results vary and do not indicate that using the highest resolution results in the smallest RMSE value. The distinction is found in the LULC and urban schemes applied. For example, for the UCM scheme using IOP Publishing doi:10.1088/1755-1315/1039/1/012036 12 the default MODIS LULC, the best simulation in Kemayoran and Bogor is with a resolution of 1 km, while the best simulation in Pondok Betung, Curug, and Citeko is with a resolution of 10 km. Others, for the default BEM scheme, the smallest RMSE in the simulation with a resolution of 3 km in Kemayoran, the smallest RMSE in the simulation with a resolution of 10 km in Pondok Betung, Curug, and Citeko, and the best RMSE in the simulation with a resolution of 1 km in Bogor. When the resolution of land-cover data is sufficient, higher resolution cannot improve model performance further [9]. However, using a higher resolution yields better results because it can describe environmental conditions that are closer to reality and are therefore very influential in subsequent environmental analysis and policy making.

4.
Conclusion This study evaluates the performance of the WRF model and its urban schemes over the Jakarta Metropolitan Area (Indonesia). The model's sensitivity to the use of different LULC datasets and spatial resolutions was also investigated. In urban areas, the RMSE for T2m, RH2m, and WS using the default LULC MODIS is smaller than the modified, MODIS 2017, and conversely in suburban and rural areas, the RMSE is smaller for the modified LULC. Changes in LULC in the surrounding area will have an impact on changes in meteorological conditions in suburban and rural areas. Although all WRF studies were able to capture diurnal variation, UCM and BEM, which take into consideration urban geometry and anthropogenic heat, can provide better results, particularly for T2m. The general state of UHI can be described as a pattern of temperature in the developed region that is higher than the surrounding area that occurs both day and night. The inclusion of a comprehensive urban scheme, as well as the LULC update, aids high-resolution modeling in providing more accurate weather predictions, particularly in urban areas.