A new detailed long-term hydrometeorological dataset: first results of extreme characteristics estimations for the Russian Arctic seas

A detailed long-term hydrometeorological dataset for Russian Arctic seas has been created using hydrodynamic modelling via a regional nonhydrostatic atmospheric model, COSMO-CLM, for 1980-2016. This paper presents evaluation techniques of long-term experiments and a preliminary analysis of the dataset. The experiments are conducted for a model domain including the Barents, Kara, and Laptev Seas, with a 12 km grid. Many test experiments have been evaluated to determine the best model configuration, which includes a new model version 5.06, a “spectral nudging” technique, and an ERA-Interim reanalysis as forcing data. A reinitialization scheme of additional “assimilation” of soil properties reanalysis data is suggested to avoid possible errors, particularly due to soil draining in the model. The primary assessment has shown that a wind speed climatology based on COSMO-CLM experiments is very close to the ERA-Interim pattern, besides many details of wind speed distributions in different Arctic regions. At the same time, the high wind speed frequencies based on the COSMO-CLM data are increased compared to the ERA-Interim, especially over the Barents Sea, the Arctic islands (Novaya Zemlya), and some seacoasts and mainland areas. Regional details are manifested in a wind speed increase up to 0.5 – 1 m/s and marked well for large lakes, orography, as well as over the polar region. At the same time, there is a mesoscale wind speed decrease compared to the ERA-Interim data for the Pechora and Laptev Sea coasts and the New Siberian islands. Comparison of two periods (1980 – 1990 and 2010 – 2016) has shown that the spatial distributions of high wind speed frequencies are very similar, but there are some detailed differences. The wind speed frequencies above 17.2 and 20.8 m/s decreased in the last decade over Novaya Zemlya, southwest of Svalbard, Northern Atlantic, and the middle Siberia continent; at the same time, it increased between Franz Josef Land and Severnaya Zemlya and in the polar regions. The preliminary assessment of the results has revealed that the dataset is promising for analysis of regional wind speed regimes and estimation of severe wind speed risks. The next step of this work is to run simulations on a 3-km grid and collaborate with the scientific community to sufficiently use this dataset.


Introduction
The Arctic climate and extreme weather events attract more attention due to Artic amplification of the global warming and the accompanying environmental changes. It is well-known that the Arctic region is most sensitive to global climate changes, particularly the temperature increase is most intense over the globe here [1,2,3]. The Arctic climate system has many complex feedbacks manifested in different atmospheric circulation features and diverse regional anomalies of opposite signs [1, 4 -7]. Arctic warming, occurring above the ''global warming'' signal, and resulting mainly from dynamic processes in the atmosphere provides a poleward heat advection [8,9].
It is well-known that many severe atmospheric processes developed within mesoscales, i.e. typical sizes about the first dozens of km, which refers to meso-γ and meso-β scale processes [10]. Usually, the most severe hydrometeorological events are caused by a combination of large-scale hydrodynamical conditions, surface properties, mesoscale circulations (e.g., polar lows), but are often an essential part of synoptic-scale systems [11 -14]. Severe events have a devastating impact on the coastal port, transport infrastructure, shelf oil and gas production objects, lead to a significant costly damage, and, occasionally, to human casualties. At the same time, there is very scarce information about the spatial structure of these features, because the Arctic region is one of those less provided by a ground observation network over the world. However, satellite data are very useful as a spatial information source, but they have not yet reached the required level of reliability and detailing for the 3D structure reconstruction, and they do not have time regularity.
Insufficiency of the given observation network leads to a need to use reanalysis data addressed to mesoscale processes. Indeed, it is the only tool to get long-term hydrometeorological fields detailed in space and time uniformly. However, the global reanalysis data -ERA-Interim [15], NCEP/NCAR [16], MERRA [17], ERA-5 [18], NCEP-CFSR [19] have a too coarse spatial resolution to reproduce many regional features affected extreme events caused by mesoscale circulations and convection processes adequately. It is worth to mention the Arctic System Reanalysis, ASR, v1 and v2 [20,21], the only current example of creation of a pan-Arctic regional reanalysis on long-term timescales (2002 -2012) obtained by ERA-Interim data dynamical downscaling using a polar version of the well-known regional atmospheric model WRF [22] for a domain covering the whole Arctic with a 30-km grid (version 2 has a 15-km grid size). For example, it was established [23] that the ASR reproduces polar lows more adequately compared to global reanalyses. However, even the 15-km grid size allows one to reproduce features with a horizontal scale of about 50 km and more; it certainly excludes from analysis a broad spectrum of severe events of meso-γ and particularly meso-β scales.
Considering the increase in the number of severe events, the Arctic coastal development, and the Northern Sea Route prospects, the task of providing the region with detailed long-term hydrometeorological and climatic information with horizontal scales of at least several km becomes more actual. A certain help in solving this problem is regional climate modelling. Long-term simulations allow one to obtain more justified estimates of the current regional and mesoscale Arctic climate changes, as well as the frequency of extreme weather events. The main goal of this work was to create such an Arctic detailed dataset. This paper describes the first stage of these simulations according to the western Russian Arctic area with a ~ 12-km horizontal grid. The creation of the next stage dataset with a ~ 3-km horizontal grid is an immediate plan for the future.

Model description
The COSMO-CLM model (ver. 5.06) was used as the main tool for the creation of this long-term meteorological archive. COSMO-CLM is a climate version of the well-known non-hydrostatic mesoscale atmospheric model COSMO, including some modifications and extensions adapted to the long-term numerical experiments. It is developed by the German Weather Service (DWD) and CLM-Community [24 -26]. The COSMO-CLM model is based on primitive Navier-Stokes equations describing the dynamics of compressible fluid in moist atmosphere. The model equations are solved 3 on a rotational grid 'latitude-longitude' (λ, φ) with a pole tilt. It helps to minimize the problem of convergence of meridians over the pole. The numerical scheme is realized on an Arakawa C-grid [27], and the height coordinate is a terrain-following hybrid Gal-Chen coordinate µ (σ-z system), which is an analogue of the σ-coordinate from the surface (Z 0 ) up to the intermediate level Z F , and above the Z F level it is a simple Z-coordinate [28,29]. This representation allows avoiding problems associated with surface heterogeneity.
The standard configuration of the COSMO-CLM model includes the Runge-Kutta integration scheme with the 5 th advection order. The Ritter and Geleyn radiation scheme [30] is based on the δ two-stream version of the radiation transfer equation. The precipitation formation is described by a bulk microphysics parameterization, Tiedtke mass-flux schemes with equilibrium closure based on moisture convergence are used for moist and shallow convection [31]. Turbulence is described by a prognostic TKE-based scheme, with 2.5 order closure [32]; Smagorinsky diffusion is included. There is an option of applying the spectral nudging technique [33]. A full description of the COSMO model physics, dynamics, and parameterizations is available in [34].

Experimental design
A general scheme of the proposed long-term and, therefore, test experiments would look as follows. The 'dynamical downscaling' ('nesting domains') technique would be applied to obtain detailed meteorological fields for 1980 -2016 with an hourly time increment of the data. The external parameters describing some surface properties including orography, soil properties, LAI, NDVI, etc. were received via an EXTPAR v.5.2.1 tool [35] from GLOBE (surface orography), MODIS (soil properties and albedo), ECOCLIMAP (forests and plant cover, root depth, land fraction, and many others), and many other datasets.
The first stage of the model runs will be executed over a base raw resolution domain (grid step: 0.108 0 ~ 12 km) covering most of the Russian Arctic by using global reanalysis data as driving conditions. After that, at the second stage, the output data of these experiments will be used as initial and driving conditions for interpolation and model runs over fine-resolution small domains (0.03 0 , ~ 3 km grid) for three key regions of the Russian Arcticthe Barents, Kara, and Laptev Seas. A scheme of the model domains is shown in Figure 1. The latter stage will be started and realized in the near future.
Many test experiments were evaluated to determine the best model configuration at the base domain runs for August-September 2015 (summertime) and December-January 2012-13 (wintertime). The experiments were conducted for the base model domain including the Barents, Kara, and Laptev Seas. As a result, the model errors reduced in a new model version 5.06 compared to version 5.0 due to refined turbulence parameterizations, as well using the "spectral nudging" technique. Also, there were no significant differences between ERA-Interim and new detailed ERA5 reanalyses as driving conditions; therefore, the final experiments were performed using ERA-Interim reanalysis. A more detailed description of the test experiments and their results is given in [36].
An important feature was implemented specially in the long-term experiment runs. It is well-known that regional climate simulations could face systematic error accumulation associated with the socalled 'long-term' memory. This has some bias consequences because of incorrect adaptations of the model soil properties to changing atmospheric conditions [37]. A scheme of additional "assimilation" of the soil properties from ERA-Interim global reanalysis data was suggested to avoid possible errors increment, particularly due to soil draining in the model. We started our experiments every month from a combined data file combined the most important atmospheric and surface variables from the last model output file and the main deep soil variables from reanalysis. The atmospheric and surface variables included wind speed, temperature, precipitable water, vertically integrated cloud content of different phases, soil surface, 1 st and 2 nd layer temperatures, lake parameters, ice temperature and sea ice thickness, deviation from reference pressure. The rest soil layers temperature and water content (from the 3 rd to the last one, the 9 th ) were inserted from the standard reanalysis file interpolated onto the model grid. Thus, this joined file was used as initial forcing of the monthly run. After the end of the monthly run the last model output file data were used again to combine the next initial file using reanalysis data from the next month. This file initiated the next monthly run, and so on, many months in sequence. In other words, reanalysis soil data were used as initial and boundary conditions, i.e. reinitialized the model every month. This scheme could introduce some biases with changing initial data each month, because the deep soil conditions at the end of the monthly run were, generally speaking, slightly different from the corresponding reanalysis data. Some test experiments were carried out for June-July 2010 and 2013, November-December 2010 and 2013 to argue that this scheme does not introduce noise above the random one. These periods were chosen because of significantly different background soil properties (extremely dry conditions in the summertime of 2010 and normal conditions in 2013) and seasons using 1 month of the 'cold start'. There were three series of experiments: standard continuous runs for 3 months; runs used the reinitialization scheme for 3 months; and long-term yearly continuous runs. Verification of these runs did not reveal any significant error differences between the 2010 and 2013 periods of the corresponding seasons. This allows one to conclude that the suggested reinitialization scheme did not worsen the simulation results and could be applied in the long-term simulations. At the same time, there were no improvements when using this scheme, which could indicate a minor role of the soil drainage properties over the Arctic region in long-term model simulations. This fact was explained by the climatically low soil moisture content in boreal regions. Finally, the suggested reinitialization scheme was used in the long-term simulations.
All experiments were run using the shared research facilities of HPC computing resources at Lomonosov Moscow State University, supercomputer "Lomonosov-2" [38]. Final long-term consequent experiments over the base domain were simulated on the MSU Supercomputer Complex "Lomonosov-2" during about 8 months using 144 nodes, and became a more than 120-Tb data volume ENVIROMIS 2020 IOP Conf. Series: Earth and Environmental Science 611 (2020) 012044 IOP Publishing doi:10.1088/1755-1315/611/1/012044 5 excluding many secondary files. Approximately a hundred of different hydrometeorological parameters would be present in the output files, including surface quantities, as three-dimensional quantities within atmosphere and soil. The list of main variables is given in Appendix.
The simulations were significantly hard, they were finished quite recently; therefore, nowadays we have preliminarily estimated the surface wind speed and temperature for two periods only, 1980 -1990 and 2010 -2016, this is presented in this paper further.

Results and discussion
Assessment of the obtained dataset included comparison with the driving condition dataset, i.e., ERA-Interim reanalysis (~0.75 0 grid). This comparison could estimate the 'added value' of regional climate modelling, show any advantages and sometimes restrictions of this method. The primary assessment was calculated for the surface wind and temperature characteristics for 1980 -1990 and 2010 -2016, which allowed one to estimate some changes in the Arctic region using a ~12 km grid dataset.
The average 10 m wind speed, the frequencies of extreme wind speeds above 17.2 and 20.8 m/s (corresponding to the "gale" and "strong gale" of 8 and 9 numbers in the Beaufort scale) thresholds according to the ERA-Interim and COSMO-CLM dataset were compared spatially.
Comparing the average wind speed for 1980 -1990 ( Figure 2), we can see a good coincidence of the two datasets in general patterns, which indicates a good reproduction of large-scale processes by the COSMO-CLM dataset. The areas of maximal wind speeds are occupied by the western Atlantic, Barents, and a part of the Kara Sea (above 7 m/s), including the northern coastal part of Western Siberia and Taymyr (above 5 -6 m/s). The mainland regions of minimal wind speeds were also well captured by the COSMO-CLM dataset: they are Eastern Siberia and Scandinavia. However, there are many mesoscale physically reasonable differences, e.g. wind speed acceleration over the Western Svalbard, Severnaya Zemlya, and especially over the north of Novaya Zemlya Island. The latter is associated, evidently, with the better resolution by the COSMO-CLM model of the local downslope wind Novaya Zemlya Bora. The regional mainland and coastal differences are attributed to lakes (Ladoga, Onega, and other Karelian lakes), detailed coastline, and relief conditions. The most striking examples are the Taymyr and Kola Peninsulas, the Nether-polar Ural, and Eastern Siberia highlands. At the same time, there is a mesoscale wind speed decrease compared to ERA-Interim data over the Pechora and Laptev Sea coasts, New Siberian Islands. These features suggest that the COSMO-CLM model reproduces the large-scale adequately, at the same time, it captures many regional mesoscale circulations and reveals patterns associated with a more detailed surface description.  There are no significant differences with the 2010 -2016 patterns (not shown), except for wind speed enhancement over northern parts of the Barents, Kara Seas, south of Svalbard, which are captured as well by the ERA-Interim and COSMO-CLM dataset, however, COSMO-CLM demonstrates more details in these patterns. At the same time, the Pechora and Yamal coasts demonstrated lower wind speeds according to the COSMO-CLM dataset during the both periods.
An analysis of the wind speed frequencies above 17.2 m/s (not shown) has shown that the differences mentioned for the average wind speed become more significant, especially over Svalbard, Severnaya Zemlya, the Putorana Plateau, and the Tiksi Bay. The most striking increase is south-west from Svalbard and over the Novaya Zemlya Island. An increase in the frequency over the White Sea according to the COSMO-CLM dataset is attributed to possible better resolving of this relatively small sea by the mesoscale model. These features are caused by the mesoscale processes permitted by the COSMO-CLM model. It demonstrates the key advantage of regional mesoscale modeling of extreme mesoscale events. Since the model configuration was successfully verified previously, such estimates contribute significantly to the 'added value' of regional modeling.
The same analysis for the 20.8 m/s threshold ( Figure 3) has shown that the main maxima are located at the same places, except for reducing the Barents Sea and Northern Atlantic areas. The COSMO-CLM dataset has a spottier pattern over the Barents Sea, the frequencies between the Svalbard and the Franz-Josef Land are lower. The other differences and relations between the ERA-Interim reanalysis and COSMO-CLM dataset are the same as mentioned above.
(а) (б)  Figure 4). For both threshold values, the areas of higher frequencies over Novaya Zemlya are shrinking, the Svalbard maxima get lower, but spread over a larger area. The northern Atlantic maxima shifted more southwest, abroad the domain boundaries. The western area of the Barents Sea gale winds gets more frequent, herewith at the eastern area the opposite pattern occurs. The gale wind speeds become more frequent in 2010 -2016 only over the water area of Severnaya Zemlya and the Franz-Josef Land and the polar area; at the same time, they become less frequent over continental areas of Middle Siberia.
The same analysis was performed for the 2-m temperature fields. The average temperature, 1 % (the coldest) and 99 % (the warmest) percentiles, spatial patterns were analyzed. The average temperatures for 1980 -1990 ( Figure 5) are consistent with the well-known climatological large-scale patterns of the ERA-Interim and COSMO-CLM dataset. A more detailed analysis revealed a shift in the 'near-zero' area slightly poleward, especially over the Barents and Kara Seas, in the COSMO-CLM dataset. The reason for this displacement could be sea-ice margin reproduction, however, the source of sea-ice variables for the COSMO-CLM model was ERA-Interim, which were only The pattern in the recent years (2010 -2016, Figure 6) changed remarkably, including the overall increase in the background temperatures, a significant shift in the 'near-zero' area eastward and poleward. These features could be attributed to the Arctic amplification of global warming. Significant changes affected the islands, the Norway Sea, inland areas; at the same time, most coastal areas did not undergo significant changes. In general, the temperature gradient over the Arctic region reduced significantly, except for Scandinavia, the Atlantic and the western Barents Sea. The eastern Siberian inland patterns did not change anyway, except for increasing temperatures over its southernmost areas.
The most actual for the Arctic region are minimal temperatures, its spatial and temporal changes; therefore, the 1 % temperature percentile is of interest to analyze. Considering its patterns (Figures 7  and 8), we can see the same features mentioned above, but in most cases they are manifested slighter. The relief differences become more pronounced, especially in Eastern Siberia, which are almost not affected in the ERA-Interim. The overall warming is observed, including the western sea areas and the polar region. Regional features are prominent over Onega and Ladoga Lakes, the western Kara Sea, there is slight warming of the islands and Eastern Siberia valleys, and a more clear differentiation between the ridges and valleys. In most cases, there are regional and subregional differences caused by a more detailed reproduction and resolution of the coasts, relief, and mesoscale processes by the COSMO-CLM model. The reliability of these features is confirmed by the preceding detailed verification and large-scale pattern fitted to the recognized ERA-Interim reanalysis dataset.

Conclusions and perspectives
The above preliminary assessment of the COSMO-CLM dataset of ~12 km grid has revealed that the wind speed and temperature patterns are very close to the ERA-Interim reanalysis data, although there are many reproduced details over the Arctic regions that did not appear in the global dataset. The high wind speed frequencies are generally higher than those based on the COSMO-CLM dataset, especially over the Barents Sea, the Novaya Zemlya island, and some inland areas. Two periods (1980 -1990 and 2010 -2016) have revealed similarity of the wind speed patterns; however, there are many details that can be attributed to regional climate changes. The same but more slight changes are observed in the mean and extreme temperature patterns, which are probably caused by the Arctic amplification of the global warming. Therefore, the COSMO-CLM dataset is promising for analysis of regional wind speeds and temperature climatology and for estimation of extreme values.
(a) (b) Some further tasks are to create an open access to the data, which is a problem due to the huge dataset volume, and to collaborate with the scientific community sufficiently to use this dataset. It could provide new, more thorough, and justified estimates of the current regional climate changes, as well as extreme weather events. The data can be used for environmental studies and modern environmental change research, as well as scientific applications, such as forcing in the modeling of ocean's characteristics (wind waves and dynamics), coastal ecosystems (turbulent heat and moisture fluxes, greenhouse gases), experiments and more detailed research of individual phenomena on nested domains (extreme situations, hazardous weather events, etc.), analysis of trends in the frequency of occurrence of extreme events and features of their spatial distributions, climatology and tracking of polar mesocyclones, etc.