Data-driven prediction of mean wind turbulence from topographic data

This study presents a data-driven model to predict mean turbulence intensities at desired generic locations, for all wind directions. The model, a multilayer perceptron, requires only information about the local topography and a historical dataset of wind measurements and topography at other locations. Five years of data from six different wind measurement mast locations were used. A k-fold cross-validation evaluated the model at each location, where four locations were used for the training data, another location was used for validation, and the remaining one to test the model. The model outperformed the approach given in the European standard, for both performance metrics used. The results of different hyperparameter optimizations are presented, allowing for uncertainty estimates of the model performances.


Introduction
Wind turbulence, in the atmospheric boundary layer, is an important phenomenon in the design of civil structures for both static and dynamic wind loads, and for the safe operation of transport vehicles. It arises from both mechanical and thermal sources. Frictional forces between the moving air and the Earth's surface are the main drivers of atmospheric turbulence and are closely linked to the local topography. Thermal sources such as surface heating/cooling and downbursts can also cause turbulence in the atmosphere by convection.
Measuring the wind properties at some desired locations can be challenging, despite promising advances in remote sensing [1][2][3][4]. Cheynet et al. [5] showed a high heterogeneity of wind turbulence in a fjord with the wind direction, which can significantly impact the design of wind-sensitive bridges and other man-made structures. In these situations, wind measurements, when available, are often only found at nearby locations. If there is enough diversity in the topography of the available measurement locations and sufficient wind data is available, it is in principle possible to use machine learning to learn the complex effects that the upstream topography has on the wind turbulence.
Artificial neural networks (ANN) (see e.g. [6,7]), can be of different types. Among them, multilayer perceptrons [8,9] have been used in many problems in atmospheric sciences [10]. They have been used IOP Publishing doi: 10.1088/1757-899X/1201/1/012005 2 to e.g. predict wind speeds from ocean surface images [11], and effectively identify topographic features such as water bodies, hills and vegetation [12]. They are thus deemed adequate for the simplified vector inputs used in this study, despite a broader support for e.g. convolutional neural networks and transformers in more challenging computer vision tasks [13,14]. To the authors knowledge, this study is the first attempt to use topographic information to predict mean wind turbulence intensities at new locations, without explicitly parametrizing the topography. Parametric models representing terrain effects are inherently imperfect and are based on numerous simplifications and difficult assessments in an attempt to systematically represent a complex terrain. They were previously proposed in e.g. the Eurocode [15], Engineering Science Data Unit [16] and Bitsuamlak et al. [17]. Other studies [18][19][20] model the dependencies between wind measurements at different locations and predict wind speeds, but are unable to predict mean wind characteristics at new locations where no measurements were available, given only information about the local topography. Bodini et al. [21] predict the turbulent kinetic energy dissipation rate while condensing the effects of the upstream topography into two variables, namely the standard deviation of the terrain elevation and the mean vegetation height, but also test their model at previously trained locations.
The model developed in this study is trained, validated and tested using measured along-wind turbulence intensities that are averaged within 1-degree-wide wind direction sectors, here denoted sectoral averages, and the topographic data associated with each sector, for each measurement mast location. The model hyperparameters were optimized after each iteration of a so-called k-fold crossvalidation, and uncertainty estimates were provided for the model performance on each tested location.

Wind measurement data
Five years of wind data, between 2015 and 2020, from six measurement masts in the region around the Bjørnafjord, in Norway, are used. The locations and names of these masts are shown in Figure 1.
Each mast has 3 sonic anemometers (model: Gill WindMaster Pro) that measured the three components of the wind with a sampling frequency of 10 Hz. The anemometers are located at 13, 33 and 48 meters above ground. To avoid measurements affected by smaller nearby obstacles such as trees and buildings, which are not represented in the topographic data, only the data recorded at 48 m height was used, for simplicity. Thus, the turbulence intensities being predicted at the different locations also refer to a 48 m height above ground. The data is pre-processed to address faulty and missing data. An outlier detection is performed through a Z-score analysis, where the 99.99% most probable data is kept.
For each 10-minute interval in the five-year period, the mean wind speed , the mean wind direction and the along-wind turbulence intensity are recorded for each anemometer at 48 m height, when available. A threshold of 5 m/s was adopted and observations with smaller mean wind speeds were discarded. High threshold values require more data but help to remove turbulence observations that are not likely governed by friction, but by e.g. local thermal effects.

Topographic data
The Norwegian mapping authority provides freely accessible Digital Terrain Models of Norway [22]. A 10 × 10 meter resolution model was used (DTM 10), consistently represented in the map projection system UTM 33. For each mast and for each 1-degree-wide wind sector, a 10 km long upstream terrain profile aligned with the wind was obtained. Note that a 10 km fetch is also suggested in NS-EN 1991-1-4:2005+NA:2009 NA.4.3.2(2) (901.1). The heights above sea level of 45 points along the profile at the upstream distances = [0, 10,30,60,100,150, … ,9900] (meters) were collected into a normalized terrain profile vector , where for each single point a min-max normalization is performed from that point's extreme values (for all masts and directions), as exemplified in Figure 2. Note the linearly increasing distance between points. This decrease in resolution assumes that, far upstream, only larger topographic features still affect turbulence (see e.g. [15], NA.4.3.2(2) (901.2)). Different sizes of , between 15 and 60, were also tested, with roughly similar results. To consider the effect of the different IOP Publishing doi:10.1088/1757-899X/1201/1/012005 3 categories of terrain roughness, a vector was added to the data used. Two categories were considered, sea and ground, normalized into a binary vector, but more terrain categories could be included.

Artificial neural network
An artificial neural network (ANN) was established using PyTorch (v.1.9.0) (a Python library for deep learning). A multilayer perceptron arrangement was used, whose representation is shown in Figure 3. The ANN predicts the sectoral/directional averages of the along-wind turbulence intensities � , i.e., the mean value of all within each 1-degree wide wind sector, at each wind mast, at 48 m above ground. A k-fold cross-validation method is used where the data is divided into six folds and where each fold corresponds to the data of one measurement mast location. This forces the model to predict turbulence intensities at locations it has never "seen" before. Each fold contains up to 360 data samples, one for each wind sector. The procedure for training, validating, optimizing and testing is further detailed in Figure 3. The domain of hyperparameters investigated is described in Table 1.  A min-max normalization is applied to all inputs and target outputs to improve learning and stability. The target values � are compared with the predicted values � � through a loss function and learning is achieved by backpropagation. A batch gradient descent was found suitable due to limited data size and use of GPU-accelerated algorithms. The hyperparameters were optimized to maximize the 2 value (coefficient of determination) of the validation data predictions, using 500 iterations with a so-called "Tree-structured Parzen Estimator Approach". This is preferable to grid and random searches and has been shown to have a good balance between performance and computer efficiency when compared to other methods such as gaussian processes and random forests [23,24]. Since the resultant "optimal" hyperparameters depend on the initial conditions, 20 initial sets of arbitrary hyperparameters, thus 20 different models, were used to estimate the uncertainty of the 2 of the final testing data predictions. Lastly, when predicting the sectoral averages � , instead of each 10-min occurrence of , the topographic effects are better isolated and other time-and thermal-related effects can be disregarded.

Norwegian Standard -Eurocode NS-EN 1991-1-4
For comparison purposes, the along-wind turbulence intensity is also estimated following the Norwegian Standard and Eurocode NS-EN 1991-1-4 (ref. [15]). The measurement masts presented in this study are in a region with strong contrasts of terrain roughness, namely sea water (terrain cat. 0) and forests in relatively small hills (terrain cat. III). This transition in the upstream terrain roughness is considered in the Eurocode NA.4.3.2(2) (901.2.2). Different orographic effects on turbulence could also be considered. Those described in NA.4.3.3 (901.2.1) and NA.4.3.3 (901.3.2) can be applicable to some of the studied locations. However, in NA.4.4 it is not clear how to combine these effects with those from the different terrain roughnesses upstream, so only the latter ones are considered. Also, the orography factor is intended to represent isolated hills and escarpments, not undulating and mountainous regions.
To consider the upstream roughness heterogeneity, the upstream terrain is divided into two continuous patches of either terrain category 0 or III. The length of the two patches and the location of the transition between them was found iteratively for each mast and wind direction, by minimizing the number of misclassifications when compared to the original vector.

Results and discussion
The main results of the data-driven analysis are presented in Figure 4, Figure 5 and Figure 6.
In Figure 4, the predictions of one ANN model, per location, are plotted. The plotted models were those that had their performance ( 2 ) closest to the average performance of all models for a given location (dark red dots in Figure 5). Displaying only the best performing ANN models would lead to bias, due to a regression to the mean of future dataset test performances, and is thus avoided. Contour and line plots are shown for each mast location. The contour plots show the upstream topography for each mean wind direction, with the same resolution as given in the input data for the and vectors (see Section 2.2). A blue color is superposed to represent the sea water, with lower surface roughness. The line plots show the measurements and ANN predictions of � . The sectoral averages of the mean wind speeds, � , from the data described in Section 2.1, are also included for completeness.
Upstream hills close to the masts affect the results to a greater extent than hills further away. Long upstream fetches of water are characterized by low turbulence intensities. The ANN predictions are best at Ospøya 1 and Ospøya 2 as expected, due to the proximity (260 meters) and topographic similarity between them. Some predicted values at Ospøya 2, Landrøypynten and Nesøya seem slightly misaligned with the measurements. This can be due to local deflections of the wind direction around hills or/and due to discrepancies between reported and real anemometer orientations. At Svarvhelleholmen, the ANN underestimates turbulence for southern winds due to the inexistence of such high turbulence intensities in its training database. The Eurocode prediction also underestimates turbulence, but it could be argued that the alternative procedure in NA.4.3.3 (901.3.2) ("Lower lying construction site downstream of a hill or escarpment") would lead to slightly higher turbulence intensities for this particular site and direction. At Synnøytangen, the presence of nearby buildings and tall trees presumably affects the measurements to some extent for some directions.   Figure 5, the 2 values, between the predictions of all ANN models and the tested measurements, are shown as an indication of the model performances. Note that the hyperparameter optimization is a chaotic process that is dependent on the initial conditions, hence the 20 models per tested location and associated 2 uncertainty estimates. A value of 2 = 1 indicates a perfect fit, whereas 2 = 0 indicates a fit that is as good as a simple average of all 360 values of � (which is unknown a priori). Another performance metric, accuracy, is also included, taken as 100% − (mean absolute percentage error).
In Figure 6, seven histograms show the final choices of the hyperparameters for all the different ANN models tested, after all optimization iterations were complete. It took roughly 110 hours to compute the 6 mast locations × 20 ANN models × 500 optimization iterations, on a laptop PC (Intel Core i7-8850H, 64 GB 2666 MHz RAM, Nvidia Quadro P4200).  For all masts, the ANN predictions were able to roughly capture the main trends of the mean turbulence intensities with the location and wind direction, showing overall better performances than the Eurocode predictions. Nonetheless, it remains a challenging task to accurately predict turbulence, regardless of the model adopted.

Conclusions
A data-driven model was developed to predict mean wind turbulence intensities for each mean wind direction in a complex terrain, where no wind measurements are available. The model consists of an artificial neural network, namely a multilayer perceptron, whose hyperparameters were systematically optimized to improve the predictions. First, a database of topographic data and measured turbulence intensities at 48 meters height above ground, at different locations, for each wind direction, was used to train the model. Each topographic data sample consisted of 45 terrain elevation points, associated with a location, a wind direction and an upstream terrain profile, plus 45 binary classifications of those points' roughnesses into "ground" or "sea". Then, the model required only the topographic data at the desired new location to predict the mean turbulence intensities at the same height above ground, for each mean wind direction.
For the six locations studied, prediction accuracies between 72% and 87% were obtained, despite the relatively small training databases with only four or five locations. The model outperformed the procedures given in the relevant standard (Eurocode NS-EN 1991-1-4), which inherently require numerous simplifications that are difficult to implement and systematize in a complex terrain. The model is simple to establish, and the suggested framework can be easily adapted to include other input features and/or to predict other wind properties.
These findings can be useful when estimating the design wind loads on structures in complex terrains as a function of the wind direction. The proof-of-concept presented could also encourage other stakeholders in establishing a comprehensive and global database, with a larger number of measurement locations and diversified topographies, which could lead to an increase in model accuracy and reliability. Such a database and model could significantly impact the design, safety and cost-effectiveness of wind sensitive structures.

Recommendations for further work
A few recommendations and ideas on how to expand the current work are as follows: • Wind measurements at different heights above ground should be collected, to expand the scope of the model and capture the turbulence relationship with the height above ground. • More terrain categories, or a continuous roughness parameter, could be directly estimated as in [25,26], using e.g. the finer point cloud models available in [22] (0.25 × 0.25 m resolution). • The crosswind and vertical turbulence intensities, often assumed to have a linear relationship with the along-wind turbulence, could be included in the model. • Expanding the inputs to "see" a wider upstream topography, such as a ±15° sector around the wind direction, could improve the predictions and capture effects such as wind deflection around hills and the horizontal diffusion of turbulence. All-around topographies could also be considered, to capture channeling and downstream blockage effects. In the present study and limited data, this resulted in no obvious gains in accuracy. • Convolutional neural networks and other state-of-the-art computer vision models could be used to capture the spatial information of the expanded inputs mentioned above. • A hybrid ANN + Eurocode model could be pursued, where the Eurocode predictions could be added to the ANN inputs. • Predefined probability density functions of wind turbulence could be predicted instead of the sectoral mean turbulence intensity. Attempts in the present study have shown that functions with more parameters resulted in a better representation of the real data, but led to worse predictions, and vice-versa, presumably due to the lack of data in some wind sectors and the small number of mast locations in the database.