Validation of a Lattice Boltzmann Solver Against Wind Turbine Response and Wake Measurements

Two of the major limitations facing the adoption of large-eddy simulation (LES) to the industry today are a lack of validation against full-scale measurements and the high computational cost. The lattice Boltzmann method is an approach to conduct LES that is suitable for parallelization on graphics processing units, leading to reduction in energy-to-solution by multiple orders of magnitude compared to Navier-Stokes solvers. We validate the lattice Boltzmann solver VirtualFluids against the measurements published in the SWiFT benchmark and the results obtained with LES by the participants in the benchmark. We compare inflow, turbine response and wake quantities and show that our method yields similar results. While the other LES methods vary in the required energy by one order of magnitude, our methodology is always about one to two orders of magnitude more efficient. The benchmark allows for a comparison to a large number of models, however, the scale of the turbine is not representative of modern turbines and therefore important challenges of modern turbines, such as blade deflection, could not be validated.


Introduction
The current modelling challenges in wind energy require fast, yet accurate simulation methods.When modelling large scale phenomena such as wind farm blockage and stratification or power and load optimization, low-fidelity models can often not reach the accuracy of high-fidelity models, namely large-eddy simulations (LES) [1,2,3].
However, LES is computationally very expensive, and users have to weigh the increased accuracy against the increased computational cost.In addition, even results from LES simulations have been shown to vary across different codes, see for example [4].Differences in numerical method, configuration of the simulation or even small differences in implementation details can alter the results.The vast number of parameters thus makes comparisons between codes and between simulation and experiment difficult.Validation is required to be able to quantify the trade-off in accuracy versus computational effort, to build trust in the capabilities of the used models and to gain experience in applying LES to real world scenarios.
A number of efforts have been made in the wind farm modelling community to provide validation cases and cross-code comparisons.Martínez-Tossas et al. compared four large-eddy simulation codes and the influence of turbulence model and parameters in [5].To highlight the effect of the discretization scheme, a turbine was simulated under highly idealized conditions.Blade forces and velocity deficit in the near wake only differed slightly between all tested models and independent of the Smagorinsky coefficient.However, higher order accuracy pseudo spectral solvers exhibited lower diffusivity and thus later near-to far-wake transition.Six model frameworks are compared to results from the DanAero experiment in [6].The benchmark includes models of different fidelity and found that integral quantities such as power were well captured by all models, whereas LES predicted second-order statistics of the flow field more accurately.The authors highlighted the difficulty in generating inflow conditions for the LES simulations.The SWiFT benchmark compares a variety of simulation tools based on different approaches to measurements from a campaign conducted at the Scaled Wind Technology Facility (SWiFT) [7].The collected data includes detailed measurements of velocities in the turbine wake as well as turbine quantities.In the original publication 13 different tools were compared and since then the benchmark was reproduced with another LES code in [8].
In this work we aim to validate a new lattice Boltzmann framework for wind farm modelling with the neutral SWiFT Benchmark case.The lattice Boltzmann method is a promising approach to reduce the computational cost of LES simulations by utilizing Graphics Processing Units (GPUs).It has been applied to wind energy applications in recent years and also been validated in a highly idealized comparison [9,10,11].We introduce the methodology used in the framework, describe the setup of the validation case and then compare the results to the measurements and results of participants of the benchmark.Finally, we provide some remarks on the computational performance before concluding the paper.

Methodology
The flow around the wind turbine within the atmospheric boundary layer is simulated via the lattice Boltzmann method (LBM) coupled to an actuator line.The general-purpose solver VirtualFluids is used as a flow solver while the Wind Farm Interface (WiFI) package computes the turbine response and models the turbine controller.

The Lattice Boltzmann Solver
LBM is an alternative approach to solve the weakly compressible Navier-Stokes equations.It is based on the kinetic theory of gases.The primary variable is the particle distribution function f , describing the distribution of particles in space, time and velocity space.The evolution of f in time is described by the Boltzmann equation.The discretized version of the Boltzmann equation, called the lattice Boltzmann equation, reads The discretized distribution is f i , the location is x, t is time, ∆t is the size of the timestep and c i are the lattice speeds.The function Ω(f ) is called the collision operator, which represents diffusion on a macroscopic level.In this work the particle distribution function is discretized using 27 distributions, resulting in a so-called D3Q27 lattice.The grid spacing ∆x is related to the timestep through the lattice speed c = ∆x ∆t .For further details of the basics of the methods the reader is referred to [12].
A variety of choices exist for the collision operator.Here we make use of the cumulant operator as presented in [13].The cumulant collision operator is fourth order accurate in diffusion and exhibits excellent stability even at very high Reynolds numbers.It is implemented in the GPU-resident general purpose flow solver VirtualFluids [14].Slip and no-slip boundaries are implemented via bounce-forward and bounce-backward approaches, see Appendix E in [15] for details.A stress boundary is modelled via the inverse momentum exchange method (iMEM), that allows for the use of a wall model [16].In this work, the wall stress is computed via Monin-Obukhov similarity theory and a time averaged velocity sampled at 1.5∆x above the wall, just like in [16].Outflow boundary conditions are applied through a zero-gradient approach [15].To interpolate between different grid resolutions the compact second order interpolation is used [17].The wind turbine is represented through an actuator line.The implementation follows the original methodology as described in [18].

Precursor
For the first time we also present a precursor boundary condition in combination with the LBM.A precursor can be either implemented as a time-varying velocity boundary, similar to a Navier-Stokes solver, or by directly recording and prescribing the distribution functions at the boundary.The latter approach requires 9 values per node per timestep but comes at a very low complexity and highest possible accuracy.We have opted for the latter method.We conducted experiments with reducing the frequency of reading and writing timesteps in order to reduce memory loads and disk space consumption but found a drastic decrease in accuracy.

Turbine Response
The computation of the turbine response as well as control of the turbine are handled with the Wind Farm Interface (WiFI) package 1 .WiFI allows for the definition of complex controllers or the incorporation of external controller frameworks.In this work a simple variable speed controller is used, similar to the one used in OpenFAST.It consists of three regions: region 2 for wind speeds below rated, where the turbine is controlled to maximize power, following an ω 2 law, region 3 for wind speeds above rated where a constant torque is set, and region 2.5, a linear transition from region 2 to region 3.

Coupling
The flow field and turbine response are coupled in a staggered fashion in order to parallelize the computation of the flow field and the turbine response and thus eliminate overheads of computing the turbine response.At timestep t i , VirtualFluids extracts the velocities at the blade nodes, passes them to WiFI and then advances to the next timestep.At time t i+1 , WiFI corrects the velocities for the spurious induction due to the Gaussian smearing of the actuator line with the smearing correction by Meyer-Forsting et al. as presented in [19].Subsequently, WiFI computes the force response based on tabulated drag and lift data and the controller torque and pitch settings.At time t i+2 , the forces computed by WiFI are applied in the flow domain.Therefore, the body forces of the turbine are applied with a delay of two timesteps.However, due to the explicit discretization, the timestep in an LBM simulation is typically much smaller than with implicit methods.

The SWiFT Benchmark
A measurement campaign was carried out at the Scaled Wind Farm Technology (SWiFT) facility, situated in Lubbock, Texas, USA.An extensive description of the benchmark cases can be found in [20].At the site, the wake of a Vestas V27 turbine with a rotor diameter of D = 27 m and a hub height of z hub = 32.1 m was measured.
To characterize the inflow, a meteorological tower collected measurements 2.5D upstream in the predominant wind direction, including wind speed and direction at various heights.A more detailed description of the measured data can be found in [21].Additionally, the rotor speed as well as generator power and torque were recorded.The wake of the turbine was measured with a DTU SpinnerLidar [22].Wind speeds were measured between 1D and 5D, although the wake was only defined after 2D downstream of the turbine.
Three cases with varying stability were compiled for the original benchmark.Here, we only consider the neutral case.Six ten-minute periods with no synoptic forcing present, low yaw offset and a stability parameter close to zero were selected from the measured data and used to define the benchmark.Ensemble mean values of the quantities of interest are presented in Table 1. 13 models participated in benchmark, including three analytical models (Gaussian,  [20]. GaussianIQ and FarmShadow), three Reynolds-averaged Navier stokes models (EllipSys3D-ABL, EllipSys3D-MOST and WakeBlaster), two dynamic wake meandering models (FASTFarm-LES and FASTFarm) and five LES models (NaluWind, SOFWA, SOFWA-2, EllipSys3D and PALM).For details on the different models, we refer to [7].In a validation study similar to this one, the solver Meso-NH was compared as well, the results are included in the comparison in section 4 as well [8].

Precursor setup
A precursor simulation is conducted to generate a realistic inflow for the simulation.The precursor has a domain size of 6 km × 6 km × 1 km in streamwise, lateral and vertical direction and is discretized with an isotropic grid with a grid spacing of ∆x = 7.8125 m.The domain is periodic in the streamwise and lateral direction x and y, respectively.At the upper boundary a slip boundary condition is applied and at the lower boundary we apply the inverse momentum exchange method as discussed in subsection 2.1 with a roughness length of z 0 = 0.014 m.The flow is driven via a constant pressure gradient of ∂p ∂x = 1.849 × 10 −4 N/m 3 .We neglect Coriolis forces.The QR-model with the model constant set to 1/5 is used to model the subgrid-stresses, see [23].The limiter of the cumulant operator is set to 0.01 [24].The domain is initialized with the desired mean flow velocity superimposed with sinusoidal perturbations in the streamwise and lateral velocity.The simulation is spun up for 23 hours.After the spin up time the distributions necessary for the boundary of the successor are saved every timestep.

Successor setup
The successor simulation contains three layers of refinement inside a coarse mesh.The coarse mesh has a size of 6 km×6 km×1 km at the same resolution as the precursor, i.e., ∆x = 7.8125 m.The center of the turbine rotor is placed at (3000 m, 2500 m, 32 m).The domain is successively refined towards the turbine.The finest grid of size 38 D × 8 D × 6 D = 1026 m × 216 m × 162 m begins 23 diameters upstream of the turbine and is centered around the turbine in lateral direction.The grid spacing in the inner grid ∆x inner = ∆x/8 ≈ 1 m.In total the domain comprises 160 million nodes.The lateral boundaries are periodic and the bottom and top boundaries are the same as in the precursor.The outflow boundary condition as described in subsection 2.1 is applied.After a spin up time of ten minutes, flow fields at 2.5 diameters upstream and 2,3,4 and 5 diameters distance downstream of the turbine are sampled.Additionally, the generator torque M , thrust force, rotor speed ω, and generator power P gen are recorded by the turbine.All measured quantities are recorded for 10 minutes, instantaneous quantities are saved with a frequency of 1 Hz.

Results
In the following we compare the results obtained from the simulation as described in section 3 to the measurements and results given in [7] and, if applicable, in [8].We begin by comparing the inflow, measured 2.5D upstream of the turbine.The mean inflow velocity u ∞ , the turbulent kinetic energy T KE and the wind direction β are presented in Figure 1.The velocity profile is well-matched in shape in all simulations, including ours.However, the wind speed obtained with VirtualFluids is lower than in the other simulations.This is due to the spatial variability of velocity in the precursor simulation, which contains large scale structures of higher and lower velocity.This effect can be alleviated through a shift in the periodic boundary [25].We chose a window with constant wind speed even though the average wind speed was lower than that of most measurements.As most quantities presented in the following are normalized with the inflow velocity, the differences are expected to be negligible.The profile of turbulent kinetic energy is almost constant with height.It is also lower than all measured profiles.However, the ratio between inflow velocity and turbulent kinetic energy, the turbulence intensity, is matched quite well.For our simulation we calculated a value of 9.5%, compared to an ensemble average of the measurements of 10.7%.The wind direction is closely aligned with the streamwise direction, indicating that no major streaks of high or low velocity passed through the domain.Due to the absence of Coriolis force in the simulation we do not observe the veer found in the measurements.

Turbine response
The turbine behavior is examined in Figure 2. The turbine operates close to the rated speed.All simulations, including ours, match the rotor speed to a high degree.The variation between the simulations in generator power is larger.Most simulations overestimate the rotor power, except the results obtained in PALM.The mean power obtained in our simulation also lies within the 95th percentile, albeit that the variability is higher in our simulation.We observed that the power and torque are very sensitive to the accurate definition of region 2.5, as the turbine operates in that region a large part of the time.The results for the generator torque M are very similar to the ones of the generator power.Finally, we compare the thrust coefficient.We have defined the thrust coefficient C T by normalizing the instantaneous thrust force exerted by the blade with the inflow velocity as measured 2.5 diameters upstream of the turbine at the same The values range from 0.75 (PALM, VirtualFluids) to 0.84 (SOWFA-2).The differences can be due to various causes including the use of the smearing correction in our approach.The smearing correction reduces the overprediction of the normal and tangential force at the blade root and tip, thus leading to an overall decrease of torque and thrust.We can conclude that the turbine response in our simulation is similar to that in other simulations and matches the measured values well, in both magnitude and variability.

Wake
The wake of a wind turbine is characterized by a velocity deficit.The relative velocity deficit is computed from the averaged velocity field u and the inflow velocity field u ∞ via ∆u = u/u ∞ − 1.
We compare the measured velocity deficit at three diameters downstream of the turbine with the results from our simulation in Figure 3.The contour shows that the simulation matches the measured data well, both in magnitude of the velocity deficit as well as the average shape of the wake.The measured wake is slightly shifted to the side compared to the simulation.In Figure 4 we present a comparison of the velocity deficit profile at hub height, including the measured data, the LES data provided by the other modelers and our simulation at the four downstream distances.At x = 2D, some simulations, including ours, show a local maximum of velocity in the center of the wake, indicative of the near-wake region.This local maximum is not present in the measurements.The wake has therefore already transitioned.Overall, we find a large spread in the velocity deficit profiles.Most models tend to underestimate the velocity deficit.Our results exhibit a larger velocity deficit at x = 2D and x = 3D, but are very close to the measured values further downstream.
Velocity deficit profiles at 2, 3, 4 and 5 diameters downstream of the turbine.Black lines are the averages obtained from the lidar measurement and the grey shaded area represents the 95 % confidence interval.
To characterize three specific phenomena, namely wake strength, wake recovery and wake meandering, three different quantities are compared in the following.The wake strength is indicated by the largest velocity deficit, signified by a minimum of ∆u, shown in Figure 5 (a).The wake recovery is measured by the difference of the maximum velocity deficit relative to the velocity deficit at x = 2D, presented in Figure 5 (b).To characterize the wake meandering, the wake center is determined with the samwich toolbox2 .To ensure comparability to the other available data, we used the same parameters as [7].The standard deviation of the horizontal and vertical wake center coordinate, Γ y and Γ z , respectively, are shown in Figure 6.The strength of the wake deficit is underestimated by all other modelers at x = 2D, yet the predictions become more accurate further downstream.Our results overestimate the wake strength at the first plane but match the measurements the closest at all other downstream distances.
The wake recovery, shown in Figure 5 (a), is matched well by many modelers.Only SOWFA underestimates the recovery.The authors in [7] connect this deviation to the lower turbulence intensity in the inflow.Our simulations match the wake recovery well in the second plane.However, further downstream the recovery is overestimated.This observation does not fit the explanation given by Doubrawa et al.We attribute the difference mainly to two factors: first, the transition of the wake happens at a later point in our simulation, compared to the measurements.However, the speed of wake recovery significantly changes from near to far wake.This leads us to the second point, namely that the displayed quantity, i.e., the difference in the minima of velocity deficits, is not a very robust measure of the wake recovery.We have included it here nonetheless to provide comparability to the other results presented in the benchmark.The final quantity, the wake meandering, is displayed in Figure 6.Unfortunately, too few wake measurements are available to properly characterize the wake meandering.In general, the horizontal meandering is stronger than the vertical, which is to be expected as the wake is not constricted by the ground and the meandering gets stronger with downstream distance.All modelers exhibit these trends, however the strength of the meandering differs, possibly due to differences in the large scale motions of the inflow.The values obtained with VirtualFluids for the vertical meandering are similar to those reported by the other modelers, however the lateral meandering in VF is higher.

Computational cost
As we discussed in the beginning, users often have to make a choice between accuracy and computational performance when picking the model to apply.As Doubrawa et al. point out, the low-fidelity models have almost negligible computational cost and are capable of capturing average wake quantities with some accuracy.The more computationally expensive LES simulations capture dynamic quantities as well, but also have drawbacks.These include the difficulty of matching prescribed inflow conditions and the general complexity of setting up simulations.Computational performance comparisons are difficult, as the results differ greatly depending on hardware, resolution and a variety of other factors.With the use of GPUs, it becomes even more cumbersome due to the different processor architecture.Therefore, we only intent to give a general impression of the computational performance of our model.2) vs energy-to-solution, E2S.Data for other simulations taken from [7] and converted from processor hours to energy-to-solution.
i.e. the normalized root-mean-square difference of the velocity deficits at all downstream distances [7].A more universal metric, that also allows for the comparison of different hardware architectures, is energy-to-solution, E2S, the total amount of energy consumed by the hardware to arrive at the solution.Since detailed hardware description is not given in the original publication, we convert processor hours to E2S by assuming each CPU has 32 cores and a thermal design power (TDP) of 150 W, which is a favorable assumption for modern server processors.Our simulation was carried out on a single NVIDIA A100 with a TDP of 400 W and a single AMD Epyc 7742 with TDP = 225 W. Simulating 10 minutes took on average about 2000 s.We show the result of that conversion in Figure 7.The energy required for a single conventional LES computation ranges from 30 kWh to more than 300 kWh.In comparison, our simulation only requires about 0.5 kWh and is therefore more than 1, often more than 2 orders of magnitude more energy efficient.Indeed, the energy consumption is closer to that of the RANS simulations, albeit with significantly higher accuracy.

Conclusion
The goal of this paper is to validate our lattice Boltzmann-based framework for wind farm simulations against the neutral SWiFT benchmark.We introduced the general methodology of our framework, consisting of the LBM solver VirtualFluids to compute the flow, and WiFI to compute the turbine response.We also presented our approach to incorporating a precursor simulation.
We conducted a simulation with the aforementioned framework and showed that our results matched well with the measured inflow.We also compared the turbine quantities such as rotor speed, generator power, generator torque and thrust coefficient to measured data and other LES models and found a good agreement in all quantities.Finally, we validated the wake by comparing wake strength, wake recovery and wake meandering to measured data and results from other modelers.Our model predicts all wake characteristics with similar accuracy as other LES models.A comparison of energy-to-solution among the LES models but also the less computationally expensive approaches demonstrated that our model requires 1 to 2 orders of magnitude less energy than the other LES models, while being able to achieve the same accuracy.
In conclusion, we were able to show that our methodology is capable of accurately predicting the turbine response and wake of a single turbine subjected to the lower atmospheric boundary layer, while being considerably more cost efficient than other models with the same accuracy.
While this validation case serves well to present the general capabilities of our model, the turbine used in the measurements is not representative of current or future turbines.Therefore, a number of important characteristics of modern turbines could not be validated in this study, including flexible and twisting turbine blades and in general rotors placed much higher in the atmospheric boundary layer.

5 T 2 267 1 Figure 1 .
Figure 1.Mean inflow velocity (left), turbulent kinetic energy (center) and wind direction (right).The averages of the measurement periods are shown as black dashed lines.

1 Figure 2 .
Figure2.Average rotor speed, power, generator torque and thrust coefficient.The black dashed line is the average obtained from the measurements and the grey shaded area represents the 95% confidence interval.The horizontal black bars represent the 95% confidence interval from the simulations.Note that SOWFA uses a constant rotor speed.

1 Figure 3 .
Figure 3. Contour of the average velocity deficit at 3 diameters distance downstream of the turbine.The lidar measurements (a), the results obtained with VirtualFluids (b) and the difference between the lidar measurement and simulation (c) are shown.The black circle represents the swept area of the rotor.

1 Figure 5 .
Figure 5. Maxima of the average velocity deficits (a) and maxima of average velocity deficit relative to the maximum velocity deficit at x = 2D (b).Black dots represent the measurements and the 95% confidence interval is marked with error bars.

1 Figure 6 .
Figure 6.Standard deviation of the wake center in and vertical direction.

Table 1 .
Ensemble mean values of various quantities of the SWiFT neutral benchmark case as presented in Doubrawa et al. reported the processor core hours by the modelers versus an error measured defined as y (∆u M eas (y) − ∆u Sim (y)) 2 y ∆u 2 M eas (y) + z (∆u M eas (z) − ∆u Sim (z)) 2 z ∆u 2 M eas (z)   ,(2)