OpenMP performance for benchmark 2D shallow water equations using LBM

Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.


Introduction
The fluid flow phenomena such as flow in the rivers, channels, and lake can be described by a known mathematical model which is called shallow water equations (SWE) [1]. Consisting of mass (1) and momentum (2 -3) equations, SWE in 2-dimension can be given as follows where h denotes the water depth/height, u the lateral velocity in x direction, v the lateral velocity in y direction, and g a constant gravitational force. The variables x, y and t denote the space and time respectively. The approximation solution of Equations (1 -2) can be obtain using several methods. The most popular method is grid method known as the finite volume method (FVM). This method is developed into several numerical schemes which are widely applied and are shown robust scheme, for instance see the references [2,3,4,5] and [6]. However, the deficiency of this method is to solve the general water flow with a complex structure. Here, the LBM is the numerical method based on particle method. LBM is already applied in many applications of fluid flows [7]. The advantage of LBM is the complexity of boundary condition can be omitted [8]. However, LBM can gain higher computational cost than the other methods. In order to accelerate the computation in time, the parallel computing techniques will be elaborated. In references [9,10,11,12] and [13], parallel computing is shown as a good idea to accelerate the numerical computation. Thus the goal of this paper is to obtain the performance of parallel computing for simulating LABSWE. Additionally, here the shared parallel architecture using OpenMP platform will be used.

Approximation Solution of SWE using LBM
Lattice Boltzmann method (LBM) is the lattice gas automata (LGA) evolved method for solving a discrete computational equation [14]. The characteristics of LBM is composed of three important parts: lattice Boltzmann equation (LBE), lattice pattern which formed by arranged particles, and equilibrium distribution function.

Lattice boltzmann equation
In the LBM, there are two important processes: streaming step and collision step. On the streaming process, the particles move to another nearby lattice pattern particles in any directions using their velocities which is written by where f α is a function for distributing particles, f α a distribution function before process of streaming, F i the force function in i-direction, ∆x the uniform distance of particles over the lattice, and ∆t the time step. The velocity is given by e = ∆x/∆t, e α and a constant N α is the constant can be determined by the pattern of following lattice Particles move because the influence of the velocity by another particles in collision step. This step can be described by the following equation where Ω α is called the collision operator which can control the acceleration of the change of f α . This operator is determined from the microscopic dynamics in generally. Higuera and Jiménez [15] introduce the linearization of collision operator by the equilibrium condition. Therefore, the collision operator can be simplified becomes where τ is defined as relaxation time and f eq α is known as distribution function of the local equilibrium.
Finally the combination of two steps (streaming and collision) which consists of Equations (4), (6) and (7) becomes following popular formula known as Lattice Boltzmann Equation (LBE),

Lattice pattern
Lattice pattern is used for representing grid points and determining particle's motions in LBM. From the lattice pattern, we could easily know that the LBM doing a 1D, 2D, or 3D model. In addition, the constant N α from Equation (8) is determined by the lattice pattern. In 1D model, the lattice pattern is depend on the amount of particle speed in lattice points. Generally it is known as the line lattice with two or three speed model. Meanwhile, there are commonly two kinds of lattice patterns in 2D models: the hexagonal and the square lattices. Hexagonal lattice can have 6 or 7-speed models, while square lattice can have 4, 5, 8, or 9-speed models.
According to recent study, the most accurate lattice pattern of all 2D models is square lattice with 9-speed (D2Q9) models [16]. Hence this paper will use D2Q9 to simulate 2D SWE which the model of lattice and its system are shown in Figure 1  In D2Q9 lattice (Figure 1 (b)), each particle moves to its neighbor with certain velocity and direction along the 1-8 links which 0 is indicated as the link with zero speed in the particle. This vector of particle velocity is defined by and the D2Q9 lattice has basic features as follows [14] α e αi = α e αi e αj e αk = 0, where ∆ ijkl = 1 in terms of i = j = k = l, and ∆ ijkl = 0 besides. Using Equation (9), the value of N α can be found by evaluating Equation (5) Substituting Equation (13) to the Equation (8), leads to Finally, Equation (14) is in the form of LBM which is used for simulating fluid flows.

Local equilibrium distribution function
Based to the LGA, an equilibrium function is originally obtained from Maxwell-Boltzmann equilibrium distribution function [17,18]. Here the equilibrium function is expressed as, where u i and u j are the components of flow velocity, or well known as u i = u and u j = v. Equation (15) becomes a popular equation since this equation is successfully applied for approximating the solution of many flow models [19,20]. Since the lattice on Figure 2.2 (b) has the same shape as the equilibrium function, thus likewise the expressions for Q α , R α and S α . According to Equation (12), the local equilibrium function D2Q9 lattice ( Figure 2.2 (b)) can be written as , P +Qe αi u i +Re αi e αj u i u j +Su i u i , α = 1, 3, 5, 7, P +Qe αi u i +Re αi e αj u i u j +Su i u i , α = 2, 4, 6, 8.
Basically, the coefficients P 0 ,P andP are defined by a constraint of equilibrium distribution function and are based on the law of conservation such as mass and momentum conservation. Moreover, to get the local equilibrium distribution function in SWE, three conditions should be satisfied, α e αi e αj f eq α (x, t) = The detail of previous equations can be found in book of Guo Zhou [20] and known as Lattice Boltzmann Method for The Shallow Water Equations (LABSWE).

Parallel Architecture
For accelerating the computational time of 2D SWE using LBM, the process will be elaborated by parallel algorithm. Since the existence of patterned particles in the system of LBM, the computation of functions from each grid can be separated and computed in different processor. In this paper, the parallel algorithm will be implemented using an application programming interface (API) OpenMP.
OpenMP is an API that provide multithreading and shared memory [21,22,23,24,25,26]. Multithreading is the capability of CPU for running some processes in the same time. This capability may reduce the runtime of process and increase the efficiency of an algorithm. Moreover, shared memory is the architecture of parallel which allows each running process to gain memory source proportionally. With this feature, access time to the storage can be minimum and accelerate the runtime of process can be increased. In LABSWE algorithm, the parallelization can be implemented on streaming and collision step for each grids. More detail about the parallel algorithm in LABSWE is shown in this flowchart Figure 2. In serial process, all of the parameters are declared and stated the initial condition of the water flows. Meanwhile in parallel process, the procedures to compute and update the collision and streaming step of all lattice particles are elaborated.

Numerical Results and Parallel Performances
Here, the initial conditions of 2D benchmark LABSWE are given as follows The wall boundary condition is given in this simulation. The wave is generated at the boundary The results of water height h(x, y, t) in two final time observations can be seen in Figure 3. It can be seen clearly that the water wave are generated on the boundary x = 0 and produces the highest wave amplitude 0.053 m. The waves propagate to the right side of domain. Moreover, the velocity profiles in x and y-direction at final time of simulation t = 30 and t = 105 seconds are given in Figure 4. From Figure 4, there are discrepancy between the velocities v and u in two final time of simulation. The velocity u is shown more fluctuated than the velocity v since the water waves propagate along the x-direction.
This paper is focus on the parallel performances of previous simulation. The results of the parallel performance can be found in Table 1. Here, two experiments are given, using 2 and 4 threads. The ten various number of L x and L y are given in order to evaluate the parallel performance.
From the Table 1, the OpenMP platform is shown able to accelerate the computational time for various number of grid sizes. In using the large number L x = 1000 and L y = 500, the CPU time of serial code is obtained 463.624 s. Meanwhile the CPU time in parallel code using 2 and 4 threads are observed 393.54 s and 333.243 s respectively. This result produces speedup 1.178 and 1.391 times using 2 and 4 threads respectively. Moreover, the efficiency using 2 threads is obtained 58.9% and using 4 threads is calculated 34.8%.
In this paper, by the experiment using 2 and 4 threads, the speedup for all simulations using 4 threads is observed get the best speedup than using 2 threads. However, in contrast with the speedup, the efficiency using 2 threads is obtained better than using 4 threads.

Conclusion
According to the results, the parallel algorithm for solving LABSWE can decrease the execution time for serial algorithm. It means parallel algorithm is successfully applied in this problem. Furthermore, parallel code with 2 threads has lower average speed up than using 4 threads.   However, the average efficiency of parallel code using 2 threads is higher than using 4 threads. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is obtained 393.54 s and 333.243 s respectively. Moreover the efficiency of 2 and 4 threads is conducted 58.9% and 34.8 % respectively.