Application of PDSLin to the magnetic reconnection problem

Xuefei Yuan; Xiaoye S Li; Ichitaro Yamazaki; Stephen C Jardin; Alice E Koniges; David E Keyes

doi:10.1088/1749-4699/6/1/014002

1. Introduction

Most of the visible universe is in the state of plasma, and plasma phenomena are of major importance in space, solar and ionospheric physics. A plasma is an ionized gas, which consists of positively charged ions and negatively charged electrons. In a plasma, the microscopic processes are dominated by collective charged particles interactions: charge separation between ions and electrons gives rise to electric fields, and charged particles motions result in currents and, consequently, magnetic fields. Electric and magnetic fields configurations can be quite intricate and provide foundations for a wide range of phenomena of overwhelming complexity. The mathematical description of plasma appropriate for describing global dynamics is magnetohydrodynamics (MHD) [5].

The simplest form of MHD is ideal MHD, where a fluid has so little resistivity that it can be treated as a perfect conductor [6, 7]. The topology of magnetic fields is fixed due to this small resistivity and energy can be stored in moving fluids. The release of energy can happen when the condition of ideal MHD breaks down; thus resistive MHD is considered [8–10]. Resistive MHD describes magnetized fluids with nonzero electrical resistivity that leads to a breaking in magnetic topology, and the presence of the Hall parameter introduces Whistler waves into the equations [5, 11, 12].

Among a multitude of plasma phenomena, the magnetic reconnection problem deserves special attention [13]. Magnetic reconnection is a fundamental process in a magnetized plasma: in the reconnection process, two magnetic flux tubes come close together at some point, and they are broken and reconnected in a different way due to the effect of finite resistivity and other non-ideal effects, where the overall topology of the magnetic field is changing and the magnetic field energy is converting into particle heat and bulk kinetic energy over a relatively short period of time [11]. Such phenomena occur in a wide range of laboratory and space plasmas, e.g. magnetic fusion experiments, the solar corona and the Earth's magnetotail [11, 14, 15].

To fully capture the change of magnetic field topology in magnetic reconnection, we focus on a four-field Hall MHD model valid in the low guide-field limit, where Whistler waves are the dominant two-fluid (ion and electron) effect in the present work. This set of extended MHD equations with hyper-resistivity terms is derived from a set of two-dimensional basic MHD equations describing incompressible, two-fluid, quasi-neutral plasma [16–18]. These equations are a subset of the full, compressible two-fluid MHD equations, which have been studied in two [19] and three [20, 21] dimensions.

However, the simulation of magnetically confined, reconnecting plasma presents numerical challenges [18, 19, 22, 23]. This is a result of many factors, including the complexity of models that accurately represent burning plasmas, as well as the resolution of the large range of spatial–temporal scales at which significant physical processes occur [24]. Even in the simpler ideal MHD model, a symmetric hyperbolic system that is a subset of the two-fluid or extended MHD equations, there are three distinct wave types with a wide separation of propagation speeds and with complex polarizations when applied to magnetized plasma conditions typical of fusion plasmas. When discretized on a finite difference or finite element mesh, these alone lead to a range of timescales and accuracy requirements that are a challenge to address with a single simulation [25].

A key result of temporal stiffness is that traditional explicit methods used for solution to such models may require prohibitively small time step restrictions compared to the dynamical scales of macroscopic stability and plasma fueling. The Courant–Friedrichs–Lewy condition [26, 27] imposes the time step size limits for numerical stability that grow quadratically with the mesh increment sizes. Implicit schemes alleviate those issues related to the time step size and the mesh increment sizes.

We present implicit numerical methods in a backwards difference formula (BDF)–Newton solution framework. The partial differential equations system becomes a set of nonlinear finite difference equations, F(u) = 0, after discretization. For such a set of nonlinear algebraic equations, inexact Newton methods [28–30] are applied on each implicit time step to iterate to a solution through a sequence of linear problems (Newton update equations) from an initial guess of the solution from the previous time step, and some linear solvers are used for the Newton equations. Moreover, such an implicit time advance for the two-fluid magnetic reconnection problem is known to be difficult because of the large condition number of the associated matrix. This is especially troublesome when the collisionless ion skin depth is large, so that the Whistler waves, which cause the fast reconnection, dominate the physics [1].

Large-scale sparse linear systems of equations, such as the Newton update equations system within numerical simulations of magnetic reconnection, become more interesting in large-scale numerical simulations because their complexity grows superlinearly with traditional direct techniques. If such a system is solved via a direct solver by factorization, the memory requirement is extremely large; if such a system is solved via an iterative solver, the preconditioning techniques are highly required in which parameters tuning is quite complex to provide a quick convergence [1].

Recently, a new library called PDSLin (Parallel Domain decomposition Schur complement-based Linear solver) [3, 4] was introduced as a hybrid direct/iterative solver based on Schur complement methods. The Krylov method (Generalized Minimal RESidual (GMRES) method) with the exact LU factors of the approximation of the Schur complement as the preconditioner in PETSc [31] is used to solve the Schur complement system. Finally, the interior systems are solved in parallel using the already computed LU factors of the subdomains. PDSLin is implemented in C with a Fortran interface and uses MPI for message passing on distributed memory machines.

In section 2, we state the four-field extended MHD equations including hyper-resistivity terms. Section 3 describes the inexact Newton methods. In section 4, the hybrid linear solver library PDSLin is introduced. The numerical experiments and comparisons for the direct solver and hybrid solver are discussed in section 5. Section 6 provides the conclusions of this work.

2. The mathematical model for the magnetic reconnection problem

The reduced two-fluid MHD equations in two dimensions in the limit of zero electron mass can be written as [18, 32]

$\begin{equation}\left\{ \begin{array}{@{}l@{}} \displaystyle \frac{\partial}{\partial t}\nabla^{2}\phi + \mathbf{V}\cdot\nabla(\nabla^{2}\phi) = [\nabla^{2}\psi,\psi] + \mu\nabla^{4}\phi, \\[12pt] \displaystyle \frac{\partial V}{\partial t} + \mathbf{V}\cdot\nabla V = [B,\psi] + \mu\nabla^{2}V - \mu h \nabla^{4}V,\\[12pt] \displaystyle \frac{\partial \psi}{\partial t} + \mathbf{V}\cdot\nabla \psi = d_{\mathrm{i}}[\psi,B] + \eta\nabla^{2}\psi - \nu \nabla^{4}\psi,\\[12pt] \displaystyle \frac{\partial B}{\partial t} + \mathbf{V}\cdot\nabla B = [V,\psi] + d_{\mathrm{i}}[\nabla^{2}\psi,\psi] + \eta\nabla^{2}B - \nu \nabla^{4}B. \end{array}\right. \end{equation} \tag{ 1 }$

Here, ϕ and ψ are stream functions for the in-plane components of the ion velocity and magnetic field, respectively, and V and B are z components of the ion velocity and magnetic field, respectively. Hence, the ion velocity and magnetic field are expressed as $\mathbf {V} = \nabla \phi \times \hat {z} + V\hat {z}$ and $\mathbf {B} = \nabla \psi \times \hat {z} + B\hat {z}$ , where η is the electrical resistivity, d_i is the collisionless ion skin depth, μ is the fluid viscosity, ν is the hyper-resistivity (or electron viscosity) and h is the hyperviscosity coefficient added to damp spurious oscillations that might otherwise develop. It must be verified that the physical results converge to a unique value independent of those coefficients over some range [18]. The Poisson bracket $[f,g] \equiv \nabla f \times \nabla g \cdot \hat {z} = \frac {\partial f}{\partial x}\frac {\partial g}{\partial y} - \frac {\partial f}{\partial y}\frac {\partial g}{\partial x}$ , V·∇f = −[ϕ,f] (f = ∇²ϕ, V , ψ, B), and the out-of-plane current density j is the negative Laplacian of the magnetic flux j = −∇²ψ.

It has been shown that equations (1) are valid in the low guide-field limit in which Whistler waves are the dominant two-fluid effect [17] and that a very similar set of equations is valid in the high guide-field limit in which the kinetic Alfvén wave is prominent [16]. Thus, we take equations (1) to be typical of the extended MHD equations in two dimensions. The hyper-viscosity term is present just to damp grid scale oscillations. However, the hyper-resistivity term is necessary for the equations to be mathematically well behaved in the neighborhood of the reconnection layer. It has been shown that a unique converged result will be obtained if the hyper-resistivity decreases as the square of the typical zone size h² as h approaches 0 [19, 33].

The finite difference approximation to equations (1) is obtained by applying standard second-order space-centered finite difference operators in spatial discretization and variable order BDF methods in temporal discretization [1]. Our physical domain is $\Omega = [-\frac {L_{x}}{2},\frac {L_{x}}{2}]\times [-\frac {L_{y}}{2},\frac {L_{y}}{2}]$ , L_x = 25.6, L_y = 12.8 [18] with periodic boundary conditions in the x-direction and Dirichlet boundary conditions at $y=\pm \frac {L_{y}}{2}$ . However, we take the first quadrant $\tilde {\Omega } = [0,\frac {L_{x}}{2}]\times [0,\frac {L_{y}}{2}]$ as the computational domain and solutions in Ω are obtained by mirroring solutions from this first quadrant. This is because ϕ,B in equations (1) are anti-symmetric along the x-axis and y-axis and V,ψ are symmetric along the x-axis and y-axis. Therefore, boundary conditions of equations (1) in $\tilde {\Omega }$ become (i) Dirichlet at $y=\frac {L_{y}}{2}$ and (ii) anti-symmetric for ϕ and B and symmetric for ψ and V at y = 0, x = 0 and $x=\frac {L_{x}}{2}$ .

We define a Harris equilibrium and perturbation similar to that used in the geospace environmental modeling (GEM) magnetic reconnection challenge [34], and take it as the initial condition for ψ. The other three fields (ϕ,V,B) are initialized to zero:

$\begin{equation} \hskip-0.6in \psi(x,y,0) =\textstyle \frac{1}{2}\ln{\cosh{2y}} + \epsilon \cos{k_{x}x}\cos{k_{y}y},\quad k_{x} = \frac{2\pi}{L_{x}},\quad k_{y} = \frac{\pi}{L_{y}},\quad \epsilon=0.1. \end{equation} \tag{ 2 }$

The GEM initial conditions also included a perturbation in the fluid density, which we take to be constant in this four-field model.

3. The nonlinear solver: inexact Newton methods

The nonlinear partial differential equations (1) become a set of nonlinear algebraic equations through finite difference approximation: F(u) = 0 with F = (F_ϕ,F_V,F_ψ,F_B) and u = (ϕ,V,ψ,B) after spatial and temporal discretizations. When we advance the system in time with the notation (ϕ^k,V^k,ψ^k,B^k) and (ϕ^k−1,V^k−1,ψ^k−1,B^k−1) as the solutions for (ϕ,V,ψ,B) obtained at time level k, k − 1, respectively, a high-order BDF method requires sufficient solution history to be accumulated at the beginning of the time integration process. In our approach, the time integration process begins, starting from an initial guess at t = 0 with a BDF method of the order of one (backward Euler), gradually increasing the order up to a desired value as more and more solution history becomes available⁷.

For the residual evaluation system F(u) = 0, we can do multivariable Taylor expansion about a current point u^m:

$\begin{eqnarray*} &&\mathbf{F}(\mathbf{u}^{m+1}) = \mathbf{F}(\mathbf{u}^{m}) + \mathbf{F}'(\mathbf{u}^{m+1} - \mathbf{u}^{m}) + \textrm{h.o.t.}, \end{eqnarray*}$

where h.o.t. means higher-order terms. Let the right-hand side be zero and neglect h.o.t. to derive a strict Newton iteration over a sequence of linear systems:

$\begin{equation} \left\{\begin{array}{@{}l@{}} \mathbf{J}(\mathbf{u}^{m})\delta\mathbf{u}^{m} = -\mathbf{F}(\mathbf{u}^{m})\\[8pt] \mathbf{u}^{m+1} = \mathbf{u}^{m} + \delta \mathbf{u}^{m},\quad m = 0,1,\ldots, \end{array}\right. \end{equation} \tag{ 3 }$

for a given u⁰. Here, F(u) is a vector-valued function of nonlinear residuals, $\mathbf {J} = \frac {\partial \mathbf {F}}{\partial \mathbf {u}}$ is its Jacobian matrix, u is the vector to be found and m is the nonlinear iteration index.

We posit that the vector-valued function F(u^m) has the following properties: there exists an u* with F(u*) = 0; F is continuously differentiable in the neighborhood of u*; and the Jacobian matrix J(u*) is nonsingular.

The Newton iteration stops based on a required drop in the norm of the nonlinear residual

$\begin{eqnarray*} &&\|\mathbf{F}(\mathbf{u}^{m})\| < \epsilon_{\mathrm{r}}\|\mathbf{F}(\mathbf{u}^{0})\|, \end{eqnarray*}$

and/or a sufficiently small Newton update

$\begin{eqnarray*} &&\|\delta \mathbf{u}^{m}\| < \epsilon_{\mathrm{s}}. \end{eqnarray*}$

Newton methods are attractive because they converge rapidly from any sufficiently good initial guess u⁰. In transient problems, a good initial guess at each stage is the solution at the previous stage. However, there is a drawback of Newton's method when solving the Newton correction (Newton update equation) at each stage: it is very expensive to compute the exact solution using a direct method such as Gaussian elimination if the number of unknowns is large and may not be justified when u^m is far from u*. Therefore, it is reasonable to use iterative methods to solve the Newton equation only approximately.

There is a class of inexact Newton methods that computes an approximate solution of Newton equations in a manner such that

$\begin{eqnarray*} &&\frac{\|\mathbf{r}^{m}\|}{\|\mathbf{F}(\mathbf{u}^{m})\|} \leqslant \eta^{m}, \end{eqnarray*}$

where the residual r^m is given by

$\begin{eqnarray*} &&\mathbf{r}^{m} \equiv \mathbf{J}(\mathbf{u}^{m})\delta\mathbf{u}^{m} + \mathbf{F}(\mathbf{u}^{m}), \end{eqnarray*}$

and the non-negative forcing sequence {η^m} is used to control the level of accuracy [30].

Then the inexact Newton method is

$\begin{equation} \left\{\begin{array}{@{}l@{}} \mathbf{J}(\mathbf{u}^{m})\delta\mathbf{u}^{m} = -\mathbf{F}(\mathbf{u}^{m}) + \mathbf{r}^{m},\quad \textrm{where} \;\frac{\|\mathbf{r}^{m}\|}{\|\mathbf{F}(\mathbf{u}^{m})\|} \leqslant \eta^{m}, \\[8pt] \mathbf{u}^{m+1} = \mathbf{u}^{m} + \delta \mathbf{u}^{m},\quad m = 0,1,\ldots. \end{array}\right. \end{equation} \tag{ 4 }$

Here η^m may depend on u^m. When η^m = 0, we recover the Newton method.

4. The parallel domain decomposition Schur complement-based linear solver

At each Newton (nonlinear) iteration, we need to solve linear system (3) for Newton methods or (4) for inexact Newton methods. In this section, we introduce the parallel domain decomposition Schur complement-based linear solver PDSLin.

The hybrid linear software library PDSLin is designed to solve a large-scale linear system:

$\begin{equation} Ax = b, \end{equation} \tag{ 5 }$

where A is a square real or complex general matrix, b is a given right-hand-side vector and x is the solution vector. It uses a non-overlapping domain decomposition technique called the Schur complement method [35]. The original linear system is first reordered into a 2 × 2 block system of the following form:

$\begin{equation}\left( \begin{array}{@{}cc@{}} A_{11} & A_{12}\\ A_{21} & A_{22} \end{array}\right) \left( \begin{array}{@{}c@{}} x_{1} \\ x_{2} \end{array}\right) = \left( \begin{array}{@{}c@{}} b_{1}\\ b_{2} \end{array}\right), \end{equation} \tag{ 6 }$

where A₁₁ are interior subdomains, A₂₂ are separators, and A₁₂ and A₂₁ are the interfaces between A₁₁ and A₂₂. To eliminate the unknowns associated with A₁₁, an equivalent system

$\begin{equation}\left( \begin{array}{@{}cc@{}} A_{11} & A_{12}\\ 0 & S \end{array}\right) \left( \begin{array}{@{}c@{}} x_{1} \\ x_{2} \end{array}\right) = \left( \begin{array}{@{}c@{}} b_{1}\\ \hat{b}_{2} \end{array}\right) \end{equation} \tag{ 7 }$

is obtained. In this system, the Schur complement S is defined as

$\begin{equation} S = A_{22} - A_{21}A_{11}^{-1}A_{12}, \end{equation} \tag{ 8 }$

and the right-hand side

$\begin{equation} \hat{b}_{2} = b_{2} - A_{21}A_{11}^{-1}b_{1}. \end{equation} \tag{ 9 }$

The solution of the global system can be achieved by solving the Schur complement system

$\begin{equation} S x_{2} = \hat{b}_{2} \end{equation} \tag{ 10 }$

first and then solving the interior system

$\begin{equation} A_{11} x_{1} = b_{1} - A_{12} x_{2}. \end{equation} \tag{ 11 }$

If k interior subdomains are extracted, the coefficient matrix of (6) has the form

$\begin{equation} \left( \begin{array}{@{}ccc@{}} A_{11} & \vline & A_{12} \\ \hline A_{21} & \vline & A_{22} \end{array}\right) = \left( \begin{array}{@{}cccccc@{}} D_{1} & & & & \vline & E_{1}\\ & D_{2} & & & \vline & E_{2}\\ & & \ddots & & \vline & \vdots\\ & & & D_{k} & \vline & E_{k}\\ \hline F_{1} & F_{2} & \cdots & F_{k} & \vline & A_{22} \end{array}\right), \end{equation} \tag{ 12 }$

where D_l is the lth subdomain and E_l and F_l are the interfaces between D_l and A₂₂. There are two processor groups g_l and g_s: processors in g_l factorize the subdomain D_l and rows of D_l, E_l and columns of F_l are distributed among these processors⁸; processors in g_s solve the Schur complement system (10).

In parallel, the Schur complement S in (8) is computed as

$\begin{equation} \hskip-0.6in S = A_{22} - \sum_{l=1}^{k}F_{l}D_{l}^{-1}E_{l} = A_{22} - \sum_{l=1}^{k}(U_{l}^{-\mathrm{T}}F_{l}^{\mathrm{T}})^{\mathrm{T}}(L_{l}^{-1}E_{l}) = A_{22} - \sum_{p=1}^{n_{p}}W^{(p)}G^{(p)}, \end{equation} \tag{ 13 }$

for an LU factorization D_l = L_lU_l. Here, n_p is the number of cores used to solve the entire system, and the matrices W^(p) and G^(p) are given by⁹

$\begin{equation} W^{(p)} = W(:,j_{p}:(j_{p+1}-1)),\quad G^{(p)} = G(j_{p}:(j_{p+1}-1),:), \end{equation} \tag{ 14 }$

such that the pth processor owns the j_pth through (j_p+1 − 1)th columns of W = (U^−T₁₁A^T₂₁)^T and rows of G = L⁻¹₁₁A₁₂, where A₁₁ = L₁₁U₁₁ given by L_l and U_l, l = 1,...,k.

When computing W^(p) and G^(p), their approximations $\tilde {W}^{(p)}$ and $\tilde {G}^{(p)}$ are calculated by discarding nonzeros with magnitudes less than a prescribed drop tolerance σ₁, and the approximate update matrix $\tilde {T}^{(p)} =\tilde {W}^{(p)}\tilde {G}^{(p)}$ . If the pth processor belongs to the processor group g_s, to compute its local portion of the approximate Schur complement, it gathers the corresponding rows of $\tilde {T}^{(q)}$ from all processors, and computes

$\begin{eqnarray*} &&\hat{S}^{(p)} = A_{22}^{(p)} - \sum_{q}\tilde{T}^{(q)}(i_{p}:(i_{p+1}-1),:), \end{eqnarray*}$

where the pth processor owns the i_pth through the (i_p+1 − 1)th row of A₂₂. Moreover, small nonzeros are dropped from $\hat {S}^{(p)}$ to form its approximation $\tilde {S}^{(p)}$ with dropping tolerance σ₂. In the process of computing the approximate Schur complement $\tilde {S}$ , several performance-enhancing techniques are employed [3].

There are three main phases: extracting and factorizing the interior subdomains A₁₁; computing an approximate Schur complement $\tilde {S}$ of S in (8); and computing the solution. During those phases, a challenge exists to develop such a robust, efficient and general-purpose hybrid solver for thousands of processors with a parallel implementation.

When the parallel nested dissection algorithm implemented in PT-SCOTCH [36] is used to extract interior subdomains, multiple processors are assigned to each interior subdomain to allow us to increase the processor count without increasing either the number of subdomains or the size of the Schur complement. This two-level parallel approach is different from the general one-level approach¹⁰ and does not need a large number of subdomains to use large numbers of processors; therefore, the size of the Schur complement does not increase. To compute an approximate Schur complement as a preconditioner, we have to deal with the load imbalance and communication both in an intra-processor group assigned to the same subdomain and in the inter-processor groups assigned to different subdomains [3]. When computing the solution, a preconditioned Krylov method in PETSc [31] is used to solve the Schur complement system, and the preconditioner is the exact LU factorization of the approximate Schur complement $\tilde {S}$ via SuperLU [37]. Finally, the interior systems are solved in parallel, using the already computed LU factors of the subdomains.

The most challenging phase of the parallel implement is to compute an approximate Schur complement $\tilde {S}$ , especially for a two-level parallel framework, where multiple processors are assigned to one subdomain. The benefit of using such a two-level approach is to limit the size of the Schur complement when using thousands of processors; however, it is hard to deal with the load imbalance and communication in these two aspects: an intra-processor group assigned to the same subdomain and the inter-processor groups assigned to different subdomains.

There are two other hybrid linear solvers, HIPS [38] and MaPHyS [39], that are also based on the Schur complement methods. However, they have different approaches in parallel implementations. The one-level approach is used in HIPS, where multiple subdomains are assigned to a single processor, and PDSLin and MaPHyS use the two-level approach. As a result, HIPS has a larger Schur complement system when the number of cores increases. The Schur complement system is treated as a global system in PDSLin and HIPS, but a local system¹¹ in MaPHyS. When solving the Schur complement system, HIPS uses the block level-based ILU and MaPHyS uses additive Schwarz as preconditioners; PDSLin uses the LU of the approximation of the Schur complement $\tilde {S}$ as the preconditioner, which provides a robust preconditioning for solving highly indefinite or ill-conditioned systems.

5. Numerical results

Our numerical experiments are carried out on the Cray XE6 Hopper, a leading petascale supercomputing system at the National Energy Research Scientific Computing Center (NERSC). Hopper is NERSC's first petascale system with a peak performance of 1.28 Petaflops s⁻¹, 153 216 processor cores for running scientific applications, 212 TB of memory and 2 Petabytes of online disk storage. Hopper has 6384 compute nodes made up of two twelve-core AMD 'MagnyCours' 2.1 GHz processors per node, of which 6000 nodes have 32 GB DDR3 1333 MHz memory per node and 384 nodes have 64 GB DDR3 1333 MHz memory per node (www.nersc.gov/systems/hopper-cray xe6/). All calculations were carried out with a 64 bit arithmetic.

There are five physical parameters: the electrical resistivity η, the fluid viscosity μ, the hyper-viscosity h = C₁h_xh_y, the hyper-resistivity ν = C₂h_xh_yη and the collisionless ion skin depth d_i. The cell size is h_x × h_y. We choose as default the values used given in the GEM problem specification: η = 0.005, μ = 0.05, C₁ = 4.0, C₂ = 1.0 and d_i∈[0,1] [18, 34].

For the nonlinear solver: the relative convergence tolerance _r = 10⁻⁸; the convergence tolerance in terms of the norm of the change in the solution between steps _s = 10⁻⁷.

5.1. Condition numbers

Starting from initial states (2), the system (1) evolves in time. When the collisionless ion skin depth number d_i = 0.0 (resistive MHD), because ϕ = V =B = 0.0 at t = 0, the second and fourth equations in (1) imply that V and B remain unchanged as time advances, and the out-of-plane current density has a large gradient in the mid-plane. As we expected, there is a thin current layer in the mid-plane, known as the Sweet–Parker layer [40, 41] and no x-point shows up. As d_i changes from 0.0 to 1.0, the reconnection region has essentially changed character from a y-point to an x-point as expected [42] (see figure 1). For the system (1), the reconnected magnetic flux is defined as $\Psi (t) = \frac {1}{2}[\psi (0,0,t) - \psi (L_{x}/2,0,t)]$ , and the reconnection rate is the time derivative of Ψ(t): R(t) = ∂Ψ(t)/∂t.

**Figure 1.** The plots of the negative current density −j in $\Omega = [-\frac {L_{x}}{2},\frac {L_{x}}{2}]\times [-\frac {L_{y}}{2},\frac {L_{y}}{2}]$ at time t = 0.0 (left), time t = 40.0 with d_i = 0.0 (middle) and time t = 40.0 with d_i = 1.0 (right).
Download figure:
Standard image

**Figure 1.** The plots of the negative current density −j in $\Omega = [-\frac {L_{x}}{2},\frac {L_{x}}{2}]\times [-\frac {L_{y}}{2},\frac {L_{y}}{2}]$ at time t = 0.0 (left), time t = 40.0 with d_i = 0.0 (middle) and time t = 40.0 with d_i = 1.0 (right).
Download figure:
Standard image

The implicit time advance for this two-fluid magnetic reconnection problem is known to be difficult because of the large condition number of the associated matrix. This is especially troublesome when the collisionless ion skip depth d_i is large so that the Whistler waves, which cause the fast reconnection, dominate the physics [1].

Figures 2(a) and (b) show comparisons between Ψ(t) and R(t) for the collisionless ion skin depth d_i = 0.1, 0.5 and 1.0 with dt = 0.1. When d_i = 1.0, the reconnection rate reaches its maximum at R(t = 26.6) = 0.03. Figure 2(c) shows comparisons of condition numbers of different d_i = 0.0, 0.2, 0.4, 0.6, 0.8 and 1.0 with dt = 1.0. When d_i = 1.0, the condition number reaches its maximum of 1.95 × 10⁷ at t = 11.0.

**Figure 2.** The reconnected magnetic flux Ψ(t) (a), the reconnection rate R(t) (b) for d_i = 0.1, 0.5, 1.0 with dt = 0.1 and the condition numbers (c) for d_i = 0.0, 0.2, 0.4, 0.6, 0.8 and 1.0 with dt = 1.0 from time t = 0.0 to T = 40.0 for η = 0.005, μ = 0.05, C₁ = 4.0 and C₂ = 1.0 on a 128 × 128 grid.
Download figure:
Standard image

Moreover, the condition number increases as the problem size increases or the time step size increases [1]. Table 1 lists condition numbers of associated matrices of the time-dependent nonlinear system for different problem sizes. The problem size (N_x × N_y), the size of the associated matrix (size(A)), the nonzeros in the matrix (nnz(A)), the nonzeros in the matrix L + U (nnz(L + Z)), the fill ratio, the condition number (cond(A)) and the memory usage (mem) in Gigabytes are listed in the table. As the problem size increases, the fill ratio increases; therefore the memory usage increases. The maximum memory per node on Hopper is 64 GB, which is not enough for evaluating the condition number for the associated matrix for the 1024 × 1024 size problem: if the fill ratio is 100, the estimate requirement of memory is about 98 GB. The sequential direct solver package SuperLU $\underline {\ }4.3$ [43] is used to obtain these condition numbers.

Table 1. The condition numbers of associated matrices of the magnetic reconnection problem for t = 0.0 with dt = 0.5, d_i = 1.0, η = 0.005, μ = 0.05, C₁ = 4.0 and C₂ = 1.0.

N_x×N_y	size(A)	nnz(A)	nnz(L+U)	Fill ratio	cond (A)	mem (GB)
64×64	16 384	358 674	13 698 180	38	6.30×10⁴	0.15
128×128	65 536	1470 802	71 554 872	48	1.48×10⁶	0.88
256×256	262 144	5956 050	401 569 028	67	2.41×10⁷	5.00
512×512	1048 576	23 970 514	2133 808 772	89	3.93×10⁸	28.54

Two matrices are chosen for numerical experiments: the first one has a size of 1048 576 with 23 970 514 nonzeros, and we call the related linear system as mcomp; the other one has a size of 4194 304 with 96 175 314 nonzeros, and we call the related linear system as bcomp.

5.2. The hybrid linear solver: the parallel domain decomposition Schur complement-based linear solver

In this section, we use the PDSLin library as the linear solver for the associated matrices in the magnetic reconnection problem and present some numerical results and parallel performance of this hybrid linear solver. In PDSLin, the SuperLU DIST 2.4 [37] is used as a direct solver for interior subdomains and the Schur complement systems are solved using a preconditioned Krylov method in PETSc¹² [31]. The stopping criterion for the Krylov solver is to check the l₂-norm of the initial residual; if it is reduced by at least 10⁻¹², we consider the solution to be converged¹³.

The first experiment is to compare the total time required by the direct solver and the hybrid solver (both the one-level parallel framework and the two-level parallel framework) to solve the mcomp linear system. In the one-level approach, the number of the interior subdomains k is set to be the same as the total number of cores, the cores used for solving the Schur complement system is half of the total number of cores; in the two-level approach, the number of the interior subdomains is fixed at 32, and the processors are distributed equally among all interior subdomains.

To enhance the performance of the hybrid solver, the drop tolerance σ₁ is used to enforce the sparsity of $\tilde {E}$ and $\tilde {F}$ , and the drop tolerance σ₂ is used to enforce the sparsity of $\tilde {S}$ . Figure 3 shows comparisons of the total time with σ₁ = 10⁻⁶, σ₂ = 10⁻⁵ and the number of cores n_p = 32,64,...,4096: PDSLin scales better than SuperLU DIST, and the scaling of the one-level and two-level approaches is similar. For example, SuperLU DIST does not scale after 128 cores, while PDSLin scales well till 512 cores for the one-level approach and 2048 cores for the two-level approach.

**Figure 3.** The total time required to solve the linear system *mcomp*: the direct solver SuperLU DIST 3.0 (black dotted line), the one-level approach (red solid line) and the two-level approach (blue dashed line). The tolerances σ₁ = 10⁻⁶ and σ₂ = 10⁻⁵.
Download figure:
Standard image

Although a similar scaling of the one-level and two-level approaches is observed here, it is not always true. For matrices in other applications [3], the two-level approach has a better scaling than the one-level approach. In the one-level approach, the number of the interior subdomains k increases as the total number of cores increases; thus the size of the Schur complement increases. In general, the iterative solver has difficulty in convergence in large-size Schur complement systems: the iteration number increases as the size of the Schur complement increases. Thus, for most matrices, the scaling of the one-level approach is not as good as that of the two-level approach, where the two-level approach has a fixed number of subdomains, and a fixed size of the Schur complement as the number of cores increases.

In the two-level approach for solving the mcomp linear system, the size of the Schur complement is 38 056 × 38 056 with average nonzeros 303 189 610 for a fixed 32 subdomains, and the number of iterations is 70; in the one-level approach for solving the mcomp linear system, the size of the Schur complement increases. In table 2, the number of subdomains (k: equivalent to the number of cores n_p), the size of the Schur complement (size), the nonzeros (nnz) and the iteration numbers of the iterative solver for the Schur complement system (its) are listed.

Table 2. The Schur complement system in the one-level approach: the number of subdomains, the size of the Schur complement, the number of nonzeros and the iteration numbers in the iterative solver.

k	size	nnz( $LU(\tilde {S})$ )	its	k	size	nnz( $LU(\tilde {S})$ )	its
32	38 056	280 975 663	70	512	128 236	575 865 038	62
64	53 610	365 279 060	85	1024	178 925	671 580 408	75
128	80 780	554 788 925	124	2048	251 229	788 066 929	80
256	110 672	637 663 812	120	4096	346 378	860 595 342	93

The nonzeros in the Schur complement increase more slowly than the increment of the size of the Schur complement: while the size increases 128 times from 32 to 4096, the size of the Schur complement only increases about nine times from 38 056 to 346 378, and the nonzeros only increases about three times from 280 975 663 to 860 595 342. Moreover, there is a dip in nonzeros when the number of cores increases from 256 to 512; therefore, the iterations required decrease from 120 to 62. These explain why the one-level approach and the two-level approach have a similar scaling in the present reconnection model.

A similar scaling is also found in solving the bcomp linear system in both the one-level and two-level approaches. Moreover, we check the times (in seconds) for the LU decomposition of the interior subdomain D_l (LU(D)), the computation of the approximate Schur complement $\tilde {S}$ (comp(S)), the LU decomposition of $\tilde {S}$ (LU(S)), and the computation of the solution vector (Solve) for bcomp. Figure 4 shows seven cases where the number of cores n_p = 32,128,...,2048 for both one-level and two-level approaches for solving the bcomp linear system.

**Figure 4.** The time for LU factorization of the interior domains LU(D), computation of the approximate Schur complement comp(S), LU of the approximate Schur complement LU(S), solution of the system for the one-level (a) and the two-level (b) approaches for *bcomp*.
Download figure:
Standard image

In general, LU(S) increases with the number of cores because doubling the number of cores doubles the number of subdomains and typically doubles the size of the Schur complement. Hence, LU(S) typically increases when twice as many cores are used on a twice larger Schur complement. This is unfortunate because one-level parallelization will not scale on a large number of cores. However, we can use two-level parallelism to scale to a larger number of cores by fixing a small number of subdomains. We can see from figure 4(a) (the one-level parallelization for bcomp) and figure 4(b) (the two-level parallelization for bcomp) that all the times are scaling well, and the good scaling of LU(S) allows one-level parallelization to scale to thousands of cores.

6. Conclusions

In this paper, we have presented implicit numerical methods in a BDF–Newton solution framework as the nonlinear solver for the four-field extended MHD equations (the magnetic reconnection problem). Two different types of linear solvers are compared for the linear system (the Newton update equations): the direct solver and the hybrid solver. Numerical experiments show that in solving the linear system associated with the magnetic reconnection problem, the parallel hybrid solver (both the one-level parallelization and the two-level parallelization) can scale up to thousands of processors. Moreover, the one-level parallelization can scale as well as the two-level parallelization, which is usually difficult to see from other linear systems, for example the linear systems from the numerical simulation of an accelerator cavity design [3]. The flexibility of having both the one-level parallelization and the two-level parallelization enables PDSLin to solve linear systems with different properties. When solving the nonlinear system in time, we expect that PDSLin can provide a better scaling than the direct solver while maintaining the same robustness as a direct solver.

Acknowledgments

The majority of the work described in this paper was supported by the Petascale Initiative in Computational Science which is funded by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research under contract No. DE-AC02-05CH11231 at the National Energy Research Scientific Computing Center (NERSC) and the Center for Simulation of RF Wave Interactions with Magnetohydrodynamics (CSWIM), which is funded by the US Department of Energy, the Office of Science. Computational resources were provided by the NERSC, which is supported by the Office of Science of the US Department of Energy under contract No. DE-AC02-05CH11231.

Application of PDSLin to the magnetic reconnection problem

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. The mathematical model for the magnetic reconnection problem

3. The nonlinear solver: inexact Newton methods

4. The parallel domain decomposition Schur complement-based linear solver

5. Numerical results

5.1. Condition numbers

5.2. The hybrid linear solver: the parallel domain decomposition Schur complement-based linear solver

6. Conclusions

Acknowledgments

Footnotes

Application of PDSLin to the magnetic reconnection problem

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. The mathematical model for the magnetic reconnection problem

3. The nonlinear solver: inexact Newton methods

4. The parallel domain decomposition Schur complement-based linear solver

5. Numerical results

5.1. Condition numbers

5.2. The hybrid linear solver: the parallel domain decomposition Schur complement-based linear solver

6. Conclusions

Acknowledgments

Footnotes