FastQSL: A Fast Computation Method for Quasi-separatrix Layers

PeiJin Zhang; Jun Chen; Rui Liu; ChuanBing Wang

doi:10.3847/1538-4357/ac8d61

1. Introduction

Chromosphere flare ribbons often coincide with the footprints of separatrices or quasi-separatrix layers (QSLs) (Priest & Démoulin 1995; Démoulin et al. 1996, 1997), which embed the favorable sites for 3D magnetic reconnection, such as null points, separators (intersection of two separatrices), and quasi-separators (intersection of two QSLs) of the magnetic field (Priest & Forbes 2000; Pontin 2011), where a large gradient of magnetic connectivity is present.

The squashing factor Q quantifies the connectivity change of magnetic field lines (Titov et al. 2002; Titov 2007); Q is defined through a mapping from one surface S₁, which a magnetic field line threads at (x₁, y₁), to another surface S₂, which the field line threads at (x₂, y₂);

$\begin{eqnarray}&&\mathop{{\rm{\Pi }}}\limits_{1\,2}:({x}_{1},{y}_{1})\to ({x}_{2},{y}_{2}).\end{eqnarray} \tag{ 1 }$

The Jacobian matrix of differential mapping is expressed as

$\begin{eqnarray}\mathop{D}\limits_{1\,2}=\left(\begin{array}{cc}\tfrac{\partial {x}_{2}}{\partial {x}_{1}} & \tfrac{\partial {x}_{2}}{\partial {y}_{1}}\\ \tfrac{\partial {y}_{2}}{\partial {x}_{1}} & \tfrac{\partial {y}_{2}}{\partial {y}_{1}}\end{array}\right)\equiv \left(\begin{array}{cc}a & b\\ c & d\end{array}\right).\end{eqnarray} \tag{ 2 }$

A full description of Q that considers the variance of the covariant metric tensor on S₁ and S₂ is defined by Equations (11), (12), and (14) in Titov (2007). If S₁ and S₂ are the two boundaries of a cuboid coordinated with the same Cartesian coordinate system, the value of Q at (x₁, y₁) is

$\begin{eqnarray}&&Q\left({x}_{1},{y}_{1}\right)=\displaystyle \frac{{a}^{2}+{b}^{2}+{c}^{2}+{d}^{2}}{\left|\det \,\mathop{D}\limits_{1\,2}\right|}.\end{eqnarray} \tag{ 3 }$

$\left|\det \,\mathop{D}\limits_{1\,2}\right|$ is often replaced by its equivalence $\left|{B}_{n,1}/{B}_{n,2}\right|$ for mitigating numerical error, where

$\begin{eqnarray}&&{B}_{n,1}={\left.({\boldsymbol{B}}\cdot {\boldsymbol{n}})\right|}_{{S}_{1}}\end{eqnarray} \tag{ 4 }$

is the component normal to S₁ of (x₁, y₁) and n is the normal unit vector of S₁; the form is similar for B_n,2. Since Titov et al. (2002) proved $Q\left({x}_{1},{y}_{1}\right)=Q\left({x}_{2},{y}_{2}\right)$ , Aulanier et al. (2005) expanded the definition of Q into 3D space by

$\begin{eqnarray}&&{\boldsymbol{B}}\cdot {\rm{\nabla }}Q=0,\end{eqnarray} \tag{ 5 }$

i.e., values of Q are invariant along a field line. Separatrices are located where Q = ∞, and QSLs are located where Q ≫ 2, the theoretical minimum of Q. Along with tracing field lines, the twist number, a measure of how many turns two infinitesimally close field lines wind about each other, can be calculated without much additional effort by Equation (16) in Berger & Prior (2006)

$\begin{eqnarray}&&{{ \mathcal T }}_{w}={\int }_{L}^{}\displaystyle \frac{{\rm{\nabla }}\times {\boldsymbol{B}}\cdot {\boldsymbol{B}}}{4\pi {B}^{2}}{dl},\end{eqnarray} \tag{ 6 }$

where the integral range L is a segment of a magnetic field line.

In this work, we take advantage of graphics processing unit (GPU) computing, which is more efficient and economic compared to traditional central processing unit (CPU) computing (Portegies Zwart 2020), to obtain the 3D distribution of Q and ${{ \mathcal T }}_{w}$ with high efficiency. Feng et al. (2013) achieved an acceleration ratio of about 8 times by rewriting the model into a GPU compatible form for the magnetohydrodynamics (MHD) simulation of space weather. Caplan et al. (2019) accelerated the solar MHD code based on OpenAcc; the GPU version has about 3 times the efficiency of the CPU version, with a comparable cost level of hardware. Tassev & Savcheva (2017) have implemented the computation of QSLs with a GPU based on OpenCL, and achieved the efficiency of obtaining a representative 3D QSL map within a few hours.

The rest of this paper is arranged as follows: in Section 2, the algorithm and program structure is presented. In Section 3, we use the extrapolated potential field from a solar active region on 2010 October 16 19:00 UT (AR11112), an analytical quadrapole field, and an analytical field from Titov & Démoulin (1999; TD99 model) to demonstrate the method. In Section 4, we present detailed benchmarks and comparisons for different algorithms and computation architectures. In Section 5, we discuss and summarize the result.

2. Method

FastQSL is developed from the published source code by Liu et al. (2016), hereafter Code2016.⁸

2.1. Calculation of Q

In the computation of Q, the essential and most computational consuming step is numerically deriving magnetic field lines by solving the equation

$\begin{eqnarray}&&\displaystyle \frac{d\,{\boldsymbol{r}}(s)}{d\,s}=\displaystyle \frac{{\boldsymbol{B}}}{B},\end{eqnarray} \tag{ 7 }$

where s is the arc length coordinate of a field line, and r (s) is the coordinates function of the field line.

To accurately map the field-line footpoints on a surface, one must solve the field-line equation in high precision. In Code2016, Equation (7) is integrated by the classic RK4. We terminate the integration where it goes outside the data cube, and then move one step back to get the field-line coordinates at the boundary.

$\mathop{D}\limits_{1\,2}$ in Equation (2) is derived from the changes in the mapped footpoint coordinates with respect to the footpoints of neighboring field lines. Q on a cross section can be calculated with the method 3 introduced in Pariat & Démoulin (2012; hereafter Method I), which introduces an auxiliary cross section S₀(x₀, y₀) to obtain

$\begin{eqnarray}\mathop{D}\limits_{1\,2}=\left(\begin{array}{cc}\tfrac{\partial {x}_{2}}{\partial {x}_{0}} & \tfrac{\partial {x}_{2}}{\partial {y}_{0}}\\ \tfrac{\partial {y}_{2}}{\partial {x}_{0}} & \tfrac{\partial {y}_{2}}{\partial {y}_{0}}\end{array}\right)\left(\begin{array}{cc}\tfrac{\partial {x}_{0}}{\partial {x}_{1}} & \tfrac{\partial {x}_{0}}{\partial {y}_{1}}\\ \tfrac{\partial {y}_{0}}{\partial {x}_{1}} & \tfrac{\partial {y}_{0}}{\partial {y}_{1}}\end{array}\right),\end{eqnarray} \tag{ 8 }$

and

$\begin{eqnarray}\begin{array}{rcl}\left(\begin{array}{cc}\tfrac{\partial {x}_{0}}{\partial {x}_{1}} & \tfrac{\partial {x}_{0}}{\partial {y}_{1}}\\ \tfrac{\partial {y}_{0}}{\partial {x}_{1}} & \tfrac{\partial {y}_{0}}{\partial {y}_{1}}\end{array}\right) & = & {\left(\begin{array}{cc}\tfrac{\partial {x}_{1}}{\partial {x}_{0}} & \tfrac{\partial {x}_{1}}{\partial {y}_{0}}\\ \tfrac{\partial {y}_{1}}{\partial {x}_{0}} & \tfrac{\partial {y}_{1}}{\partial {y}_{0}}\end{array}\right)}^{-1}\\ & = & \left(\begin{array}{cc}\tfrac{\partial {y}_{1}}{\partial {y}_{0}} & -\tfrac{\partial {x}_{1}}{\partial {y}_{0}}\\ -\tfrac{\partial {y}_{1}}{\partial {x}_{0}} & \tfrac{\partial {x}_{1}}{\partial {x}_{0}}\end{array}\right)/\left|{B}_{n,0}/{B}_{n,1}\right|,\end{array}\end{eqnarray} \tag{ 9 }$

where B_n,0 is the component normal to S₀, which has a similar form to Equation (4). Method I is used in Code2016.

Tassev & Savcheva (2017) published their code QSL Squasher ⁹ that achieved a high efficiency to identify QSLs, and Scott et al. (2017) gave a detailed analysis of the code. Taking U , V as a pair of orthonormal unit vectors on S₀, Scott et al. (2017) then proposed a method of obtaining Q without the information of neighboring mapping coordinates by solving

$\begin{eqnarray}&&\displaystyle \frac{d\left\{{\boldsymbol{r}},{\boldsymbol{U}},{\boldsymbol{V}}\right\}}{{ds}}=\left\{\displaystyle \frac{{\boldsymbol{B}}}{B},{\boldsymbol{U}}\cdot {\rm{\nabla }}\displaystyle \frac{{\boldsymbol{B}}}{B},{\boldsymbol{V}}\cdot {\rm{\nabla }}\displaystyle \frac{{\boldsymbol{B}}}{B}\right\},\end{eqnarray} \tag{ 10 }$

and they proved that

$\begin{eqnarray}&&Q=\displaystyle \frac{{\tilde{{\boldsymbol{U}}}}_{1}^{2}\,{\tilde{{\boldsymbol{V}}}}_{2}^{2}+{\tilde{{\boldsymbol{U}}}}_{2}^{2}\,{\tilde{{\boldsymbol{V}}}}_{1}^{2}-2({\tilde{{\boldsymbol{U}}}}_{1}\cdot {\tilde{{\boldsymbol{V}}}}_{1})({\tilde{{\boldsymbol{U}}}}_{2}\cdot {\tilde{{\boldsymbol{V}}}}_{2})}{{\left({B}_{n,0}\right)}^{2}/({B}_{n,1}\,{B}_{n,2})}\end{eqnarray} \tag{ 11 }$

is equivalent to Equation (3), where

$\begin{eqnarray}&&{\tilde{{\boldsymbol{U}}}}_{1}={\boldsymbol{U}}-{\left.\displaystyle \frac{{\boldsymbol{U}}\cdot {\boldsymbol{n}}}{{\boldsymbol{B}}\cdot {\boldsymbol{n}}}{\boldsymbol{B}}\right|}_{{S}_{1}};\end{eqnarray} \tag{ 12 }$

the form is similar for ${\tilde{{\boldsymbol{U}}}}_{2},{\tilde{{\boldsymbol{V}}}}_{1},{\tilde{{\boldsymbol{V}}}}_{2}$ . In this paper, the method of Scott et al. (2017) based on Equation (11) is referred to as Method II.

An alternative set of codes for calculating QSLs has been published¹⁰ (hereafter CodeYang); the first version still followed Method I and was first applied in Yang et al. (2015). As of 2018 October, CodeYang adopted Method II.

Different selections of S₁, S₂ will result in different values of Q even for the same start point on S₀ (see the example in Section 4.1 of Titov 2007). In Code2016 and FastQSL, S₁, S₂ by Method I are the boundaries where a field line terminates. Therefore Code2016 and FastQSL record these boundaries for every field line. If one locally rotates S₁, S₂ to be perpendicular to the magnetic field line, Equation (3) and Equation (11) will give Q_⊥ (Titov 2007). Q_⊥ removes the projection effect on boundaries, and therefore quantifies the property of volume QSL more precisely than Q does. The reason that Q is still often used rather than Q_⊥ is for its numerical simplicity. QSL Squasher and CodeYang set S₁, S₂ to always be perpendicular to B , therefore providing Q_⊥ only.

Method I requires tracing at least four neighboring field lines for the central difference of footpoint coordinates, resulting in a numerical difficulty in cases where four field lines have different S₁, S₂, which gives NaN in Code2016. Method I also has difficulty accurately applying the formula of Q_⊥ of Titov (2007), especially on a polarity inversion line (PIL). Method II can give Q and Q_⊥ directly without introducing the error of coordinate difference by tracing Equation (10) alone, but solving Equation (10) is less efficient than Equation (7) because of the need of calculating ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ at every step. In addition, since Q changes sharply around a separatrix, the high-Q positions could be inside cells whose surrounding grids still have low values of Q; then, these separatrix segments cannot be captured. In contrast, with Method I, Code2016 traces field lines at refined grids, and implements the central difference of the mapping coordinates in terms of refined grids for Equations (8) and (9); here the grid spacing is denoted as δ in Pariat & Démoulin (2012). Consequently, a separatrix can be captured with refined grids, and characterized by an extremely high value of Q. Briefly speaking, Method I has the advantage of locating the position of thin QSLs, especially separatrices (except that S₁, S₂ are not exactly the same for four neighboring field lines); Method II has the advantage of giving accurate values of Q and Q_⊥. These characteristics are shown in Section 3 (Figures 2 and 4). Since Q is mostly used to locate the position of QSLs and separatrices, Method I still has its advantage. FastQSL provides the option of both methods.

2.2. Magnetic Field at the Input

For Code2016 and FastQSL, the input magnetic field is assumed to be in Cartesian grids. Code2016 requires uniform grid spacings, while FastQSL additionally supports general stretched (but still rectilinear) grids. CodeYang and QSL Squasher can run in spherical coordinates; QSL Squasher can run in stretched grids.

We assume that the input magnetic field B is known at every 3D Cartesian grid [x_i, y_j, z_k]. Then the B at r = (x, y, z) in a cubic unit cell [x_i, x_i+1] × [y_j, y_j+1] × [z_k, z_k+1] is interpolated by

$\begin{eqnarray}&&\begin{array}{l}{{\boldsymbol{B}}}_{\mathrm{interp}}(x,y,z)\\ =\displaystyle \sum _{m=0}^{1}\displaystyle \sum _{n=0}^{1}\displaystyle \sum _{p=0}^{1}{\omega }_{x,m}\,{\omega }_{y,n}\,{\omega }_{z,p}\,{\boldsymbol{B}}({x}_{i+m},{y}_{j+n},{z}_{k+p}),\end{array}\end{eqnarray} \tag{ 13 }$

$\begin{eqnarray}&&{\omega }_{x,0}=\displaystyle \frac{{x}_{i+1}-x}{{x}_{i+1}-{x}_{i}},\end{eqnarray} \tag{ 14 }$

$\begin{eqnarray}&&{\omega }_{x,1}=1-{\omega }_{x,0},\end{eqnarray} \tag{ 15 }$

where ω_y,n, ω_z,p have similar forms to Equation (14) and Equation (15). For uniform grids, simply flooring {x, y, z}/spacing is {i, j, k} in Equation (13). While for stretched grids, we apply a binary search for the determination of i, j, k, which is much more time-consuming than the flooring, and the final performance is reduced to 45%–80% (depending on settings at the input) by this determination.

2.3. Tracing Scheme

Code2016 and CodeYang utilizes the classic RK4 to solve Equation (7) and Equation (10). QSL Squasher was updated to version 2.0 from 2019 January, then the option of tracing scheme of Cash–Karp method is removed and only Euler integration is retained. Previous versions are not available online currently. All of them use uniform fixed step-size (hereafter step).

FastQSL updates the tracing scheme with the 3/8-rule RK4 (Kutta 1901), which introduces a smaller step error than the classic RK4, and additionally provides RKF45 (Fehlberg 1969) for further acceleration. RKF45 calculates the difference between RK4 and RK5 of each step, tol is the maximum tolerated difference, and the unit of tol is the original grid spacing. If the difference is larger than tol, which is a failed-trial step, then the step-size is adjusted to a smaller value and we repeat the tracing step from the same point. If the difference is smaller than tol, RKF45 accepts this tracing step and then adjusts the step-size to a larger value according to the last difference and tol. A smaller value of tol will result in a more precise output, but takes more computational resources.

If grids are stretched, we adopt a self-adaptive fashion of step and tol in cells of different shapes for a better performance. The scaling of step and tol in a cell are

$\begin{eqnarray}&&\begin{array}{l}\mathrm{scaling}\\ =1/\sqrt{{\left(\displaystyle \frac{{B}_{x}/B}{{x}_{i+1}-{x}_{i}}\right)}^{2}+{\left(\displaystyle \frac{{B}_{y}/B}{{y}_{j+1}-{y}_{j}}\right)}^{2}+{\left(\displaystyle \frac{{B}_{z}/B}{{z}_{k+1}-{z}_{k}}\right)}^{2}},\end{array}\end{eqnarray} \tag{ 16 }$

$\begin{eqnarray}&&{{\mathtt{tol}}}_{\mathrm{cell}}={\mathtt{tol}}\times \mathrm{scaling},\end{eqnarray} \tag{ 17 }$

$\begin{eqnarray}&&{{\mathtt{step}}}_{\mathrm{cell}}={\mathtt{step}}\times \mathrm{scaling}.\end{eqnarray} \tag{ 18 }$

tol and step here are dimensionless. tol _cell and step _cell that are actually applied in the cell have the same units as x_i, y_j, z_k.

2.4. Code and Algorithm

Code2016 and FastQSL also provide ${{ \mathcal T }}_{w}$ and field-line length, which are also extended to 3D by remaining constant along a field line, like Equation (5). Different from the Q calculation, a low level of (even without) refinement of ${{ \mathcal T }}_{w}$ is good enough for data analysis. FastQSL additionally provides the coordinates of ending points of field lines, which had been used to locating flaring ribbons in an MHD simulation (Jiang et al. 2021), and can help the calculation of slip-squashing factors (Titov et al. 2009) if the flows at the boundaries are known.

FastQSL provides two sets of code. The first set is directly developed from Code2016 and still runs on IDL (the reliance of SolarSoftWare of Code2016 is removed)+Fortran.¹¹ This set is optimized in many details. For example, it can be compiled by both gfortran and ifort while Code2016 is designed only for ifort.

The second set is accelerated by an NVIDIA GPU. The flowchart of the GPU program with Method I is shown in Figure 1; the differences from Method II are described in the caption of Figure 1. The program is based on Python¹² and CUDA/C¹³ . The major input of this program is the data cube of the 3D magnetic field. The data is loaded with Scipy.io,¹⁴ which is capable of reading various file format of data (e.g., .sav, .mat, and unformatted). For the preparation of GPU computing, the field data, the parameters, and the coordinate array of points to be calculated are then transferred to GPU memory. The most computational-power consuming part is to trace magnetic field lines, for which the RKF45 solver is implemented in CUDA/C and compiled as a callable module TraceAllBline (as the green blocks shown in Figure 1) by Cupy.¹⁵ The compiled module calculates magnetic field lines and derives the footpoint mapping. Then the footpoint mapping results are transferred back to host memory for the calculation of Q. Also, with the magnetic field line computed in TraceAllBline, ${{ \mathcal T }}_{w}$ can be simply derived with Equation (6). After the calculation, Q and ${{ \mathcal T }}_{w}$ are visualized with matplotlib¹⁶ (for 2D) and pyvista¹⁷ (for 3D).¹⁸

3. Results

We apply FastQSL to three common scenarios of magnetic field analyzing: (1) a potential field extrapolated from a solar active region; (2) an analytical quadrapole field; and (3) a flux rope from a TD99 model. These magnetic fields are used to demonstrated and benchmark FastQSL.

3.1. An Extrapolated Potential Field

The first test is done with the potential field extrapolated from an active region (NOAA AR 11112) on 2010 October 16 at 19:00 UT, which is the same field for analyzing "dome-plate QSL" in Chen et al. (2020). Strictly speaking, it should be "dome-plate separatrices" with a singular Q since there is a null point (Titov et al. 2011; Scott et al. 2021). As shown by Figure 2(d), Q_⊥ cannot show any footprints of a bald patch separatrix (Titov et al. 1993; Titov & Démoulin 1999). For example, in the region bounded by the black box in Figure 2(b), there is a footprint of a bald patch separatrix on a PIL between the green and the purple field lines (the neighboring field lines at two sides of a PIL that is the footprints of a bald patch separatrix should depart with each other), which also satisfies (B_x, B_y, 0) · ∇B_z∣_PIL > 0 (the purple lines in Figure 2(a)), while such topology is missing in Figure 2(d). As in the discussion above, the footprints of QSLs in Figures 2(d) and (e) are much thinner than those in Figure 2(b), even appearing discontinuous at the thinnest points. Figures 2(c) and (f) are calculated with original grids. Figure 2(c) keeps the continuity of footprints, while Figure 2(f) shows many more discontinuities than Figure 2(e), indicating that Method II requires a higher level of refinement for displaying continuous separatrices.

**Figure 2.** Magnetic configuration of NOAA AR 11112 on 2010 October 16 at 19:00 UT. (a) PILs (cyan) and the footprints of bald patch separatrix (purple) superimposed on the magnetogram. (b)–(f) Map of QSL at the photosphere; the method and the factor of refinement of grids (refined grid spacing = 1/factor × original grid spacing) are denoted in the image title. The coordinates are the same as those in Chen et al. (2020). The green and the purple field lines present a double check of the existence of a bald patch in the black box in panel (b).
Download figure:
Standard image High-resolution image

With the significant improvement in efficiency, the 3D distribution of Q can be obtained with ease within the timescale of minutes. Figure 3 shows the 3D magnetic topologies above the magnetogram. As shown in Figure 3, the dome structure in Figure 4 of Chen et al. (2020) is well represented.

3.2. An Analytical Quadrapole Field

In the first case, Q decays exponentially away from the null point (Pontin et al. 2016). There are still some remnants of the high-Q region around separatrices that can be captured by Method II. An analytical quadrapole field can clearly present the shortage of Method II on capturing separatrices. The field is

$\begin{eqnarray}&&{{\boldsymbol{B}}}_{\mathrm{quadrapole}}({\boldsymbol{r}})=\sum _{i=1}^{4}\,{q}_{i}\,\displaystyle \frac{{\boldsymbol{r}}-{{\boldsymbol{r}}}_{i}}{{\left|{\boldsymbol{r}}-{{\boldsymbol{r}}}_{i}\right|}^{3}},\end{eqnarray} \tag{ 19 }$

where r ₁ = (−1.5, 0, −0.5), r ₂ = (−0.5, 0, −0.5), r ₃ = (0.5, 0, −0.5), r ₄ = (1.5, 0, −0.5) are the locations of four magnetic charges, and {q₁, q₂, q₃, q₄} = {1, −1, 1, −1} are the strengths of the magnetic charges. This analytical field is uniformly discretized for FastQSL, and the grid spacing is 0.02. There are four Sun spots on the photosphere (Figure 4(a)), and the length of the field line can sharply jump at some places (Figures 4(b) and (e)), where there must be separatrices. Method I can fully capture all these separatrices (Figures 4(c) and (f)). While in Figure 4(g) that from Method II presents a blank because all calculated Q are below 10 in the region of Figure 4(g); the case is the same if we plot ${\mathrm{log}}_{10}({Q}_{\perp })$ . In Figure 4(c), in the area closed by the inner separatrices that are labeled with "in" or in the area outside of the outer separatrices that are labeled with "out," considering the symmetry of the magnetic field, the field-line mapping (1) should be x₂ = − x₁, y₂ = y₁. According to Equation (3), the values of Q in the area "in" and "out" should identically be 2. The real distribution of Q around these separatrices should like the Dirac delta function. As the discussions of Section 2.1, since most refined grids are not on the zero-thickness separatrices, Method II cannot capture these separatrices (Figures 4(d) and (g)).

**Figure 4.** Magnetic structures of the quadrapole field. (a) Magnetogram. (b), (e) Field-line length map. (c), (f) Map of ${\mathrm{log}}_{10}(Q)$ from Method I. (d), (g) Map of ${\mathrm{log}}_{10}(Q)$ from Method II.
Download figure:
Standard image High-resolution image

**Figure 4.** Magnetic structures of the quadrapole field. (a) Magnetogram. (b), (e) Field-line length map. (c), (f) Map of ${\mathrm{log}}_{10}(Q)$ from Method I. (d), (g) Map of ${\mathrm{log}}_{10}(Q)$ from Method II.
Download figure:
Standard image High-resolution image

Another case is that B = (x, −y, 0) has a similar distribution of Q. We set x² + y² = 1 as S₁(θ, z), S₂(θ, z) of the Q calculation (Figure 5), where θ is the radian measure. For 0 < θ < π/2 (the case is similar for π/2 < θ < 2 π), according to the symmetry of the magnetic field, the mappings are θ₂ = π/2 − θ₁, z₂ = z₁, and all values of Q are 2 with Equation (3). Separatrices only appear at θ = 0, π/ 2, π, 3 π/ 2.

3.3. A TD99 Model

A TD99 model at the setting of R = 110 Mm, d = 34 Mm, L = 55 Mm, a = 49.4 Mm, I = 4 × 10¹² A, I₀ = 1.66 × 10¹² A, q = 10¹⁴ T m² is also tested. This analytical field is uniformly discretized and the grid spacing is 3.04 Mm. This setting represents a hyperbolic flux tube (HFT; Titov et al. 2002) topology around the flux rope; the 3D structure of the HFT and the 3D distribution of the twist number are presented by Figure 6.

**Figure 6.** 3D structures of the TD99 model. Left: the purple semitransparent surface is the contour of Q = 5000, different types of magnetic field line are rendered in different colors. Right: the distribution of the twist number.
Download figure:
Standard image High-resolution image

4. Benchmark

A benchmark is presented for comparing the efficiency of FastQSL with published codes (i.e., QSL Squasher, CodeYang, and Code2016). The quality of resultant images and the time consumed depend on the choices of step of RK4 or tol of RKF45 and the method. Images with different choices of parameters are shown in Figure 7. There is a marginal value of step or tol for the quality of the resultant image. For a specific row, comparing horizontally, i.e., comparing with those resultant images with any lower values, the image with the marginal value should not show any recognizable difference. And above the marginal value, the difference is recognizable. In Figure 7, columns are labeled with "A, B, C, D" to mark an image that is the ground truth, marginal ground truth, distinguishable, and unlikeness. The marginal value depends on the smoothness of the field. For example, compared to column A and B, most areas in column C are satisfying, but some areas are not; one should check the quality of image to make decision. The image quality is not very sensitive to the value of step or tol, which the efficiency is sensitive to. For some analysis that do not require a high quality, even column D can provide an acceptable result, and can achieve a very fast performance.

**Figure 7.** The quality of the Q map with different codes, parameters, and methods. The rows of `QSL Squasher` and CodeYang show ${\mathrm{log}}_{10}({Q}_{\perp })$ ; other rows show ${\mathrm{log}}_{10}(Q)$ . Code and method are marked at the left of the image. The value of `step` or `tol` is in each panel. This test is done with the TD99 model, cut from a cross section at a height of 30.4 Mm; the unit of `step` and `tol` is the original grid spacing of the magnetic field (corresponding to 3.04 Mm). Column B presents images with a marginal value that are without any indistinct area, compared to column A. The numbers following Method II indicate the first or the second way of calculating ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ .
Download figure:
Standard image High-resolution image

**Figure 7.** The quality of the Q map with different codes, parameters, and methods. The rows of `QSL Squasher` and CodeYang show ${\mathrm{log}}_{10}({Q}_{\perp })$ ; other rows show ${\mathrm{log}}_{10}(Q)$ . Code and method are marked at the left of the image. The value of `step` or `tol` is in each panel. This test is done with the TD99 model, cut from a cross section at a height of 30.4 Mm; the unit of `step` and `tol` is the original grid spacing of the magnetic field (corresponding to 3.04 Mm). Column B presents images with a marginal value that are without any indistinct area, compared to column A. The numbers following Method II indicate the first or the second way of calculating ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ .
Download figure:
Standard image High-resolution image

For Method I, if we fix the tracing parameter step or tol, we find the angle $\theta =\arcsin (| {B}_{n,0}/B| )$ between the magnetic field line and the cross section S₀ will affect image quality, with a smaller θ giving poorer results. To mitigate this effect, our empirical formulas are

$\begin{eqnarray}&&{\left.{\mathtt{step}}\right|}_{{S}_{0}}=\max \left(\left[{{\mathtt{step}}}_{\perp }\times \left|{B}_{n,0}/B\right|,{{\mathtt{step}}}_{\min }\right]\right),\end{eqnarray} \tag{ 20 }$

$\begin{eqnarray}&&{\left.{\mathtt{tol}}\right|}_{{S}_{0}}={{\mathtt{tol}}}_{\perp }\times | {B}_{n,0}/B{| }^{1.5},\end{eqnarray} \tag{ 21 }$

where step _⊥, ${{\mathtt{step}}}_{\min }$ and tol _⊥ are constants and max() is the operation of taking the maximum value. When field lines are tangent to S₀, B_n,0 → 0, introducing spurious high-Q structures. In order to avoid this artifact, when ∣B_n,0/B∣ < 0.05, we locally rotate S₀ so that it is perpendicular to the field line, and traces four neighboring field lines at the new cross section, and then calculate Q. Therefore, the step or the tol at the input of FastQSL is for the perpendicular case, then adjusted by Equations (20) or (21) for every field line. Since Code2016 fixes the step, it needs a small marginal step of 0.4 in Figure 7. For Method II, FastQSL always sets S₀ to the perpendicular case; this adjustment is not applied.

For Method II , there can be two strategies to calculate ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ . One prepares a 3D array of ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ at the beginning by a second order finite difference; for example, the formula for uniform grids is

$\begin{eqnarray}\begin{array}{rcl}\displaystyle \frac{\partial \,{\boldsymbol{B}}/B}{\partial \,x}({x}_{i},{y}_{j},{z}_{k}) & = & \displaystyle \frac{1}{({x}_{i+1}-{x}_{i-1})}\\ & & \times \,\left[\displaystyle \frac{{\boldsymbol{B}}({x}_{i+1},{y}_{j},{z}_{k})}{B({x}_{i+1},{y}_{j},{z}_{k})}-\displaystyle \frac{{\boldsymbol{B}}({x}_{i-1},{y}_{j},{z}_{k})}{B({x}_{i-1},{y}_{j},{z}_{k})}\right].\end{array}\end{eqnarray} \tag{ 22 }$

The form is similar for $\tfrac{\partial \,{\boldsymbol{B}}/B}{\partial \,y}({x}_{i},{y}_{j},{z}_{k}),\tfrac{\partial \,{\boldsymbol{B}}/B}{\partial \,z}({x}_{i},{y}_{j},{z}_{k})$ . Then one does an interpolation that is similar to Equation (13). The other is to interpolate $\tfrac{{\boldsymbol{B}}}{B}$ at the neighboring points of r like r ± (0.001, 0, 0), r ± (0, 0.001, 0), r ± (0, 0, 0.001), and then ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ is given by central differences. The first way gives a smoother distribution than the second. As shown by the bottom two rows of Figure 7, the marginal tol by the first way is 10^−2.9 while that by the second way is 10^−4.4, which means the image quality by the first way is better with the same tol. The second way needs to calculate an additional six values of $\tfrac{{\boldsymbol{B}}}{B}$ ; therefore, it is slower than the first way. But the first way asks for additional storage for the 3D array of ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ (3 times the array of B is occupied) while the second does not. CodeYang applies the second way. As the gradient of Equation (13) can be analytically given in every cell, QSL Squasher applies a skill that is mathematically similar to the second way, but the calculation of one value of ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ consumes the similar duration as the first way. The first set of FastQSL uses the first way, while the second set applies the second way due to the limitation of GPU memory. Compared with Method I (the row of Code2016 of Figure 7), Method II (the top two rows of Figure 7) allows a much larger step.

Providing a completely impartial benchmark to all codes and both methods is extremely difficult. We just show the benchmark at the marginal values (column B of Figure 7), to make a sense of the time consumed. By using the TD99 model as an input, and calculating for the same region (the 3D region of Figure 6) and at the same resolution, the efficiency is measured by the count of calculated values of Q per unit time.

The original CodeYang cannot accept a step > 1, because it extends only one ghost layer to boundaries. We modified the code to have 10 ghost layers for the benchmark.

QSL Squasher allows adaptive mesh refinement; the possible locations of QSL are quickly identified by the Field-line Length Edge (FLEDGE) map, and then the code only calculates Q_⊥ at these locations. Tassev & Savcheva (2017) claimed an order-of-magnitude speed-up with adaptive refinements. We first do not apply adaptive refinements. We directly set the grid resolution for Q_⊥ that is the same as other panels of Figure 7; then we achieve the marginal step of 0.3, and use this setting for the benchmark (Table 1). A larger step can shrink the resulting QSLs more significantly (Figure 7), and even slightly affects them at the step of 0.3. We infer that this artifact comes from the relatively large step error of Euler integration, because our codes will also have such shrinking if we change the tracing scheme to Euler integration. We also try the adaptive refinements. At the proper choices of the threshold of length jump for a refinement and the maximum times of refinements for giving a satisfying image that looks like column B of Figure 7, the performance by GPU is 25 kQ s⁻¹, which is surprisingly lower than 189 kQ s⁻¹ in Table 1, which is without the adaptive refinements. We guess that the gradient of field-line length in the thick QSL could be small still; then the adaptive refinements cannot improve the performance.

As shown by Table 1, Code2016 is faster than QSL Squasher and CodeYang. QSL Squasher is slowed down by Euler integration, double-precision floating-point format, adaptability for stretched girds, the way of calculating ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ , and other potential aspects. CodeYang is slowed down by double-precision floating-point format, the way of calculating ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ , and other potential aspects. Comparing to Code2016, FastQSL achieves a significant speed-up. For the first set of FastQSL, compiling by ifort is slightly faster than by gfortran. Tracing by RKF45 is slightly faster than by RK4, but this comparison may be not true for all kinds of magnetic field. For a highly twisted field, many failed-trial steps could happen in that by RKF45, and then that by RK4 could be even faster. If traced by RK4, the efficiency by Method II is slightly faster than that by Method I; the comparison is reversed if traced by RKF45. For the second set of FastQSL that is optimized with GPU, it achieves the best efficiency by Method I. If Q is calculated by Method II, since the second set uses the second way to calculate ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ , it is even slower than the first set.

Table 1. The Computational Efficiency of Different Methods, Measured as the Capability of Calculating Q per Second

Note. In this test, the CPU is Intel Core i9 10900K and the GPU is RTX 3070 OC. The version of gfortran is 9.4.0 and the version of ifort is 2021.6.0. The columns of "Processor," "Compiler," and "Performance" are only for the computation of QSL; others like input and output, preprocessing, and visualization are not involved. The numbers following II indicate the first or the second way of calculating ${\rm{\nabla }}\tfrac{{\boldsymbol{B}}}{B}$ . The column of "Parameter" shows the marginal value of a satisfied image in column B of Figure 7. QSL Squasher is tested with a smaller data cube of the same TD99 due to its high requirement of memory.

5. Discussion and Summary

Compared with Code2016, the computational efficiency increased by 24 times in this work. The increase in computational efficiency majorly comes from two aspects: new computing architecture and the improvement of algorithm. In the computational tasks of data inspection like QSL identification and quantification, GPU acceleration can improve the efficiency of interactive inspection and help to discover new features in data. In the computational tasks of simulation (Feng et al. 2013), the massive parallel computation with GPU can help explore more parameter space with a given amount of computational resource. Also, GPU computing is more environmentally friendly by reducing carbon emission (Stevens et al. 2020; Portegies Zwart 2020).

To summarize, we developed a reliable method of calculating the squashing factor Q and twist number in data cubes of magnetic fields on Cartesian grids. This method can achieve unprecedented computation efficiency, with which one can obtain maps of Q and the twist number within a few seconds for 2D input and a few minutes for 3D input. The high efficiency can benefit the analysis of magnetic topology, especially for the analysis of MHD simulations, which may require the computation of 3D Q and the twist number in a time series.

J.C. thanks the constructive discussions with Guo, Yang, and Roger B. Scott, and acknowledges the support by the China Scholarship Council (No. 201706340140). The research was supported by the National Natural Science Foundation of China (42188101, 41974199, 41574167, 41774150, and 11925302), the B-type Strategic Priority Program of the Chinese Academy of Sciences (XDB41000000), and STELLAR project (952439).

FastQSL: A Fast Computation Method for Quasi-separatrix Layers

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction