ODYSSEY: A PUBLIC GPU-BASED CODE FOR GENERAL RELATIVISTIC RADIATIVE TRANSFER IN KERR SPACETIME

Hung-Yi Pu; Kiyun Yun; Ziri Younsi; Suk-Jin Yoon

doi:10.3847/0004-637X/820/2/105

1. INTRODUCTION

Theoretical studies of observable features in the strong gravity environment around black holes, such as emission-line profiles (e.g., Fuerst & Wu 2004, 2007; Müller & Camenzind 2004; Younsi et al. 2012), reverberation (e.g., Reynolds et al. 1999), light curves of a hot spot (e.g., Schnittman & Bertschinger 2004; Broderick & Loeb 2005), quasi-periodic oscillations (QPOs; e.g., Schnittman & Bertschinger 2004; Schnittman & Rezzolla 2006; Fukumura & Kazanas 2008), and black hole shadow images (e.g., Falcke et al. 2000; Takahashi 2004), provide useful insights into understanding the nature of these systems. Recent millimeter very long baseline interferometry (VLBI) observations of Sgr A* and M87 by the Event Horizon Telescope (EHT) helped to place constraints on the core size to within a few Schwarzschild radii and demonstrated the potential capability of the EHT to directly image the shadow of a black hole (Doeleman et al. 2008, 2012). In the near future, images of black hole shadows are expected to be observed by millimeter/submillimeter VLBI observations, e.g., the EHT, BlackHoleCam, and Greenland Telescope (GLT) projects (Inoue et al. 2014).

As the resulting image and spectrum from black hole accretion and jets can vary owing to the dynamics of the accretion flow (e.g., free-fall versus Keplerian sphere of plasma; Falcke et al. 2000; Broderick & Loeb 2006) and are also strongly dependent on the distribution of (both thermal and nonthermal) electrons (e.g., Akiyama et al. 2015; Chan et al. 2015a), a systematic study of the observational predictions of many different models throughout physically relevant parameter space is essential. As such, the rapid, efficient and accurate computation of both geodesics and the solution of the radiative transfer equations along these geodesics in the Kerr spacetime is a very important topic.

The integration of geodesics in the Kerr spacetime can be performed either by using the transfer function method (Cunningham 1975; Fabian et al. 2000), by using the elliptic function method (e.g., Rauch & Blandford 1994; Agol 1997; Dexter & Agol 2009; Yang & Wang 2013), or by direct numerical integration of the geodesic equations of motion (e.g., Fuerst & Wu 2004; Levin & Perez-Giz 2008; Psaltis & Johannsen 2012; Younsi et al. 2012). In the absence of scattering, the integration of the radiation transfer equation can be performed by dividing each ray into a series of small steps. While the elliptic function method, being semianalytic in form, is efficient for the calculation of emission from axisymmetric and optically thick objects like a geometrically thin accretion disk, it does not fare so well for systems that do not possess the necessary symmetry, such as the highly turbulent, nonsymmetric, and magnetized flows found in GRMHD simulations of accretion onto black holes. Direct numerical integration of the geodesic equations of motion by the Runge–Kutta method is more suitable for general relativistic radiative transfer (GRRT) computations as it makes no assumptions of the underlying geometry or thermodynamics of the accretion flow or spacetime being considered. Additionally, because a complex change of variables is not needed, the direct integration method is more straightforward to implement numerically, and its incorporation into more sophisticated and physically realistic models is transparent.

The calculation of geodesics and radiative transfer along each ray (geodesic) can be efficiently boosted through parallel computation. In capitalizing on the advantages offered by parallel programming, much attention has been paid in recent years to the graphics processing unit (GPU). Being somewhat analogous to the messaging passing interface (MPI) model composed of multiple CPUs, the GPU model is built around the concept of multiple streaming multiprocessors (SMs) containing several hundred threads. These threads work as a CPU does, i.e., concurrently, with a single-instruction, multiple-thread (SIMT) model on a single graphics card. However, a graphics card may comprise several thousand processors. In order to take advantage of the architecture of the GPU, one of the major graphics card manufacturers, NVIDIA, released the Compute Unified Device Architecture (CUDA) platform for General-Purpose computing on Graphics Processing Units (GPGPU). Given that ray tracing, in the absence of scattering, is a trivially parallelizable problem (each ray may be treated as independent from all other rays), no communication between threads is necessary during the calculation of each ray. This makes the GPU model particularly appealing.

In this article we present a public, GPU-based GRRT code, Odyssey, based on the ray-tracing algorithm presented in Fuerst & Wu (2004) and the radiative transfer formulation described in Younsi et al. (2012), implemented in CUDA C/C++. The performance on a single NVIDIA GPU graphics card exceeds 1 ns per Runge–Kutta step per geodesic, similar to or slightly better than that reported in another GPU-based ray-tracing code, GRay (Chan et al. 2013), in which a different ray-tracing algorithm is adopted. One of the direct applications of Odyssey is for studying the spectra and images from black holes at horizon scales. This is an important observational goal of current and future millimeter/submillimeter VLBI observations of the accretion flows onto supermassive black holes.

The article is organized as follows. In Section 2 the ray tracing and radiative transfer formulation are derived and the numerical solution of this formulation is discussed. In Section 3 we introduce and discuss the GPU scheme on which Odyssey is based. In Section 4 we present the results of several different benchmarking calculations for Odyssey, including the calculation of images, spectra, and light curves necessary for astrophysical calculations and comparisons with future observations. In Section 5 we assess the performance of Odyssey, presenting timing benchmarks and comparisons with other ray-tracing codes. Section 6 is devoted to the summary and discussion.

2. FORMULATION

In this article we adopt the natural unit convention (c = G = 1), wherein the gravitational radius of a black hole of mass M is given by r_g = M. For a rotating (Kerr) black hole the spacetime metric may be written in Boyer–Lindquist coordinates as

$\begin{eqnarray}{{ds}}^{2} & = & -\left(1-\displaystyle \frac{2{Mr}}{{\rm{\Sigma }}}\right){{dt}}^{2}-\displaystyle \frac{4{aMr}{\mathrm{sin}}^{2}\theta }{{\rm{\Sigma }}}{dtd}\phi +\displaystyle \frac{{\rm{\Sigma }}}{{\rm{\Delta }}}{{dr}}^{2}\\ & & +{\rm{\Sigma }}d{\theta }^{2}+\left({r}^{2}+{a}^{2}+\displaystyle \frac{2{a}^{2}{Mr}{\mathrm{sin}}^{2}\theta }{{\rm{\Sigma }}}\right){\mathrm{sin}}^{2}\theta d{\phi }^{2},\end{eqnarray} \tag{ 1 }$

where ${\rm{\Sigma }}\equiv {r}^{2}+{a}^{2}{\mathrm{cos}}^{2}\theta$ and ${\rm{\Delta }}\equiv {r}^{2}-2{Mr}+{a}^{2}$ . Hereafter the black hole mass is set to unity, which is equivalent to normalizing the length scale to r_g and the timescale to r_g/c. From the separability of the Hamilton–Jacobi equations for a Kerr black hole, the geodesic equations of motion may be reduced to a problem of quadratures (four constants of motion [see Section 2.1] and four ordinary differential equations) with the variables ( $\dot{t}$ , ${\dot{r}}^{2}$ , ${\dot{\theta }}^{2}$ , $\dot{\phi }$ ), where an overdot denotes differentiation with respect to the affine parameter. Complications related to the uncertainty in the signs of r and θ at turning points in the geodesic motion arise from this approach, and so to circumvent this issue, we instead consider the second derivatives of r and θ and integrate six differential equations for $(\dot{r},\dot{\theta },\dot{\phi },\dot{t},\dot{{p}_{r}},\dot{{p}_{\theta }})$ , where (p_r, p_θ) are components of the covariant four-momentum of the geodesic (Fuerst & Wu 2004).

Such an algorithm is sufficiently straightforward to implement into GRRT calculations. The covariant four-momentum ${p}_{\alpha }$ is computed at each integration step and can be directly used to compute the relative energy shift, γ, between radiation emitted from material circulating around the black hole with four-velocity u^α and radiation received by a distant observer:

$\begin{eqnarray}&&\gamma \equiv \displaystyle \frac{\nu }{{\nu }_{0}}=\displaystyle \frac{{p}_{\alpha }{u}^{\alpha }{| }_{\infty }}{{p}_{\alpha }{u}^{\alpha }{| }_{\lambda }}.\end{eqnarray} \tag{ 2 }$

The scalar product p_αu^α is a frame-invariant quantity that may be calculated in any desired frame at that position in spacetime. For simplicity, we choose to evaluate p_αu^α in the observer reference frame at that particular point along the ray (indicated by the affine parameter, λ). Subscripts "0" and " $\infty$ " denote quantities evaluated in the local fluid rest frame and in the reference frame of a distant (stationary) observer, respectively. For completeness, in this section we summarize the governing differential equations (Fuerst & Wu 2004), initial conditions, and radiative transfer formulation (Younsi et al. 2012) used in Odyssey.

2.1. Ray-tracing Algorithm

The Kerr spacetime is of Petrov type D and, being independent of both t and ϕ coordinates, possesses two Killing vectors. These give rise to the conservation of energy, E, and conservation of angular momentum, L_z, where L_z is the projection of the particle angular momentum along the black hole spin axis. The rest mass, μ, of the particle (0 for photons and massless particles and −1 for particles with mass) and the Carter constant, Q, are also conserved along each geodesic. As such, the Kerr black hole possesses four constants of motion. From the Lagrangian, the covariant four-momenta components of a geodesic may be written as

$\begin{eqnarray}&&{p}_{{\rm{t}}}=-E,\end{eqnarray} \tag{ 3 }$

$\begin{eqnarray}&&{p}_{{\rm{r}}}=\displaystyle \frac{{\rm{\Sigma }}}{{\rm{\Delta }}}\dot{r},\end{eqnarray} \tag{ 4 }$

$\begin{eqnarray}&&{p}_{\theta }={\rm{\Sigma }}\dot{\theta },\end{eqnarray} \tag{ 5 }$

$\begin{eqnarray}&&{p}_{\phi }={L}_{{\rm{z}}}.\end{eqnarray} \tag{ 6 }$

In addition, E and L_z may be derived from the initial conditions of the ray via the following formulae:

$\begin{eqnarray}&&{E}^{2}=\left(\displaystyle \frac{{\rm{\Sigma }}-2r}{{\rm{\Sigma }}{\rm{\Delta }}}\right)({\rm{\Sigma }}{\dot{r}}^{2}+{\rm{\Sigma }}{\rm{\Delta }}{\dot{\theta }}^{2}-{\rm{\Delta }}\mu )+{\rm{\Delta }}{\dot{\phi }}^{2}{\mathrm{sin}}^{2}\theta ,\end{eqnarray} \tag{ 7 }$

$\begin{eqnarray}&&{L}_{{\rm{z}}}=\displaystyle \frac{({\rm{\Sigma }}{\rm{\Delta }}\dot{\phi }-2{arE}){\mathrm{sin}}^{2}\theta }{{\rm{\Sigma }}-2r}.\end{eqnarray} \tag{ 8 }$

The corresponding geodesic equations of motion may then be written as

$\begin{eqnarray}&&\dot{t}=E+\displaystyle \frac{2r({r}^{2}+{a}^{2})E-2{ar}\ {L}_{{\rm{z}}}}{{\rm{\Sigma }}{\rm{\Delta }}},\end{eqnarray} \tag{ 9 }$

$\begin{eqnarray}&&{\dot{r}}^{2}=\displaystyle \frac{{\rm{\Delta }}}{{\rm{\Sigma }}}(\mu +E\dot{t}-{L}_{{\rm{z}}}\dot{\phi }-{\rm{\Sigma }}{\dot{\theta }}^{2}),\end{eqnarray} \tag{ 10 }$

$\begin{eqnarray}&&{\dot{\theta }}^{2}=\displaystyle \frac{1}{{{\rm{\Sigma }}}^{2}}[Q+({E}^{2}+\mu ){a}^{2}{\mathrm{cos}}^{2}\theta -{L}_{{\rm{z}}}^{2}{\mathrm{cot}}^{2}\theta ],\end{eqnarray} \tag{ 11 }$

$\begin{eqnarray}&&\dot{\phi }=\displaystyle \frac{2{arE}+({\rm{\Sigma }}-2r){L}_{{\rm{z}}}{\mathrm{csc}}^{2}\theta }{{\rm{\Sigma }}{\rm{\Delta }}},\end{eqnarray} \tag{ 12 }$

where

$\begin{eqnarray}&&Q\equiv {p}_{\theta }^{2}+[{L}_{{\rm{z}}}^{2}{\mathrm{csc}}^{2}\theta -{a}^{2}({E}^{2}+\mu )]{\mathrm{cos}}^{2}\theta .\end{eqnarray} \tag{ 13 }$

Equations (10) and (11) may be replaced with two equations for the covariant four-momenta (for details, see Fuerst & Wu 2004) as follows:

$\begin{eqnarray}{\dot{p}}_{{\rm{r}}} & = & \displaystyle \frac{1}{{\rm{\Sigma }}{\rm{\Delta }}}\{[({r}^{2}+{a}^{2})\mu -\kappa ](r-1)+r{\rm{\Delta }}\mu \\ & & +2r({r}^{2}+{a}^{2}){E}^{2}-2{{aEL}}_{{\rm{z}}}\}-\displaystyle \frac{2{{p}_{r}}^{2}(r-1)}{{\rm{\Sigma }}},\end{eqnarray} \tag{ 14 }$

$\begin{eqnarray}&&{\dot{p}}_{\theta }=\displaystyle \frac{\mathrm{sin}\theta \mathrm{cos}\theta }{{\rm{\Sigma }}}\left[\displaystyle \frac{{L}^{2}}{{\mathrm{sin}}^{4}\theta }-{a}^{2}({E}^{2}+\mu )\right],\end{eqnarray} \tag{ 15 }$

where $\kappa \equiv Q+{L}_{{\rm{z}}}^{2}$ + ${a}^{2}({E}^{2}+\mu )$ = ${p}_{\theta }^{2}$ + ${L}_{{\rm{z}}}^{2}{\mathrm{csc}}^{2}\theta +{a}^{2}$ $({E}^{2}{\mathrm{sin}}^{2}\theta +\mu )$ . The final equations of motion for the six parameters $(r,\theta ,\phi ,t,{p}_{r},{p}_{\theta })$ are then given by Equations (3), (4), (9), (12), (14), and (15).

In Odyssey, a fifth-order Runge–Kutta scheme with adaptive step size (Press et al. 1992) is used to integrate these equations. In addition, to avoid numerical problems when a photon passes too close to the pole $(\theta =0,\pi )$ , $\mathrm{sin}\theta$ is set to be 10⁻⁸ when $\mathrm{sin}\theta \lt {10}^{-8}$ . As shown later in Section 4, such a consideration is acceptable in practical ray-tracing and GRRT calculations.

2.2. Initial Conditions

For an observer who receives the ray at an inclination angle θ_obs and radial position ${r}_{\mathrm{obs}}=\infty$ , the celestial coordinates in the observer's image frame, $(\alpha ,\beta )$ , are calculated as (Chandrasekhar 1983)

$\begin{eqnarray}&&\alpha ={L}_{{\rm{z}}}\mathrm{csc}{\theta }_{\mathrm{obs}},\end{eqnarray} \tag{ 16 }$

$\begin{eqnarray}&&\beta =\sqrt{Q/{E}^{2}+{a}^{2}{\mathrm{cos}}^{2}{\theta }_{\mathrm{obs}}-{L}_{{\rm{z}}}^{2}{\mathrm{cot}}^{2}{\theta }_{\mathrm{obs}}},\end{eqnarray} \tag{ 17 }$

where β is also equal to the initial value of $-{p}_{\theta }$ . However, we wish to define an observer at some arbitrary position $(r,\;\theta ,\;\phi )$ in space (not necessarily infinitely removed from the black hole). Such an approach has several advantages, including reducing the time needed to integrate the geodesic from the observer to the black hole. The initial conditions of a ray arriving in this observer's image plane are calculated as follows.

The observer grid is constructed as a left-handed rectangular coordinate system with the z-axis oriented toward the black hole center. The observer's axes are denoted by $\bar{{\boldsymbol{x}}}\equiv {(x,y,z)}^{{\rm{T}}}$ . The black hole coordinate system is right-handed, rectangular, and denoted by $\bar{{\boldsymbol{x}}}^{\prime} \equiv {(x^{\prime} ,y^{\prime} ,z^{\prime} )}^{{\rm{T}}}$ . The observer is located at a distance r_obs from the black hole center, at an angle θ_obs from the positive black hole z'-axis (coinciding with the spin axis) and at an angle ϕ_obs with respect to the black hole's x'-axis. While the value of ϕ_obs is arbitrary, since the Kerr metric does not depend on ϕ, there are situations where specifying the observer's azimuthal position is important. For instance, in time-dependent radiation transfer calculations or imaging fully three-dimensional anisotropic accretion flows such as those found in state-of-the-art GRMHD simulations (e.g., McKinney et al. 2013), one may wish to resolve particular local and perhaps transient features, e.g., outflows, magnetic reconnection events, and shocks.

It is assumed that the observer's image plane is a two-dimensional grid with zero curvature and that all rays received by the observer arrive perpendicular to this grid. Close to the black hole, spacetime curvature becomes significant and the observer image plane must possess some curvature-dependent distortion, which would in turn distort the calculated image. Rays would no longer arrive perpendicular to the image plane. To image a black hole closer to the event horizon would require defining an appropriate orthonormal tetrad basis in which to place the observer (e.g., Bardeen et al. 1972; Marck 1996). Since we concern ourselves only with calculating what a distant observer would actually observe, we place our observer at a distance of ${10}^{3}\;{r}_{{\rm{g}}}$ from the black hole. At this distance the deviation of geodesics from Minkowski spacetime is smaller than the numerical precision used to integrate the geodesic itself. Therefore, the spacetime may be taken as Euclidean, and we can safely employ the reverse ray-tracing method under the aforementioned assumptions.

In order to determine the initial conditions of rays starting on the observer's grid, the observer coordinate system $\bar{{\boldsymbol{x}}}$ must be transformed into the black hole coordinate system $\bar{{\boldsymbol{x}}}^{\prime}$ . This may be accomplished through the following series of transformations: (i) rotate clockwise by $(\pi -{\theta }_{\mathrm{obs}})$ about the x-axis ( ${{\boldsymbol{R}}}_{x}$ ), (ii) rotate clockwise by $(2\pi -{\phi }_{\mathrm{obs}})$ about the z-axis ( ${{\boldsymbol{R}}}_{z}$ ), (iii) reflect in the plane y = x ( ${{\boldsymbol{A}}}_{y=x}$ ), (iv) translate $\bar{{\boldsymbol{x}}}$ so that the origins of both coordinate systems coincide ( ${{\boldsymbol{T}}}_{{\boldsymbol{x}}\to {\boldsymbol{x}}^{\prime} }$ ). This may be calculated as follows:

$\begin{eqnarray}{\boldsymbol{x}}^{\prime} & = & {{\boldsymbol{A}}}_{y=x}\ {{\boldsymbol{R}}}_{z}(2\pi -{\phi }_{\mathrm{obs}}){{\boldsymbol{R}}}_{x}(\pi -{\theta }_{\mathrm{obs}}){\boldsymbol{x}}+{{\boldsymbol{T}}}_{{\boldsymbol{x}}\to {\boldsymbol{x}}^{\prime} }\\ & = & \left(\begin{array}{c}{ \mathcal D }(y,z)\mathrm{cos}{\phi }_{\mathrm{obs}}-x\mathrm{sin}{\phi }_{\mathrm{obs}}\\ { \mathcal D }(y,z)\mathrm{sin}{\phi }_{\mathrm{obs}}+x\mathrm{cos}{\phi }_{\mathrm{obs}}\\ ({r}_{\mathrm{obs}}-z)\mathrm{cos}{\theta }_{\mathrm{obs}}+y\mathrm{sin}{\theta }_{\mathrm{obs}}\end{array}\right),\end{eqnarray} \tag{ 18 }$

where

$\begin{eqnarray}&&{ \mathcal D }(y,z)\equiv (\sqrt{{r}_{\mathrm{obs}}^{2}+{a}^{2}}-z)\mathrm{sin}{\theta }_{\mathrm{obs}}-y\mathrm{cos}{\theta }_{\mathrm{obs}}.\end{eqnarray} \tag{ 19 }$

The transformation from Cartesian coordinates to Boyer–Lindquist coordinates is given by

$\begin{eqnarray}&&r=\sqrt{\displaystyle \frac{w+\sqrt{{w}^{2}+4{a}^{2}{z}^{\prime 2}}}{2}},\end{eqnarray} \tag{ 20 }$

$\begin{eqnarray}&&\theta =\mathrm{arccos}\left(\displaystyle \frac{z^{\prime} }{r}\right),\end{eqnarray} \tag{ 21 }$

$\begin{eqnarray}&&\phi =\mathrm{atan}2(y^{\prime} ,x^{\prime} ),\end{eqnarray} \tag{ 22 }$

where $w\equiv {x}^{\prime 2}+{y}^{\prime 2}+{z}^{\prime 2}-{a}^{2}$ . Substituting the components of Equation (18) into Equations (20)–(22) gives the initial $(r,\;\theta ,\;\phi )$ conditions for a photon on the observer grid.

Next, the initial velocities of the ray must be determined. Each ray arrives perpendicular to the image plane, moving parallel to the z-axis; hence, we set $(\dot{x},\;\dot{y},\;\dot{z})=(0,\;0,\;1)$ . Subsequent differentiation of Equation (18) yields the Cartesian components of the ray's velocity in black hole coordinates:

$\begin{eqnarray}&&\dot{x}^{\prime} =\left(\begin{array}{c}-\mathrm{sin}{\theta }_{\mathrm{obs}}\mathrm{cos}{\phi }_{\mathrm{obs}}\\ -\mathrm{sin}{\theta }_{\mathrm{obs}}\mathrm{sin}{\phi }_{\mathrm{obs}}\\ -\mathrm{cos}{\theta }_{\mathrm{obs}}\end{array}\right).\end{eqnarray} \tag{ 23 }$

Finally, to obtain the ray's velocity components in Boyer–Lindquist coordinates, we differentiate Equations (20)–(22) with respect to affine parameter, solve for $(\dot{r},\;\dot{\theta },\;\dot{\phi })$ , and substitute for Equation (23). Upon simplification this yields

$\begin{eqnarray}&&\dot{r}=-\displaystyle \frac{r{ \mathcal R }\mathrm{sin}\theta \mathrm{sin}{\theta }_{\mathrm{obs}}\mathrm{cos}{\rm{\Phi }}+{{ \mathcal R }}^{2}\mathrm{cos}\theta \mathrm{cos}{\theta }_{\mathrm{obs}}}{{\rm{\Sigma }}},\end{eqnarray} \tag{ 24 }$

$\begin{eqnarray}&&\dot{\theta }=\displaystyle \frac{r\mathrm{sin}\theta \mathrm{cos}{\theta }_{\mathrm{obs}}-{ \mathcal R }\mathrm{cos}\theta \mathrm{sin}{\theta }_{\mathrm{obs}}\mathrm{cos}{\rm{\Phi }}}{{\rm{\Sigma }}},\end{eqnarray} \tag{ 25 }$

$\begin{eqnarray}&&\dot{\phi }=\displaystyle \frac{\mathrm{sin}{\theta }_{\mathrm{obs}}\mathrm{sin}{\rm{\Phi }}}{{ \mathcal R }\mathrm{sin}\theta },\end{eqnarray} \tag{ 26 }$

where ${ \mathcal R }\equiv \sqrt{{r}^{2}+{a}^{2}}$ and ${\rm{\Phi }}\equiv (\phi -{\phi }_{\mathrm{obs}})$ . We now have initial conditions for $(r,\;\theta ,\;\phi ,\;t,\;{p}_{r},\;{p}_{\theta })$ .

2.3. Radiative Transfer

To compute the jet emission and image in the observers' reference frame, one can use the ray-tracing scheme outlined in Section 2.1 to trace the received ray backward in time with initial conditions as described in Section 2.2. The frequency shift is related to the plasma motion and varies from point to point along the ray; see Equation (2). Consequently, a frequency correction from the observed frequency to the local frequency is required at every point along the ray, because physical processes take place in the local comoving frame.

Along each ray the covariant GRRT equation may be written as

$\begin{eqnarray}&&\displaystyle \frac{d{ \mathcal I }}{d{\tau }_{\nu }}=-{ \mathcal I }+\displaystyle \frac{\eta }{\chi },\end{eqnarray} \tag{ 27 }$

where the Lorentz-invariant intensity ( ${ \mathcal I }$ ) is related to the specific intensity (I) as ${ \mathcal I }={I}_{\nu }/{\nu }^{3}={I}_{{\nu }_{0}}/{\nu }_{0}^{3}$ , where ν is the frequency of radiation. The invariant absorption coefficient (χ) and invariant emission coefficient (η) are given by $\chi =\nu {\alpha }_{\nu }$ and $\eta ={j}_{\nu }/{\nu }^{2}$ , respectively, where ${\alpha }_{\nu }$ and ${j}_{\nu }$ are, respectively, the specific emission and absorption coefficient evaluated at a frequency ν. The optical depth of the medium at a given frequency ν is denoted by ${\tau }_{\nu }$ .

Defining the source function ${ \mathcal S }\equiv \eta /\chi$ , Equation (27) may be directly integrated, yielding

$\begin{eqnarray}&&{ \mathcal I }({\tau }_{\nu })={{ \mathcal I }}_{0}{e}^{-{\tau }_{\nu }}+{\displaystyle \int }_{{\tau }_{0}}^{{\tau }_{\nu }}{ \mathcal S }({\tau }_{\nu }^{\prime }){e}^{-({\tau }_{\nu }-{\tau }_{\nu }^{\prime })}d{\tau }_{\nu }^{\prime },\end{eqnarray} \tag{ 28 }$

where the optical depth ${\tau }_{\nu }$ is given by

$\begin{eqnarray}&&{\tau }_{\nu }(\lambda )=-{\displaystyle \int }_{{\lambda }_{0}}^{\lambda }{\alpha }_{0,\nu }(\lambda ^{\prime} ){k}_{\alpha }{u}^{\alpha }{| }_{\lambda ^{\prime} }d\lambda ^{\prime} ,\end{eqnarray} \tag{ 29 }$

and λ is the affine parameter.

By combining Equations (28) and (29), the solution of the radiative transfer equation can be can be reduced to two decoupled differential equations (Younsi et al. 2012)

$\begin{eqnarray}&&\displaystyle \frac{d\tau }{d\lambda }={\gamma }^{-1}{\alpha }_{0,\nu },\end{eqnarray} \tag{ 30 }$

$\begin{eqnarray}&&\displaystyle \frac{d{ \mathcal I }}{d\lambda }={\gamma }^{-1}\left(\displaystyle \frac{{j}_{0,\nu }}{{\nu }^{3}}\right)\;\mathrm{exp}(-\tau ),\end{eqnarray} \tag{ 31 }$

which are easily integrated along with the geodesic as the ray is propagated backward in time.

3. GPU SCHEME

GPUs enable high multithreading and are designed for massively parallel computation.⁶ Using a mapping of one CUDA thread to one pixel, the method of parallel computation done by CUDA can be illustrated in Figure 1.

**Figure 1.** Mapping the threads to pixels of the image, as discussed in Section 3.
Download figure:
Standard image High-resolution image

In this analogy, the observer's image plane is composed of ${N}_{\alpha }\times {N}_{\beta }$ (hereafter assume ${N}_{\alpha }={N}_{\beta }=N$ ) pixels in the α and β directions and is decomposed into multiple Grids. Each Grid is subdivided into a two-dimensional hierarchy of blocks and threads, and a CUDA kernel function run on each GPU is performed, Grid by Grid, until the entire image plane is covered. To ensure that there are enough threads to cover an image composed of N² pixels, the number of Grids in each direction ( $\alpha ,\beta$ ) must satisfy the following condition:

$\begin{eqnarray}&&{k}_{i}=\lceil \displaystyle \frac{N}{{\mathrm{Dim}}_{i}^{\mathrm{Block}}\times {\mathrm{Dim}}_{i}^{\mathrm{Grid}}}\rceil \end{eqnarray} \tag{ 32 }$

Note again that Dim^Grid represents the number of blocks, Dim^Block represents the number of threads, and $i=(\alpha ,\beta )$ is the dimension index. The notation $\lceil k\rceil$ is the smallest integer not less than k (the "ceiling" function).

The global memory on each GPU is allocated by the CUDA command cudaMalloc(), which is composed of two parts. First, for each working thread, some global memory is required for saving specific variables until each thread finishes its computation. Let P be the corresponding memory for each thread; then an amount P × T of memory is required, where T is the number of working threads. Second, part of the global memory must be assigned for saving the computed data that will later be returned to the CPU. For each pixel, if Q is the amount of information for each pixel to be later transported from CUDA to the host CPU, then Q × N² of memory is required for the entire image. By manipulating the value of P and Q, the user can easily design their own task and assign multiple outputs (e.g., α, β, γ, ${I}_{\nu }$ , ...) to each pixel of the image for subsequent calculations.

4. BENCHMARK ASTROPHYSICAL CALCULATIONS

4.1. Grid Projection

In this section we perform several standard benchmark tests of the performance of Odyssey. First, we calculate the projection of an evenly spaced rectangular grid in the observer's image plane $(\alpha ,\beta )$ onto the equatorial plane of a Kerr black hole (see Schnittman & Bertschinger 2004; Dexter & Agol 2009; Chan et al. 2013) via the following coordinate transformation:

$\begin{eqnarray}&&x=\sqrt{{r}^{2}+{a}^{2}}\mathrm{cos}\phi ,\end{eqnarray} \tag{ 33 }$

$\begin{eqnarray}&&y=\sqrt{{r}^{2}+{a}^{2}}\mathrm{sin}\phi .\end{eqnarray} \tag{ 34 }$

This is illustrated in Figure 2, which recovers the results of the aforementioned previous studies, e.g., Figure 2 of Schnittman & Bertschinger (2004) and Figure 3 of Dexter & Agol (2009). Note that the horizontal and vertical axis in Figure 2 corresponds to the $-\beta$ and $-\alpha$ direction, as adopted by those authors.

**Figure 2.** Projection of uniform grids in the observer's frame $(\alpha ,\beta )$ onto the equatorial plane (x,y) of a nonrotating (a = 0; left panels) and rapidly rotating (a = 0.95; right panels) black hole. The observer inclination angle is set to be $i\simeq 0^\circ$ (top panels) and i = 60° (bottom panels). This figure recovers the results of previous studies, e.g., Figure 2 of Schnittman & Bertschinger (2004) and Figure 3 of Dexter & Agol (2009). The horizontal and vertical axes correspond to the $-\alpha$ and $-\beta$ direction, respectively, as adopted by the aforementioned authors.
Download figure:
Standard image High-resolution image

In Figure 3 we also consider a more illustrative example by projecting an evenly spaced grid in the equatorial plane (x, y) of the black hole onto the observer's image plane $(\alpha ,\beta )$ for nonspinning (a = 0, left) and spinning (a = 0.95, right) black holes. The observer's image frame is shown in Figure 3; therefore, the horizontal and vertical axes simply correspond to the α and β directions, respectively. The bending of rays emitted around the black hole results in an expansion of the grid on the equatorial plane, as can be clearly seen in the face-on case (top panels). Compared with the grid in front of the black hole, the thickness of the grid behind is magnified by gravitational lensing (bottom panels). When the black hole is rotating (right panels), the rays are dragged by the rotation of the spacetime around the black hole (the frame-dragging effect). This effect results in the grid being distorted in an anticlockwise direction (i.e., in the same direction as the rotation of the black hole).

4.2. Black Hole Shadows

The appearance of the black hole shadow, which is the shadow cast by the photon capture surface of the black hole (and not the event horizon itself), can be determined by simply plotting all rays that are captured by the black hole. The black hole shadow profile is associated with the black hole spin and inclination angle of the observer, as studied in, e.g., Falcke et al. (2000), Takahashi (2004), and Chan et al. (2013). In Figure 4 we plot the black hole shadow for cases of a Schwarzschild (a = 0) and a Kerr (a = 0.998) black hole, when the observer is in the x−y plane, i.e., i = 90°. For a = 0, the shape of the black hole shadow, or more accurately the shadow of the photon capture region, follows the analytic description ${\alpha }^{2}+{\beta }^{2}=27$ . The analytic solution for the general case, $a\ne 0$ , is discussed in, e.g., Grenzebach et al. (2014) and Abdujabbarov et al. (2015). The analytic solutions (dashed red curves) are plotted in Figure 4 for comparison.

**Figure 4.** Black hole shadow images for black holes with spin parameters of a = 0 (top panel) and a = 0.998 (bottom panel), as viewed along the equatorial plane. The dashed red curves indicate the analytic solution for the photon ring. The image resolution is 512 × 512 pixels.
Download figure:
Standard image High-resolution image

4.3. Keplerian Thin Disk

We now consider an infinitesimally thin, Keplerian disk around a rotating black hole. The inner edge r_in of the disk is located at the marginally stable orbit (Bardeen et al. 1972), r_ISCO (innermost stable circular orbit). The outer edge, r_out, is assumed to be located at $50{r}_{g}$ . For an observer inclination angle satisfying $\mathrm{cos}{\theta }_{\mathrm{obs}}=0.25$ , the relative energy/frequency shift, γ, is shown in Figure 5 as a series of solid blue (blueshift) and solid red (redshift) contours. The approaching (left) side is blueshifted (γ > 1), and the receding (right) side is redshifted (γ < 1). The dotted line denotes the region of zero redshift. The radial contours (r = constant; see the grid profile in the bottom left panel of Figure 3) are also shown as a series of concentric solid rings.

**Figure 5.** Energy shift of a geometrically thin, Keplerian disk around a nonrotating black hole, with $\mathrm{cos}{\theta }_{\mathrm{obs}}=0.25$ . The disk is rotating anticlockwise (i.e., in the ϕ-direction). Only rays emitted directly from the disk are considered. The same contour values used in Figure 5.2 of Agol (1997) are adopted: the radial contour plot shows $r/{r}_{{\rm{g}}}=6.1,11,16,\cdots ,41,46,50$ , and the redshift contour plot shows $\gamma =0.55,0.6,\ldots ,0.95$ (red solid lines), 1.0 (black dotted line), 1.05, ..., 1.25, 1.3 (blue solid lines). This figure recovers Figure 5.2 of Agol (1997).
Download figure:
Standard image High-resolution image

**Figure 5.** Energy shift of a geometrically thin, Keplerian disk around a nonrotating black hole, with $\mathrm{cos}{\theta }_{\mathrm{obs}}=0.25$ . The disk is rotating anticlockwise (i.e., in the ϕ-direction). Only rays emitted directly from the disk are considered. The same contour values used in Figure 5.2 of Agol (1997) are adopted: the radial contour plot shows $r/{r}_{{\rm{g}}}=6.1,11,16,\cdots ,41,46,50$ , and the redshift contour plot shows $\gamma =0.55,0.6,\ldots ,0.95$ (red solid lines), 1.0 (black dotted line), 1.05, ..., 1.25, 1.3 (blue solid lines). This figure recovers Figure 5.2 of Agol (1997).
Download figure:
Standard image High-resolution image

We next compute spectra from a multitemperature disk as described in Novikov & Thorne (1973) and Page & Thorne (1974). The relation between the disk flux, F, and its effective temperature, T_eff, may be written as

$\begin{eqnarray}&&F(r)=\displaystyle \frac{3\dot{M}}{8\pi {r}^{2}}\displaystyle \frac{M}{r}f(r)=\sigma {T}_{{\rm{eff}}}^{4}(r),\end{eqnarray} \tag{ 35 }$

where M is the mass of the black hole, $\dot{M}$ is the accretion rate, f(r) is a correction factor related to the inner boundary of the disk and relativistic effects, and σ is the Stefan–Boltzmann constant. Under the assumption that the viscous stress vanishes at r_ISCO, the value of f (r) (and hence F(r)) also vanishes there.

The spectrum from a disk with a = 0.999, i = 85°, M = 10 M_⊙, $\dot{M}={10}^{19}\;{\rm{g}}\;{{\rm{s}}}^{-1}$ , and distance D = 10 kpc is shown in Figure 6. The spectrum is plotted in terms of photon number flux density, f_E (photons cm⁻² s⁻¹ keV⁻¹), instead of energy flux density, F_ν (erg cm⁻² s⁻¹ Hz⁻¹).⁷ We fix the disk inner radius at r_in = r_ISCO and vary the outer radius of the disk as r_out = 50r_g and 500r_g to demonstrate how the turnover at lower energies depends on the disk's outer radius. Conversely, the high-energy part of the spectrum manifests predominantly from regions with higher T_eff (i.e., from the inner edge of the disk) and therefore remains almost completely unchanged.

Incorporated into the X-ray spectral fitting package XSPEC, the KERRBB model is a multitemperature model of a geometrically thin, steady accretion disk (see Li et al. 2005). Disk self-irradiation, torque at the inner boundary of the disk, and limb darkening may be included within KERRBB by adjusting the appropriate model parameters. For comparison, the result of the KERRBB model calculation with similar parameters is plotted as a dashed line in Figure 6 (for details, see the figure caption).

4.4. Keplerian Hot Spot

A photon received by a distant observer at an observer time ${t}_{{\rm{obs}}}$ is emitted at a coordinate time, ${t}_{{\rm{emm}}}={t}_{{\rm{obs}}}-{\rm{\Delta }}t$ , where ${\rm{\Delta }}t$ is the time taken for the photon to travel from its point of emission to its point of reception on the observer image plane (see Schnittman & Bertschinger 2004).

The top panel of Figure 7 shows the spectrogram (time-dependent spectrum) of the direct emission from an orbiting hot spot rotating in a clockwise direction around the central black hole. The center of the hot spot, ${{\boldsymbol{x}}}_{{\rm{spot}}}(t)$ , orbits on the equatorial plane of a nonrotating black hole at r_ISCO, observed at i = 60°. The emissivity is assumed to be a function of the distance (d) from the hot spot center, where $d=| {\boldsymbol{x}}-{{\boldsymbol{x}}}_{{\rm{spot}}}(t)|$ . Within z > 0 and d < 4 r_spot, r_spot = 0.5 r_g and the emission is modeled as Gaussian in profile, in pseudo-Cartesian coordinates (Equations (33), (34), and $z=r\mathrm{cos}\theta$ )

$\begin{eqnarray}&&j({\boldsymbol{x}})\propto \mathrm{exp}\left(-\displaystyle \frac{{d}^{2}}{2{r}_{{\rm{spot}}}^{2}}\right).\end{eqnarray} \tag{ 36 }$

The hot spot is assumed to be optically thick, and therefore the computation is terminated once the ray intersects the hot spot surface. The shape and corresponding energy shift of the hot spot at specific orbital phases are shown in the bottom panel of Figure 7. The contribution of rays emitted from different locations of the hot spot at different local coordinate times, coupled with relativistic (e.g., length contraction and time dilation) and general relativistic (e.g., gravitational lensing) effects, results in a distorted hot spot image (see Younsi & Wu 2015). Again, the emission region is magnified when the hot spot is "behind" the black hole, farthest away from the observer. The light curve of the hot spot emission can be obtained by integrating the intensity over all frequency bins for each image, as shown in Figure 8. The spectrogram and light curve in Figure 7 recover those found in Figures 4 and 6 of Schnittman & Bertschinger (2004).

**Figure 7.** Top panel: spectrogram of a circular hot spot with r_spot = 0.5 r_g, orbiting a nonrotating black hole at r_ISCO. The observer inclination angle is i = 60°. This result recovers Figure 4 of Schnittman & Bertschinger (2004). Bottom panel: shape and energy shift of the hot spot at specific observer times, colored by energy shift, and also indicated by arrows A–E in the top panel.
Download figure:
Standard image High-resolution image

**Figure 8.** Light curves of the hot spot described in Figure 7 for different observer inclination angles. The intensity is normalized to the sum of total intensity over one orbital period. This result is in agreement with Figure 6 of Schnittman & Bertschinger (2004).
Download figure:
Standard image High-resolution image

4.5. Keplerian Shell

In order to test the radiative transfer formulation, we now consider the emission from a Keplerian shell of plasma in rotation around a black hole (see Broderick & Loeb 2006). By setting u^θ = 0, the remaining components of the 4-velocity of the flow follow the same description as that of a Keplerian disk as given by, e.g., Cunningham (1975). As a demonstrative calculation, we consider thermal synchrotron radiation from a distribution of relativistic electrons with an underlying relativistic Maxwellian profile. The angle-averaged emissivity coefficient for relativistic thermal synchrotron radiation may be written as (Mahadevan et al. 1996)

$\begin{eqnarray}&&{j}_{\nu }(T)=\nu \displaystyle \frac{4\pi {{ne}}^{2}}{\sqrt{3}{{cK}}_{2}(1/{{\rm{\Theta }}}_{{\rm{e}}})}M({x}_{M}),\end{eqnarray} \tag{ 37 }$

where

$\begin{eqnarray}&&M({x}_{M})=\displaystyle \frac{4.0505}{{x}_{M}^{1/6}}\left(1+\displaystyle \frac{0.40}{{x}_{M}^{1/4}}+\displaystyle \frac{0.5316}{{x}_{M}^{1/2}}\right)\mathrm{exp}(-1.8899{x}_{M}^{1/3}).\end{eqnarray} \tag{ 38 }$

Here ${x}_{M}\equiv \nu /{\nu }_{{\rm{c}}}$ and

$\begin{eqnarray}&&{\nu }_{c}=\left(\displaystyle \frac{3{eB}}{4\pi {m}_{e}c}\right){{\rm{\Theta }}}_{e}^{2},\end{eqnarray} \tag{ 39 }$

where ${{\rm{\Theta }}}_{e}\equiv {k}_{{\rm{B}}}T/{m}_{{\rm{e}}}{c}^{2}$ is the dimensionless electron temperature and K_n is a modified Bessel function of the second kind of order ${\rm{n}}$ . Since thermal synchrotron radiation is the only radiation source being considered, the absorption coefficient may be calculated via Kirchoff's law.

The Keplerian shell model has previously been used to simulate the image and spectrum of Sgr A* (e.g., Broderick & Loeb 2006; Broderick et al. 2011). We adopt the same self-similar electron number density profile and temperature profile given in Equation (1) and Table 1 of Broderick & Loeb (2006), for the cases of black hole spin parameters of a = 0, 0.5, and 0.998. The GRRT computation is performed within a shell of outer radius 500 r_g, with a central black hole mass of 4 × 10⁶ M_⊙. The resulting images at 150, 340, and 1000 GHz are plotted in Figure 9. The spectrum is shown in Figure 10. It is clear that, besides the blackbody-like spectral component contributed by the thermal synchrotron emission, an additional power-law component is further needed to explain the observed data of Sgr A*. On the other hand, because thermal synchrotron dominates the total emission of the image at the frequencies of interest, Figure 9 in general shows good agreement with Figure 1 of Broderick & Loeb (2006), in which the contribution from a nonthermal electron population is also included.

**Figure 9.** Thermal synchrotron radiation image of a Keplerian shell around a black hole at 150 GHz (left column), 340 GHz (central column), and 1000 GHz (right column), as viewed at an observer inclination angle of 45°. From top to bottom, a = 0, 0.5, and 0.998, respectively. The analytic solutions for the photon rings are shown by dashed green curves for reference. The image intensity is plotted on a linear scale.
Download figure:
Standard image High-resolution image

**Figure 10.** Corresponding spectrum of the emission from a Keplerian shell around a nonrotating black hole as illustrated in Figure 9. The cases a = 0, 0.5, and 0.998 are plotted as thick, medium, and thin solid lines, respectively. Observational data from Sgr A* (Serabyn et al. 1997; Falcke et al. 1998; Bower et al. 2015) are overlaid for comparison.
Download figure:
Standard image High-resolution image

5. TIMING BENCHMARK

To test the speed of Odyssey, we perform multiple runs on different image sizes for the same parameters as used to calculate the bottom right panel of Figure 2, a benchmark also used in the GPU-based ray-tracing code, GRay (Chan et al. 2013). The timing benchmark was performed with a single NVIDIA GeForce GTX 780 Ti graphics card (with 2880 CUDA cores and 3 GB of GDDR5 RAM) and computed in double-precision floating-point arithmetic. In Table 1, we record the run time and total number of steps used for the Runge–Kutta method. Although the run time and number of Runge–Kutta steps depend on the accuracy required by the adaptive step size (Press et al. 1992), the average time per Runge–Kutta step, per photon, remains roughly uniform with increasing numerical precision. Because the radiative transfer integration is also performed piecewise according to these steps, the average time per integration step, per photon provides a better standard for comparison with other ray-tracing codes with different underlying numerical schemes. The value for the averaged run time per Runge–Kutta step, per photon as a function of image size is given in the last column of Table 1.

Table 1. Timing Performance of Odyssey Code

Image Size	Run Time^a	Total R-K Steps	Average Time
(Number of Geodesics)	(ms)		(ns/step/photon)
2²	56.921631	5.718000e+003	2488.703693
4²	113.811523	2.484900e+004	286.257806
8²	116.409309	9.876600e+004	18.416211
16²	114.792221	3.929330e+005	1.141180
32²	117.429955	1.548845e+006	0.074041
64²	238.210617	6.183181e+006	0.009406
128²	605.967957	2.468902e+007	0.001498
256²	1873.401367	9.866115e+007	0.000290
512²	7090.669434	3.944880e+008	0.000069
1024²	25732.214844	1.577673e+009	0.000016

Note.

^aTime required to compute the bottom right panel of Figure 2 by one NVIDIA GeForce GTX 780 Ti graphics card with double-precision floating-point arithmetic. The GeForce GTX 780 Ti graphics card has 15 SMs and 192 cores per SM, giving 2880 CUDA cores in total.

Download table as: ASCII Typeset image

We are also interested in how GPUs can boost the computation speed compared to conventional serial CPU and parallel CPU cluster codes with the same numerical algorithm. Therefore, we apply the same algorithm described in Section 2 to a serial code and a parallel code in MPICH, which is a high-performance implementation of the MPI standard. These two codes are both written in C, using double-precision floating-point arithmetic. We compare the run-time result in Figure 11. The profile for the case of Odyssey and MPICH flattens when integrating a small number of geodesics because the run time is dominated by the time taken to invoke the CUDA or MPICH kernels. For small numbers of photons, the serial code outperforms the parallelized codes. However, parallel computations efficiently reduce the run time for larger numbers of geodesics. Comparing the unit price of a CPU cluster that can deliver computational results in an equal amount of time compared to one GPU card, it is clear that for the same unit price GPUs deliver results faster.

**Figure 11.** Comparison of the computational time required to obtain the result in the bottom right panel of Figure 2, for an image size of 2ⁿ × 2ⁿ pixels ( $n=1,2,\cdots ,10$ ), using three different programming architectures: serial code (blue triangles), parallelized by MPICH (green squares), and CUDA (red circles). For small numbers of geodesics, the computational time is dominated by the time to launch the CUDA or MPICH kernel. The serial code is run on an Intel(R) Core(TM) i7-4940 K 4.00 GHz CPU. The MPICH code was run on two Intel(R) Xeon(R) E5-2620 2.00 GHz CPUs (each CPU has six cores), with hyperthreading enabled. Our GPU code (`Odyssey`) was run on a single NVIDIA GeForce GTX 780 Ti graphics card, as summarized in Table 1.
Download figure:
Standard image High-resolution image

**Figure 11.** Comparison of the computational time required to obtain the result in the bottom right panel of Figure 2, for an image size of 2ⁿ × 2ⁿ pixels ( $n=1,2,\cdots ,10$ ), using three different programming architectures: serial code (blue triangles), parallelized by MPICH (green squares), and CUDA (red circles). For small numbers of geodesics, the computational time is dominated by the time to launch the CUDA or MPICH kernel. The serial code is run on an Intel(R) Core(TM) i7-4940 K 4.00 GHz CPU. The MPICH code was run on two Intel(R) Xeon(R) E5-2620 2.00 GHz CPUs (each CPU has six cores), with hyperthreading enabled. Our GPU code (`Odyssey`) was run on a single NVIDIA GeForce GTX 780 Ti graphics card, as summarized in Table 1.
Download figure:
Standard image High-resolution image

GRay is a GPU-based ray-tracing code, based on the ray-tracing algorithm of Psaltis & Johannsen (2012) and Bauböck et al. (2012). With the same benchmarks (bottom right panel of Figure 2), Odyssey and GRay reveal similar profiles in run-time measures (comparing Figure 11 with Figure 4 in Chan et al. 2013). Odyssey also reaches similar levels of performance to those reported by GRay (∼1 ns per photon, per time step). The average run time for Odyssey can be less than 1 ns when the image size is larger than 32² pixels.⁸

6. SUMMARY AND DISCUSSION

Odyssey is an accurate, flexible, and efficient GPU-based code for ray tracing and radiative transfer in the Kerr spacetime. Compared with GRay (Chan et al. 2013), Odyssey adopts a different ray-tracing algorithm and CUDA code structure, with similar performance for the ray tracing (≲1 ns per step, per photon). The source code for Odyssey is freely available (Pu et al. 2016)⁹ , with two default tasks for computing Figure 5 and the top middle panel of Figure 9. Users can easily modify the source code to suit their needs and efficiently perform GRRT calculations on their computer with a CUDA-capable GPU graphics card.

An immediate and important application of Odyssey is the rapid computation of images, spectra, and light curves of different black hole accretion and/or jet models, provided either semianalytically (e.g., Broderick & Loeb 2006; Broderick & Loeb 2009; Pu et al. 2015) or numerically, e.g., from simulations of current state-of-the-art general relativistic magnetohydrodynamic (GRMHD) simulation codes, such as HARM3D (Noble et al. 2009; Mościbrodzka et al. 2014, 2016) and RAISHIN (Mizuno et al. 2006). Post-processing GRMHD simulation data for GRRT calculations has been considered in several studies to calculate the simulated spectrum and VLBI images from Sgr A* and M87 (e.g., Mościbrodzka et al. 2009; Dexter et al. 2012; Mościbrodzka & Falcke 2013). The works of Chan et al. (2015a, 2015b) provide a good example of how GPUs can help to accelerate these computations significantly, one important application of which is to a time series of GRMHD data, in particular to extract the observed spectra and light curves from many different accretion models both accurately and efficiently. Such calculations are important for calculating the time-dependent emission from accretion onto black holes and warrant a detailed separate study.

Time-dependent GRRT involving a full consideration of the light-crossing time of each ray is needed when the light-crossing timescale is comparable with the dynamical timescale of the system, for example, near the black hole event horizon or strong shock regions. Starting from the observer's frame, the photon plane moving backward in time with the same increment ${\rm{\Delta }}t$ resembles a plane on which all emitted photons will reach the observer simultaneously. We therefore term this plane the frozen photon plane. At large distances, the frozen photon plane resembles a conventional Euclidean plane because length-contraction and time-dilation effects are negligible far from the black hole. However, closer to the black hole, the frozen photon plane becomes distorted, as demonstrated in Figure 12. Although the GRRT integration along each ray between successive frozen photon planes requires only the data within two planes, near the black hole every photon on the same plane requires different numbers of Runge–Kutta steps to reach the final photon plane (i.e., the distant observer). This is essentially the "fast light" approximation, whereby it is assumed that all rays from everywhere within the computational time, at that particular observer time, arrive simultaneously. This enables ray tracing on static time slices of GRMHD simulation data and is trivially parallelizable.

**Figure 12.** Evolution of photon plane (backward in time) near a nonrotating black hole at various coordinate times t, at which the emitted photons all reach the observer simultaneously. The time sequence is from left to right, top to bottom, and the time interval between snapshots is ${dt}=4$ .
Download figure:
Standard image High-resolution image

**Figure 12.** Evolution of photon plane (backward in time) near a nonrotating black hole at various coordinate times t, at which the emitted photons all reach the observer simultaneously. The time sequence is from left to right, top to bottom, and the time interval between snapshots is ${dt}=4$ .
Download figure:
Standard image High-resolution image

Relaxing this approximation and considering the dynamical evolution of the medium as the ray propagates through it, along with the arrival time delays between neighboring rays, is a significant computational challenge. Owing to the limited amount of memory on board each GPU and the size of even modest 3D GRMHD data, post-processing GRMHD simulation results with time-dependent GRRT is not currently feasible with GPUs. However, recent progress in parallel computing, most notably with hybrid CPU and GPU programming architectures like OpenCL and CUDA-Aware MPI, offers several possibilities to approach this problem. Through sufficient load balancing of the computation between CPUs and GPUs, one can in principle access the large amounts of RAM required for this task. We leave this to a future update of Odyssey.

Finally, we note that Odyssey is developed in the Microsoft Visual Studio environment, and as such it is possible to combine the Odyssey algorithm with DirectX for visualizing the propagation of rays in the Kerr spacetime. Together with the Graphical User Interface, powered by DirectX, we also present a public software package and educational tool Odyssey-Edu, for demonstrating null geodesics around a Kerr black hole.¹⁰

We thank the anonymous referee for useful comments and suggestions that helped improve the manuscript. We thank Steven V. Fuerst for sharing his code, Hsi-Yu Schive for helpful discussions about GPU programming, and K. Akiyama for discussions on spectral data from Sgr A*. H.-Y.P. and Z.Y. are grateful for numerous helpful and stimulating discussions with Kinwah Wu. H.-Y.P. also thanks members of the GLT team for their support and encouragement. H.-Y.P. is supported by the Ministry of Science and Technology (MOST) of Taiwan under the grant MOST 103-2112-M-001-038-MY2. Z.Y. is supported by an Alexander von Humboldt Fellowship and acknowledges support from the ERC Synergy Grant "BlackHoleCam—Imaging the Event Horizon of Black Holes" (Grant 610058). S.-J.Y. acknowledges support from the National Research Foundation of Korea to the Centre for Galaxy Evolution Research (No. 2010-0027910), from the Mid-Career Researcher Program (No. 2015-008049) through the National Research Foundation (NRF) of Korea, and from the Yonsei University Future-leading Research Initiative of 2014–2015. This research has made use of NASA's Astrophysics Data System.

ODYSSEY: A PUBLIC GPU-BASED CODE FOR GENERAL RELATIVISTIC RADIATIVE TRANSFER IN KERR SPACETIME

Article metrics

Permissions

Author affiliations

Author notes

ORCID iDs

Dates

ABSTRACT

1. INTRODUCTION

2. FORMULATION

2.1. Ray-tracing Algorithm

2.2. Initial Conditions

2.3. Radiative Transfer

3. GPU SCHEME

4. BENCHMARK ASTROPHYSICAL CALCULATIONS

4.1. Grid Projection

4.2. Black Hole Shadows

4.3. Keplerian Thin Disk

4.4. Keplerian Hot Spot

4.5. Keplerian Shell

5. TIMING BENCHMARK

6. SUMMARY AND DISCUSSION

Footnotes

ODYSSEY: A PUBLIC GPU-BASED CODE FOR GENERAL RELATIVISTIC RADIATIVE TRANSFER IN KERR SPACETIME

Article metrics

Permissions

Share this article

Author affiliations

Author notes

ORCID iDs

Dates

ABSTRACT

1. INTRODUCTION

2. FORMULATION

2.1. Ray-tracing Algorithm

2.2. Initial Conditions

2.3. Radiative Transfer

3. GPU SCHEME

4. BENCHMARK ASTROPHYSICAL CALCULATIONS

4.1. Grid Projection

4.2. Black Hole Shadows

4.3. Keplerian Thin Disk

4.4. Keplerian Hot Spot

4.5. Keplerian Shell

5. TIMING BENCHMARK

6. SUMMARY AND DISCUSSION

Footnotes