This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.

Efficient Parallel Algorithm for Estimating Higher-order Polyspectra

, , and

Published 2019 August 20 © 2019. The American Astronomical Society. All rights reserved.
, , Citation Joseph Tomlinson et al 2019 AJ 158 116 DOI 10.3847/1538-3881/ab3223

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1538-3881/158/3/116

Abstract

Nonlinearities in the gravitational evolution, galaxy bias, and redshift-space distortion drive the observed galaxy density fields away from the initial near-Gaussian states. Exploiting such a non-Gaussian galaxy density field requires measuring higher-order correlation functions, or, its Fourier counterpart, polyspectra. Here, we present an efficient parallel algorithm for estimating higher-order polyspectra. Based upon the Scoccimarro estimator, the estimator avoids direct sampling of polygons using the fast Fourier transform, and the parallelization overcomes the large memory requirement of the original estimator. In particular, we design the memory layout to minimize the inter-CPU communications, which excels in the code performance.

Export citation and abstract BibTeX RIS

1. Introduction

The primordial density field in the universe measured by the temperature anisotropies and polarizations of cosmic microwave background radiation is very close to Gaussian (Planck Collaboration 2016). In contrast, the observed galaxy density field shows strong non-Gaussianity due to late-time effects such as nonlinear gravitational instability, nonlinear galaxy bias, as well as nonlinear redshift-space distortion.

The analysis of the galaxy density field in the literature is yet focusing on the galaxy two-point correlation function, or its Fourier counterpart, the galaxy power spectrum (Tegmark et al. 2006; Blake et al. 2011; Parkinson et al. 2012; Reid et al. 2012; de la Torre et al. 2013; Pezzotta et al. 2017). At the same time, the interest in the higher-order correlation functions and polyspectra has emerged to complement the two-point statistics in exploiting such a non-Gaussian galaxy density field. In Fourier space, these are, from third order, bispectrum, trispectrum, quadspectrum, pentaspectrum, hexaspectrum, and so on.

Thus far, the analysis on the higher-order correlation functions and polyspectra focuses mainly on the galaxy bispectrum and three-point correlation function (Scoccimarro et al. 2001; Verde et al. 2002; Gaztañaga et al. 2005; Nichol et al. 2006; Nishimichi et al. 2007; Marín 2011; McBride et al. 2011b, 2011a; Marín et al. 2013; Gil-Marín et al. 2015), but they have not been as informative as their lower-order counterparts. This is due to both the high computational cost in estimating higher-order correlation functions and the lack of accurate modeling needed for detailed data analysis.

One challenge in studying the higher-order polyspectrum in Fourier space is to find an efficient estimator; when estimating nth order polyspectrum, the complexity of the naive estimator grows as the number of all possible n-gons in the three-dimensional Fourier space scales as ${N}_{\max }^{3(n-1)}$. Here, Nmax ≡ kmax/δk is the number of one-dimensional discrete grid points of interest, with kmax being the maximum wavenumber of the analysis, usually set by the smallest scale that we can reliably model the nonlinearities in the galaxy density field, and δk being the size of the binning in Fourier space.

One can alleviate this computational cost using the Fourier space representation of the Dirac-delta operator that dictates the statistical homogeneity. Originally done by Scoccimarro (2000) for estimating the bispectrum, in this way, we can estimate a polyspectrum by taking advantage of the fast Fourier transformation algorithm. As we shall outline in Section 2, the calculation of the polyspectrum then boils down to a sum of products of the Nmax distinct three-dimensional Fourier volumes derived from the original density field. This method avoids directly sampling all n-gons and is computationally much faster for all high-order polyspectra.

The implementation of the Fourier-based method, however, requires allocating a large amount of memory. To avoid the aliasing problem associated with the Fourier method, the three-dimensional Fourier grid must be at least (sNmax)3 for the s-point polyspectrum; as we need a total of Nmax for such three-dimensional Fourier volumes, the memory requirement grows as ${s}^{3}{N}_{\max }^{4}$. For example, the memory requirement to estimate the bispectrum (s = 3) with Nmax = 128, when using single precision, is already 27 Gigabytes.

In this paper, we shall improve the performance of the polyspectra estimators by parallelizing them. The most straightforward and naive approach might be to distribute each of the Fourier volumes to a different processor. To compute the polyspectum, then we need to calculate the sum of the multiplication of the Fourier volumes in different processors. This proves to be quite slow and, if not slower, only marginally faster than even the direct calculation from sampling all possible n-polygons. Instead, we distribute the Fourier volumes to parallel nodes so that the summation can be done almost entirely locally. Here, most of the inter-processor communication occurs during the Fourier transform step and the performance is significantly faster than the serial estimator. In this paper, we shall demonstrate the performance of this parallel algorithm for efficient calculation of higher-order polyspectra.

This paper is organized as follows. In Section 2 we discuss the Scoccimarro estimator and its extension to higher-order polyspectra and the limitations of such an extension. In Section 3 we explain our efficient method of parallelization for this estimator. Section 4 analyzes the results of our application of the estimator to a grid of unity, equivalent to calculating the number of polygons, for a variety of high-order polyspectra and also includes an analysis of the scaling of our parallel algorithm. We conclude in Section 5. Appendix A discusses the conversion of our general estimator from integral to summation form for direct application to calculations. Finally, Appendix B discusses some approximately analytic solutions to the number of n-gons in three-dimensional Fourier space.

We use the following convention

Equation (1)

for the Fourier transformation, and

Equation (2)

for the n-point polyspectrum. Here, δD is the Dirac-delta operator, and we use the shorthanded notation of ${{\boldsymbol{k}}}_{1\cdots n}\,\equiv {{\boldsymbol{k}}}_{1}+\cdots {{\boldsymbol{k}}}_{n}$. We denote the amplitude of a vector ${{\boldsymbol{k}}}_{i}$ as ki.

2. The Scoccimarro Estimator

The estimator for polyspectra that we shall parallelize here is based on the Scoccimarro estimator (Scoccimarro 2000, 2015) that we summarize in this section. In this section, we shall mainly discuss the algorithm with the equations in the continuous limit and summarize the discrete implementation in Appendix A.

Let us begin with the bispectrum estimator. The estimator for the monopole bispectrum $B({k}_{1},{k}_{2},{k}_{3})$ takes the average of $\delta ({{\boldsymbol{q}}}_{1})\delta ({{\boldsymbol{q}}}_{2})\delta ({{\boldsymbol{q}}}_{3})$ for all possible combinations of three vectors $| {q}_{i}-{k}_{i}| \lt \delta {k}_{i}/2$ (i = 1, 2, 3) that satisfy the triangle condition (${{\boldsymbol{q}}}_{123}=0$). We denote the bin size in the ith direction as δki, although, throughout the paper, we choose the same bin size for all three directions for simplicity. The extension of the method to arbitrary bin size is trivial. We may formulate the bispectrum estimator as

Equation (3)

with N(123) being the number of triplets $({{\boldsymbol{q}}}_{1},{{\boldsymbol{q}}}_{2},{{\boldsymbol{q}}}_{3})$ satisfying the binning condition ($| {q}_{i}-{k}_{i}| \lt \delta {k}_{i}$, which is denoted as qi ∼ ki) and the triangle condition ${N}_{(123)}={\sum }_{{{\boldsymbol{q}}}_{1},{{\boldsymbol{q}}}_{2},{{\boldsymbol{q}}}_{3}}{\delta }^{{\rm{D}}}({{\boldsymbol{q}}}_{123})$ with the same summation as Equation (3). Throughout the paper, we impose the condition that ${k}_{1}\geqslant {k}_{2}\geqslant {k}_{3}$ to avoid the duplication.

In the continuous limit, the bispectrum estimator becomes (see, Appendix A for the details about the normalization factor)

Equation (4)

where Vf is the volume of a fundamental Fourier cell, (2π)3/V, and ${V}_{(123)}^{B}$ is the Fourier space volume defined as

Equation (5)

The key step for the Scoccimarro estimator is promoting the Dirac-delta operator with its Fourier representation,

Equation (6)

with which we can recast the three integrals for sampling triangles in Equation (4) into a single integration,

Equation (7)

Here,

Equation (8)

is the Fourier transformation of the function ${\tilde{I}}_{{k}_{i}}({\boldsymbol{q}})$ that is defined with the density field in the spherical shell defined by $| q-{k}_{i}| \lt \delta {k}_{i}/2$ and zero otherwise. That is, instead of the time-consuming operation of sampling all possible triangles, the Scoccimarro estimator involves Fourier transformation for computing ${I}_{{k}_{i}}({\boldsymbol{x}})$ at each wavenumber bin, and summing over the product of three ${I}_{{k}_{i}}{\rm{s}}$.

Note that the complexity, or the scaling of the computation time as a function of system size, of the Scoccimarro estimator is the same as the direct summation's scaling of ${N}_{\max }^{6}$ for the case of bispectrum. The Scoccimarro estimator, however, reduces the computation time in two ways. First, we have fewer total numerical operations. The direct sampling involves looping over the full ranges of −Nmax ∼ Nmax for each component of ${{\boldsymbol{k}}}_{1}$ and ±Nmax/2 ∼ ±Nmax for each component of ${{\boldsymbol{k}}}_{2}$, because our convention of ${k}_{1}\geqslant {k}_{2}\geqslant {k}_{3}$ with a triangle condition constrains ${k}_{1}/2\leqslant {k}_{2}\leqslant {k}_{1}$. On the other hand, the Scoccimarro estimator reduces it to the matrix inner product (${N}_{\max }^{3}$ operation) only for the amplitude triplets $({k}_{1},{k}_{2},{k}_{3})$. Second, for the Scoccimarro estimator, the three-dimensional array $\delta ({\boldsymbol{q}})$ is accessed in an ordered manner that is easier for the CPU to cache. In contrast, the direct sampling accesses $\delta ({{\boldsymbol{q}}}_{3})$ in an irregular manner, which is inevitable because the value for ${{\boldsymbol{q}}}_{3}$ is determined by ${{\boldsymbol{q}}}_{3}=-{{\boldsymbol{q}}}_{1}-{{\boldsymbol{q}}}_{2}$. Combining these two factors, the serial version of the Scoccimarro estimator is a factor of 10 faster than the direct sampling in our implementation.

The reduction of operation time is even more striking as we calculate the higher-order correlation functions. When computing the angle-averaged n-point polyspectra that are only a function of the Fourier wavenumbers (Sefusatti 2005), the Scoccimarro estimator's complexity increases only by one more power of Nmax, ${ \mathcal O }({N}_{\max }^{n+3})$ while the direct sampling method's complexity grows by three powers of Nmax at each order, ${ \mathcal O }({N}_{\max }^{3(n-1)})$. Needless to say that the base operation time for the Scoccimarro estimator is much faster than the direct sampling case, as we have already seen in the case of the bispectrum where the two method have the same complexity.

Extending the bispectrum estimator in Equation (7) for the angle-averaged higher-order polyspectra is straightforward:

Equation (9)

with the Fourier volume

Equation (10)

occupied by ${{\boldsymbol{q}}}_{1},\cdots ,\,{{\boldsymbol{q}}}_{n}$, which contribute to the estimation of angle-averaged nth order polyspectrum. Note that we can compute ${V}_{\tilde{n}}$ with the exact same method,

Equation (11)

with

Equation (12)

In this paper, we shall implement Equation (11) to study the performance of the parallel algorithm.

2.1. On the Dimensionality of nth Order Polyspectra

At this point, a cautionary remark on the dimensionality of nth order polygon is in order. While we only consider the angle-averaged polyspectra in this paper, specifying the full shape of the n-point (n ≥ 3) correlation function (or polyspectra, its Fourier counterpart) in the three-dimensional space would require $3n-6$ real numbers. It is because adding one more point to a configuration of (n − 1) points introduces three additional real numbers (the coordinate of the new point relative to the n − 1 points) to specify the configuration of n points. Starting the recursion from bispectrum (or the three-point correlation function) that requires three numbers, therefore, each configuration of the n-point correlation function is specified by $3n-6$ real numbers.

This argument does not apply for the two-point function (n = 2) and the three-point function (n = 3) because of the underlying symmetry. For the two-point correlation function, translational symmetry removes three (we can always set the coordinate of the first point as the origin) and the rotational symmetry removes two (the correlation function must be independent of the orientation of the second point) real dimensions. As a result, we are left with one real dimension for the power spectrum. For the three-point correlation function, the rotational symmetry around the axis defined by existing two points removes one real dimension. As a result, we are left with three real dimensions for the bispectrum.

Along the same reasoning, adding redshift-space distortion (Kaiser 1987) would require only two more real dimensions to the problem, because the coordinate of the newly added point is enough to determine the orientation of the point relative to the existing n − 1 points. That is, from the two angles specifying the orientation of any three points relative to the line-of-sight direction, we can determine the relative orientation of all other points with respect to the line-of-sight direction.

Although we shall not discuss further in this paper, one can also implement the polyspectra estimator with the full dependence using a similar method. For example, in addition to the amplitude of four wavevectors, the general trispectrum also depends on the diagonals ${d}_{1}=| {{\boldsymbol{k}}}_{1}-{{\boldsymbol{k}}}_{2}| $ and ${d}_{2}=| {{\boldsymbol{k}}}_{1}-{{\boldsymbol{k}}}_{3}| $, and takes the form

Equation (13)

One may use, again, Fourier transformation to calculate the double convolution. Note that the result must reduce to the angle-averaged trispectrum when integrating over all possible d1 and d2.

3. Parallelization

The key to the efficient parallelization of the polyspectra estimator is minimizing the inter-CPU communication, and we achieve that using the slab decomposition scheme.

The slab decomposition slices the data cube along a single dimension, generally the fastest varying index, and assigns each chunk to a different process. As each process stores a full range of other dimensions and a limited range of the sliced dimension, this separates the data into slabs of the domain hence the name slab decomposition. We show a visualization of this decomposition in Figure 1. For the specific case of the polyspectra estimator, this decomposition takes the form of decomposing along the fastest varying Fourier grid dimension while the other two grid dimensions and the wavenumber dimension remain fully accessible in each process. For example, for the Julia implementation, we assign each process ${\tilde{I}}_{{k}_{i}}({k}_{1\min }\,:{k}_{1\max },{k}_{{\rm{F}}}\,:{k}_{\max },{k}_{{\rm{F}}}\,:{k}_{\max })$, and for C implementation, we assign each process ${\tilde{I}}_{{k}_{i}}({k}_{{\rm{F}}}\,:{k}_{\max },{k}_{{\rm{F}}}\,:{k}_{\max },{k}_{3\min }\,:{k}_{3\max })$ for all i from 1 to Nmax. As the arrays ${\tilde{I}}_{{k}_{i}}({\boldsymbol{k}})$ are zero except for the shell defined as $| {\boldsymbol{k}}| \sim {k}_{i}$, for which we assign the value from the original data cube $\delta ({\boldsymbol{k}})$, generating Nmax of ${N}_{\max }^{3}$-cube arrays takes only the time that we read through the original data cube. That is, at no point does the generating process need more than a single slab of data at once in memory.

Figure 1.

Figure 1. Visualization of the decomposition scheme. From the Fourier space density contrast field $\delta ({\boldsymbol{k}})$, we construct Nmax Fourier data cubes ${\tilde{I}}_{{k}_{i}}({\boldsymbol{k}})$ for i = 1 to Nmax and distribute the array to m different processes depicted as different gray scale colors. The data cubes ${\tilde{I}}_{{k}_{i}}({\boldsymbol{k}})$ are spread out over each process such that each one only has access to a single contiguous chunk of one index and the full range of the others. For example, the first process stores ${\tilde{I}}_{{k}_{i}}({k}_{{\rm{F}}}\,:{k}_{1\max },{k}_{{\rm{F}}}\,:{k}_{\max },{k}_{{\rm{F}}}\,:{k}_{\max })$ from all ${\tilde{I}}_{{k}_{1}}$ to ${\tilde{I}}_{{k}_{{N}_{\max }}}$. In this distribution scheme, the integration in Equation (9) is done mostly in each local process, and the inter-CPU communication is only required at the last moment when we compute the total sum.

Standard image High-resolution image

This memory distribution is also natural for the Fourier transformation as the Fastest Fourier Transform in the West (FFTW; Frigo & Johnson 2005) implements distributed Fourier transforms through slab decomposition and transposing, allowing for a very natural calculation without a need to shuffle around memory between processes to get the right format. The only other step in the calculation is the product and sum: the product is always able to be done locally within each process and the sum is a trivially parallelizable calculation with very little extra memory usage or communication as each processes calculates a local sum before the main processes collects and finalizes the sum. All together this memory distribution scheme allows us to easily and quickly calculate this estimator with minimal memory footprint and very little inter-CPU communication.

This method must be contrasted with the naïve parallelization that each process contains the full ${\tilde{I}}_{{k}_{i}}({\boldsymbol{k}})$ and ${I}_{{k}_{i}}({\boldsymbol{x}})$ for each wavenumber bin ki. For this alternative case, however, the time saved in the Fourier transform step is dwarfed by the significantly more expensive final step where we calculate sums of products across wavenumbers. As a result, this somewhat more natural decomposition is significantly slower than the slab decomposition.

4. Code Tests

In order to test the accuracy and performance of our implementation of the parallel estimator, we compute the number of polygons satisfying the homogeneity condition and the binning condition

Equation (14)

for which we have present an analytical estimation (see Section 4.1 for details). Here, Vf is the volume of the fundamental cell in the Fourier space, which is related to the fundamental Fourier wavenumber kF by ${V}_{f}={k}_{{\rm{F}}}^{3}$.

In practice, we estimate the number of polygons by setting $\delta ({\boldsymbol{k}})=1$ for all Fourier modes and estimating the angle-averaged polyspectra. Therefore, counting the number of polygons requires exactly the same procedure as estimating the polyspectra.

4.1. Number of Polygons

To obtain the number of n-gons satisfying the binning conditions, we need to evaluate the 3n-dimensional integration in Equation (14). Again, using the Fourier representation of the Dirac-delta operator, we reduce the problem to the three-dimensional integration of multiplications of ${\iota }_{k}({\boldsymbol{x}})$ (Equation (11)), which is

Equation (15)

Using the expression to leading order in the bin size δk, we further reduce the calculation of the 3n-dimensional volume to the one-dimensional integration

Equation (16)

In Appendix B, we find a general expression for the radial integration,

Equation (17)

and apply Equation (17) for the n = 3 and n = 4 cases to find the number of triangles and quadrilaterals, which are given by

Equation (18)

and

Equation (19)

4.2. Result

In this section, we shall present number of polygons that we calculated using the parallelized polyspectra estimator and compare the result with the analytical estimation that we find in Section 4.1. We shall show the result for triangles (n = 3), quadrilaterals (n = 4), pentagons (n = 5), and hexagons (n = 6) that are estimated using, respectively, the parallel estimators of bispectrum, trispectrum, quadspectrum, and pentaspectrum.

To show the full shape dependence of the number of polygons as a function of their side length (wavenumbers), we need to visualize them in n-dimensional space of $({k}_{1},{k}_{2},\cdots ,\,{k}_{n})$. For our purpose, however, to compare the outcome of the parallel computation and analytical estimation, it suffices to flatten n-dimensional data points to the one-dimensional ordered sequence. Following the convention in McCullagh et al. (2016), we impose the condition ${k}_{i}\geqslant {k}_{i+1}$ to avoid duplication and form the sequence by taking the row-major order of the n-dimensional points (kn is the fastest varying dimension).

In all comparison plots that we present here, the x-axes show the index of the n-dimensional point $({k}_{1},\cdots ,\,{k}_{n})$ in the flattened sequence, and y-axes show the number of polygons whose n sides are $({k}_{1},\cdots ,\,{k}_{n})$. We use blue to show the numerical result and orange to show the analytical estimation.

First, we show the number of triangles using the parallel bispectrum estimator in Figure 2. The left panel shows the numerical result (blue) and analytical prediction (Equation (18)), while the middle and right panels show the absolute number (middle) and fractional (right) differences. On large scales (or small indices), we expect that the analytical estimation is not accurate because the bin's size is of order the wavenumber (ki ≃ δk). For small scales, although the absolute residuals increases with wavenumbers (the lowest order error term goes as $\prod {k}_{i}$ just like the Ntri in Equation (18)), the fractional residual between the numerical and analytical estimation are consistently off by 10% ∼ 20%, and the biggest differences happen for the folded triangles (k1 = k2 + k3). This result suggests that the variance of bispectrum estimated using Equation (18; for example, in the Fisher information forecast in Desjacques et al. 2018) must be off by the same factor. For an accurate estimation of the variance of the bispectrum, one must use the numerically estimated number of triangles.

Figure 2.

Figure 2. Left: the number of triangles satisfying the binning conditions measured from the parallel bispectrum estimator. The blue curve is the output from the numerical estimation, and the orange curve is according to our analytical estimation in Equation (18; see Appendix B for the derivation). Middle: the difference between the numerical measurement and theoretical estimation. Right: the fractional difference between the two. Besides the largest scales (earlier indices), which we expect the analytical estimation breaks down, the analytical estimation in Equation (18) is good to about 20% accuracy.

Standard image High-resolution image

We perform the same analysis for the number of quadrilaterals using the parallel estimator for trispectrum. The left panel of Figure 3 shows the comparison between the numerical result and the analytical estimation in Equation (19). It turns out that the biggest discrepancies are for the colinear quadrilaterals (${k}_{1}={k}_{2}+{k}_{3}+{k}_{4}$) where the analytical formula, Equation (19), predicts zero. To remedy the situation, we add the exact corrections for the colinear quadrilaterals, Equation (59), and the analytical estimate indeed matches the numerical calculation better, as shown in the middle panel of Figure 3.

Figure 3.

Figure 3. Left: the number of quadrilaterals satisfying the binning conditions measured from the parallel trispectrum estimator. The blue curve is the output from the numerical estimation, and the orange curve is according to our analytical estimation in Equation (19). Middle: the same numerical result (blue) along with the improved analytical calculation, using the exact expression for ιk(x) (see, Equation (57)) and the calculated correction for the colinear quadrilaterals as in Equation (59). Right: the fractional difference between the two curves in the middle panel. The small-scale (large wavenumber) discrepancies are consistently about 40%.

Standard image High-resolution image

We can easily extend the parallel estimator for the bispectrum and trispectrum to measure any n > 4 polyspectra. We extend the estimator to n = 5 and n = 6 order polyspectra, and show the result in Figures 4 and 5, respectively. In all cases the fractional errors between numerical calculation and analytical estimate are dominated by a collection of 100% error terms. These occur, just as the colinear case for the quadrilaterals, because the leading-order approximation in the analytical calculation yields zeros when the actual calculation yields small but non-zero numbers. This happens for all n > 4 cases, and it further reinforces the need for manually calculating the number of polygons when estimating the polyspectra, or estimating their variance.

Figure 4.

Figure 4. Flattened number of pentagons/angle-averaged quadspectrum of unity. The orange is according to the number predicted by Appendix B and the blue curve is the output of the estimator run on a grid of unity on a spherical shell. The quadspectrum shows very similar error behavior to the trispectrum and for much of the same reasons that it cannot be practically used when calculating the actual quadspectrum.

Standard image High-resolution image
Figure 5.

Figure 5. Flattened number of hexagons/angle-averaged pentaspectrum of unity. The orange is according to the number predicted by Appendix B and the blue curve is the output of the estimator run on a grid of unity on a spherical shell. The pentaspectrum displays near identical error behavior to the other high-order polyspectra.

Standard image High-resolution image

4.3. Code Performance

In this section, we present the performance of the parallel algorithm. All of our benchmarking work was done with 2.2 GHz Intel Xeon Processors on nodes with 40 CPU/node, 1 TB of RAM, 10 Gbps Ethernet, and FDR Infiniband.

We have implemented two versions of the parallel bispectrum estimator, one using C and the other using Julia. These two pieces of software perform the same algorithm. In Figure 6, we compare the performance of the two implementations. Overall, as a function of domain length, the size of one-dimensional Fourier grid N, and the maximum wavenumber Nmax = kmax/kF, the performance scales as ${ \mathcal O }({N}^{3}{N}_{\max }^{n})$, as we have analyzed in Section 2. If we apply the aliasing condition, which requires that $N\gt {{nN}}_{\max }$, we can rewrite the scaling as ${ \mathcal O }({n}^{3}{N}_{\max }^{n+3})$.

Figure 6.

Figure 6. Total time spent calculating the bispectrum vs. the length of one grid dimension for 40 processors. It roughly follows the ${ \mathcal O }({N}^{3})$ one would expected given the sum in the algorithm with fixed Nmax, modulo the overhead dominated small grid sizes. Each grid size was tested 50 times and then averaged. While the Julia version is slightly slower than the C version, the time difference is relatively consistent with around a constant 30 s offset for small grid sizes and is mostly negligible for large grid sizes given the overall length of the computation.

Standard image High-resolution image

The C implementation (blue) outperforms the Julia implementation (orange) for all domain lengths. The improvement, however, stays constant at about 30 s offset for reasonable grid sizes. As it is easier to read and modify the Julia version, we have extended the software to the higher-order polyspectra estimator only using Julia.

Figure 7 shows the performance of our Julia implementation as a function of the number of processors. Here, we use the domain size of N = 256 (left panel) and N = 512 (right panel). Initially, adding more CPUs improves the performance and reduces the execution time, but the improvement stops at some point. In fact, when the grid size per CPU is sufficiently small then adding more processes actually slows down the performance; the increased overhead time cancels the small gains in extra discretization.

Figure 7.

Figure 7. Left: the scaling of an N = 256 length grid over various numbers of processes. There are clear gains when initially increasing the number of processes but at a certain point overhead begins to dominate causing a slight slowdown. Right: the scaling of an N = 512 length grid over various numbers of processes. It takes more processes to reach the top speed, and the relative speedup between the maximum and the Ncpu = 4 case is only a factor of 2 compared to the factor of 4 for the N = 256 case. This is because the overhead scales very poorly with the grid size. Each number of processes was tested 50 times and then averaged.

Standard image High-resolution image

How long does it take to estimate the bispectrum? In order to avoid the aliasing effect, the grid size must satisfy $N\gt 3{N}_{\max }$ for estimating the bispectrum. Therefore, probing the bispectrum up to Nmax ≃ 340 requires the one-dimensional Fourier grid of N = 1024; we can estimate the bispectrum in 20 minutes using 40 cores.

For a more concrete example, the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX) survey (Hill et al. 2008) has kF ≈ 0.0045 h/Mpc. Measuring the galaxy bispectrum to kmax = 0.2 h/Mpc only requires Nmax = 45. With the parallel version of the Scoccimarro estimator, we can measure the galaxy bispectrum for HETDEX in ∼40 s with 2 GB of memory distributed over 40 processors. Here, we did not run the FFTW in place, and kept both ${I}_{{k}_{i}}({\boldsymbol{x}})$ and ${\tilde{I}}_{{k}_{i}}({\boldsymbol{q}})$. Similarly, the Wide Field Infrared Survey Telescope (WFIRST; Spergel et al. 2015) has kF ≈ 0.0028 h/Mpc corresponding to ${N}_{\max }=75$ for kmax = 0.2 h/Mpc. For our implementation over 40 processors this corresponds to ∼50 s and 11 GB of memory distributed over 40 processors to calculate the galaxy bispectrum for WFIRST.

With the scaling that we have discussed in Section 2, we can extend the previous example to find a general relationship. For a cosmological survey with volume V and maximum wavenumber kmax, we calculate the number of one-dimensional sampling points as

Equation (20)

Because the performance of the Scoccimarro estimator scales as ${ \mathcal O }({N}_{\max }^{6})$, the estimated computing time becomes

Equation (21)

over 40 processors. For the higher-order polyspectra, we can extend this relationship by

Equation (22)

again, over the 40 processors. This relationship matches with the execution time for the polyspectra that we have measured for generating the figures in this paper.

5. Conclusion

In this paper, we present a novel parallel algorithm for estimating polyspectra of a complex field given on three-dimensional regular grids. While inheriting the efficiency of the Scoccimarro estimator, the parallel algorithm alleviates its rather stringent memory requirement by distributing the array into multiple processors. In addition, the key for efficient parallization is the slab decomposition scheme that maintains only a low level of inter-CPU data communication, avoiding massive data transportation.

Although we have only presented the case for the monopole (angle-averaged) polyspectra as a function of their side lengths (that is, the wavenumbers, ki), for estimating the redshift-space galaxy bispectrum, one can easily extend the method to the angular multipoles by applying the slab decomposition to the multipole-weighed fields ${\delta }_{n}({\boldsymbol{q}})$ as described in Scoccimarro (2015). Beyond the galaxy bispectrum, measuring the polyspectra with the full (3n − 6) parameter dependencies requires a little more elaboration, as the calculation involves convolution instead of a matrix inner product (Equation (13)). We shall leave this as a future project.

With the Julia implementation, we have demonstrated that for the galaxy surveys such as HETDEX and WFIRST, we can estimate the galaxy bispectrum down to k ≃ 0.2 h/Mpc in about a minute, and even higher-order polyspectra in timescales of less than an hour. With these execution timescales, it is feasible to study the high-order polyspectra by directly measuring them from the massive mock galaxy catalogs (Monaco et al. 2013; Kitaura et al. 2014; Avila et al. 2015; Izard et al. 2016; Agrawal et al. 2017; Munari et al. 2017; Taruya et al. 2018). If it is not for their own sake, studying the galaxy trispectrum (the four-point correlation function in Fourier space) and the galaxy pentaspectrum (the six-point correlation function) shall certainly elucidate the study of covariance matrix for, respectively, the galaxy power spectrum and the galaxy bispectrum.

We have stored our Julia implementation for the n-point polyspectra at https://github.com/JosephTomlinson/PolyspectrumEstimator, and those who are interested in the C implementation for the bispectrum estimator may contact the authors.

We thank Emiliano Sefusatti for useful discussions. D.J. and J.T. were supported at Pennsylvania State University by NSF grant (AST-1517363) and NASA ATP program (80NSSC18K1103).

Appendix A: Polyspectrum Estimator in Discrete Fourier Transformation

In the main manuscript, we have presented the polyspectra estimator in an contiguous limit as an integral form. In the practical implementation, we have converted the estimator into a discrete summation as we present here. For our discrete implementation, following the FFTW convention, we use the unnormalized discrete Fourier transform and explicitly pull out all normalization factors. For more details, authors refer to Chapter 7 of Jeong (2010). Here, we note the discrete version of variables by square bracket $[{\boldsymbol{x}}]$.

Let us first summarize the continuous version of the polyspectra estimator. Starting with the integral form of the general estimator Equation (9),

Equation (23)

We convert the Dirac-delta operator into its Fourier representation form,

Equation (24)

which, when inputed back into the estimator, results in

Equation (25)

where

Equation (26)

is the Fourier Transform of ${I}_{{k}_{i}}({\boldsymbol{q}})$ which is defined to be the same as $\delta ({\boldsymbol{q}})$ within a spherical shell with a Fourier space radius of $| {\boldsymbol{q}}| \simeq {k}_{i}$, and zero otherwise.

In the FFTW convention of unnormalized discrete Fourier transform, we find

Equation (27)

with which we get a partially discretized estimator of

Equation (28)

We then fully discretize the estimator by converting the final integral to a sum:

Equation (29)

Finally, the quantity in parenthesis $\tfrac{{V}_{f}^{n-1}}{{V}_{\tilde{n}}}$ is the reciprocal of the number of polygons, and we can compute them by applying the same polyspectra estimator to a Fourier grid of unity. Note, however, that the normalization is different here.

Equation (30)

where we define ${\tilde{\iota }}_{{k}_{i}}[{\boldsymbol{q}}]$, analogous to ${\tilde{I}}_{{k}_{i}}[{\boldsymbol{q}}]$ above, as the Fourier grid which takes unity within the spherical shell bounded by $| k-{k}_{i}| \lt \delta k/2$ and zero elsewhere.

Appendix B: Approximate Analytic Number of Polygons

In the polyspectra estimator, we have the normalization constant $\tfrac{{V}_{\tilde{n}}}{{V}_{f}^{n-1}}$, which is the number of n-gons that satisfy the closing condition (Dirac-delta) for a given set of side-length wavenumber ki. In this appendix, we shall present the analytical approximation to this quantity.

Using the Dirac-delta (Equation (24)), we rewrite the 3n-dimensional volume ${V}_{\tilde{n}}$ as

Equation (31)

for which we can compute the each qi integration as

Equation (32)

Combining the two results, we reduce the number of n-gons to one-dimensional integration as

Equation (33)

which is an analytically tractable integral, given the assumptions that ${k}_{i}\geqslant {k}_{i+1}$ (to uniquely define the polygon), and the polygon condition that $({\sum }_{i\ne j}{k}_{i})\geqslant {k}_{j}$.

In what follows, we shall show the calculations. First, we show the general strategy of integrating ${{ \mathcal I }}_{n}$ in Appendix B.1, which is followed by the explicit result for the n = 3 (Appendix B.2) and n = 4 (Appendix B.3) cases. We then present the calculation for the colinear case, ${k}_{1}={k}_{2}+\cdot \,+\,{k}_{n}$, where the approximation in Equation (32) results as zero for all n > 4 polygons. We therefore show the explicit expression without approximation in Equation (32) in Appendix B.4.

B.1. The Analytical Calculation: General Case

To simplify the notation, let us define following function

Equation (34)

where the integrand is an even function of x. Here, we show the general strategy of solving for the function ${{ \mathcal I }}_{n}$. First, we use the identity

Equation (35)

to transform the integration as

Equation (36)

We then use

Equation (37)

to convert the integrations in terms of the Dirac-delta operator,

Equation (38)

That is,

Equation (39)

Going beyond this result requires the conditions to form polygons. For general polygons in three-dimensional there is only one condition: the longest side must be less than or equal to the sum of all the other sides. If we do not want to be bothered by the ordering, we can simply state as following. For any $j\in [1,n]$,

Equation (40)

With this condition, we can see that there must be a solution for ${\mu }_{i}\in (0,1)$ satisfying

Equation (41)

For later use, we find the following identity for the integration of the Dirac delta useful:

Equation (42)

Here ${\rm{\Theta }}({x}_{1},{x}_{2},\cdot ,\,{x}_{n})$ is a multi-dimensional Heaviside-Theta function, which is 1 only if none of the xi are not positive.

B.2. The Analytical Calculation for Triangles (n = 3)

For the bispectrum (n = 3), Equation (39) reduces to

Equation (43)

Without a loss of generality, let us order the wavenumbers so that k1 ≥ k2 ≥ k3. Then, from the polygon inequality in Equation (40), the first term becomes

Equation (44)

and the second term is

Equation (45)

Therefore,

Equation (46)

and

Equation (47)

B.3. The Analytical Calculation for Trispectrum

For the trispectrum (n = 4), Equation (39) reduces to

Equation (48)

Again, without a loss of generality, we can order the wavenumbers so that k1 ≥ k2 ≥ k3 ≥ k4.

We first consider the case where ${k}_{1}={k}_{2}+{k}_{3}+{k}_{4}$. The integration then becomes

Equation (49)

That is, the number of trispectrum using Equation (32) vanishes for the colinear quadrilaterals. We present the expression without using the approximation in Appendix B.4.

For the other cases, k1 < k2 + k3 + k4, we integrate ${{ \mathcal I }}_{4}$ as follows. The first Heaviside-Theta function vanishes unless

Equation (50)

so that we integrate the first term as

Equation (51)

From Equation (40), we know that ${k}_{1}\gt -{k}_{2}-{k}_{3}+{k}_{4}$ and −k1 < k2 − k3 + k4, and, of course, k2 − k3 + k4 > −k2 − k3 + k4. Therefore,

Equation (52)

The second Heaviside-Theta function vanishes unless

Equation (53)

Using that, we find, from Equation (40), k1 < k2 + k2 + k4 and −k1 < −k2 + k3 + k4, and of course, −k2 + k3 + k4 < k2 + k3 + k4. Therefore,

Equation (54)

Combining all, we find that

Equation (55)

We then calculate the VT as

Equation (56)

B.4. Exact Analytic Colinear Case

For the higher-order polyspectra, the lowest order theory predicts zero polygons for the colinear case ${k}_{1}={\sum }_{2}^{n}{k}_{i}$, but this is clearly not the case through either direct summation tests or estimation through our method. Simply adding a few higher-order terms in the series of Equation (32) causes the integration to diverge so we instead need to solve the exact case directly.

The full calculation for the integration that we approximate in Equation (32) yields

Equation (57)

Restricting to the colinear case, we calculate the Fourier volume integration for the bispectrum as

Equation (58)

and for the trispectrum

Equation (59)

using the Mathematica package.

For the bispectrum case Equation (58) differs only about 1% from the lowest order estimation. For trispectrum, where the lowest order approximation predicts zero polygons, Equation (59) dramatically improves the match between the numerical calculation and analytical estimation, as we show in Figure 3.

Please wait… references are loading.
10.3847/1538-3881/ab3223