Maps of the Number of H i Clouds along the Line of Sight at High Galactic Latitude

G. V. Panopoulou; D. Lenz

doi:10.3847/1538-4357/abb6f5

1. Introduction

The ability to reconstruct the 3D distribution of matter in the Galactic interstellar medium (ISM) is important for astrophysics and cosmology. 3D maps inform us about nearby Galactic spiral structure (e.g., Rezaei et al. 2018, and references therein), dust evolution (e.g., Schlafly et al. 2017), star-forming regions (e.g., Zucker et al. 2019), and the ISM magnetic field (e.g., Van Eck et al. 2017). Such maps are also necessary for accurately modeling polarized dust emission (e.g., Martínez-Solaeche et al. 2018), which acts as a dominant foreground contaminating the polarization of the cosmic microwave background (CMB; e.g., Planck Collaboration et al. 2016).

In recent years 3D mapping capabilities have improved drastically, largely due to the availability of large, high-accuracy photometric data sets (e.g., Pan-STARRS, Kaiser et al. 2002) and information on stellar distances from Gaia (Gaia Collaboration 2016). Most 3D maps exploit the differential reddening of stars located at various distances to infer the 3D distribution of intervening ISM dust (e.g., Marshall et al. 2006; Lallement et al. 2014, 2019; Green et al. 2015, 2018, 2019; Capitanio et al. 2017; Chen et al. 2019; Leike & Enßlin 2019). A different, more classical method uses kinematic information from molecular (CO) and atomic (H i) line emission as a tracer of the ISM density and relies on a model for the Galaxy's rotation curve to locate clouds as a function of distance (e.g., Blitz 1979; Kulkarni et al. 1982; Levine et al. 2006; Wenger et al. 2018). It is possible to combine these two approaches, as demonstrated by Tchernyshyov & Peek (2017), to construct a 4D map of ISM clouds in the Galactic plane. These recent improvements in mapping complement our existing large-scale view of Galactic spiral arms (obtained by astrometric measurements of masers in star-forming regions; see Reid et al. 2019) by revealing the solar neighborhood structure in detail.

Existing 3D dust maps have complementary strengths (e.g., higher angular resolution vs. absence of finger-of-God effect; see comparison in Green et al. 2019). However, all share a common problem: they perform best at lower Galactic latitudes, whereas cosmology experiments avoid these high-dust emission regions. Maps that do cover regions far from the Galactic plane (such as those by Green et al. 2019; Lallement et al. 2019) do not detect reddening toward many high Galactic latitude sight lines. This may arise partly from photometric uncertainties, which are comparable to the reddening in such regions (e.g., Lenz et al. 2017). The maps also do not extend to large distances at high Galactic latitude. The kinematic distance method does not apply in these parts of the sky, since cloud motions are affected by processes other than Galactic rotation (e.g., Westmeier 2018).

This shortcoming is particularly troublesome for CMB science and the search for primordial B-mode polarization of the CMB (Kamionkowski et al. 1997; Seljak & Zaldarriaga 1997). Experiments that are searching for this signal target regions at high Galactic latitude where the Galactic foregrounds, such as dust and synchrotron emission, are lowest (e.g., BICEP2 & Keck Array Collaborations et al. 2015). In order to separate the Galactic dust from the cosmological signal, this foreground emission must be modeled to great accuracy. An important challenge in this modeling comes from the fact that we do not know a priori the level of complexity of the ISM in these regions (e.g., as discussed by Rocha et al. 2019). The observed signal could arise from multiple dust components (Tassis & Pavlidou 2015), each with different properties (composition, temperature, magnetic field orientation). A large effort is underway to understand the effects of assumed dust models on the retrieval of cosmological parameters (e.g., Remazeilles et al. 2016; Poh & Dodelson 2017; Thorne et al. 2017; Hensley & Bull 2018). Innovative, data-driven approaches are being developed to characterize the ISM properties (Clark et al. 2015; Clark 2018; Philcox et al. 2018; González-Casanova & Lazarian 2019; Zhang et al. 2019). Recent studies try to construct realistic realizations of dust foregrounds using 3D dust maps (Martínez-Solaeche et al. 2018) or H i data (Ghosh et al. 2017; Clark & Hensley 2019; Lu et al. 2020; Adak et al. 2020; Hu et al. 2020).

When modeling dust in 3D, studies typically assume a model in which dust emission arises from discrete layers of dust along the line of sight (Planck Collaboration 2016a). A key unknown in the modeling is the number of such layers that occupy a single pixel on the sky. Some studies assume an arbitrary value for the number of layers (clouds) along the line of sight (e.g., Tassis & Pavlidou 2015; Hensley & Bull 2018). Other studies adopt a more physically motivated assumption about the number of clouds per pixel. For example, Ghosh et al. (2017) and Adak et al. (2020) assume three dust layers, each arising from a different gas phase of the ISM. The approach of Martínez-Solaeche et al. (2018) assumes that each distance bin in the 3D dust map of Green et al. (2019) is a different dust layer.

In this work we aim to measure the number of clouds along each line of sight—a critical but missing input to CMB foreground modeling. We focus on the high Galactic latitude sky, where the 3D reconstruction of the dust distribution based on stellar reddening is incomplete. This difficulty can be surpassed by the use of an indirect but more sensitive tracer of dust: H i line emission. The H i column density correlates well with dust in the diffuse ISM (e.g., Boulanger et al. 1996; Planck Collaboration 2014; Lenz et al. 2017). In contrast to existing 3D dust maps, H i data show detection of the H i line throughout the sky—there is no sight line free of H i emission (HI4PI Collaboration et al. 2016).

Though the H i line does not directly provide distances, important knowledge can be acquired through the kinematic information it carries. ISM clouds that lie at different distances likely have different kinematic properties, thus appearing as distinct kinematic components in H i spectra. This is especially true for the high Galactic latitude sky, where the H i emission can be cleanly separated into three classes of clouds: low-, intermediate-, and high-velocity clouds (LVCs, IVCs, HVCs) based on their radial velocities with respect to the local standard of rest (v_LSR; e.g., Wakker 1991; Kuntz & Danly 1996).

Various approaches have been adopted to separate such kinematically distinct structures in H i spectra. Haud (2008) decomposes H i spectra into a set of Gaussian components (GCs) using all-sky data from the Leiden/Argentine/Bonn survey (Kalberla et al. 2005). They identify IVCs and HVCs by searching for overdensities in the parameter space of all GCs found across the sky. Within smaller sky regions, Planck Collaboration (2011) and Martin et al. (2015) define the velocity range occupied by LVCs, IVCs, and HVCs based on the standard deviation of the H i spectrum. Murray et al. (2019) construct smoothed H i spectra toward the SMC by use of a Gaussian kernel and measure the number of peaks in the resulting spectra (e.g., Martin et al. 2015).

In this work we develop a method for cloud identification that builds on the complementary strengths of the aforementioned studies: it (a) locates overdensities in GC parameter space and (b) operates locally, within small subregions of the sky. The problem of identifying clouds by such means has been extensively addressed in the case of CO emission (e.g., Williams et al. 1994; Rosolowsky et al. 2008). Recently, Miville-Deschênes et al. (2017, hereafter MML17) developed a two-step method for cloud identification in CO spectral cubes. The MML17 method is based on a Gaussian decomposition of the data. It uses a hierarchical cluster analysis to identify clouds that are distinct in position–position–velocity (PPV) space. The MML17 method was applied to the Dame et al. (2001) CO survey and resulted in a cloud catalog of unprecedented completeness.

The nature of the molecular medium allows for a well-defined process for cloud identification: CO is much more concentrated in space than H i and allows one to draw boundaries to isolate distinct clouds (e.g., Williams et al. 1994). Due to the much more diffuse nature of the atomic medium, a method that aims to define cloud boundaries, such as that of MML17, is not generally applicable to H i. Exceptions include cases of very distant and/or compact clouds (e.g., Wakker & van Woerden 1991; Saul et al. 2012; Moss et al. 2013), or restricting the analysis to the most cold and narrow H i components, as done by Haud (2010). However, we can use the basic principles of the MML17 approach for the specific problem of identifying distinct low-, intermediate-, and high-velocity components far from the Galactic plane that lie along the same line of sight. We develop a method that makes use of a Gaussian decomposition (as in many of the aforementioned works) but that only seeks to identify clouds within the same sight line and uses limited spatial information from neighboring pixels for this task.

We briefly describe the Gaussian decomposition in Section 2 and defer a detailed presentation to a separate paper (D. Lenz 2020, in preparation). The method developed here takes this decomposition as input and searches for prominent velocity components within groupings of neighboring pixels (as described in Section 3). Preprocessing steps and validation are described in Appendices A and B. We apply our method to regions of interest for CMB polarization studies: the sky at high Galactic latitude, and the region targeted by the BICEP/Keck experiment (Section 4). We discuss our results in relation to CMB foreground analysis, as well as the limitations of our method, in Section 5. The software and resulting data products are made publicly available and are presented in Section 6. We summarize in Section 7.

2. Data and Gaussian Decomposition

We use the all-sky survey of the H i line presented in HI4PI Collaboration et al. (2016) (HI4PI Survey). The data set has an angular resolution of 16.2 arcmin and is sampled on a HEALPix grid of N_side = 1024. It merges data from the Effelsberg-Bonn H i Survey (EBHIS; Winkel et al. 2010; Kerp et al. 2011; Winkel et al. 2016) and the Galactic-All-Sky Survey (GASS; McClure-Griffiths et al. 2009; Kalberla et al. 2010; Kalberla & Haud 2015) to create a full-sky database of Galactic atomic neutral hydrogen. The velocity range is (−470, 470) km s⁻¹ for the part of the sky covered by the southern hemisphere survey and (−600, 600) km s⁻¹ for the northern hemisphere survey, and the spectral resolution is 1.49 km s⁻¹ (the channel separation is 1.29 km s⁻¹).

Each spectrum can be decomposed into a set of Gaussian basis functions, yielding a compressed description of the data (e.g., Haud 2000). This approach has been adopted by many previous works for studying the properties of different ISM phases (e.g., Haud & Kalberla 2007; Roy et al. 2013; Lindner et al. 2015; Murray et al. 2017; Kalberla & Haud 2018; Marchal et al. 2019; Riener et al. 2019), as well as detecting different classes of clouds (e.g., Haud 2008, 2010).

D. Lenz (2020, in preparation) created a Gaussian decomposition of this data set that is publicly available. For each spectrum, the intensity I as a function of velocity v is decomposed in a basis of the form

$\begin{eqnarray}&&I(v)=\displaystyle \sum _{i}^{N}{A}_{i}\,\displaystyle \frac{1}{{\sigma }_{i}\sqrt{2\pi }}\,{e}^{\tfrac{{\left(v-{v}_{0,i}\right)}^{2}}{2{\sigma }_{i}^{2}}},\end{eqnarray} \tag{ 1 }$

where N is the number of GCs, ${v}_{0,i}$ is the centroid velocity of the ith GC, A_i is its amplitude, and ${\sigma }_{i}$ is its standard deviation. The spectrum is fit approximately 100 times with different numbers of components and initial parameters. The best compromise between complexity and goodness of fit is then selected via the Bayesian information criterion. The priors used for all three parameters of the GCs are uniform. A is constrained so that the column density for each GC is within the range $[5\times {10}^{18},5\times {10}^{22}]\,{\mathrm{cm}}^{-2}$ . The centroid velocity, ${v}_{0}$ , must lie within the HI4PI band. Finally, σ is related to the H i kinetic temperature and is constrained to be within the range $[50,4\times {10}^{4}]\,{\rm{K}}$ . The model is fit to the velocity range [−300, +300] km s⁻¹ and results in high-quality fits. We define the relative residual as the ratio of the difference between the model column density and the total column density, ${\rm{\Delta }}{N}_{{\rm{H}}{\rm\small{I}}}$ , over the total column density of the pixel, N_{H i}. The mean relative residual, is 1%, while 65% of pixels have a relative residual less than 5%.

Figure 1 shows examples of the Gaussian decomposition for three neighboring pixels at high Galactic latitude. The original spectrum in each pixel is fit well by the sum of the GCs. The three pixels show similar spectra but have been fit by a different set of GCs (the number of components varies from 7 to 9). This is to be expected, since Gaussians do not form an orthogonal basis. The nonuniqueness of the fit gives rise to differences in the description of emission spectra that may not reflect actual changes in ISM properties (e.g. as discussed by Marchal et al. 2019). Such differences can be reduced by imposing spatial coherence criteria when performing the decomposition (see Marchal et al. 2019), with the disadvantage that this can be computationally expensive, especially for large sky areas. In the following section, we describe a method that overcomes this difficulty by combining the Gaussian decompositions from multiple neighboring pixels.

3. Cloud Identification

We define a cloud as a distinct peak appearing in the H i spectrum. Peaks are often fit by multiple GCs, the properties of which can vary between neighboring pixels (Figure 1). However, if Gaussians from multiple neighboring spectra appear to cluster around a certain region of parameter space (specifically in velocity space), then this is a strong indication that they are probing the same peak in the spectrum—a cloud. This idea of searching for a "consensus" between adjacent pixels was used, for example, in the method of MML17 and in Haud (2010) to identify clouds in PPV space. Here we adopt a different approach: we search for a "consensus" of GCs only in velocity space and consider GCs within a specified area, termed a superpixel.

We collect information from multiple spectra by segmenting the sky into large superpixels, sampled on the HEALPix grid (Górski et al. 2005; Zonca et al. 2019). The choice of superpixel size is a free parameter in the method. We have found that the method works well when there is a statistically significant number of GCs in each superpixel. For illustrative purposes, we choose a superpixel size of N_side = 128. Each superpixel thus contains 64 pixels of the HI4PI data. This superpixel size corresponds to 0 fdg 46 on each side, comparable to the angular resolution of the Lite satellite for the studies of B-mode polarization and Inflation from cosmic background Radiation Detection (LiteBIRD) at CMB frequencies (Sugai 2020). We investigate the effect of slightly varying the N_side parameter in Appendix B.

We remove Gaussians associated with known artifacts. The part of the HI4PI survey that is based on the EBHIS data (northern hemisphere) exhibits residual stray radiation patterns, which are inevitably fit by the Gaussian decomposition. We identify regions where this effect is more pronounced and remove the associated GCs as explained in Appendix A. We also remove all GCs within the 13' beam centered on (l, b) = (209 fdg 018, −19 fdg 37), where the signal drops significantly to negative values.

Now that we have a set of GCs that is free of artifacts, we proceed to analyze the distribution of mean velocities, ν₀, within each superpixel (Figure 2). We estimate the probability density function (pdf) of ν₀ using a kernel density estimator (KDE) of Gaussian shape. The size of the kernel is selected through validation tests described in Appendix B. The selected kernel standard deviation is four channels wide (5.1 km s⁻¹).

**Figure 2.** Method of cloud identification. (a) Distribution of mean velocity of all GCs within a superpixel, after applying the quality cuts described in the text. (b) KDE of the distribution in panel (a) (red line). Maxima (black upward-pointing triangles) and minima (open downward-pointing triangles) of the pdf, as well as maxima of the second derivative (open black circles), are used to define the location and velocity range of each peak. Peaks containing fewer than 20 GCs are later discarded (marked as open upward-pointing triangles here). The velocity range of each peak is shown with a differently colored vertical band. (c) Individual Gaussians colored according to the peak they belong to. (d) Mean spectrum (average of GCs) for each peak.
Download figure:
Standard image High-resolution image

Next, we identify local maxima in the constructed pdf following the procedure described in Lindner et al. (2015). We search for local minima of negative curvature (bumps), i.e., locations where all the following conditions are met:

1.
The third derivative of the pdf changes sign.
2.
The second derivative is negative.
3.
The fourth derivative is positive.

Figure 3 shows the pdf of ν₀ and its derivatives up to fourth order in an example superpixel. Local maxima in the pdf indicate the presence of a "consensus": multiple Gaussians are tracing the same velocity component.

We now wish to assign each Gaussian to its corresponding peak. For this we must estimate the extent of each peak (the range of velocities belonging to each local maximum). To decide where a local maximum begins and ends, we find the following three types of points:

I.
Local minima of the pdf (sign change of the first derivative coincident with positive value of the second derivative).
II.
Points where the pdf is null (in practice, where its value is less than 10⁻⁴ of the maximum, meaning that there are no GCs at these velocities).
III.
Points where the second derivative is maximum.

For each local maximum, we find the nearest point of any of the aforementioned types on either side of the maximum. We now have a velocity range assigned to each maximum (see Figure 2, panel (b)). Figure 2 shows an example pdf with these maxima and velocity ranges. In this example, the right border of the peak at −30 km s⁻¹ is a local minimum. The right border of the peak at −4 km s⁻¹ is a point of type III and coincides with the left border of the peak at +20 km s⁻¹. Type III points are useful when there is no minimum between two peaks. In cases where a point of type I or II is next to a point of type III with no maximum between them, points of the former types take precedence in border placement. This ensures that there are no intervals where GCs exist but were not assigned to a peak.

It is important to note that not all peaks should be considered as clouds. First, when evaluating the pdf, edge effects can cause peaks at the beginning and end of the velocity range. We control for this by adding padding to the velocity axis prior to evaluating the pdf. Second, it is possible that a very small number of GCs are assigned to a given peak (for example, the leftmost peak in Figure 3). If there are fewer than 20 GCs that are associated with this peak, we discard the peak. We also discard clouds that cover less than 1 beam size (approximately 3 N_side = 1024 pixels), which should remove point sources from our data set.

We validate the code by testing it on mock data as presented in Appendix B. We find that the method is able to recover clouds at the correct velocity for over 85% of cases with a false-positive rate of less than 10%. We have also checked that our method performs well in assigning the majority of the H i emission to clouds. Using the HI4PI data at high Galactic latitude, we demonstrate in Appendix B that only a negligible number of pixels (1%) show residual emission (not assigned to clouds) that is more than 10% of the total column density. We note that our definition of a cloud is more general than that used by Heiles & Troland (2003) and includes not only the cold neutral medium (CNM) but also the warm neutral medium (WNM).

The method outputs the following information for each superpixel: (a) the number of identified clouds, and (b) the parameters of all GCs that belong to each cloud. We post-process the output to compute cloud properties. For each cloud we calculate its mean column density in the superpixel, ${N}_{{\rm{H}}\,{\rm\small{I}}}^{\mathrm{Cloud}}$ , by averaging the column density of the cloud's GCs over all N_side = 1024 pixels that they occupy. We construct a mean spectrum of the cloud, $\langle \,I(v)\,{\rangle }^{\mathrm{Cloud}}$ , by averaging the intensities of all of the cloud's GCs at each velocity channel. The centroid velocity of the cloud, v^Cloud, is calculated as

$\begin{eqnarray}&&{v}^{\mathrm{Cloud}}=\displaystyle \frac{{\displaystyle \sum }_{i}^{{N}_{\mathrm{chan}}}{v}_{i}\cdot \langle \,I({v}_{i})\,{\rangle }^{\mathrm{Cloud}}}{{\displaystyle \sum }_{i}^{{N}_{\mathrm{chan}}}\langle \,I({v}_{i})\,{\rangle }^{\mathrm{Cloud}}},\end{eqnarray} \tag{ 2 }$

where ${v}_{i}$ is the velocity of the ith velocity channel and the summation is performed over all velocity channels ( ${N}_{\mathrm{chan}}$ ) in the range [−300, 300] km s⁻¹. As a measure of the width of the cloud spectrum, we calculate the square root of its second moment:

$\begin{eqnarray}&&\delta {v}^{\mathrm{Cloud}}=\sqrt{\displaystyle \sum _{i}^{{N}_{\mathrm{chan}}}\displaystyle \frac{\langle I({v}_{i}){\rangle }^{\mathrm{Cloud}}\cdot {\left({v}_{i}-{v}^{\mathrm{Cloud}}\right)}^{2}}{{\displaystyle \sum }_{i}^{{N}_{\mathrm{chan}}}\langle I({v}_{i}){\rangle }^{\mathrm{Cloud}}}.}\end{eqnarray} \tag{ 3 }$

Since we retain the information on the GCs, we can also calculate quantities at the native resolution of the HI4PI data. For example, we compute the total N_{H i} per N_side = 1024 pixel from GCs belonging to clouds. This is used to make morphological comparisons with higher-resolution data (Section 4) and to check the output of the algorithm, as explained in Appendix B.

4. Results

Our method is best suited for inferring 3D information about the dust distribution in regions with low dust content, where the H i and dust column densities are linearly correlated (Planck Collaboration 2014). Additionally, because the method relies on identifying kinematicaly distinct features, it is expected to perform best in areas where H i spectra are relatively simple (with ideally one velocity component within the chosen KDE bandwidth).

For these reasons, we have chosen to apply our method to the high Galactic latitude sky (Section 4.1.1), with column density ${N}_{{\rm{H}}{\rm\small{I}}}\lt 4\times {10}^{20}\,{\mathrm{cm}}^{-2}$ , where Lenz et al. (2017) find the best correspondence between H i and dust emission. We also present results for the region targeted by the BICEP/Keck CMB experiment, as an illustration of the ability of our method to inform dust modeling for CMB foreground subtraction (Section 4.2).

In our analysis we examine H i emission within the velocity range $| {v}_{\mathrm{LSR}}| \lt 70\,\mathrm{km}\,{{\rm{s}}}^{-1}$ , thus excluding HVCs that do not contribute traceable amounts of dust (e.g., Wakker & Boulanger 1986; Lenz et al. 2017).

4.1. Polar Areas

We apply our method to all pixels in the HI4PI data with ${N}_{{\rm{H}}{\rm\small{I}}}\lt 4\times {10}^{20}\,{\mathrm{cm}}^{-2}$ , which covers most of the sky at Galactic latitude $| b| \gt 30^\circ$ . The total N_{H i} is calculated at the native resolution of the HI4PI data (N_side = 1024), and the resolution is then downgraded to produce a column density map at N_side = 128. Pixels of the low-resolution map that exceed the threshold in column density are masked. The total area covered is 17,494 square degrees (42% of the sky).

4.1.1. The Number of Clouds per Sight Line at High Galactic Latitude

We produce maps of the number of clouds, N_clouds, per ${N}_{\mathrm{side}}=128$ pixel (resolution 0 fdg 46) (Figure 4). The maps are centered on the north and south Galactic poles for better visualization. A total of 94% of pixels have N_clouds = 1−4, with the maximum of seven components found in a single pixel. There is no sight line free of clouds, as expected from the H i spectra, which show detections everywhere. The maps show large-scale coherence, despite the fact that each pixel has been treated independently. These large-scale regions of similar N_clouds per pixel mark the presence of clouds that are spread out over hundreds of square degrees. The patterns are very different between the north and south Galactic polar regions. We quantify this difference by looking at the 1D distribution of N_clouds per pixel for the north and south separately (inset of Figure 4, same data as in the maps). The distribution in the north has a mean of N_clouds = 3 compared to 2.5 in the south and is skewed toward larger values.

fdg — **Figure 4.** Number of clouds (N_clouds) per N_side = 128 pixel (resolution 046) in the north (left) and south (right) regions in the velocity range $| {v}_{\mathrm{LSR}}| \lt 70\,\mathrm{km}\,{{\rm{s}}}^{-1}$ . The maps are centered on the Galactic poles and are in orthographic projection. Parallels are spaced by 30° in latitude. The inset shows the 1D distribution of N_clouds per pixel for the north (gray) and south (red) regions.
Download figure:
Standard image High-resolution image

In order to understand these differences, we turn to the physical properties of the identified clouds. We examine the column density and centroid velocity of the clouds in Figure 5 for the north and south regions separately. Clouds are found in the entire velocity range considered ( $| {v}_{\mathrm{LSR}}| \lt 70$ km s⁻¹). Their column densities span the range $5\times {10}^{18}\,{\mathrm{cm}}^{-2}-5.7\times {10}^{20}$ cm⁻². Some clouds exceed the threshold ${N}_{{\rm{H}}{\rm\small{I}}}\lt 4\times {10}^{20}\,{\mathrm{cm}}^{-2}$ placed when defining the mask for the high-latitude regions.This happens because some superpixels that lie at the edge of the mask contain high-resolution pixels with N_{H i} exceeding the applied threshold. There are 1348 superpixels that contain clouds with column densities higher than the column density threshold of the sky mask, which make up only 0.2% of pixels in the studied sky area.

**Figure 5.** Physical properties of identified clouds (left two panels): 2D distribution of cloud H i column density ( ${N}_{{\rm{H}}\,{\rm\small{I}}}^{\mathrm{Cloud}}$ ) vs. cloud centroid velocity for the north and south Galactic polar regions. The dashed vertical lines mark the velocity range that separates LVCs and IVCs (defined in the rightmost panel). Horizontal bars (solid black for the north and red dotted for the south) mark the 1st and 99th percentile range of the distribution of centroid velocities of clouds above a minimum cloud N_{H i} (see also right panel). Determination of velocity boundary that separates LVCs and IVCs (right panel): horizontal bars mark the 1st and 99th percentile range of cloud velocity for different thresholds of minimum cloud column density (black solid for clouds in the north, red dotted for clouds in the south). Dashed vertical gray lines mark the adopted ${v}_{\mathrm{LSR}}$ boundaries at −12 and 10 km s⁻¹.
Download figure:
Standard image High-resolution image

**Figure 5.** Physical properties of identified clouds (left two panels): 2D distribution of cloud H i column density ( ${N}_{{\rm{H}}\,{\rm\small{I}}}^{\mathrm{Cloud}}$ ) vs. cloud centroid velocity for the north and south Galactic polar regions. The dashed vertical lines mark the velocity range that separates LVCs and IVCs (defined in the rightmost panel). Horizontal bars (solid black for the north and red dotted for the south) mark the 1st and 99th percentile range of the distribution of centroid velocities of clouds above a minimum cloud N_{H i} (see also right panel). Determination of velocity boundary that separates LVCs and IVCs (right panel): horizontal bars mark the 1st and 99th percentile range of cloud velocity for different thresholds of minimum cloud column density (black solid for clouds in the north, red dotted for clouds in the south). Dashed vertical gray lines mark the adopted ${v}_{\mathrm{LSR}}$ boundaries at −12 and 10 km s⁻¹.
Download figure:
Standard image High-resolution image

The highest column density clouds are found in the range of v_LSR that traditionally corresponds to LVCs (±35 km s⁻¹; chapter by Albert & Danly in van Woerden et al. 2004). Clouds within this velocity range exhibit similar ranges of N_{H i} in the north and south regions. At more negative velocity, cloud properties in the north and south regions differ: there are more IVCs at negative velocity with ${N}_{{\rm{H}}{\rm\small{I}}}\gt 0.5\times {10}^{20}$ cm⁻² in the north than in the south. The excess of IVC emission at negative velocities in the north is well documented (e.g., van Woerden et al. 2004) and has been known since early H i surveys (e.g., Blaauw & Tolbert 1966; Wesselius & Fejes 1973).

In the literature, the velocity that marks the boundary between LVCs and IVCs varies by tens of kilometers per second (see, e.g., Wakker 1991, 2001; Magnani & Smith 2010, where the cut is at ±20 km s⁻¹, ±30 km s⁻¹ and ±40 km s⁻¹, respectively). Here we can use the second dimension of cloud N_{H i} to select a more suitable velocity cut for the selected sky regions. We will exploit the apparent similarity of LVC properties in the north and south.

For our selected regions, we expect LVCs in the north and south to have statistically similar physical properties and to reside within a few hundred parsecs of the Sun, based on several lines of evidence. First, the fact that all high column density clouds are LVCs is consistent with the scale height of the H i gas (Kalberla et al. 2007). Second, linear features in H i at low velocity are correlated with interstellar polarization of stars within a few hundred parsecs of the Sun (Clark et al. 2014). Third, the majority of interstellar polarization arises just outside the Local Bubble at high latitude (Santos et al. 2011; Berdyugin et al. 2014), as does the majority of dust extinction detected in 3D dust maps (Lallement et al. 2019). In contrast, most negative-velocity IVCs are more distant objects, associated with the intermediate-velocity (IV) Arch, IV Spur, and other well-known complexes (Kuntz & Danly 1996). Constraints on the value of the distance of northern IVCs from the Galactic midplane range widely (e.g., Wakker 2001; Welsh et al. 2004; Puspitarini & Lallement 2012). Most sight lines have height brackets of 0.5–2 kpc (Wakker 2001). Exceptions exist for both cloud categories: extragalactic gas from parts of the Magellanic Stream is known to have low v_LSR (e.g., D'Onghia & Fox 2016), while the molecular cloud IVC 135+54 (IV21 Kuntz & Danly 1996) lies at a distance of ∼300 pc (Benjamin et al. 1996), as do some intermediate-latitude parts of the IV Arch (Welsh et al. 2004). Both these exceptions, however, only affect a small percentage of sight lines of the high-latitude sky. Consequently, we expect that the physical properties of LVCs studied here differ statistically from those of IVCs.

The higher column density of LVCs and their expected similarity between hemispheres allow us to define a simple criterion for selecting the LVC velocity range. Figure 5 (right) shows the 1st and 99th percentiles of the distribution of cloud velocities after selecting all clouds above a threshold in N_{H i}. As the threshold is increased, we find improving agreement between the percentiles of the northern and southern v_LSR distributions. For ${N}_{{\rm{H}}{\rm\small{I}}}\gt 2.5\times {10}^{20}$ cm⁻² the north and south velocity ranges are the same to within the first decimal. We thus adopt a velocity range for LVCs of $-12\,\mathrm{km}\,{{\rm{s}}}^{-1}\,\lt {v}_{\mathrm{LVC}}\lt 10\,\mathrm{km}\,{{\rm{s}}}^{-1}$ . Clouds with centroid velocities outside this range fall in the IVC category.

Note that placing a cut on the centroid velocity of clouds results in a more natural separation between LVCs and IVCs than applying a threshold in the H i spectra, as is commonly done. This has the advantage of allowing for a gradual fall of the cloud signal, which can extend past the border in velocity. The emission of a cloud is thus not truncated arbitrarily.

Having defined a velocity cut to separate LVCs from IVCs, we examine how these classes are distributed on the sky. Figure 6 shows maps of N_clouds per pixel for the LVC and IVC range separately. The number of distinct kinematic components in the LVC range is significantly smaller than that in the IVC range: 95% of pixels have ${N}_{\mathrm{clouds}}\leqslant 1$ in the LVC range, a value that only 36% of pixels in the IVC range have. LVCs cover almost the entire selected area, while IVCs are more clustered and occupy a smaller sky fraction in the south than in the north. The patterns seen in Figure 4 can be attributed primarily to IVCs, as can be seen by comparing with the lower left panel of Figure 6.

In the north, at longitudes l = 90°–270°, there are more clouds per pixel than in the rest of the area. This area is primarily occupied by negative-velocity IVCs, that are associated with extraplanar gas structures, such as the IV Arch (Kuntz & Danly 1996). However, not all IVCs are extraplanar. Positive-velocity IVCs at longitudes l = 300°–350° are likely associated with planar gas that does not lie in our immediate vicinity. In the south, there are pixels that contain emission from the Magellanic Stream, which coincides with low v_LSR and can easily be confused with Galactic emission. A more detailed analysis of the kinematics of the Galactic gas and the Magellanic Stream is necessary in order to separate the two contributions (e.g., Nidever et al. 2010).

The fact that the IVC velocity range has a higher number of clouds per pixel than that of LVC gas can be due to the large distance to IVCs (e.g., Wakker et al. 2007, 2008). A pixel of 0 fdg 45 covers 8 pc in angular size at a distance of 1 kpc, and only 0.8 pc at 100 pc (nearest distance to Local Bubble wall). Moreover, it is likely that the majority of LVCs reside at the boundary of the Local Bubble, a structure with simple kinematics. At the same time, structures like the IV Arch may originate from a Galactic fountain process (Kuntz & Danly 1996), thus inheriting complex kinematics. However, another important factor is that the velocity range occupied by LVCs is only 20 km s⁻¹ wide, much narrower than that of IVCs. At the selected KDE bandwidth of 5 km s⁻¹, our method cannot distinguish between velocity components that differ by less than ∼12 km s⁻¹. For the goal of separating IVCs from LVCs, the selected bandwidth is sufficient and necessary to avoid the detection of spurious peaks in the pdf of GC velocities (see Appendix B).

In our discussion so far, we have not introduced any weighting in counting the number of components along the line of sight. However, the column density of clouds varies both along the line of sight and on the plane of the sky (Figure 6, right panels). We define a measure of the complexity of the gas distribution along the line of sight that takes into account the cloud column density:

$\begin{eqnarray}&&{{ \mathcal N }}_{c}=\displaystyle \sum _{i=1}^{{N}_{\mathrm{clouds}}}\displaystyle \frac{{N}_{{\rm{H}}{\rm\small{I}}}^{i}}{{N}_{{\rm{H}}{\rm\small{I}}}^{\max }},\end{eqnarray} \tag{ 4 }$

where ${N}_{{\rm{H}}{\rm\small{I}}}^{i}$ is the column density of the ith cloud along the sight line (within the superpixel) and ${N}_{{\rm{H}}{\rm\small{I}}}^{\max }$ is the column density of the cloud with highest N_{H i} in the superpixel. If a sight line has two clouds of equal column density, ${{ \mathcal N }}_{{\rm{c}}}$ will be equal to 2. If one of two clouds has half the column density of the other, then ${{ \mathcal N }}_{{\rm{c}}}=1.5$ , and so forth.

Figure 7 shows maps of ${{ \mathcal N }}_{{\rm{c}}}$ for the north and south regions. We again find an asymmetry between the two polar areas, with the north showing a higher fraction of pixels with ${{ \mathcal N }}_{{\rm{c}}}\gt 1$ . The 1D distributions of ${{ \mathcal N }}_{{\rm{c}}}$ differ significantly from those of N_clouds (Figure 4). In the north, the percentage of pixels with ${{ \mathcal N }}_{{\rm{c}}}\lt 1.5$ is 60%; in the south it is 90%. Thus, most sight lines are dominated by the column density of one cloud, with notable exceptions existing owing to the presence of northern IVCs. In the north, 17% of pixels have ${{ \mathcal N }}_{c}\geqslant 2$ .

The distributions of ${{ \mathcal N }}_{c}$ do not show tails extending to large values, unlike the distributions of N_clouds. This stems from the fact that a large population of low column density clouds exist in our sample (see Figure 5). Clouds with ${N}_{{\rm{H}}{\rm\small{I}}}\lesssim 2\times {10}^{19}\,{\mathrm{cm}}^{-2}$ may be affected by certain systematics, as discussed in Section 5.3. This population is most sensitive to the choice of superpixel size in the cloud identification step (Appendix B). The column-density-weighted number of clouds, ${{ \mathcal N }}_{c}$ , is insensitive to these problems. We show that the distribution of ${{ \mathcal N }}_{c}$ remains stable for different choices of the superpixel size in Appendix B. Compared to N_clouds, ${{ \mathcal N }}_{c}$ is a more robust measure of ISM complexity along the line of sight, in the context of CMB foreground subtraction.

4.1.2. Signatures of IVCs in Planck Dust Emission

The previous section shows that higher values of N_clouds and ${{ \mathcal N }}_{c}$ are associated with the presence of IVCs. Dust properties have been found to differ between IVCs and LVCs in the 14 fields analyzed by Planck Collaboration (2011). We wish to investigate whether there are signs of varying dust properties associated with IVCs throughout the entire northern high Galactic latitude sky. For this we do not attempt to repeat the joint analysis of H i and Planck dust emission presented in Planck Collaboration (2011), as it is beyond the scope of the present work. Instead, we search for signatures of varying dust properties in the maps of dust model parameters produced by Planck Collaboration (2016b).

We use the map of dust temperature, T_d, created by fitting a modified blackbody to multifrequency dust emission maps.⁴ The Planck and IRIS data were processed with the Generalized Needlet Internal Linear Combination (GNILC) algorithm (Remazeilles et al. 2011) to separate the contribution of the cosmic infrared background to the observed emission. We also make use of the χ² map that resulted from the model fit to GNILC-processed dust emission maps (Planck Collaboration 2016b).⁵ We downgrade both maps from their native resolution of ${N}_{\mathrm{side}}=2048$ to N_side = 1024, to match the resolution of the HI4PI data. The morphological patterns that appear on the maps under investigation typically cover sky areas much larger than the pixel size at any of these resolutions. Averaging model parameters in this way is only used here for investigating such large-scale spatial correlations (see also Hensley et al. 2019).

We create maps of H i column density of LVCs and IVCs separately at a resolution of N_side = 1024 (as mentioned in Section 3). They are used to make a map of the ratio, ρ, of IVC to LVC column density:

$\begin{eqnarray}&&\rho =\displaystyle \frac{{N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}}{{N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{LVC}}}.\end{eqnarray} \tag{ 5 }$

In pixels containing an LVC but no IVC, we set $\rho ={10}^{-3}$ . In pixels where only IVCs were detected, we set $\rho ={10}^{3}$ . Masking out these pixels instead has no effect on the result. Pixels where neither IVCs nor LVCs are found are masked. We apply this mask to the T_d map as well.

Figure 8 shows the maps of T_d and IVC column density ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ . The T_d map shows large-scale, coherent variations with a significant enhancement of T_d near the pole (as noted in Planck Collaboration 2014). There is an apparent tendency for pixels with higher ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ to show higher T_d, though the correspondence is not one-to-one. This morphological similarity was found in Planck Collaboration (2014) and remains in our maps despite the different methods of creating the ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ map and the different v_LSR cut used to separate LVCs and IVCs (±35 km s⁻¹ was used in their work).

A simple correlation of T_d and ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ yields the very small Pearson correlation coefficient of 0.2. However, we need to take into account that T_d is also found to correlate with the total N_{H i}, as discussed by Hensley et al. (2019). If IVC intrinsic properties are responsible for the change in T_d, then we should expect that sight lines with stronger IVC emission show a higher T_d when compared to LVC-dominated regions with the same total N_{H i}. This is indeed what we find, as demonstrated in Figure 8 (right panel). By binning pixels in the N_{H i}–ρ plane, we find that the average T_d for pixels with $\rho \gt 1$ is higher than that for $\rho \lt 1$ . For the same range of N_{H i}, namely, [1.0, 2.8] $\times \,{10}^{20}$ cm⁻², the mean T_d changes from 19.4 to 20.4 K as we transition from sight lines dominated by LVCs ( $\rho \lt 1$ ) to those dominated by IVCs ( $\rho \gt 1$ ). Beyond ρ = 1 there is very little variation in T_d. The mean T_d remains higher for $\rho \gt 1$ compared to $\rho \lt 1$ even if we do not restrict N_{H i} to the aforementioned range (in which case we find a mean T_d of 20.1 and 19.3, respectively).

We hypothesize that the change in T_d is most prominent for a restricted range of N_{H i} because that is the range where negative-velocity IVCs dominate. At lower latitude and higher column, positive-velocity IVCs dominate, and these are most likely not extraplanar gas. To test this, we select pixels where the N_{H i} from negative velocity, ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVCneg}}$ , is at least half of the total IVC N_{H i}. We find that 54% of pixels that satisfy this condition have IVC N_{H i} in the range [1.0, 2.8] $\times \,{10}^{20}$ cm⁻². If we look at pixels that also have $\rho \gt 1$ , we find that a significantly larger percentage of those (74%) lies in the aforementioned column density range. Therefore, it is likely that the negative-velocity IVCs are those responsible for the increase in T_d within the column density range [1.0, 2.8] $\times \,{10}^{20}$ cm⁻².

The evidence that dust in IVCs is different from that in LVCs implies that the dust spectral energy distribution (SED) might be more complex than a modified blackbody along sight lines where both classes of cloud exist. If deviations from a modified blackbody SED exist in such sight lines, then the model used in Planck Collaboration (2014) might yield a slightly poorer fit than the rest of the high-latitude sky. We investigate whether this is the case by using the χ² statistic (Figure 9).

**Figure 9.** Morphological correlation between χ² (top row) and ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ (bottom row). From left to right: maps showing the northern Galactic polar region, zoom-ins centered on *l, b* = (136°, 54°) and (142°, 75°) (image sizes are 7° and 10°, respectively).
Download figure:
Standard image High-resolution image

A correlation with ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ is not as obvious in the χ² map as it was in the T_d map. However, we do find certain regions where the χ² is consistently elevated with patterns that morphologicaly match certain IVCs. Figure 9 shows zoomed-in portions of the χ² and ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ maps centered on prominent features in the former. The cloud shown in the middle panels is the well-studied diffuse molecular cloud IVC 135+54-45 (IV21; Benjamin et al. 1996; Kuntz & Danly 1996; Lenz et al. 2015), known to have lower metallicity than LVC gas (Hernandez et al. 2013). The feature in the right panels extends over ∼10°, is part of the IV Arch, and overlaps with clouds IV2, 4, 11, 17 in the catalog by Kuntz & Danly (1996). We also find close morphological matches between χ² and ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ for other known molecular IVCs that are less extended on the sky. We visually inspected the maps of χ² and ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ at the locations of the MIVCs in the catalog of Röhser et al. (2016) (within an area of 4° × 4°). We found 36 locations with clear similarities, which amounts to 40% of MIVCs within the footprint of our ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ map.

We note that our high-resolution ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ maps contain evident artifacts of the pixelization used in the cloud identification method. These arise from the IVC/LVC selection and are largely absent in the total ${N}_{{\rm{H}}{\rm\small{I}}}$ map. We discuss this artifact in Section 5. For the purpose of quantifying the number of clouds along the line of sight (relevant to CMB foreground modeling), the statistics of individual sight lines are the primary target. While a more advanced method that includes spatial correlations on scales larger than the superpixel size should be pursued in order to resolve such artifacts, we do not expect any significant effect on the statistics of N_clouds or ${{ \mathcal N }}_{c}$ .

So far, we have investigated the effect of IVCs on the total intensity of dust emission. We briefly examine whether IVCs leave a traceable imprint on the polarized dust emission measured by Planck at 353 GHz. We follow Planck Collaboration (2020a) to construct smooth maps of Stokes I, Q, and U at a resolution of 80'. We subtract the zero-level offset of 389 $\mu {K}_{\mathrm{CMB}}$ from the I map. After smoothing, the map resolution is downgraded to ${N}_{\mathrm{side}}=128$ . We construct a map of fractional linear polarization, p₃₅₃, by using the equation ${p}_{353}\,=\sqrt{{Q}^{2}+{U}^{2}}/I$ .

Figure 10 (top) shows the joint distribution of p₃₅₃ and the column-density-weighted number of clouds per pixel ( ${{ \mathcal N }}_{c}$ ) from Section 4.1.1. Intriguingly, there is a marked decrease of the median and maximum p₃₅₃ at high ${{ \mathcal N }}_{c}$ . As discussed in Section 4.1.1, larger values of ${{ \mathcal N }}_{c}$ are found primarily in regions where IVCs are prominent. The decrease of p₃₅₃ with ${{ \mathcal N }}_{c}$ may indicate some level of line-of-sight depolarization caused by IVCs. Note that we have not taken into account the uncertainties in p₃₅₃, which may cause positive bias for pixels with low signal-to-noise ratio. The trend shown in Figure 10 (top) is distinct from the decrease in p₃₅₃ with total N_{H i} found at higher column density by Planck Collaboration (2020a). In the high Galactic latitude regions studied here, p₃₅₃ is primarily flat with N_{H i} (Figure 10, bottom). We find that p₃₅₃ is related to the IVC column density, rather than the total column density: a trend similar to that in Figure 10 (top) is seen also when comparing with IVC N_{H i} instead of ${{ \mathcal N }}_{c}$ .

Together, the results of this section indicate that IVCs are likely an important contributor to the Planck dust emission, both in total and in polarized intensity.

4.2. The BICEP/Keck Field

A primary motivation for this work is to inform CMB foreground modeling efforts. To this end, we focus on the region targeted by the BICEP/Keck experiment (Chiang et al. 2010). We examine the region as defined by the mask provided on the experiment's website,⁶ after downgrading from N_side = 512 to 128. We use the method outlined in Section 3 to quantify the complexity of H i spectra. As in Section 4.1.1, we analyze the velocity range $| {{\rm{v}}}_{\mathrm{LSR}}| \leqslant 70$ km s⁻¹ to avoid most HVC emission. We note that parts of the Magellanic Stream in this area have $| {{\rm{v}}}_{\mathrm{LSR}}|$ ∼70 km s⁻¹ (Westmeier 2018) and are therefore not removed by this cut.

The dominant cloud in each pixel lies within the LVC range, as can be seen by examining the joint distribution of cloud column density and cloud centroid velocity (Figure 11, left). The majority of high-N_{H i} clouds are found at low velocity, in agreement with the results presented previously for larger sky regions. Clouds with very low N_{H i} ( ${N}_{{\rm{H}}{\rm\small{I}}}\lt 0.5\times {10}^{20}\,{\mathrm{cm}}^{-2}$ ) are found at intermediate velocities. The tail of low-N_{H i} clouds that extends to high (positive) velocity is associated with the Magellanic Stream and thus will likely not contain traceable amounts of dust.

Figure 11 (middle) shows the column-density-weighted number of clouds per pixel, ${{ \mathcal N }}_{c}$ . ${{ \mathcal N }}_{c}$ ranges from 1 to 2.3, with 75% of pixels having ${{ \mathcal N }}_{c}\lt 1.1$ (for a two-cloud sight line this would imply that one cloud has 1/10 of the column density of the other). Thus, one cloud dominates the column density in the majority of pixels in the BICEP/Keck region. We calculate the relative contribution of the highest-N_{H i} cloud per pixel compared to the total N_{H i} of the sight line. We find that for 85% of pixels the highest-N_{H i} cloud contributes more than 80% of the column.

We note that our definition of a cloud depends on the choice of KDE bandwidth. The method is not able to distinguish between velocity components that differ by less than ∼2.5 times the chosen bandwidth. Therefore, it is possible that within the LVC range there may exist kinematicaly distinct features that are identified as a single "cloud". For a bandwidth of four channels (5 km s⁻¹) we find that there is unresolved substructure in the identified clouds. We quantify the amount of substructure by identifying maxima in the mean spectrum of each cloud. Two or more maxima are found for 40% of LVCs in the region. Thus, while IVCs are likely not a concern for foreground modeling in this region, the LVC substructure may be cause to consider models with multiple dust components. The very nature of the multiphase neutral medium may cause such velocity substructure, as well as variations in dust properties (e.g., Clark et al. 2019; Murray et al. 2020), motivating the incorporation of phase structure into foreground modeling (Ghosh et al. 2017; Adak et al. 2020).

We further investigate the velocity substructure by varying the choice of bandwidth. We repeat the analysis by choosing a bandwidth of three channels (3.8 km s⁻¹) and show the resulting ${{ \mathcal N }}_{c}$ in the right panel of Figure 11. Much of the substructure in the cloud spectra is now resolved into separate clouds. ${{ \mathcal N }}_{c}$ has a maximum value of 3.0, with 54% of pixels having ${{ \mathcal N }}_{c}\lt 1.1$ . Pixels with ${{ \mathcal N }}_{c}\gt 1.5$ take up 20% of the area, while those with ${{ \mathcal N }}_{c}\gt 2$ amount to only 4%. The fraction of pixels where the highest-N_{H i} cloud contributes more than 80% of the column is now 61%. The conclusion remains that N_{H i} is dominated by one cloud per superpixel for most of the area (within $| {{\rm{v}}}_{\mathrm{LSR}}| \leqslant 70$ km s⁻¹).

Substructure in the mean spectrum of clouds exists even after using the higher velocity resolution (bandwidth = three channels). An example is shown in Figure 12 (top), where two maxima are seen in the mean spectrum of an LVC. To quantify this effect, we construct a map of the number of maxima per cloud. In each pixel we identify the highest column density cloud in the LVC range and measure the number of maxima in its mean spectrum. The resulting map is shown in Figure 12 (bottom). We find spatially coherent regions with approximately two maxima, primarily in the western and eastern areas of the map. Two or more maxima are found in 30% of pixels.

**Figure 12.** Substructure in the mean spectrum of LVCs. Top: the red dashed line (right axis) shows the pdf of GC mean velocity of a superpixel in the region. The black lines (left axis) show the average spectrum of clouds in the superpixel (at each velocity we find the average intensity from the collection of all GCs of the cloud). The pixel shown is centered at (l, b) = (2856, −524). The mean spectrum that peaks at ∼5 km s⁻¹ has two maxima, revealing structures unresolved by the cloud identification. Bottom: map of the number of maxima in LVCs identified in the region. A red filled circle marks the pixel shown in the top panel. A KDE bandwidth of three channels was used.
Download figure:
Standard image High-resolution image

fdg — **Figure 12.** Substructure in the mean spectrum of LVCs. Top: the red dashed line (right axis) shows the pdf of GC mean velocity of a superpixel in the region. The black lines (left axis) show the average spectrum of clouds in the superpixel (at each velocity we find the average intensity from the collection of all GCs of the cloud). The pixel shown is centered at (l, b) = (2856, −524). The mean spectrum that peaks at ∼5 km s⁻¹ has two maxima, revealing structures unresolved by the cloud identification. Bottom: map of the number of maxima in LVCs identified in the region. A red filled circle marks the pixel shown in the top panel. A KDE bandwidth of three channels was used.
Download figure:
Standard image High-resolution image

In Section 4.1.2 we found that regions containing multiple clouds (IVCs and LVCs) sometimes showed elevated values of χ² of the dust SED model fit. With the absence of IVCs in the region, it is interesting to investigate whether there are any morphological correlations between the χ² map and the complexity of H i spectra. The distribution of χ² on the sky does show morphological similarities with the total column density map (middle and left panels of Figure 13, respectively). In particular, there are two regions of elevated χ² that correlate with the H i column. These are marked with white circles in the middle panel of Figure 13. The lower part of the map shows a ring feature that is not related to the H i morphology. The right panel of Figure 13 shows the 2D distribution of χ² and N_{H i}. The two marked regions correspond to two correlated regions in this plot. The eastern region shows higher values of χ² ( ${\chi }^{2}\gt 0.1$ ) but relatively modest N_{H i}. The western region contributes to the correlation seen at $4\times {10}^{20}\,{\mathrm{cm}}^{-2}\,\lt {N}_{{\rm{H}}{\rm\small{I}}}\lt 7\times {10}^{20}\,{\mathrm{cm}}^{-2}$ . We separate the two regions with a simple cut on Galactic longitude of l = 315°. Within these subregions we find a Pearson correlation coefficient of 0.68 for the eastern region and 0.40 for the western region.

**Figure 13.** Maps of N_{H i} (left) and dust emission χ² (middle) in the BICEP/Keck field and their 2D distribution (right). In the middle panel, white lines encircle two regions that show spatial correlation between the two maps. The highly correlated population of pixels with ${\chi }^{2}\gt 0.1$ at the low-N_{H i} range (right panel) corresponds to the eastern circled region in the middle panel. The more modest correlation seen between N_{H i} of 4 and 7 $\,\times \,{10}^{20}\,{\mathrm{cm}}^{-2}$ corresponds to the western marked region in the middle panel.
Download figure:
Standard image High-resolution image

**Figure 13.** Maps of N_{H i} (left) and dust emission χ² (middle) in the BICEP/Keck field and their 2D distribution (right). In the middle panel, white lines encircle two regions that show spatial correlation between the two maps. The highly correlated population of pixels with ${\chi }^{2}\gt 0.1$ at the low-N_{H i} range (right panel) corresponds to the eastern circled region in the middle panel. The more modest correlation seen between N_{H i} of 4 and 7 $\,\times \,{10}^{20}\,{\mathrm{cm}}^{-2}$ corresponds to the western marked region in the middle panel.
Download figure:
Standard image High-resolution image

In Section 4.1.2 we found that the existence of IVCs is sometimes correlated with elevated values of χ². The origin of the spatially coherent high-χ² values in the BICEP/Keck region, however, must be different, as there is no significant contribution of IVCs to the total H i column density. Therefore, if the increased χ² in this region is due to the presence of multiple dust components, then these components would correspond to clouds in the LVC range. If that is the case, we expect that LVCs in the regions of elevated χ² will show more complex kinematics. The western region does indeed show elevated values of ${{ \mathcal N }}_{c}$ (Figure 11), as well as LVC spectra with multiple components (Figure 12). The eastern region, which has the highest χ² values, does not appear to have elevated ${{ \mathcal N }}_{c}$ . However, this region shows a larger proportion of spectra with multiple (unresolved) components (Figure 12). We select pixels in this region by imposing ${\chi }^{2}\gt 0.1$ . We find that 38% of these pixels have at least two maxima in the mean spectrum of clouds, compared to 26% in the rest of the map.

The regions that exhibit higher χ² fall in the outskirts of the BICEP/Keck region and are assigned lower weights than the central pixels in the power spectrum analysis of BICEP2 & Keck Array Collaborations et al. (2015). We isolate pixels where the mask values (weights) are higher than 0.9 and investigate the complexity of their associated H i emission. We find that in 15% of pixels the mean spectrum of the LVC component shows two maxima. These components may not exhibit as stark differences in dust properties as seen between LVCs and IVCs, as the χ² is low throughout this central region of the BICEP/Keck field.

The required precision with which dust must be modeled as a CMB foreground depends on the target sensitivity of the experiments. Further work is needed to assess the effect of multiple clouds on the polarized dust SED in the region. The results presented in this section can aid future investigations by providing estimates of the expected dust emission signal from each cloud along the line of sight.

5. Discussion

By analyzing the H i emission at high Galactic latitude, we have produced a measure of the number of discrete components per line of sight that may contribute to the observed dust emission at microwave frequencies. Our primary motivation was to inform CMB foreground modeling. We discuss our results in the context of the latest literature that aims to quantify line-of-sight effects in the ISM relative to CMB foreground subtraction.

5.1. Comparison of the Number of Clouds per Sight Line with Previous Works

In recent years it has been realized that a major uncertainty in CMB foreground modeling comes from the unknown complexities of the dust distribution along the line of sight (e.g., Tassis & Pavlidou 2015). Various estimates of the number of dust components along the line of sight have been used in order to understand the effect of such complexities on the recovery of cosmological parameters. The estimates differ drastically in the literature. In one of the first works assessing such effects, Poh & Dodelson (2017) studied the region around the north Galactic pole. They used two models: one with nine clouds per kiloparsec and another based on the 3D dust extinction map by Green et al. (2015). Planck Collaboration (2016a) use a model of discrete layers of dust to describe the dust emission in the south Galactic cap ( $b\lt -60$ ). They find that four to nine layers can reproduce the one-point statistics of the polarized dust emission from Planck. In their approach, the number of layers was simply a free parameter of the model, unrelated to a specific physical quantity. In contrast, the model of Ghosh et al. (2017) and Adak et al. (2020) employs three discrete layers of dust, each associated with a different phase of the ISM. Their analysis is based on HI4PI data in the Galactic polar caps and also succeeds in reproducing observables in the polarized dust emission. The all-sky model of Martínez-Solaeche et al. (2018) is based on the 3D reddening map of Green et al. (2015). Dust emission at high Galactic latitudes in their map arises from approximately one to two layers.

The variety of assumptions in such models was a prime motivator for the present work. Our measure of the number of H i clouds along the line of sight shows that on average 2.5 (south) and 3 (north) kinematicaly distinct components are found at high Galactic latitude (Section 4.1.1). The relative contribution of these components along the sight line varies from pixel to pixel. We introduced a separate measure of the number of clouds that takes into account the relative column density of clouds. The southern Galactic polar region is dominated by the column density of one cloud per sight line. In contrast, the dominant cloud contributes less than two-thirds of the total column density for 40% of northern pixels (where ${{ \mathcal N }}_{c}\gt 1.5$ ).

Our determination of the average number of clouds per pixel differs drastically from that assumed in some of the first works mentioned previously (Planck Collaboration 2016a; Poh & Dodelson 2017). These large numbers of "layers" have been reduced in the physically motivated models of Ghosh et al. (2017), Adak et al. (2020), and Martínez-Solaeche et al. (2018).

When assuming three dust components, each associated with a discrete phase of the neutral ISM (CNM, WNM, and unstable neutral medium), Adak et al. (2020) find that the northern Galactic polar region model lacks a necessary fourth component to account for all the observed dust. This fourth component corresponds to IVCs. In our determination of the number of clouds per pixel we do not distinguish between ISM phases, but we find that there are on average three kinematicaly distinct components along the line of sight. Given the wide bandwidth chosen for the application of our method, it is natural that these components correspond to discrete multiphase clouds. An improved description for dust modeling would combine the two approaches and take into account not only the ISM phase information but also the discrete nature of LVCs and IVCs (see also Murray et al. 2020).

Most recently, a different class of models has been developed in which H i velocity channel maps are treated as "layers" of emission along the line of sight (Clark & Hensley 2019; Lu et al. 2020; Hu et al. 2020). These approaches utilize morphological information and the column density to successfully reconstruct properties of the polarized dust emission that are related to the 3D structure of the ISM magnetic field. To fully model the dust emission Stokes parameters, these approaches must make assumptions on the temperature and spectral index of dust within each "layer." These assumptions can be informed by the results presented in Section 4, in particular regarding the constraints on differences between dust in IVCs and LVCs.

5.2. Are IVCs Important for Polarized CMB Foreground Subtraction?

At high Galactic latitude the observed H i column density that is correlated with reddening originates from LVC and IVC gas (e.g., Planck Collaboration 2014; Lenz et al. 2017). These two classes of clouds may exhibit systematic differences in their dust and magnetic field properties, potentially causing variations of the polarization angle of dust emission with frequency (as proposed by Tassis & Pavlidou 2015).

Evidence for differences in the dust properties of IVCs compared to LVCs was found when analyzing the total intensity of dust emission by Planck Collaboration (2011). The increased dust temperatures and lower dust emission cross sections in IVCs were attributed to smaller grain sizes in IVCs arising from dust shattering in the Galactic halo. This interpretation was also considered by Planck Collaboration (2014) to explain the lower dust specific luminosity found in Galactic polar regions with IVCs. In this work we noted two additional pieces of evidence that point to the different dust properties of IVCs compared to LVCs. One is a morphological correlation between the column density of some IVCs and the map of reduced χ² from the modified blackbody fit to the dust emission SED performed by Planck Collaboration (2016b). The elevated values of χ² in regions with (primarily molecular) IVCs imply that the single-component modified blackbody yields a significantly poorer fit to the dust SED. The second is a systematic increase in the dust temperature fit parameter, T_d, for pixels where IVCs contribute more to the column density than LVCs. This correlation is true for a given column density. The correlation is therefore not driven by the fitting degeneracy between T_d and dust amplitude at low signal-to-noise ratio regions that was discussed in Hensley et al. (2019).

Some evidence for differences in magnetic field properties between IVCs and LVCs exists in the literature. To the best of our knowledge there are two indications of such variations based on analysis of starlight polarization. Clark et al. (2014) note a possible loss of alignment between starlight polarization and H i morphology toward parts of the IV Arch. Panopoulou et al. (2019) perform a tomographic decomposition of stellar polarization as a function of distance toward a sight line at intermediate Galactic latitude that intercepts LVC and IVC gas. They find that the inferred plane-of-sky magnetic field orientation differs by 60° between the LVC and the IVC. Additional evidence comes from the Faraday rotation map of Oppermann et al. (2012). The IV Arch is found to correlate with a large region of positive Faraday depth, which could point to a systematic difference in the line-of-sight component of the magnetic field compared to local gas. Tritsis et al. (2019) apply a novel method to estimate the plane-of-sky magnetic field strength in LVCs and IVCs toward Ursa Major. They find magnetic field strengths that vary both on the plane of the sky and along the line of sight. These results are not surprising, as the distance to most IVCs (of order 1 kpc) is much larger than the correlation length of the Galactic magnetic field (∼200 pc; e.g., Beck et al. 2016).

In Section 4.1.2 we find tentative evidence that IVCs may be contributing to the polarized dust emission at 353 GHz (by adding a depolarizing effect). This is in apparent contrast with the finding of Skalidis & Pelgrims (2019) that at high latitudes the Planck polarized intensity is dominated by dust in the Local Bubble wall. These authors compared the Planck polarized dust emission with starlight polarization at different distances. They found that the match between the two tracers at high Galactic latitude is best at distances of 200–300 pc. However, as noted by the authors, the stellar polarization sample does not uniformly cover the high-latitude sky. In particular, there is a lack of stellar measurements at distances farther than 400 pc toward the general area occupied by IVCs (see Figure D.1 in Skalidis & Pelgrims 2019). Future stellar polarization surveys at high latitude (e.g., Tassis et al. 2018) will help clarify what fraction of the polarized dust emission arises from farther than the Local Bubble wall.

The large angular scale coverage of IVCs, combined with their common astrophysical origin (likely a Galactic fountain process; Kuntz & Danly 1996), may act to produce a systematic effect in the polarized dust SED (in the form of frequency decorrelation). So far, frequency decorrelation remains undetected in the Planck data (Planck Collaboration 2020b; Sheehy & Slosar 2018). However, if such a signal is to be searched for in the high Galactic latitude sky, the regions with significant IVC contribution to the column density would present a prime target. The maps of IVC column density presented in this work could serve as templates for forward modeling the polarized dust emission SED in order to predict the level of decorrelation that is to be expected owing to their presence. Another possible avenue of investigation would be to perform parametric fits to the polarization data that explicitly take into account contributions from IVCs and LVCs separately, as has been done for total intensity (Planck Collaboration 2011).

As the required accuracy of polarized foreground modeling increases, analyses that take into account IVCs may also serve for testing/improving modeling of Galactic synchrotron emission (the dominant low-frequency CMB foreground). Current models assume a single correlation parameter between the synchrotron and dust emission polarized signals (BICEP2 Collaboration & Keck Array Collaboration 2018; Planck Collaboration 2020b). However, this assumption would not be accurate if the detected synchrotron emission and dust emission did not probe the same path length over all sight lines. For example, while dust emission from IVCs might dominate that of local gas in some sight lines, synchrotron emission from the Galactic halo could be suppressed compared to local emission. It is known that the cosmic-ray density is significantly lower toward certain IVCs compared to the Galactic plane (Tibaldo et al. 2015). The magnetic field in the Galactic halo is also observed to be lower than in the Galactic plane (specifically, the line-of-sight component of the field; Sobey et al. 2019). Determining whether the polarized synchrotron emission at the location of IVCs is detectable (and if so, at what frequency) would help inform choices in combined dust–synchrotron foreground models.

5.3. Limitations of the Current Work

5.3.1. Spatial Coherence

Treating HEALPix superpixels independently means that there is no imposed spatial coherence at scales larger than the superpixel size. This can lead to abrupt changes between pixels. Discontinuities can arise when, for example, two components that are nearby in velocity appear distinct in one pixel but are blended in a neighboring pixel. The method has no way of distinguishing the two in the latter case. The fact that our maps do show large-scale continuity in the number of identified clouds per pixel indicates that there exist multiple distinct velocity components that are separate enough in velocity for the algorithm to identify them individually.

In Figure 9 the IVC column density map shows discontinuities toward certain pixels. These discontinuities arise from the selection of the LVC/IVC velocity boundary. It happens that in these regions a cloud is peaked very near the boundary of −12 km s⁻¹. In some pixels the velocity centroid of the cloud falls in the IVC range, and in neighboring pixels it is found in the LVC range. An improved IVC column density map could be constructed by iteratively finding the IVC/LVC boundary by considering not only the global statistics of clouds (Figure 5) but also the local properties of clouds in a region. Methods that take into account solutions in neighboring pixels could be adjusted for this (e.g., Green et al. 2019).

5.3.2. Confusion in Velocity

As explained in Section 3, we smooth the spectral information of the original data during the construction of the pdf of GC velocities with a KDE. This means that we are losing the ability to identify clouds that are separated in velocity by less than $2.5\times 4$ channels (∼12 km s⁻¹). Pixels with distinct peaks that differ by less than the bandwidth will be identified by the method as a single cloud. An example is shown in Figure 14 (bottom panel). A single peak in the pdf is made of Gaussians with a bimodal distribution of velocity, leading to a double-peaked spectrum for the "cloud." This effect can be mitigated by using a smaller KDE bandwidth, at the expense of introducing more spurious cloud identifications (Section B).

**Figure 14.** Top: example of mean spectrum of a cloud that suffers from confusion with signal from a nearby bright peak (cloud spectrum, leftmost black line) Bottom: example of mean cloud spectrum with substructure that has not been detected by the method (second-from-the-right black line). The dashed red line shows the pdf of GC velocities in the specific superpixel. Vertical gray lines mark the centroid velocity of each cloud. The boxes in the upper right corner show the coordinates of the superpixel.
Download figure:
Standard image High-resolution image

A small fraction of the fainter clouds identified by the method are subject to certain systematics. First, in the north, there exists emission from residual stray radiation. Even though this issue is addressed by our preprocessing routine (Appendix A), it is likely that some part of this spurious signal remains and is attributed to clouds. Second, a problem occurs when a strong peak in the pdf is located near a fainter peak (e.g., Figure 14, top). Gaussians that belong to the stronger peak may lie at a velocity that has been assigned to the weaker peak. As a result, the fainter cloud may show a skewed mean spectrum. This affects mostly faint clouds that are adjacent to very bright ones in velocity space. This does not affect our estimates of the number of clouds per pixel. Its main effect is on the estimate of N_{H i} of fainter clouds, biasing it to slightly higher values.

We quantify the percentage of clouds that are affected by the latter issue by calculating a measure of skewness of the spectrum of each cloud. We calculate the difference δ between the centroid velocity and the peak velocity of the mean spectrum of each cloud. The values of δ are in the range [−12, 12] channels, with the 10th and 90th percentiles being −2 and −1.9 channels. The most extreme values of δ ( $| \delta | \gt 5$ channels) are found in 80% of cases in clouds with ${N}_{{\rm{H}}{\rm\small{I}}}\lt 8\times {10}^{19}\,{\mathrm{cm}}^{-2}$ . Clouds of even lower column density ${N}_{{\rm{H}}{\rm\small{I}}}\lt 1\times {10}^{19}\,{\mathrm{cm}}^{-2}$ make up 65% of extreme-δ cases.

Our method can identify components of emission that are kinematicaly distinct. It is worth noting that the velocity differences found in this work are much larger (typically tens of kilometers per second) than what is expected to arise from natural velocity dispersion (thermal/turbulence) within individual clouds. For gas temperatures of 100–1000 K, the isothermal sound speed is 1–3 km s⁻¹. While different velocity components do not necessarily map to discrete structures along the line of sight (Beaumont et al. 2013; Clarke et al. 2018), any velocity substructure is smoothed out owing to (a) the spectral resolution of the HI4PI data (∼1 km s⁻¹) and (b) our KDE smoothing kernel (5 km s⁻¹).

It is possible that multiple clouds exist within one pixel that cannot be separated in velocity. Because of this, our estimate of the number of clouds per pixel is a lower limit. This limitation can be surpassed by using complementary methods and data, for example, by mapping stellar reddening as a function of distance (see references in the Introduction), or by searching for H i filaments with different orientations in different velocities (Clark 2018). The present method can be used in conjunction with such approaches in order to improve reconstruction of the 3D ISM.

5.4. Future Directions

The data presented here can be used to improve 3D maps of reddening at high Galactic latitude. For example, the number of clouds per pixel can be used as a prior in fitting the line-of-sight reddening profile in models like that of Zucker et al. (2019). Additionally, the IVC and LVC column density maps can be used as spatial templates, extending the analysis used for molecular cloud data to the diffuse ISM (Zucker et al. 2018). Knowledge of the number of components along the line of sight and their properties in combination with starlight polarization can also facilitate 3D mapping of the ISM magnetic field (Panopoulou et al. 2019).

Given the increasing sensitivity of ongoing and upcoming CMB experiments, there is an urgent need for modeling dust foregrounds with unprecedented precision (at the nanokelvin level; Sugai 2020; Abazajian et al. 2019). Our results can be used to inform more realistic models of polarized dust emission in the near future. For example, the number of clouds per line of sight can serve as an informative prior for parametric component separation methods (e.g., Eriksen et al. 2008), where the foreground signal is modeled independently in each pixel of the sky.

6. Data Products

We provide data⁸ in the following forms:

1.
We provide two HEALPix FITS files for the polar areas studied in Section 4.1 and two HEALPix FITS files for the BICEP/Keck region (Section 4.2). For each region, one file contains the column density maps: ${N}_{{\rm{H}}{\rm\small{I}}}$ , ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{LVC}}$ , ${N}_{{\rm{H}}{\rm\small{I}}}^{\mathrm{IVC}}$ . The other file contains maps of the number of cloud statistics: N_clouds and ${{ \mathcal N }}_{c}$ . The maps are given at a resolution of ${N}_{\mathrm{side}}=128$ , corresponding to a pixel size of 046 on each side.
2.
For each of the studied regions, we provide a file in hdf5 format that holds properties of the entire sample of clouds (including velocities that were excluded for the analysis in Section 4). The reported properties are the cloud's column density ( ${N}_{{\rm{H}}\,{\rm\small{I}}}^{\mathrm{Cloud}}$ ), the superpixel index in which it belongs, the centroid velocity (v^Cloud) and second moment (δv^Cloud) of the cloud spectrum, the number of Gaussians that make up the cloud, and the number of maxima in the cloud spectrum.

The data are provided for the optimal choice of bandwidth parameter (four channels, or 5 km s⁻¹). A Python implementation of the method discussed in Section 3 is publicly available on github.⁹ The data are accompanied by a Python tutorial for using the data products.

We caution against smoothing the N_clouds and ${{ \mathcal N }}_{c}$ maps to obtain lower-resolution versions. The reason is simply illustrated with the following example. Let us suppose that one pixel is occupied by a single cloud (c₁), while its neighboring pixel contains two clouds ( ${c}_{1}^{{\prime} },{c}_{2}^{{\prime} }$ ). The value for the number of clouds in the area that covers both pixels would be 1.5 if we naively averaged N_clouds in the region, which is clearly in error. An alternative would be to assign the value of N_clouds that is maximum among the superpixels that make up the area (giving two clouds in this case). This would give the correct result if c₁ is the same cloud as either ${c}_{1}^{{\prime} }$ or ${c}_{2}^{{\prime} }$ , but an incorrect result if the three clouds are all distinct from each other. Lower-resolution products are created by repeating the analysis of Section 3 with a different choice of N_side and can be made available upon request.

7. Summary

We have developed a method to identify clouds of H i, defined as kinematicaly distinct components of emission. The method uses a Gaussian decomposition of H i spectra and searches for overdensities in the pdf of GC velocities within HEALPix superpixels of size ∼0 fdg 5 (N_side 128). The value of the main parameter of the method, the kernel bandwidth, is optimized through tests with real and mock data. These tests show that the method identifies clouds at the correct velocity for over 85% of cases with a false-positive rate of less than 10% (Appendix B).

We implemented this method for the high Galactic latitude sky. The method does well in assigning the majority of the emission to clouds, with very little residual signal left out (median relative residuals of 0.5%; Appendix B). We present maps of the number of components along the line of sight and statistical properties of the identified clouds. We analyze clouds in the velocity range $| {v}_{\mathrm{LSR}}| \lt 70$ km s⁻¹. Throughout the high-latitude sky, the number of clouds per pixel is small: less than six clouds per pixel are found to contribute to the H i column. The northern sky has on average a larger number of clouds per pixel (3) than the southern (2.5). This is mostly due to the prominence of IVCs in the north.

We introduced a measure of the number of clouds that takes into account the column density of different clouds along the line of sight. This quantity, ${{ \mathcal N }}_{c}$ , is more robust to uncertainties in the method. We find that the majority of pixels at high latitude have a low value of ${{ \mathcal N }}_{c}\lt 1.5$ (60% in the north and 90% in the south). The statistics of the number of clouds per pixel are dominated by IVCs, which show large-scale spatial coherence, especially over the northern sky. We find that the presence of IVCs affects the fit to the dust emission SED by Planck.

We also implemented the method on the region targeted by the BICEP/Keck CMB experiment. We find evidence for multiple components along the line of sight. However, for most of the area the column density is dominated by a single component in each pixel. We find that regions with elevated χ² from the fit to the dust SED show evidence for more complex kinematics in the H i gas.

We discuss our results in the context of CMB foreground modeling and highlight the potential importance of IVCs. Our results can aid in informing future 3D dust mapping efforts, as well as CMB foreground analyses. The data are made publicly available as described in Section 6.

The authors thank Kostas Tassis and Vincent Pelgrims for comments on the paper. G.V.P. thanks Alberto Krone-Martins, Marc-Antoine Miville-Deschênes, and Brandon Hensley for helpful discussions. We thank Vincent Guillet for suggesting the use of a column-density-weighted number of clouds metric. We thank Mathieu Remazeilles for providing the χ² map of the dust model from Planck. G.V.P. acknowledges support from the National Science Foundation, under grant No. AST-1611547. Support for this work was provided by NASA through the NASA Hubble Fellowship grant No. HST-HF2-51444.001-A awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Incorporated, under NASA contract NAS5-26555. This work made use of healpy (Zonca et al. 2019) and Astropy (Astropy Collaboration et al. 2013; Price-Whelan et al. 2018).

Facilities: Effelsberg - Effelsberg Radio Telescope, Parkes - , Planck - .

Software: astropy (Astropy Collaboration et al. 2013), healpy (Zonca et al. 2019).

Appendix A: Removing Contamination from Uncorrected Stray Radiation

The part of the HI4PI survey that was conducted from the northern hemisphere (Effelsberg-Bonn HI Survey, EBHIS) contains low-amplitude artifacts that originate from residual uncorrected stray radiation. These artifacts appear as faint increases in brightness with a distinctive square pattern (approximately 5° × 5°, reflecting the scanning pattern). They are usually visible at velocities where Galactic emission is very faint or absent. See also Martin et al. (2015) for more details.

These artifacts have not been removed prior to performing the Gaussian decomposition in D. Lenz et al. (2020, in preparation). Gaussians associated with these artifacts will result in detections of spurious "clouds" by our method. To avoid this, we preprocess the Gaussian decomposition in order to remove as much contamination as possible.

The artifacts were identified by eye in the spectral channel maps of the EBHIS part of the survey. We removed GCs that were associated with artifacts by the following process:

1.
For each noise square, we create a list of all pixels within a circular region of 6° in diameter.
2.
We flag any GC in the decomposition if (a) it is within a region affected by noise (defined as above) AND (b) its mean is within 0.5σ of the velocity channel where a noise square was identified.
3.
All flagged GCs are removed before proceeding with the analysis.

Figure 15 compares the results of the cloud identification method before and after performing the preprocessing step. We compare the map of the mean velocity of clouds identified in a region centered on the north Galactic pole (with a radius of 30°). The map shows only clouds within a range of velocities where artifacts are prominent (40–50 km s⁻¹). The run with preprocessing (right panel) effectively eliminates the square-patterned residuals present when no preprocessing is applied (left panel). We are confident that our method removes all artifacts with maximum amplitude more than 0.19 K and rms (within the square) of 0.12 K or higher.

Appendix B: Validation Tests and Parameter Optimization

B.1. The KDE Bandwidth Parameter

The method presented in Section 3 constructs the pdf of GC velocities by use of a KDE. The choice of KDE size (bandwidth) determines the resolution of the cloud identification in velocity. Figure 16 shows the effect of varying the value of this parameter (from two to four channels) on the resulting clouds. For the finest resolution, there are spurious features in the pdf. Since the method locates clouds as separate peaks in the pdf, the spurious local maxima found when using a small bandwidth lead to the "detection" of many peaks. However, the slightly larger bandwidths of three and four channels produce smoother pdf's that lack these spurious detections. The cloud identification for the bandwidth of three channels is in agreement with that of four channels for this selected pixel.

**Figure 16.** Effect of KDE bandwidth on clouds found in real data. From left to right: bandwidth of two, three, and four channels. The red dashed line is the resulting pdf of the mean velocity of Gaussians in one N_side = 128 pixel. A black line shows the sum of the emission from all Gaussians that have been identified as belonging to the same cloud (there is more than one cloud in this sight line). The selected pixel is centered on l, b = (5609, 6307).
Download figure:
Standard image High-resolution image

fdg — **Figure 16.** Effect of KDE bandwidth on clouds found in real data. From left to right: bandwidth of two, three, and four channels. The red dashed line is the resulting pdf of the mean velocity of Gaussians in one N_side = 128 pixel. A black line shows the sum of the emission from all Gaussians that have been identified as belonging to the same cloud (there is more than one cloud in this sight line). The selected pixel is centered on l, b = (5609, 6307).
Download figure:
Standard image High-resolution image

We investigate the optimal choice of KDE bandwidth through the following test. We generate mock data of Gaussian velocities for 10⁴ superpixels. Each 1D distribution of velocities has properties similar to what can be found in a superpixel of the HI4PI data set at high Galactic latitude. For each superpixel, we first determine the number of peaks in the distribution of Gaussian velocities (clouds) by drawing from a skewed-normal distribution that peaks at two components, with a tail out to eight components. The number of Gaussians that belong to each cloud is drawn from a normal distribution centered on 10, with a standard deviation of 400. We take the absolute value of the distribution to ensure positive-definite numbers. We assume that each cloud has a distribution of GC velocities that is normal. The mean velocity for each cloud is drawn from a normal centered on −30 km s⁻¹, with a standard deviation of 40 km s⁻¹, covering the entire range of velocities observed in the high-latitude sky, with the exception of HVCs. The standard deviation is drawn from a uniform distribution in the range [4,13] km s⁻¹. We repeat this process 10⁴ times, thus generating mock data for 10⁴ superpixels.

We then run the cloud identification code with different values of KDE bandwidth (from two channels to six channels in steps of one). One way of evaluating the method's success is to compare the number of peaks it has identified in a single pixel with the true number of peaks (input in generating the mock data for that pixel). Figure 17 shows the difference between the recovered number of peaks (N_found) and the true number of peaks (N_true) as a function of the latter, for all pixels in the test. When a bandwidth of 2 is used, the code finds more peaks than were originally input. These are spurious peaks owing to the noisiness of the pdf. As the bandwidth increases, fewer and fewer of such spurious peaks are found. At the same time, however, the reduced spectral resolution when using larger bandwidths means that some peaks that were nearby in velocity space cannot be separated and are counted as a single peak. This is why at a bandwidth of 6 there are many pixels in which the method finds less peaks than were input. There is a trade-off in the choice of bandwidth. We opt for an intermediate bandwidth: one that minimizes the amount of spurious peaks, while at the same time not significantly compromising the spectral resolution.

**Figure 17.** Evaluation of the KDE bandwidth from tests on mock distributions of Gaussian parameters. The difference is between the number of peaks found by the method, N_found, and the true number of peaks (N_true) as a function of N_true. Each panel shows the results for a different value of the KDE bandwidth (two, three, four, five, and six channels). A horizontal black dashed line marks 0 (perfect match with true solution). The horizontal red lines mark the 1st and 99th percentile of the distribution of ${N}_{\mathrm{found}}-{N}_{\mathrm{true}}$ (for bandwidth = 2 the 99th percentile is outside the shown range). Larger bandwidths result in less spurious detections of clouds at the cost of reduced spectral resolution.
Download figure:
Standard image High-resolution image

**Figure 17.** Evaluation of the KDE bandwidth from tests on mock distributions of Gaussian parameters. The difference is between the number of peaks found by the method, N_found, and the true number of peaks (N_true) as a function of N_true. Each panel shows the results for a different value of the KDE bandwidth (two, three, four, five, and six channels). A horizontal black dashed line marks 0 (perfect match with true solution). The horizontal red lines mark the 1st and 99th percentile of the distribution of ${N}_{\mathrm{found}}-{N}_{\mathrm{true}}$ (for bandwidth = 2 the 99th percentile is outside the shown range). Larger bandwidths result in less spurious detections of clouds at the cost of reduced spectral resolution.
Download figure:
Standard image High-resolution image

A second way of evaluating the method's success is by measuring how well the peak velocities of recovered clouds compare to the input peak velocities. For this we calculate the difference between the mean velocity of each recovered cloud and its nearest (in velocity) counterpart in the original input of the test. We find that there is no bias; the distribution of velocity differences has a mean of zero channels for all bandwidths used. However, the distribution shows wide wings for a bandwidth of two channels, something that is reduced for larger bandwidths. The 38th percentile of the distribution of velocity differences is four channels for a bandwidth of 2. For larger bandwidths (3, 4, 5, 6) four channels corresponds to the 64th, 85th, 93th, and 96th percentile. Thus, by choosing a bandwidth of 4 or larger, more than 85% of clouds will be identified by the method within four channels of their true value. The percentage of clouds that will be identified within two channels of their true value is 75% for the same choice of bandwidth.

We also examine the false-positive rate, that is, the fraction of peaks that have no true peak within their assigned range of velocities. This fraction is 70%, 30%, and 10% at bandwidths of two, three, and four channels and drops to 3% and 1% at bandwidths of five and six channels, respectively.

From the above tests, we conclude that a bandwidth of 4 is optimal: it combines high enough spectral resolution for identifying separate peaks while minimizing spurious detections and ensures that the recovered clouds will be located within a few channels of their true value.

B.2. The N_side Parameter

The maps presented in the main text were created by applying the method to superpixels of N_side = 128. Here we repeat the analysis of the high Galactic latitude sky to investigate the effect of changing the N_side to 64 (the immediately lower resolution in a HEALPix pixelization).

We expect our method to work well as long as there is a statistically large enough sample of Gaussians per pixel that are used to create the velocity pdf. This is true for both resolution choices. For N_side = 64, the number of Gaussians per pixel ranges from 300 to 2480 for the southern Galactic cap. In 80% of pixels more than 880 Gaussians are used to construct the pdf. For the higher resolution of N_side = 128, the minimum number of Gaussians per pixel is 68 and the maximum is 664.

First, we compare the distributions of N_clouds that result from applying the method with a superpixel N_side of 64 and 128. The mean, 10th percentile, and 90th percentile are 2.8, 2, and 4, respectively, for ${N}_{\mathrm{side}}=128$ . These values change to 3.3, 2, and 5 for ${N}_{\mathrm{side}}=64$ . The main reason for these differences is that small changes in the pdf (that can arise from different superpixel size choices) can easily affect the detection (or not) of low-N_{H i} clouds. By imposing a threshold on the cloud N_{H i} we find improved agreement between the distribution of N_clouds at different N_side. This is shown in Figure 18 (left).

Next, we compare the distribution of ${{ \mathcal N }}_{c}$ for the two resolutions (Figure 18, right). The distributions at different resolution agree for all values of the N_{H i} threshold. This is to be expected, as ${{ \mathcal N }}_{c}$ down-weighs low column density clouds, which are sensitive to the choice of superpixel size. Visual inspection of the ${{ \mathcal N }}_{c}$ maps shows the same large-scale features arising for both choices of superpixel size.

B.3. Evaluation of Residuals

The final step in validating our method is to examine whether there is emission that is not picked up by our cloud identification. Figure 19 shows the difference between the integrated emission of the HI4PI data and the integrated emission from the GCs assigned to clouds for pixels in the north and south sky regions. We find very small residuals, with a median of −0.8%. Only 1.2% of pixels have a relative residual ${{\rm{\Delta }}}_{\mathrm{NH},\mathrm{rel}}\lt -10 \%$ , and 0.005% have ${{\rm{\Delta }}}_{\mathrm{NH},\mathrm{rel}}\gt 10 \%$ . This shows that the clouds we identified are responsible for all but a negligible fraction of the total extinction. In the residual map, the regions where there is subtraction of stray light radiation residuals (Appendix A) appear as patches of underestimated column density (by 10%–20%). Regions where cloud N_{H i} overestimates the total column by more than 20% are associated with point sources or the Magellanic system, where there is significant signal outside the range of velocities used in the Gaussian decomposition. Additionally, the N_{H i} calculated by integrating the H i spectra may in rare cases include summation over negative values, due to presumably improper baseline subtraction. The GCs are forced to have positive amplitude, and this causes, by default, overestimation. This effect is minor.

Maps of the Number of H i Clouds along the Line of Sight at High Galactic Latitude

Article metrics

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

2. Data and Gaussian Decomposition

3. Cloud Identification

4. Results

4.1. Polar Areas

4.1.1. The Number of Clouds per Sight Line at High Galactic Latitude

4.1.2. Signatures of IVCs in Planck Dust Emission

4.2. The BICEP/Keck Field

5. Discussion

5.1. Comparison of the Number of Clouds per Sight Line with Previous Works

5.2. Are IVCs Important for Polarized CMB Foreground Subtraction?

5.3. Limitations of the Current Work

5.3.1. Spatial Coherence

5.3.2. Confusion in Velocity

5.4. Future Directions

6. Data Products

7. Summary

Appendix A: Removing Contamination from Uncorrected Stray Radiation

Appendix B: Validation Tests and Parameter Optimization

B.1. The KDE Bandwidth Parameter

B.2. The N_side Parameter

B.3. Evaluation of Residuals

Footnotes

Maps of the Number of H i Clouds along the Line of Sight at High Galactic Latitude

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

2. Data and Gaussian Decomposition

3. Cloud Identification

4. Results

4.1. Polar Areas

4.1.1. The Number of Clouds per Sight Line at High Galactic Latitude

4.1.2. Signatures of IVCs in Planck Dust Emission

4.2. The BICEP/Keck Field

5. Discussion

5.1. Comparison of the Number of Clouds per Sight Line with Previous Works

5.2. Are IVCs Important for Polarized CMB Foreground Subtraction?

5.3. Limitations of the Current Work

5.3.1. Spatial Coherence

5.3.2. Confusion in Velocity

5.4. Future Directions

6. Data Products

7. Summary

Appendix A: Removing Contamination from Uncorrected Stray Radiation

Appendix B: Validation Tests and Parameter Optimization

B.1. The KDE Bandwidth Parameter

B.2. The Nside Parameter

B.3. Evaluation of Residuals

Footnotes

B.2. The N_side Parameter