Revisiting the Vertical Distribution of H i Absorbing Clouds in the Solar Neighborhood

The vertical distribution of cold neutral hydrogen (H i) clouds is a constraint on models of the structure, dynamics, and hydrostatic balance of the interstellar medium. In 1978, Crovisier pioneered a method to infer the vertical distribution of H i absorbing clouds in the solar neighborhood. Using data from the Nançay 21 cm absorption survey, Crovisier determined the mean vertical displacement of cold H i clouds, 〈∣z∣〉. We revisit that author’s analysis and explore the consequences of truncating the H i absorption sample in Galactic latitude. For any nonzero latitude limit, we find that the quantity inferred by Crovisier is not the mean vertical displacement but rather a ratio involving higher moments of the vertical distribution. The resultant distribution scale heights are thus ∼1.5 to ∼3 times smaller than previously determined. In light of this discovery, we develop a Bayesian Monte Carlo Markov Chain method to infer the vertical distribution of H i absorbing clouds. We fit our model to the original Nançay data and find a vertical distribution moment ratio 〈∣z∣3〉/〈∣z∣2〉 = 97 ± 15 pc, which corresponds to a Gaussian scale height σ z = 61 ± 9 pc, an exponential scale height λ z = 32 ± 5 pc, and a rectangular half-width W z,1/2 = 129 ± 20 pc. Consistent with recent simulations, the vertical scale height of cold H i clouds appears to remain constant between the inner Galaxy and the Galactocentric distance of the solar neighborhood. Local fluctuations might explain the large-scale height observed at the same Galactocentric distance on the far side of the Galaxy.


INTRODUCTION
The structure of the interstellar medium (ISM) is determined by the complex interplay of many physical processes (e.g., McKee & Ostriker 1977).In particular, the vertical distribution (i.e., perpendicular to the Galactic plane) of interstellar matter is set by the local gravitational potential, energy injection by stellar feedback and supernovae, the radiation field, and the physical conditions (e.g., pressure, turbulence, magnetic support) of the ISM (e.g., Kim et al. 2010;Ostriker & Kim 2022).Characterizing the vertical distribution of the various phases of the ISM ultimately constrains models and simulations of ISM physics, star formation, and galactic evolution (e.g., Hopkins et al. 2018).
Neutral atomic hydrogen (H i) is pervasive in the ISM and exists in pressure equilibrium in two stable phases: twenger2@wisc.edu the cold neutral medium (CNM) and the warm neutral medium (WNM) (Field et al. 1969;McKee & Ostriker 1977;Wolfire et al. 2003).It is within clumps of CNM gas that molecular clouds, and ultimately stars, form (McKee & Ostriker 2007).The WNM is easily traced via the 21 cm hyperfine transition of H i in emission, and its vertical distribution has been extensively studied.Levine et al. (2006), for example, find a strong anticorrelation between the vertical scale height of H i emission and spiral structure inferred from the H i surface density.This suggests that the gravitational potential of spiral arms acts to squeeze the WNM (e.g., Kalberla & Kerp 2009).Given its low kinetic and spin temperature (T ∼ 100 K), the CNM is most easily observed via 21 cm absorption toward background continuum sources (e.g., Dickey et al. 1983) or in self-absorption toward background H i emission (e.g., Kavars et al. 2003).Such H i absorption observations are challenging to interpret in the Milky Way ISM due to the blending of H i features in both emission and absorption along sight-lines through the Galactic midplane (e.g., Dickey et al. 2003;McClure-Griffiths et al. 2023, and references therein).Nonetheless, several studies have explored the differences between the vertical H i distribution as traced by H i emission and absorption.Dickey et al. (2009), for example, explore the Galactocentric radial and azimuthal variation in H i scale height in the outer Galaxy.They find similar scale heights between the two H i phases.
In the inner Galaxy, Dickey et al. (2022) use H i absorption observations from the Galactic Australia Square Kilometer Array Pathfinder (GASKAP) pilot survey to constrain the vertical scale height of CNM clouds.The complications of interpreting H i absorption spectra in the inner Galaxy are mitigated by considering only those components near the tangent point, where distance ambiguities are minimized.They find a compact vertical CNM cloud distribution, with an inferred scale height σ z ≃ 50 − 90 pc.This distribution closely matches the scale height of the thin molecular cloud layer in the inner Galaxy, σ z ≃ 40 − 60 pc (Bronfman et al. 1988;Heyer & Dame 2015;Su et al. 2021).Furthermore, Dickey et al. (2022) infer a scale height of σ z ≃ 200 − 700 pc for H i absorbing clouds beyond the solar circle in the fourth Galactic quadrant, which suggests significant flaring of the CNM in the Galactic outskirts.This is consistent with Galactic simulations where the H i disk becomes more vertically extended at the edges of galaxies due to the decreased pressure of the ISM, although the effect is seen only in the warm H i and not the CNM (Smith et al. 2023).Dickey et al. (2022) also estimate the vertical scale height of CNM clouds near the solar orbit by considering those H i absorption detections with velocities relative to the local standard of rest (LSR) near 0 km s −1 .At these velocities, their sample includes both local H i absorption as well as H i absorption near the solar Galactocentric distance, R 0 , along a Galactic longitude ℓ ≃ 340 • , which, for R 0 = 8.34 kpc (Reid et al. 2014), corresponds to a heliocentric distance d ≃ 15.6 kpc.They find that the latitude distribution of these H i absorption features is a blend of two Gaussian distributions with different widths, presumably corresponding to the local and distant CNM clouds.Considering only the narrow component, they estimate a vertical scale height of σ z ≃ 160 pc for the CNM at the solar orbit.
There are few experiments investigating the vertical distribution of the CNM in the solar neighborhood.In fact, nearly every reference that we could find in the literature ultimately pointed back to a single experiment: Crovisier (1978).In this work, the authors develop a statistical model to infer the vertical scale height of local H i absorbing clouds.Using data from the Nançay 21-cm absorption survey (Crovisier et al. 1978), Crovisier (1978) constrains the mean absolute vertical displacement of CNM clouds in the solar neighborhood as ⟨|z|⟩ = 107 ± 29 pc.They claim that the data are inconsistent with an exponential vertical distribution, and the inferred ⟨|z|⟩ implies a scale height σ z = 134 ± 36 pc for a Gaussian vertical distribution.Belfort & Crovisier (1984) expand this experiment to include more H i absorption data, but the estimated mean absolute vertical displacement is similar: ⟨|z|⟩ = 92±12 pc for clouds with 5 • < |b| < 30 • .These results are consistent with the Dickey et al. (2022) inference at R 0 in the fourth Galactic quadrant.Altogether we begin to develop a clear picture of a vertical CNM distribution that is thin in the inner-Galaxy, mimicking the molecular gas distribution, and slowly flares with Galactocentric distance (e.g., see Section 6.4.2. and Figure 10 in McClure-Griffiths et al. 2023).
In this work, we revisit the Crovisier (1978) analysis in preparation for its application to the ever-growing sample of H i absorption spectra (e.g., hibigcat, McClure-Griffiths et al. 2023) and next-generation H i surveys (e.g., GASKAP, Dickey et al. 2013).In particular, we discover a statistical bias that alters the interpretation of the Crovisier (1978) result, which we demonstrate through both a simple simulation as well as a robust derivation.We develop a Bayesian model that performs a similar inference as in the original work and, using the original Nançay data, implement a Monte Carlo Markov Chain (MCMC) method to infer the vertical distribution of CNM clouds in the solar neighborhood.Finally, we discuss the implications of these results in light of modern Galactic ISM simulations.

METHOD
Our goal is to infer the shape of the vertical distribution of H i absorbing clouds (or any other tracer) using only their sky positions and radial velocities.Crovisier (1978) introduce a statistical method to perform this inference, which we briefly summarize here (see their Section 2 for more details).For convenience and simplicity, we adopt a slightly different notation: d is the heliocentric distance and z ≡ |z| is the absolute vertical displacement of the cloud from the Galactic midplane.We assume that the Sun is located at z = 0 and the midplane is defined by Galactic latitude b = 0 • .
The observed radial velocity of a cloud is related to its distance due to Galactic rotation.For a cloud located at a given Galactic longitude and latitude, (ℓ, b), the radial velocity in the LSR frame is where V ⊙,LSR is the solar motion with respect to the LSR projected onto the line of sight, V rot is the cloud's motion due to Galactic rotation projected onto the line of sight, and V t is the random velocity of the cloud along the line of sight.Crovisier (1978) use a local approximation for Galactic rotation defined by the first Oort constant, but any model that predicts V rot for a given distance and sky position is appropriate.
Given that radial velocity is only a weak indicator of distance in the solar neighborhood, d cannot be determined for any individual cloud with sufficient precision to infer the shape of the vertical distribution of clouds.If we assume a plane-parallel model for clouds in the solar neighborhood, then, Crovisier (1978) argues, we can relate the first moment of a cloud's distance probability distribution (i.e., the expectation value), ⟨d⟩, to the first moment of the vertical distribution of clouds (i.e., the mean absolute vertical displacement), ⟨|z|⟩, by With a sufficiently large sample of clouds, the central limit theorem suggests that we can infer ⟨|z|⟩ from these distance point estimates despite their imprecision.Finally, we assume that V t is truly random; that is, we assume that there are no systematic departures from the Galactic rotation model in our sample of clouds.Crovisier (1978) uses a least-squares algorithm to minimize where the sum is taken over all of the clouds, in order to infer the mean absolute vertical displacement, ⟨|z|⟩, the root-mean-square of V t , and the parameters that define the solar non-circular motion (V ⊙,LSR ) and Galactic rotation (V rot ).

The Bias
We must first derive equation 2, which relates the first moment of a cloud's distance probability distribution to the first moment of the vertical distribution of clouds, before we can demonstrate the bias introduced by a subtle assumption.Consider a population of clouds that follows an absolute vertical displacement distribution defined by the probability density P Z (z).We adopt a notation where the arguments of P represent random variables and the subscripts of P represent the space over which the probability is defined.In this case, z ∈ Z = R + .Assuming that the clouds are distributed uniformly across the Galactic plane (i.e., a plane-parallel model), then the distance probability density for a cloud with Galactic latitude b is where z = d sin |b|.Thus, P Z (z) = P D (d)/ sin |b| for sin |b| > 0. The first moment of the distance probability distribution is for sin |b| > 0, where ⟨|z|⟩ is the first moment of the absolute vertical displacement distribution.Presumably, this is how Crovisier (1978) arrive at equation 2.
A simple simulation reveals a problem with equation 2 when we truncate the data in Galactic latitude.Crovisier (1978) motivates the need for latitude truncation in order to (1) restrict the sample to local H i clouds and (2) remove complex H i absorption spectra in the Galactic plane.We demonstrate the problem by drawing a random sample of clouds from a uniform distribution in the Galactic plane out to some maximum midplane distance, R max , such that (x, y) ∼ U (−R max , R max ), and a half-normal distribution in absolute vertical displacement with some variance, σ 2 z , such that z ∼ N 1/2 (σ 2 z ).For a half-normal distribution, the standard deviation is related to the first moment by ⟨|z|⟩ = σ z 2/π.The midplane distance is r = x 2 + y 2 and the Galactic latitude is b = tan −1 (z/r).For a random sample of 1,000,000 clouds with R max = 10 kpc and σ z = 100 pc, we use equation 2 to calculate ⟨|z|⟩ = ⟨d sin |b|⟩ ≃ 80 pc, which is consistent with the expectation for a halfnormal vertical distribution, ⟨|z|⟩ = σ z 2/π ≃ 80 pc.If we remove the clouds with |b| < b min , however, then the discrepancy emerges.For the same simulated data with b min = 1 • , for example, we find ⟨d sin |b|⟩ ≃ 120 pc.For b min ≥ 2 • , we find ⟨d sin |b|⟩ ≃ 160 pc.There is a factor of two difference from the equation 2 prediction that appears to be independent of b min for b min ≥ 2 • (i.e., for R max ≫ σ z / tan b min ). Figure 1 demonstrates the result of this simple simulation.

The Correction
Equation 2 is only correct for b min = 0 • .For any b min > 0 • , we must derive the distance expectation value while properly considering the joint dependency between d, z, and b.Consider again a plane-parallel geometry where clouds are distributed uniformly in the Galactic midplane out to a maximum midplane distance R max and follow some absolute vertical displacement distribution P Z (z).The joint probability distribution for a cloud's location in the heliocentric Galactic Cartesian frame is The transformation to the cylindrical frame is and to the spherical frame is Marginalized over distance, the probability density is If we require that P Z (z) → 0 as z → R max tan |b| (i.e., the artificially truncated midplane distribution does not appreciably truncate the z distribution along the line of sight toward b), then where ⟨|z| 2 ⟩ = ∞ 0 z 2 P Z (z) dz is the second moment of the z distribution.The conditional probability distribution -the probability of finding a cloud at a given distance along a given line of sight -is The first moment of this distribution is Thus we find that, for any b min > 0 • , the distance expectation value is not directly related to the first moment of the absolute vertical displacement distribution, ⟨|z|⟩, as in equation 2, but rather the ratio of the third moment (skewness) to the second moment (variance), Equipped with this corrected derivation, we can test the result against our simple simulation.For a half-normal distribution, ⟨|z|⟩ = σ z 2/π, ⟨|z| 2 ⟩ = σ 2 z , and For our simulation parameters R max = 10 kpc and σ z = 100 pc, the assumptions of our derivation are valid for b min ≥ 2 • since P Z (R max tan b min ) ≃ 0 and P D|LB (R max / cos b min |ℓ, b min ) ≃ 0. The moment ratio is related to the standard deviation for a half-normal distribution by Here we see the erroneous factor of two.For our simulation parameters, we predict ⟨d sin |b|⟩ ≃ 160 pc, which is consistent with our simulation results.
For the other vertical distribution shapes considered by Crovisier (1978), we calculate the factor by which they overestimate the distribution width due to latitude truncation, which is simply the ratio (⟨|z| 3 ⟩/⟨|z| 2 ⟩)/⟨|z|⟩.As demonstrated for the half-normal distribution, this factor is 2. For an exponential distribution with scale height λ z , the moments are ⟨|z| n ⟩ = n!λ n z , and thus the factor is 3. Finally, for a rectangular distribution parameterized by the half-width, W z,1/2 , the moments are ⟨|z| n ⟩ = W n z,1/2 /(n + 1), so the factor is 1.5.Thus, for a Gaussian, exponential, or rectangular distribution, Crovisier (1978) overestimates the distribution width by a factor of 2, 3, or 1.5, respectively, due to the subtle bias introduced by latitude truncation.by Crovisier et al. (1978), so we assign a LSR velocity uncertainty of 1.0 km s −1 .
Note-Table 1 is published in its entirety in the machinereadable format.A portion is shown here for guidance regarding its form and content.

DATA & ANALYSIS
Here we use the original Crovisier et al. (1978) data to reanalyze the vertical distribution of H i absorpting clouds in the solar neighborhood in light of the aforementioned latitude truncation bias.Our objectives are to (1) reproduce the Crovisier (1978) original results, (2) perform a similar least squares analysis using a modern Galactic rotation model, and (3) infer the vertical cloud distribution using a robust, outlier-tolerant Bayesian model that incorporates the uncertainties in the assumed Galactic rotation model.

Data
The Crovisier (1978) data come from the Nançay 21cm absorption survey (Crovisier et al. 1978).At 21 cm, the Nançay radio telescope has a half-power beam with of 3 ′ .1 in right ascension and 21 ′ in declination.The survey results contain H i absorption spectra toward 819 extragalactic continuum sources with a spectral resolution of 6 kHz (∼1.25 km s −1 ).Unfortunately, we were unable to find these data in a machine readable format, so we transcribed the data by hand from Table 2 in Crovisier et al. (1978).Following the selection criteria in Crovisier (1978), we only copied rows for which the Galactic latitude is |b| > 10 • and the absorption detection is not marked as "dubious" (remark "P" in their Table 2).The source names, Galactic coordinates, and LSR velocities for these 297 absorption features are listed in Table 1.Crovisier (1978) claims to have 299 absorption detections meeting these criteria, but we are unable to reproduce this number.The absorption parameters for three detections (two toward 0538+49 and one toward 0725+14) were determined "by eye" by Crovisier et al. (1978), and thus do not have an associated velocity uncertainty.Figure 2 shows the Galactic longitude, latitude, and LSR velocity distributions for this sample.There are minor differences between our LSR velocity distribution and the similar distribution in Crovisier (1978) (the top panel of their Figure 1), which we attribute to the slightly different sample size of our transcribed data.

Least Squares
We begin our analysis using the least squares method described by Crovisier (1978) in order to reproduce their results and verify the accuracy of the transcribed data.Following Crovisier (1978), we adopt a local approximation for Galactic rotation defined in terms of Oort's A constant, where ℓ 0 is the nodal deviation and A = 15 km s −1 kpc −1 .The solar motion with respect to the LSR is defined as where U ⊙ , V ⊙ , and W ⊙ are the solar motion velocities in the direction of the Galactic center, solar orbit, and north Galactic pole, respectively.We assign each cloud to its distance expectation value (equation 13) given the z distribution moment ratio, ⟨|z| 3 ⟩/⟨|z| 2 ⟩, and we use a least squares algorithm to search for the solar motion velocities, z distribution moment ratio, and nodal deviation that minimize the velocity residuals (equation 3).Note that, following Crovisier (1978), we ignore the LSR velocity uncertainties of individual H i absorption detections since the velocity residuals are dominated by the random component, V t .The results of this analysis, applied to the same Galactic latitude-selected and LSR velocity-selected subsets of data listed in Tables 1 and 2 of Crovisier (1978), are presented in Table 2.For each subset, we list the number of absorption features, N , optimized model parameters, U ⊙ , V ⊙ , W ⊙ , ℓ 0 , and ⟨|z| 3 ⟩/⟨|z| 2 ⟩, and root mean square residual velocity, σ V .
As expected, the results of our reproduction of the Crovisier (1978) least squares analysis are nearly identical to the original results, except that we now correctly interpret the inferred quantity as ⟨|z| 3 ⟩/⟨|z| 2 ⟩ rather than ⟨|z|⟩.This similarity suggests that we have accurately transcribed the original data and have implemented the least squares analysis correctly.We are unable to rectify the minor differences in sample size, and we attribute the small differences in inferred values to the slightly different samples.

Updated Least Squares
A modern re-analysis of the Crovisier (1978) data is warranted in order to quantify the effect of an updated Galactic rotation model.Here we adopt the A5 model of Reid et al. (2019), which is an axisymmetric Galactic rotation curve fit to the positions and kinematics of masers associated with high-mass star forming regions.Since our sample of clouds is limited to the solar neighborhood, the most important difference between this model and the local approximation used in the previous analysis is the shape of the rotation curve at the solar Galactocentric distance.This shape is parameterized by Oort's A constant, which, for the Reid et al. (2019) model, is A ≃ 15.2 km s −1 kpc −1 and not so different from the previous assumption.Nonetheless, by incorporating this non-local Galactic rotation model, we can be confident that second-order effects for potentially distant H i absorbing clouds are properly considered.
We use another least squares analysis to infer the model parameters by minimizing the velocity residuals given by equation 3 and again ignoring the LSR velocity uncertainties of individual clouds.We fix R 0 and the Galactic rotation model parameters a 2 and a 3 to the mean point estimates from the Reid et al. (2019) A5 model but allow the parameters defining the solar motion with respect to the LSR to vary.This least squares method has one fewer free parameter (the nodal deviation) than the original Crovisier (1978) analysis.Table 2 shows the results of the least squares analysis using the Reid et al. (2019) Galactic rotation model for the same subsets of data.Unsurprisingly, since the local shape of the Galactic rotation curve is similar between both anal-  yses, the inferred distribution of H i absorbing clouds is nearly the same between the two methods.

MCMC Moment Ratio
There are two important deficiencies in the aforementioned least squares analyses: outliers and uncertainties in the Galactic rotation model.As noted by Crovisier (1978) and demonstrated in Table 2, the inferred shape of the H i absorbing cloud distribution strongly depends on the LSR velocities included in the sample.This dependency suggests that the data are not sufficiently described by the model; outliers bias the inference.Furthermore, we use point estimates for the Galactic rotation model parameters (i.e., Oort's A constant as in Crovisier (1978), or fixed R 0 , a 2 , and a 3 in the Reid et al. (2019) A5 model), yet these parameters have associated uncertainties that should be propagated into our inferred z distribution.A Bayesian model is well-suited to overcome these complications.
Here we develop a Bayesian model, parameterized in terms of the z distribution moment ratio, ⟨|z| 3 ⟩/⟨|z| 2 ⟩, to predict the LSR velocity distribution of H i absorbing clouds.As in the least squares analyses, each cloud is assigned to its distance expectation value following equation 13, at which we evaluate the expected LSR velocity from the Reid et al. ( 2019) A5 Galactic rotation model, V model .By assigning each cloud to its distance expectation value, we are marginalizing over the latent distance of each cloud, which we do not care to infer, under the assumption that it is poorly constrained by the velocity.We assume that the random component of the LSR velocity residuals is normally distributed with variance σ 2 V , so the likelihood that a cloud with observed LSR velocity V i is drawn from the model is a normal distribution with mean V i − V model and variance σ 2 V .To account for the possibility of outliers in our data, we include a second, zero-centered, normally-distributed component in the likelihood distribution with a variance σ 2 V,outlier .The total likelihood that cloud i is drawn from this mixture model is thus where w = (w 0 , w 1 ) are mixture weights that quantify the fraction of data belonging to the model or outlier population, respectively, such that w 0 + w 1 = 1, and θ is the set of model parameters that defines the model, including ⟨|z| 3 ⟩/⟨|z| 2 ⟩ and the Galactic rotation model parameters.We again ignore the LSR velocity uncertainties of individual clouds in this likelihood calculation since they are typically much smaller than σ V and thus do not contribute significantly to the likelihood determination.
In addition to the likelihood, we must also specify our prior knowledge of the model parameters.For the z distribution moment ratio, we adopt a k = 2 gamma distribution with a scale parameter θ = 50 pc.The gamma distribution is a convenient specification both because it is broad (variance kθ 2 ) -and thus encompasses a reasonable range of values -as well as because it approaches zero probability for unphysically small moment ratios.For the likelihood distribution widths, σ V and σ V,outlier , we use half-normal distributions with widths 10 km s −1 and 50 km s −1 , respectively, which again are quite broad and encompass physically reasonable expectations.The mixture weight prior is a Dirichlet distribution with equal concentration.Finally, for the Galactic rotation model parameters, we adopt as a prior a multivariate normal distribution with means equal to the Reid et al. (2019) A5 model posterior point estimates and covariances inferred from the A5 model posterior distribution standard deviations and correlation matrix (Reid, M., private communication).Random samples from these prior distributions (i.e., prior predictive checks) cover the range of observed data, and thus we are confident that the chosen prior distributions are reasonable.
We draw posterior samples from our Bayesian model using the Hamiltonian Monte Carlo No-U-Turn sampler (NUTS; Hoffman & Gelman 2014) implemented in pyMC, a probabilistic programming framework in python (Oriol et al. 2023).To diagnose convergence and ensure that the sampler is not confined to local maxima of the posterior distribution, we draw 500 tuning samples followed by 500 posterior samples across four independent Markov chains.We check that the chains are converged by computing the R statistic, which compares inter-chain variance to that across all chains (Vehtari et al. 2021), as well as the effective sample size of each parameter.In this and all subsequent MCMC analyses, we find R < 1.01 and effective sample sizes >1000 for all parameters, which is indicative of convergence.Furthermore, we draw random samples from the posterior distribution (i.e., posterior predictive checks) and confirm that the sampled posterior distribution is able to reproduce the observations.
The results of our moment ratio MCMC analysis are listed in the first column of Table 3.Here, we use all of the H i absorption spectra with |b| > 10 • ; we do not exclude the high-velocity features since our model is capable of identifying them as outliers.The model parameters include the z distribution moment ratio, ⟨|z| 3 ⟩/⟨|z| 2 ⟩, the fraction of data belonging to the outlier distribution, f outlier ≡ w 1 , the width of the model and outlier distributions, σ V and σ V,outlier , respectively, and the parameters that describe the Reid et al. ( 2019) A5 Galactic rotation model: the solar Galactocentric distance, R 0 , the shape parameters, a 2 and a 3 , and the solar motion with respect to the LSR, U ⊙ , V ⊙ , and W ⊙ .Although we do not sample the shape parameters directly, we infer the Gaussian scale height, σ z , exponential scale height, λ z , and rectangular distribution halfwidth, W z,1/2 , from the moment ratio posterior distributions and include them in Table 3.A visual inspection of the marginalized posterior distributions for each parameter reveals that they are uni-modal and approximately normally distributed.The parameter values and uncer-tainties in Table 3 represent the posterior mean point estimates and 68% highest posterior density credible intervals, respectively.

MCMC Shape Parameter
If we assume the shape of the z distribution of H i absorbing clouds, then we can craft a Bayesian model to infer the shape parameter of that distribution directly.Such a model is beneficial in that it allows us to perform model comparison and determine which distribution shape best represents the data.The only difference between this model parameterization and the former is the relationship between the z distribution shape parameter and the distribution moment ratio.For a Gaussian, exponential, and rectangular distribution parameterized by Gaussian scale height σ z , exponential scale height λ z , and rectangular half-width W z,1/2 , respectively, the z distribution moment ratio, ⟨|z| 3 ⟩/⟨|z| 2 ⟩, is 2σ z 2/π, 3λ z , and 3W z,1/2 /4, respectively.We use as a prior on the shape parameters a k = 2 gamma distribution with a scale parameter θ = 50 pc, which again is a convenient distribution both because the prior probability goes to zero for unphysically small shape parameters as well as because the distribution is wide enough to cover reasonable shape parameter values.Prior predictive checks demonstrate that our choice of priors reasonably represents the data.
The results of our shape parameter MCMC analyses are listed in the last three columns of Table 3 for the Gaussian, exponential, and rectangular models.Again, we include all H i absorption features with |b| > 10 • .We list the derived z distribution moment ratio, although this parameter is not sampled directly for these models.All models converge after 500 tuning samples and 500 posterior samples as determined by R < 1.01 for each parameter, an effective sample size > 1000 for each parameter, and a visual inspection of the posterior predictive checks.The marginalized posterior distributions are again uni-modal and normally distributed in all cases, so the parameter values and uncertainties in Table 3 represent the posterior mean point estimates and 68% highest posterior density credible intervals, respectively.
Finally, we determine which of the three assumed z distribution shapes best represents the data using leaveone-out cross-validation (Vehtari et al. 2017) as implemented in ArviZ, a python package for exploratory analysis of Bayesian models (Kumar et al. 2019).For each shape parameter model, the last row of Table 3 lists the expected log pointwise predictive density (ELPD), which is a measure of the out-of-sample predictive fit.
The model with the largest ELPD is preferred.

Vertical Distribution
Although the statistical bias introduced by latitude truncation alters the interpretation of the Crovisier (1978) analysis, the statistical model that they develop to infer the vertical distribution of H i absorbing clouds is robust.Based on our least squares implementation of this model and using the original Crovisier et al. (1978) data, we are able to reproduce the Crovisier (1978) results and infer the shape of the z distribution of cold H i clouds.Depending on the LSR velocity selection of the sub-sample, we infer a z distribution shape defined by the ratio of the third moment (skewness) to the second moment (variance), ⟨|z| 3 ⟩/⟨|z| 2 ⟩ ≃ 110 ± 15 pc (see Table 2).For an assumed Gaussian distribution, this moment ratio corresponds to a scale height σ z ≃ 70 pc, which is a factor of two smaller than that inferred by Crovisier (1978) due to the latitude truncation bias discussed in Section 2.1.
The least squares analysis results are sensitive to the LSR velocity criteria of each subsample, which implies that the data include outliers that are not explained by the model.Therefore, we develop two Bayesian models -one to infer the z distribution moment ratio and one to infer the shape parameter of an assumed distribution directly -that include an outlier component in the likelihood distribution.We find that each model converges to a similar z distribution moment ratio, ⟨|z| 3 ⟩/⟨|z| 2 ⟩ ≃ 97 ± 15 pc, which corresponds to a Gaussian scale height σ z = 61 ± 9 pc, an exponential scale height λ z = 32 ± 5 pc, and a rectangular halfwidth W z,1/2 = 129 ± 20 pc (see Table 3).The ELPD is nearly identical between models, which indicates that the data are not able to distinguish between these three assumed distributions.Furthermore, we find that ∼13% of the Crovisier et al. (1978) H i absorption detections appear to be poorly explained by our model and instead belong to an outlier population.Figure 3 shows the Galactic longitude-velocity distribution of the Crovisier et al. (1978) H i absorption detections, where we highlight those data with a > 50% probability of being a member of the outlier population based on samples from the posterior distribution of the moment ratio model.Clearly, the outliers belong to a population of intermediate-velocity H i absorption features.These detections may represent a collection of intermediatevelocity clouds (e.g., Wakker 2004;Lehner et al. 2022), perhaps the remnants of infalling high-velocity clouds (e.g., Begum et al. 2010), but we defer further investigation to a future work.

Implications
The fact that we have reduced the apparent vertical distribution of cold H i clouds by a factor of two compared to the original Crovisier (1978) analysis has potentially significant implications for our understanding of hydrostatic balance in the solar neighborhood ISM.Our result suggests that, in the solar neighborhood, the scale height of the CNM is nearly the same as that of the molecular gas layer (σ z ≃ 50 pc, e.g., Heyer & Dame 2015).This complicates the already-limited characterization of the vertical distribution of the CNM given the few observational constraints (McClure-Griffiths et al. 2023) and draws into question the claimed variation in CNM scale height with Galactocentric distance.
It is possible that the solar neighborhood gives a biased view of the vertical distribution of CNM clouds.Consider, for example, the analysis of Dickey et al. (2022), which uses H i absorption detections near 0 km s −1 to infer the vertical distribution of cold H i clouds near the solar Galactocentric distance.They find a latitude distribution of H i absorption features composed of two Gaussian distributions, the broader of which they attribute to local H i and the narrower of which they attribute to H i at the solar Galactocentric distance on the far side of the Galaxy.Given a heliocentric distance d ≃ 15.6 kpc, the latitude distribution width of ∼0 • .59corresponds to a z distribution scale height σ z = 160 pc.This is more than a factor of two greater than our inferred CNM scale height in the solar neighborhood.Since Dickey et al. (2022) se-lect their sample kinematically, they may be biased by intermediate-velocity cloud interlopers that exist at a variety of distances and thus complicate the interpretation of the latitude distribution.It is also possible, however, that the CNM scale height varies due to local effects like the stellar surface density and star formation history.
Recent isolated galaxy simulations have attempted to overcome the resolution and physical limitations of galaxy evolution models in order to predict realistic distributions of different ISM phases on sub-parsec scales.One such simulation, that of Smith et al. (2023), uses the hydrodynamic galaxy models of Tress et al. (2020) and Tress et al. (2021) to reach parsec resolution (or better) in the cold gas.The Smith et al. (2023) simulation assumes a constant metallicity across the disk, but they test the effect of a varying FUV interstellar radiation field that is tied to the time-averaged star formation rate surface density distribution compared to a constant radiation field.In both cases, they find that the CNM scale height is smaller than the total H i scale height everywhere outside of the Galactic center.This is in contrast to the Dickey et al. (2009) study which finds a similar scale height between the WNM and CNM in the outer Galaxy.Furthermore, the Smith et al. (2023) simulations demonstrate that, although the azimuthallyaveraged total H i scale height increases with Galactocentric distance, the azimuthally-averaged CNM scale height remains roughly constant at σ z ≃ 50 pc for the varying radiation field and increases slightly from ∼50 pc to ∼100 pc for the constant radiation field.These simulations are thus consistent with our solar neighborhood inference, although their Figure 10 suggests that there could be local variations in the CNM scale height on the order of a factor of ∼2.Such local variations could explain the Dickey et al. (2022) scale height at the solar Galactocentric distance in the fourth quadrant.

Galactic Rotation
In addition to the vertical distribution of H i absorbing clouds, our Bayesian model also infers the Reid et al. (2019) Galactic rotation model parameters, including the solar Galactocentric distance.We use as priors a multivariate normal distribution defind by the means, standard deviations, and correlation coefficients of the Reid et al. (2019) A5 model posterior distribution.Except for the solar peculiar velocity components, our model posterior distribution means match the prior distribution means within the uncertainties and the widths of the posterior distributions are similar to those of the priors (see Table 3).These two facts suggest that the Crovisier et al. (1978) data are not able to further constrain R 0 and the Galactic rotation model parameters.On the other hand, both our least squares analysis and Bayesian models converge to similar estimates for the solar peculiar velocity components, (U ⊙ , V ⊙ , W ⊙ ) = (12.0± 0.5, 14.4 ± 0.7, 8.2 ± 0.5), which are more precise than the prior distributions.These parameter estimates are consistent with those inferred from stellar kinematics (Schönrich et al. 2010;Zbinden & Saha 2019), although the stellar results appear sensitive to both the method of analysis and the stellar sample (e.g., Ding et al. 2019).

Code Package
The Bayesian model, least squares, and MCMC methods used this work are provided to the community as an open source package: kinematic scaleheight 1 (Wenger 2024).Although we have only demonstrated it as applied to H i absorption data, it is possible to use these models and algorithms to infer the vertical distribution of any tracer in the solar neighborhood for which there is a reasonable expectation that the tracer follows a given Galactic rotation model.

CONCLUSION & FUTURE WORK
The vertical distribution of the ISM, perpendicular to the Galactic plane, is a constraint on models of ISM hydrostatic balance and galaxy evolution.In particular, the distribution of CNM clouds appears to be set gravitationally by the stellar surface density and kinematically by the energy injection from star formation (McKee & Ostriker 1977;Smith et al. 2023).Due to the observational challenges associated with identifying individual CNM clouds, few studies have explored their vertical distribution across the Galactic disk.In the solar neighborhood, the statistical analysis of Crovisier (1978) and the extension by Belfort & Crovisier (1984) are the only such studies.
We identify a bias in the Crovisier (1978) method that results in an overestimate of the scale height of local CNM clouds by a factor of ∼1.5 to ∼3.With a corrected least squares method as well as a novel Bayesian model, we use the Crovisier et al. (1978) H i absorption data to infer that the vertical distribution of H i absorption in the solar neighborhood is best described by ⟨|z| 3 ⟩/⟨|z| 2 ⟩ = 97 ± 15 pc, where ⟨|z| 3 ⟩ and ⟨|z| 2 ⟩ are the third (skewness) and second (variance) moments of the distribution, respectively.This moment ratio corresponds to a Gaussian scale height σ z = 61 ± 9 pc, an exponential scale height λ z = 32 ± 5 pc, and a rectangular half-width W z,1/2 = 129 ± 20 pc, although the Crovisier et al. (1978) data are unable to distinguish be-tween these three possible vertical distribution shapes.Furthermore, our model suggests that ∼13% of the local H i absorption detections may belong to a population of intermediate-velocity cloud interlopers, since they are not well-described by our model.Both the least squares and Bayesian models converge to estimates of the solar peculiar motion that are consistent with those inferred from stellar kinematics (e.g., Schönrich et al. 2010): (U ⊙ , V ⊙ , W ⊙ ) = (12.0± 0.5, 14.4 ± 0.7, 8.2 ± 0.5).Our analysis tools are provided to the community as an open-source software package (Wenger 2024).
The scale height of CNM clouds in the solar neighborhood is similar to both the inferred inner-Galaxy CNM scale height (σ z ≃ 50 − 90 pc, e.g., Dickey et al. 2022) as well as the scale height of the molecular gas layer in the solar neighborhood (σ z ≃ 50 pc, e.g., Heyer & Dame 2015).These results are consistent with simulations, which suggest that the scale height of cold H i should remain roughly constant out to the edge of the star forming disk (Smith et al. 2023).At the solar Galactocentric distance on the far side of the Galaxy, however, Dickey et al. (2022) find a 2 − 3× larger scale height.Either there are significant variations in the scale height due to local effects, or their inference is biased due to intermediate-velocity H i absorption interlopers that complicate the interpretation of their kinematically-selected sample.A comprehensive analysis of modern H i absorption observations in the solar neighborhood is warranted and will be the subject of an upcoming study.

Figure 1 .
Figure 1.The binned heliocentric (x, z) distribution of simulated clouds drawn from a uniform distribution in x and y and a half-normal distribution in z with standard deviation σz = 100 pc.Note that the scale of the x axis is much larger than that of the z axis.The top panel includes all clouds whereas the bottom panel only includes clouds with b > bmin = 10 • .The solid black lines indicate ⟨|z|⟩ = σz 2/π ≃ 80 pc for a half-normal distribution.The dashed red lines indicate ⟨d sin |b|⟩ derived from the simulated clouds.Clearly, ⟨|z|⟩ ̸ = ⟨d sin |b|⟩ for bmin > 0 • .

Figure 3 .
Figure 3. Galactic longitudes and LSR velocities, VLSR, of the Crovisier et al. (1978) H i absorption detections.Those data with a > 50% posterior probability of being outliers in the moment ratio model are highlighted in blue.
This research has made use of NASA's Astrophysics Data System Bibliographic Services.T.V.W. is supported by a National Science Foundation Astronomy and Astrophysics Postdoctoral Fellowship under award AST-2202340.D.R.R. is supported by a National Science Foundation Astronomy and Astrophysics Postdoctoral Fellowship under award AST-2303902.The authors acknowledge Interstellar Institute's program "II6" and the Paris-Saclay University's Institute Pascal for hosting discussions that nourished the development of the ideas behind this work, as well as J. M. Dickey for providing comments that improved the quality of this paper.Software: ArviZ (Kumar et al. 2019), pyMC (Oriol et al. 2023), Matplotlib (Hunter 2007), NumPy & SciPy (van der Walt et al. 2011), kinematic scaleheight (Wenger 2024),

Table 1 .
H i Absorption Sample a These absorption parameters were determined "by eye"

Table 2 .
Least Squares Analysis Results

Table 3 .
MCMC Analysis Results