Bayesian inversion with α-stable priors

We propose using Lévy α-stable distributions to construct priors for Bayesian inverse problems. The construction is based on Markov fields with stable-distributed increments. Special cases include the Cauchy and Gaussian distributions, with stability indices α = 1, and α = 2, respectively. Our target is to show that these priors provide a rich class of priors for modeling rough features. The main technical issue is that the α-stable probability density functions lack closed-form expressions, and this limits their applicability. For practical purposes, we need to approximate probability density functions through numerical integration or series expansions. For Bayesian inversion, the currently available approximation methods are either too time-consuming or do not function within the range of stability and radius arguments. To address the issue, we propose a new hybrid approximation method for symmetric univariate and bivariate α-stable distributions that is both fast to evaluate and accurate enough from a practical viewpoint. In the numerical implementation of α-stable random field priors, we use the constructed approximation method. We show how the constructed priors can be used to solve specific Bayesian inverse problems, such as the deconvolution problem and the inversion of a function governed by an elliptic partial differential equation. We also demonstrate hierarchical α-stable priors in the one-dimensional deconvolution problem. For all numerical examples, we use maximum a posteriori estimation. To that end, we exploit the limited-memory BFGS and its bounded variant for the estimator.


Introduction
Inverse problems is the mathematical theory and practical interpretation of noise-perturbed indirect observations. Bayesian statistical inversion is the effort to formulate real-world inverse problems as Bayesian statistical estimation problems [1,2]. Bayesian inverse problems can be found in medical and subsurface imaging, industrial applications, and near-space remote sensing. The objective, for example in industrial tomography, is to detect different materials, which may have isotropic, anisotropic, or inhomogeneous features. This means that we typically aim to reconstruct a hidden substance from indirect noise-perturbed measurements. Inhomogeneities include, for example, material interfaces and rough features, and these are the main topics of this paper.
Inverse problems are often formulated using a noise-perturbed measurement equation where y ∈ R M are noisy finite-dimensional measurements, G is a linear or non-linear mapping from some function space to R M , u : D → R is the unknown with D ⊆ R d , d = 1, 2, 3, and η is noise, which we assume to be Gaussian. Our aim is to estimate u given one realization of y.
Inverse problem methods can be roughly divided into deterministic and statistical methods. We model y, u, η as random objects in a statistical framework [2]. For practical computations, we discretize the unknown u, and denote it by u. Then, within the Bayesian inverse problem framework, the solution can be represented through probability distributions via Bayes' theorem, that is, the posterior distribution π(u | y) = π(y | u)π(u) π(y) ∝ π(y | u)π(u), (2) where π(y | u) is the likelihood, and π(u) is the prior distribution of the unknown. We omit the normalization constant π(y), and instead use the unnormalized posterior distribution from now on. The choice of the prior π(u) is practically the only tunable object in inversion. The traditional choices in inverse problems are Gaussian and total variation priors for smoothing and edge-preserving inversion, respectively [2][3][4][5]. In this paper, we build upon the research line starting from the observation that total variation priors do not provide invariant estimators under mesh refinement [5]. Besov priors on a wavelet basis were proposed as a solution to this problem [6]. Here, we extend the study from Cauchy [7] priors to α-stable priors, of which the Cauchy priors are special cases with α = 1, and Gaussian priors are similarly special cases with α = 2. In order to leverage α-stable laws for Bayesian inverse problems, we need approximations of α-stable probability densities evaluated very fast with reasonable precision [8,9]. Our particular interest is to implement and use discretized α-stable random fields in Bayesian continuous-parameter estimation.
We note that traditionally, stable distributions have been employed in financial applications like the modeling of asset time series [10]. They have also been used in biomedical engineering [11], remote sensing [12], network traffic statistical analysis [13], and digital signal processing [14], to name a few. We extend the application to inverse problems here.

Contributions
Our main contribution is to implement computationally feasible numerical approximations of symmetric α-stable priors for Bayesian inverse problems, which requires evaluating the univariate or multivariate probability density functions of α-stable random variables. The symmetric α-stable probability density functions do not have elementary function expressions, except for the two special cases of Gaussian and Cauchy distributions. Thus, the evaluation of the probability densities requires the incorporation of an appropriate approximation method.
Unfortunately, none of the existing α-stable density function approximation methods is alone optimal for our needs. Being based on either series expansions, integral expressions or Fourier transforms [8,9,[15][16][17][18][19][20], they are either computationally too heavy to evaluate within Bayesian inversion, are not applicable for arbitrary values of r and stability indices α, or do not provide a consistent seamless approximation in the sense of the partial derivatives.
For this reason, we introduce a fast hybrid method to approximate the α-stable laws that leverage both bicubic spline interpolations at precomputed probability density grids and asymptotic series approximations. The novelty of the hybrid method is multi-part. First, the presented method is orders of magnitude faster to evaluate than the prevalent methods that are based on numerical integration. The method approximates the log densities of the α-stable distributions within 100-400 nanoseconds on a typical Intel Xeon-based workstation. For the second, the method enables the evaluation of the partial derivatives of the log densities with respect to both α and r with similar performance, which is crucial for the estimators in the Bayesian inference. Finally, the method provides a consistent and discontinuity-free approximation of the log densities on a wide range of stability indices α and for any argument r as opposed to the series expansion methods.
We establish error bounds for the method through comprehensive computer-assisted analysis of the partial derivatives of the α-stable probability density functions [21] and argue that the presented approach is accurate enough for Bayesian inversion. Furthermore, we demonstrate various α-stable priors on a range of Bayesian inverse problems. To our knowledge, general α-stable priors have not been approximated and implemented numerically for Bayesian inverse problems previously. The numerical examples include deconvolution problems in onedimensional and two-dimensional grids. Additionally, we illustrate nonlinear Bayesian inversion governed by an elliptic partial differential equation (PDE) through α-stable priors. In the numerical examples, we resort to maximum a posteriori (MAP) estimators: We showcase the applicability of the presented methodology in a hierarchical α-stable prior, of which the stability index α or the scale σ are processes of their own. The presented hybrid method proved to be very useful in such a hierarchical scenario because of the variable stability indices. In essence, we extend previous works on parametric deep Gaussian processes [22] to simple two-layer α-stable processes. The α-stable priors have the ability to model both discontinuities and smoothness simultaneously in a function of interest, which is a desired property in Bayesian continuous-parameter estimation. The results are promising for further development of α-stable random field priors. For instance, the hierarchical α-stable priors may be very effective tools to model processes that have simultaneously properties of a stationary Gaussian process and a Cauchy random walk.

Outline
This paper is organized as follows: we provide the necessary background material for the paper as an introduction to α-stable priors in section 2. This will lead onto section 3, where we briefly review the existing methods and our hybrid method for approximating α-stable probability density functions, and provide error bounds related to our method. Numerical experiments with the α-stable priors are provided in section 4, where we test our priors on the example problems. A summary of our findings and future work is provided in section 5.

Models
In this section, we review and discuss the necessary prior forms based on α-stable distributions. We also present some basic properties and then present the multivariate setting and how they can be defined.

Stable distributions
A random variable W corresponding to a symmetric stable distribution, also known as α-stable and Lévy α-stable distribution, can be characterized in terms of a stability index α ∈ (0, 2] (sometimes also called the tail index or the characteristic exponent), and a scale parameter σ > 0, in the sense that its characteristic function is given by in which case we write The parameter α is called the stability index because if W 1 and W 2 are two independent copies of W and A, B > 0, then with Hence, the symmetric α-stable distributions are a family of continuous probability distributions that are infinitely divisible and closed under convolution. The monograph [15] is the standard reference for stable distributions, including the wide class of non-symmetric stable distributions, which we do not consider here. It is clear from (4) that for α = 2, W is normally distributed (with zero mean and variance 2σ 2 ). Likewise, for α = 1, W has a Cauchy distribution (with zero median and scale parameter being σ). Besides these two special cases of α-stable laws, there are no known closedform expressions based on elementary functions for the density functions of symmetric stable distributions (the other special cases where closed-form expressions are known to consist of non-symmetric distributions, such as the Lévy distribution).
Multivariate stable distributions can be defined in a similar but more complicated manner, with a spectral measure Λ in place of the scale parameter σ; see [15, ch 2]. For our purposes, it suffices to recall the definition of spherically contoured stable distributions: a random vector W := (W 1 , W 2 , . . . , W d ) on R d is said to have a spherically contoured stable distribution if its characteristic function is given by where α ∈ (0, 2] is again a stability index, σ > 0 is a scale parameter and | · | stands for the standard ℓ 2 -based Euclidean norm on R d . We refer to [9] for a treatment of such distributions.

α-stable priors
A stochastic process (W t ) t⩾0 is a symmetric α-stable process if ∑ n j =1 a j W tj is a symmetric α-stable random variable for all finite {t 1 , . . . , t n } ⊂ [0, ∞) and {a 1 , . . . , a n } ⊂ R. We refer to [15, chapter 3] for the existence and construction of a wide variety of such processes. If the previous definition is satisfied for the multivariate case {t 1 , . . . , t n } ⊂ R K , the process is called an α-stable field.
In particular, we aim to apply discretized priors corresponding to a Lévy α-stable motion, which we take to mean an α-stable process (W t ) t⩾0 with some given initial distribution W 0 ∼ µ and independent increments that satisfies For α < 2, the Lévy α-stable motion generally does not have continuous sample paths. However, according to [23, theorem 11.1], there exists a version of this process with cádlág paths satisfying We more generally refer to [23] for an overview of the analytical properties of Lévy α-stable motions and related processes, including a description of their infinitesimal generators. A Lévy α-stable motion with initial distribution µ can be discretized as follows. For ∆ ∈ (0, 1), define the Markov chain (u ∆ k ) k⩾0 by u ∆ 0 ∼ µ, and u ∆ k+1 − u ∆ k ∼ S α (∆ 1/α ) independently for all k ⩾ 0. Then, by writing (W ∆ t ) t⩾0 for the appropriately-scaled, piecewise constant cádlág process stemming from the Markov chain (u ∆ k ) k⩾0 , i.e. W ∆ t := u ∆ ⌊t/∆⌋ . Using the basic properties of stable distributions along with (6), it is easy to verify that lim ∆→0 + W ∆ = W in the sense of finite-dimensional distributions, i.e.
for all finite {t 1 , . . . , t n } ⊂ [0, ∞) and bounded and continuous functions h : R n → R. An alternative way to construct such a discretization, localized to a finite interval, is to partition the interval by N equispaced points and define the unnormalized density function of u := (u i ) N i =1 on these points as where µ is the initial distribution of the above-mentioned process (W t ) t⩾0 and f( · ; α, σ) stands for the stable density function with stability index α and appropriately-chosen scale parameter σ. The only two-dimensional α-stable field we consider in this paper is a simple generalization of the quasi-isotropic Cauchy first-order difference prior [7], defined analogously to (7). That is, the probability density function of an α-stable random field u discretized through finite differences on a two-dimensional rectangular domain Ω ⊂ R 2 is proportional to where ∂Ω denotes the set of the left and bottom indices on the grid, and f B (·, ·; α, σ) the symmetric bivariate α-stable probability density function. The probability density function π ∂Ω is applied on the grid points at the left and bottom boundary of the grid to make the resulting distribution of u proper.

Hierarchical α-stable priors
Hierarchical priors are dominantly used within Gaussian priors [22,24,25]. With these priors, we can model discontinuities and other features with varying scale or smoothness at the target function. Unfortunately, the computational complexity of the canonical Gaussian priors is cubic with respect to the number of training points, unless a special formulation of the process is employed, like a stochastic PDE [26]. The hierarchical priors might require several layers on top of each other to perform well, while having too many layers may not offer any additional expression capability [27] but rather overfit in the data. We intend to build and demonstrate simple, two-layer Markovian hierarchical α-stable priors that could be useful without the computational or implementation complexity of hierarchical Gaussian processes. We model the scale or the stability of a discretized α-stable process as another α-stable process. This is possible due to the first-order difference prior's simple Markovian construction, which effectively allows expressing the normalization constant of the joint distribution of the discretized process and its parameters in a closed form. A hierarchical α-stable difference process u with scale σ = G(c) and stability α = H(s) based on other discretized α-stable difference processes could be constructed as follows: where H and G are nonlinear functions with Range(G) ⊆ R + , and Range(H) ⊆ (0, 2]. f(·; α, σ) denotes the probability density function of a symmetric univariate α-stable random variable with stability α, scale σ. The conditional distribution π(u | c, s) integrates to a constant, regardless of c and s, as do the priors π(c) and π(s), because σ s , σ c , α s and α c are fixed. The overall joint prior distribution is thus proper. The convergence properties to the continuous limit of the hierarchical α-stable processes in the continuous-time limit are unknown-a sum of two Lévy α-stable random variables with different stability indices does not obey an α-stable distribution. However, continuous-time processes with a local stability index varying with the state of the process, commonly called stable-like processes, are well-studied in the literature; see e.g. chapter 7 in [28] and theorem 5.2 in [29].
Further studies are thus needed regarding the matter, but as we demonstrate in the numerical experiments, the hierarchical α-stable process constructions are promising. Unfortunately, the presented hierarchical priors cannot be applied to the α-stable difference priors when the spatial dimension is greater than one. That is because the normalization constants of the priors are intractable due to their construction upon the distributions of increments between nearest neighbors. However, a Matérn-like stochastic PDE prior could be optionally employed instead of the difference priors [7], which would effectively allow incorporating deep α-stable processes thanks to the tractable normalization constants.

Approximation of α-stable probability density functions
Before presenting our hybrid method of approximating symmetric spherically-contoured αstable density functions, we perform a brief literature overview of the existing approximation methods as a motivation for our contributions. Later, we provide various relative error bounds for the probability density approximations given by our method. For simplicity, we denote with r both the argument of the univariate probability density functions and the Euclidean distance of the arguments of the multivariate α-stable probability density functions to the origin, that is, r := ∥r∥.
Unless otherwise specified, the approximations are applied for σ = 1. Recall that for general symmetric α-stable laws, the probability density functions for the other scale parameters are given by f(r; α, σ) = 1 σ d f( r σ ; α, 1), where d stands for the dimensionality of the distribution [9].

Prevalent approximation methods for symmetric univariate and bivariate α-stable laws
A canonical method to approximate the α-stable probability density functions is to evaluate the inverse Fourier transform of the characteristic function. For the symmetric univariate αstable distributions given by (4) with σ = 1, the probability density function can be expressed as [30] f(r; α) = 1 πˆ∞ 0 cos(rt) exp(−t α )dt.
In theory, numerical integration allows evaluating the density of any α-stable distribution at an arbitrary point r with specified precision. The integral may be impractical to evaluate for large r and small α due to the severe oscillations [30], so oscillatory integral techniques have been proposed to address the issue [17]. Furthermore, the discrete Fourier transform method [31] exploits the low computational complexity of the fast Fourier transform, but requires interpolation to evaluate the density outside the grid [18,32]. Additionally, there is an alternative integral representation formula [8], that we call Nolan's method for short, for the univariate α-stable density function when α ̸ = 1. For simplicity, if we consider only symmetric α-stable distributions and α ̸ = 1, the probability density function for an α-stable random variable with µ = 0 and σ = 1 given by the method is [8] f(r; α) = α|r| where cos(t) . In contrast to the inverse Fourier method, the integrand in Nolan's method is compactly supported and non-oscillatory. Unfortunately, when |α − 1| < 0.02, the integrand becomes extremely peaky and narrow, and then the method is impractical unless arbitrary precision arithmetic is used for the integration [8]. Even otherwise Nolan's method can be tricky to evaluate since the domain of integration should be evaluated in parts near the peak of the integrand. The peak is located at t p , which satisfies the equation [8]. Thus, the univariate symmetric α-stable laws can be accurately evaluated with either Nolan's method or the inverse Fourier transform method depending on the values of α and r. The approximation methods based on the fast Fourier transform [18,[31][32][33] are also worth mentioning. They are simple to implement and relatively fast to evaluate but must be used in conjunction with interpolation to approximate the density at a point that is not part of the FFT grid. It has been reported that the FFT-based approximation is accurate only for large α [18,34].
The approximations based on the integral representations of α-stable laws are complemented by series expansions. A well-known series expansion for the univariate α-stable density is of form [35] which is an asymptotic expansion for α ∈ (1, 2) for r → ∞, and converges pointwise to the true density for α ∈ (0, 1). There is a similar series expansion, also outlined in [35], which is an asymptotic series for α ∈ (0, 1) and a converging series for α ∈ (1, 2) at r → 0 + . Furthermore, there are methods that provide a converging series approximation for the symmetric univariate probability densities for α ∈ (0, 2) by combining two separate power series expansions [19], or approximate the inverse Fourier transform of the characteristic function by domain partitioning and implementing different series expansions within them [16]. The methodology for approximating spherically contoured multivariate α-stable distributions is similar to the univariate one. There are several integral expressions for their probability density functions (see e.g. [9]), such as where J ν is the Bessel function of the first kind. Analogously to the univariate case, multivariate spherically contoured α-stable laws have an absolutely converging series expansion for r > 0 and α ∈ (0, 1), which is an asymptotic expansion for α ∈ (1, 2) and r → ∞ [9]: Similarly, there is an absolutely converging series expansion for r > 0 and α ∈ (1, 2), which is an asymptotic expansion for α ∈ (0, 1) and r → 0 + [9]. None of the existing approximation methods as such is suitable to be used in Bayesian inversion. The presented approximation techniques based on numerical integration are not applicable for the tails of the α-stable distributions because the relative accuracy of the integration cannot be ensured due to limited floating point precision. They and the advanced series expansions [16] may be too time-consuming to perform within Bayesian continuous-parameter estimation since the probability density functions must be evaluated up to several hundreds of thousands of times even in modest-dimensional settings. Some of the methods are also applicable only on fixed α [18], and not on a continuous range of them. Certain approximation methods fail to approximate the log density of the α-stable distributions with a close to uniform relative accuracy for all r because they are based on combining two power series expansions [19], which may affect negatively the estimators. Finally, the typical asymptotic series expansions are fast to evaluate when the number of used terms is low, but not accurate enough or even applicable for all r, let alone α.

Hybrid method for approximating α-stable laws
To address the issues of the existing approximation methods in Bayesian inversion, our hybrid approximation method is divided into parts, and different techniques are employed within them. When r is small, we approximate the α-stable laws with two-variable bicubic splines that are fitted on grids of precomputed α-stable log densities with varying radius r and stability α. To our knowledge, a similar rationale has been presented by [18], but our method works also for α < 1. Furthermore, our method approximates the probability densities on a continuous range of the stability indices α with consistent accuracy and regularity, and for a continuous range of α. The feature is essential for hierarchical α-stable prior constructions. We employ the Julia library Interpolations.jl to evaluate the bicubic splines. Figure 1 depicts the overall approximation method. We present the details of the parts of the hybrid method below.
3.2.1. The first bicubic spline approximation. The first bicubic spline grid of precomputed log densities is applied when r ∈ [0, 0.9], α ∈ [0.5, 1.9]. This region is illustrated in turquoise in figure 1. We divide the domain [0, 1.0] × [0.5, 1.9] uniformly to the intervals of h r = 0.01 and h α = 5 · 10 −4 , and evaluate numerically the densities using the integral methods. For the onedimensional α-stable density we use the Fourier integral formula from (10) for |α − 1| < 0.2, and otherwise we employ Nolan's method of (11). Instead of introducing improvised heuristics for the integration and domain partitioning, we count on the Fourier integral for the aforementioned stability values that are particularly tricky in Nolan's method. The numerically evaluated densities of both methods agree each with a least 12 decimals for |α − 1| ⩾ 0.2, so using only the Fourier approach would suffice.
We use the very same spline grid for the bivariate symmetric α-stable laws, and we employ the integral expression with the Bessel function in (13). In both cases, we set the derivative with respect to r at r = 0 to zero in the spline to ensure the method approximates the partial derivative with respect to r properly at r = 0. At r = 0.9, we set the partial derivatives with respect to r to agree with those given by the second spline, which guarantees the C 1 continuity of the spline approximation for the log density at r = 0.9, which is the upper limit of the first spline.

The second bicubic spline approximation.
The second bicubic spline grid is constructed in r ∈ [0, 30], α ∈ [0.5, 1.9] with the intervals of h r = 0.01 and h α = 5 · 10 −4 for the nodes of the grid. The domain is illustrated in violet in figure 1. We apply the same approach as in the first grid For the one-dimensional α-stable laws, we use Nolan's method for approximating the log densities in the grid points |α − 1| > 0.2, and otherwise we use the inverse Fourier method. Likewise, we use the integral expression incorporating a Bessel function for the bivariate α-stable laws to construct the log density grid.
We apply natural boundary conditions with respect to r and α in the spline, that is, the second derivatives with respect to them are set to zero on the boundaries of the spline. However, we limit the usage of the spline only for r ∈ (0.9, 29.6), α ∈ [0.5, 1.9]. Again, we want the overall approximation and its partial derivatives to be continuous. That is why we intentionally let this second spline grid, which is also the largest of them all, overlap with the domains of the first and the third spline grids, and modify their boundary conditions to match the partial derivatives with respect to r of the second spline.

Series expansions for approximating the tails.
As we pointed out in the overview, approximating the α-stable laws for r ⩾ 0 is ineffective with numerical integration-based approaches, so we use the asymptotic series expansions for the tails of the distributions. For the univariate α-stable log densities, we use (12), and for the bivariate tails, we use (14). We use three terms in both series expansions, which we found to provide feasible accurate tail density approximations. The domain of the series expansions is depicted in yellow in figure 1.
The lower limit of r = 30 for the series expansions was selected because it has been reported to provide a practical rounded bound for the asymptotic tail expansions of the univariate stable densities up to α = 1.999. However, the number of required terms in the series can be several tens or over a hundred for very high accuracy [20]. Naturally, such a high number of terms would result in a loss of computational efficiency. We keep the same lower limit for the tail expansions of the symmetric bivariate α-stable distributions since we have numerically verified the difference of the log densities given by the tail approximation and the numerical integration-based approach are of the same magnitude as in the univariate case.

The transition region.
The switch from the spline grids to the asymptotic tails series is made seamlessly with the help of the third bicubic spline. The spline is employed for r ∈ [29.6, 30], α ∈ [0.5, 1.9], and we call this region the transition region of the approximation. The purpose of the transition region is to ensure the log-density approximation and its partial derivatives are continuous everywhere in its domain. The domain of the transition region spline is illustrated in red in figure 1. We use the same grid node intervals of h r = 0.01 and h α = 5 · 10 −4 in this domain as we do in the other two grids, and the methodology is identical for both in the univariate and bivariate α-stable laws. The spline is constructed as follows. Let us denote r a = 29.6, r b = 30.0, and ∆ = r b − r a . Let the log-density approximation given by the asymptotic series expansion from (12) ( (14) in the bivariate case) at the spline grid node r a , α j be f s a,j and its derivative with respect to r be D s a,j . Additionally, let f d b,j stand for the log-density approximation given by the numerical integration at r b , α j , and D d b,j denote its derivative with respect to r. We set the value in the transition region spline grid point value at i, j to follow an auxiliary cubic Hermite interpolation as follows: where q i = r i − r a . Equation (15) is applied for each stability α j within the transition grid separately. The Hermite interpolation is only applied during the construction of the transition spline because the evaluation of the constructed grid is performed by the bicubic spline library in Julia. Introducing the Hermite interpolated data points as an additional step at the transition region helps to avoid abrupt changes in the derivatives of the log densities near the boundary of the transition region and the tail approximation.
The derivatives of the spline with respect to r on the boundary r = 29.6 are set to follow the values given by the second spline. The same derivative at r = 30 is set to follow the derivative given by the asymptotic tail expansion, respectively.

Error bounds of the hybrid method
We obtain the following error bound for the symmetric α-stable log densities in the domain of the spline grids: where f T := f(r; α) stands for the (true) density of the symmetric α-stable distribution with σ = 1, and f A := f A (r; α) denotes density given by the bicubic spline interpolation. The error estimate for the bivariate log density is orders of magnitudes higher than for the univariate one due to the significantly larger suprema for the partial derivatives within the domain of the splines, particularly with small α. If the lower bound of α of the approximation domain was increased to 0.7, the bivariate log-density error estimate would decrease to 0.013, accordingly. The accuracy of the approximations is enough for our needs in the Bayesian inversion. For the relative error bounds of the tails, we have the following estimates. Denoting by S 3 (r; α) the sum in (12)  For clarity, we briefly explain the rationale of the procedures behind the derived bounds. First, we assume the integration error of the probability densities to be zero within the spline grid points. For the second, we ignore the transition region spline from the error estimates, because it can be left out of the hybrid method at the expense of having a less regular overall approximation. Then we make use of the properties of bicubic splines as follows [36]. Theorem 3.1. Let the true α-stable density and the bicubic spline approximated density be denoted as above. The error resulting from the bicubic spline approximation can then be approximated by [36,37] where h r and h α stand for the intervals of the log density interpolation grid cells in the directions of the radius and the stability index, respectively, and the superscripts (i,j) stand for partial derivatives of the form ∂ i+j ∂r i ∂α j .
Due to the lack of any sort of a closed-form expression for f T , estimating the partial derivatives of log f T appearing in the suprema in (16) involves estimating the partial derivatives of f T from above, and f T itself from below. Both types of estimates are tricky to do in a precise manner. When estimating the partial derivatives of f T , we will use several strategies that are variably efficient for different regions of (r; α), and then use the first-order variants of these estimates in conjunction with a precomputed grid and the fundamental theorem of calculus for the lower bounds of f T . The full details of these estimates are presented in the supplementary material.
First, we can obtain crude uniform bounds (with respect to r) for each (f T ) (i,j) by e.g. differentiating (10) under the integral sign and eliminating the resulting oscillatory term simply using the triangle inequality. We can somewhat refine this pointwise for 'moderate' r > 1 by using standard oscillatory integral techniques (which basically amount to partial integration against a sufficiently quickly vanishing function, resulting in an upper bound of order r −1 ). This is largely the best we can do for moderate values of r, where neither of the asymptotic expansions (see (12) and the subsequent discussion) come close to approximating f T well with only a couple summands-the region of said 'moderate' values will, of course, depend on α.
For larger values of r, we may exploit the expansion (12) pointwise with e.g. 2-3 summands and an explicit (albeit complicated) expression for the remainder term, due to Bergström [35]. Some of the integrals we encounter here are highly intractable in the mathematical sense of the word, but non-oscillating and well-behaved enough for efficient numerical estimation, yielding an upper bound that decreases to order roughly comparable to that of f T for r → ∞, and where the asymptotic constants stay sufficiently tame for α ∈ [0.5, 1.9]. This all holds, mutatis mutandis, for r → 0 + as well.
As a result, we obtain several different kinds of upper bounds for |(f T ) (i,j) (r; α)|, pointwise with respect to (r, α). With some minor additional work, we may loosen the estimates very slightly so that they will be uniform for the spline grid cells r ∈ [r j , r j + h r ] and α ∈ [α i , α i + h { α], where i and j are indexed over the numbers of the spline grid points. With the upper estimates for the partial derivatives of f T , we then estimate f T from below with reasonable accuracy. Namely, noting that f T (r; α) is for fixed α always a decreasing function of r, we may precompute f T (r j ; α i ) at the nodes of the discretized grid, and thus use the fundamental theorem of calculus to obtain the lower bound for f T (r j ; α i ) within the spline grid cells.
The discussion above also applies to the bivariate case, with the additional difficulty of the Bessel function of the first kind J 0 in the representation (13). In the supplementary material, we present analogous asymptotic expansions with respect to r → 0 + and r → ∞ with quantitative remainder term estimates. In particular for r → ∞, we present a modification of Bergström's [35] complex-analytical treatment, which allows us to obtain estimates for the remainder term in the bivariate case which are not immediate from the asymptotic expansions presented in [9].

Practical considerations on the hybrid method
We consider the computational efficiency of the presented hybrid scheme as good. The evaluation of an α-stable log density takes approximately 100 nanoseconds in the domain of the spline grids, and about 400 nanoseconds in the asymptotic tail expansions on a workstation equipped with Intel Xeon CPU E5-2698 v4 central processing unit. For instance, evaluation of the univariate α-stable density through Nolan's method or the inverse Fourier transform approach with a relative tolerance of 10 −10 takes approximately 10-150 microseconds for α = 1.6 r ∈ [0, 30] using Julia's QuadGK.jl library, which applies adaptive Gauss-Kronrod quadrature. However, for α = 1.6 r = 150, both integration-based methods take over 2 milliseconds to evaluate, and the computational efficiency degrades the further the greater the radius argument r is because the integrals are close to zero, and hence slower to evaluate with the specified relative tolerance. On a typical Bayesian inverse problem, the number of separate evaluations of probability density functions can be tens or hundreds of thousands in a posterior distribution. Furthermore, an approximation based on combining power series expansions such as the method of [19] is likely not any faster than the hybrid method that combines splines with wellknown asymptotic tail expansions since the method requires evaluation of two different series expansions simultaneously. That is why a hybrid method is essential from a computational efficiency viewpoint, and we argue that our method certainly meets those requirements.
In the presented methodology, we do not consider the stability indices of α < 0.5 or α > 1.9. In theory, values of α close to 2 − 10 −6 can be effectively approximated with the presented hybrid method, because the series approximation and the integration-based spline interpolation agree well at r = 30 for that high α. However, it must be borne in mind that the closer the stability index is to 2, the worse the relative error of the approximation will be because of the magnitudes of the partial derivatives with respect to α. As a remedy to that, an extra spline grid could be incorporated in the approximation for 1.9 ⩽ α ⩽ 2 − 10 −6 , but the problem in the approach is to smoothly join the splines through their boundary conditions with respect to α, which would further complicate the overall workflow. On the other hand, low stability indices are not in our interest, and including them would require increasing the number of nodes in the precomputed log-density grids to sustain the accuracy of the approximation. Alternatively, yet another spline grid could be employed for smaller α with a possibly greater range of r, which would again make the method more complex. We remark that the inverse Fourier transformbased method copes badly at approximating the univariate α-stable densities with a small stability index, whereas Nolan's method still sustains its effectiveness thanks to the compact integrand it adapts.
Finally, we elaborate on the selection of the spline parameters. The first and the transition region splines (r ∈ [0, 0.9] and r ∈ [29.6, 30.0]) are useful also because the resulting systems of equations of the spline coefficients involving the boundary conditions of the splines are smaller, and hence easier to solve than directly incorporating them into the coefficients of the largest spline grid. That is because the interpolation library Interpolations.jl applies effectively the symmetries of the spline coefficient equations when natural boundary conditions are used. The limit of r = 29.6 for the transition region splines was selected as a good compromise between the regularity of the approximation and its relative accuracy. The limit of the first spline (r = 0.9) can be argued to be adequate for the first spline because the partial derivative with respect to r is the largest with the small α, and the largest magnitudes are obtained for α < 0.9. We have not optimized the limit further because all spline grids have the same node intervals. However, it would be straightforward to decrease the spacing h r in the first spline, if the lower stability indices will be of interest in the future. As a proof of concept, we regard the current grid intervals of h r = 0.01 and h α = 5 · 10 −4 as a practical compromise between accuracy and the required memory to store grids. The current parameters result in a grid of approximately 3000 × 3000 nodes in the second interpolation spline. Although all the grids in the study need only a few hundred megabytes of storage in total, we emphasize that the asymptotic series expansions are virtually the largest source of the relative error in the overall approximation despite the proven error bounds. Thus, we consider decreasing the node intervals impractical for now.

Numerical examples
We demonstrate the α-stable priors in three numerical experiments. We employ the priors first in a deconvolution, which is a well-known linear inverse problem. Moreover, the same priors are used in estimating the conductivity field of an elliptic PDE in two spatial dimensions. For the time being, we only use MAP estimators in the reconstructions because full Bayesian inference with the presented random field priors requires the usage of MCMC methods, which have been shown to struggle with such heavy-tailed priors [7]. Since the assessment of the reconstructions in inverse problems cannot be usually accomplished in a unified manner, we do not intentionally tabulate any metrics of the reconstructions, such as L 2 errors of the reconstructions, in the manuscript. We demonstrate the MAP estimators of the α-stable priors by varying the stability index α and the scale σ in the examples. We did not optimize their ranges in the experiments. Rather, they were selected so that their effect on the estimators would be meaningful to illustrate. The Julia codes of the experiments can be found at https://github.com/ suurj/alpha-stable [38].

MAP estimation
Evaluation of the MAP estimators (3) in Bayesian continuous-parameter estimation is usually performed with the help of a nonlinear conjugate gradient algorithm, a quasi-Newton method, a matrix-free truncated Newton method, or a combination of them [39][40][41][42].
In our numerical examples, the maximization of the log posteriors is done through the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method in the deconvolution experiments. Moreover, we resort to the bounded L-BFGS algorithm [43] at the inversion of the conductivity field of a linear elliptic PDE. As the numerical implementations of the limitedmemory BFGS algorithms, we use Optim.jl [44] for the unconstrained L-BFGS, and a Julia wrapper LBFGSB.jl of the original Fortran-implementation of L-BGFS-B [43]. Lastly, we want to emphasize that the presented α-stable random field priors often make the posteriors multimodal [7], and finding global maxima from such distributions is difficult. Moreover, different optimization algorithms may converge to different local minima.

One-dimensional deconvolution
In the first numerical experiment, we demonstrate the properties of the first-order α-stable difference priors. We discretize the target function u, which includes both discontinuities and piecewise realizations of a Gaussian process with Matérn covariance, at 500 equispaced grid points on [−1, 1]. We convolve u with a normalized Gaussian kernel of though a matrix approximation, which is denoted by F. We construct the noise-corrupted data y at 60 equispaced points within the support of the target function by adding white Gaussian noise to the discrete convolution as follows: Thus, the likelihood π(y | u) of the forward model is Gaussian. We avoid committing an inverse crime by reconstructing u on a 120-dimensional equispaced grid at [−1, 1] and use the corresponding matrix approximation for the convolution with the kernel (17) on the reconstruction grid to evaluate the forward model and the likelihood. The ground truth function u and the measurement data y are depicted in figure 2(a).
We employ four different α-stable priors in the one-dimensional setting. Namely, we exploit the hierarchical α-stable priors defined in (9), so that (i) fix the scale σ and stability index α of the prior of the increments of u, (ii) consider the stability index of the prior of the increments of u as a process that depends on another α-process s and fix the scale of the increments, (iii) consider the scale of the prior of the increments of u as a process that depends on another α-process c and fix the stability index of the increments, (iv) and consider both the stability index and the scale of the prior of the increments of u as processes that depend on another α-processes s and c.
The MAP estimates with the priors are illustrated in figures 2-5. The selected discretization of the processes implies that the dimensions of the posterior distributions are 120-, 240-, 240-, and 360-dimensional, respectively.
The MAP estimates with the non-hierarchical α-stable first-order difference priors in figure 2 demonstrate the effect of altering the stability α or scale σ of the distribution of the increments in the prior of u. As a rule of thumb, the smaller the stability α is, the stronger the prior favors non-Gaussian increments, so they are usually close to zero. The larger the scale σ is, the greater the variability is allowed within the increments. Stability indices of 1 ⩽ α ⩽ 2, are particularly useful for reconstructing the target function in this example case. Those priors are able to favor the existence of Gaussian-like parts of the ground truth function when needed. If the estimation was done using stationary Gaussian priors, the MAP estimate would be either over-smoothed and incapable to locate the discontinuity at the boxcar, or it could detect the discontinuity at the expense of being very sensitive to noise.
Considering the stability of the prior of u as another first-order α-stable process, turned out to be less successful. We let the scale of the process u to follow an α-stable process with scale σ = 0.01 and set its untransformed stability process s to follow an α-stable process with the parameters tabulated in figure 3. To guarantee that 0.51 ⩽ α ⩽ 1.9, we apply a transformation (9) α = H(s) = 0.51 + 1.39S(s), (19) where S(x) = 1 1+exp(−x) . In both figures 3 and 5, the stability processes are shown in their transformed values. The stability process seems to be either close to constant (≈1.25) or decreasing towards the right side of the domain in all the tabulated cases. However, there is some variation in the stability process in the middle of the domain when the untransformed process has the parameters α = 0.8, σ = 0.1. The phenomenon may suggest that the stability index being a process does not work well as a prior. When the stability of the untransformed stability process is α s = 1.4 and its scale σ s = 0.05 (equation (9)), the reconstruction of u is smooth at first, but as the stability decreases, the reconstruction becomes more discontinuous and non-Gaussian.
In the one-dimensional deconvolution experiment, the best results in terms of the reconstruction agreement with the ground truth are obtained when the scale of u process is considered a process instead of its stability. We fix the stability of u to α = 1.9 and instead model the untransformed discretized scale process c with another α-stable process with the parameters shown in figure 4. The final scale process (9) is given by a transformation The reconstructions where the untransformed scale process has the scale of 0.05 ⩽ σ c ⩽ 0.1 and stability between 0.8 ⩽ α c ⩽ 1.9, agree well with the ground truth and even with each other. For the last, setting both the scale and the stability of u as α-stable processes seem to suffer from the same issue as the stability process case. We set the scale of the untransformed stability index process to σ s = 0.05 (equation (9)), and the stability of the untransformed scale parameter process to α c = 1.9. Hence, the scale parameters σ in figure 5 refer to the scale of the untransformed stability process (σ s ), and the stability indices to the untransformed scale process (α c ). We transform the processes with the same sigmoid functions as in the other two cases, using equations (20) and (19). Either one or both of the parameter processes remain close to constant throughout the domain, and the MAP estimates for u are no better than in the simpler hierarchical α-stable priors. Whether the poor reconstructions are caused by overfitting, poorly selected hyperparameters of the processes s and c, unidentifiability, or something else, shall be investigated in further studies.

Two-dimensional deconvolution
We also conduct a deconvolution experiment in two dimensions with similar settings as in the one-dimensional case. The ground truth function and the reconstructions are plotted in figure 6. We aim to reconstruct the blurred test function with the help of spherically symmetric bivariate α-stable first-order difference priors (8). The ground truth function is supported on [−1, 1] 2 . It is evaluated at a uniform equispaced grid of size 333 × 333, after which is interpolated at 100 2 points that are scattered according to a low-discrepancy sequence within the domain of the target function. The dataset y is then generated with the help of a matrix approximation for the convolution with a Gaussian kernel of ψ: and noncorrelated Gaussian noise with variance of 0.05 2 is added to the result, analogously to equation (18). We use the interpolation and the non-gridded evaluation points to ensure that there are no artifacts in the reconstructions that could be explained by too-regular lattice discretizations of u in the data generation step. An equispaced grid of 256 × 256 nodes is used in the MAP estimators to prevent committing an inverse crime, and we employ a matrix approximation for the convolution model also in the reconstruction step. The MAP estimates of the reconstructions are consistent with the one-dimensional deconvolution experiment. Increasing the stability of the α-stable difference priors manifests in more Gaussian-like features in the MAP estimates. The distribution α = 0.51 and σ = 0.01 is probably too spiky and heavy-tailed as the difference prior since the reconstruction lacks any features resembling the ground truth objects. A notable feature is the existence of diagonal discontinuities at certain MAP estimates, like in the case α = 0.8 and σ = 0.1 in the object that consists of two overlapping spheres. Although the construction of the α-stable difference prior incorporates bivariate symmetrically contoured α-stable distributions, the prior is likely not fully isotropic. In fact, even the isotropic and upwind total variation priors are not perfectly isotropic, and a method has been proposed to alleviate the issue [45]. Unfortunately, the technique cannot be applied to the presented α-stable priors, so the matter of alleviating the diagonally anisotropic reconstructions must be considered separately.

Inversion of an elliptic PDE
As the third numerical experiment with the α-stable priors, we consider the nonlinear inverse problem of estimating a conductivity field k ∈ L ∞ (Ω), with Lipschitz domain Ω ⊂ R 2 of an elliptic partial differential equation (PDE): with prescribed zero Dirichlet boundary conditions, where u ∈ H 1 0 (Ω) denotes the solution of the PDE, and g ∈ L ∞ (Ω). The inversion is done using noisy observations y of the solution  of the PDE as the likelihood for k. We discretize the PDE with the standard finite difference method. The noise is assumed to be Gaussian with observations taking the form Like in the other two experiments, we evaluate only the MAP estimator of the problem. We use a bounded limited-memory BGFS algorithm (L-BFGS-B) to calculate the MAP estimate. We employ the constraint 10 −5 < k < 10 2 to ensure the well-posedness of the elliptic PDE and to keep the condition number of the matrix of the system of equations small. The gradient of the log posterior with respect to the discretized k is calculated through a discrete adjoint method [46,47]. In other words, we solve the adjoint equation to obtain the adjoint q via the equation ( ∂E ∂u where E denotes the system of finite difference equations of the discretized PDE, and π (y | u(k)) the Gaussian likelihood function, which merely consists of solving the PDE with given k and evaluating the fidelity of the obtained solution with respect to y. The gradient of the log posterior π(k | y) with respect to the discretized conductivity field k is then since the likelihood depends on k only through u.
We estimate the conductivity field with the same bivariate α-stable difference priors as we do in the two-dimensional deconvolution. We use a reconstruction grid of 128 × 128. To simulate the measurements, and to avoid committing an inverse crime, we calculate the solution of the PDE using a larger finite difference grid with a size of 223 × 223 and interpolate the solution at 25 × 25 points that are positioned on the reconstruction grid according to a lowdiscrepancy sequence, after we add the noise to the dataset according to equation (22). The source term function g of (21), the solution of the PDE, and the ground truth conductivity, as well as the reconstructions, are plotted in figure 7.
The shape of a double-sphere object in the conductivity field is captured the best with the priors with smaller stability indices while increasing the stability seems to favor smooth reconstructions. On the other hand, having a large scale σ may make the prior uninformative. Judging by the shape and distribution of the values within the reconstruction, the prior with α = 0.8 and scale σ = 0.1 could be the best of the tested parameter choices in this case.

Conclusion
This work was motivated by the desire to implement approximations of the α-stable random field priors for Bayesian inverse problems. Because both the Cauchy and Gaussian fields are special cases of the α-stable random fields, our objective was to extend the prior selection to general α-stable priors, which could prove useful in reconstructions where both Gaussian and non-Gaussian features are present. As the α-stable density functions mostly lack the closedform expressions, we introduced a computationally feasible hybrid method for approximating the symmetric univariate and bivariate α-stable probability density functions. The novelty of the presented method in comparison with the existing approximation methods is its accuracy and, especially, its performance. The method allows evaluation of the α-stable probability logdensity functions within a stability index range of α ∈ [0.5, 1.9] and radius argument range of r ∈ [0, ∞). Furthermore, we provided error bounds for the log-density approximations through careful computer-assisted analysis. The range of the stability index and the relative accuracy of the hybrid method could be potentially improved by introducing more spline grids for large α and small α. Another further improvement would be to extend the domain of the splines to higher r and optimize the spacing of the spline grids for greater accuracy.
In the numerical experiments, we employed finite-difference approximations of the α-stable first-order random motion priors at one-dimensional and two-dimensional deconvolution, and we also addressed the estimation of a function governed by an elliptic PDE with the same priors. The MAP estimation was implemented through the standard L-BFGS method and its bounded variant. Our objective was to illustrate how the parameters, such as the stability and scale of the α-stable priors, can affect the estimation of the unknown functions. The results are promising in the sense that the presented priors are both computationally viable, manifest in useful MAP estimates, and are yet novel compared to the existing random field priors like Gaussian, Cauchy, Besov, and total variation priors. We also introduced hierarchical α-stable priors in our one-dimensional deconvolution example. Although the results are already encouraging, we believe the full potential of the hierarchical α-stable priors is yet to be found. An approximation method like ours is needed for hierarchical α-stable priors because the density functions must be evaluated on a continuous range of stability indices.
As we introduced new α-stable priors and provided examples through MAP-estimates, we consider extending the estimators to full inference as well as other α-stable priors. For future work, we will consider Bayesian neural networks [48] with α-stable weights, which are possibly non-symmetric in contrast to all the distributions in this study. We believe the developed approximations to turn out useful in that case due to the recent studies on Bayesian neural networks with Cauchy and Gaussian weights [49]. Alternatively, α-stable random field approximations through the stochastic PDE approach could be beneficial [7,26,50]. Another consideration would be to test these priors on ensemble Kalman methods [51][52][53], which have been used and tested with hierarchical Cauchy processes.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://github.com/suurj/alpha-stable [38].