Paper The following article is Open access

Maximum a posteriori estimators in $\ell^p$ are well-defined for diagonal Gaussian priors

and

Published 3 May 2023 © 2023 The Author(s). Published by IOP Publishing Ltd
, , Citation Ilja Klebanov and Philipp Wacker 2023 Inverse Problems 39 065009 DOI 10.1088/1361-6420/acce60

0266-5611/39/6/065009

Abstract

We prove that maximum a posteriori estimators are well-defined for diagonal Gaussian priors µ on $\ell^p$ under common assumptions on the potential Φ. Further, we show connections to the Onsager–Machlup functional and provide a corrected and strongly simplified proof in the Hilbert space case p = 2, previously established by Dashti et al (2013 Inverse Problems29 095017); Kretschmann (2019 PhD Thesis). These corrections do not generalize to the setting $1 \leqslant p \lt \infty$, which requires a novel convexification result for the difference between the Cameron–Martin norm and the p-norm.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Let $(X,\|\boldsymbol{\cdot} \|_{X})$, be a separable Banach space and µ a centered and non-degenerate Gaussian (prior) probability measure on X. We are motivated by the inverse problem of inferring the unknown parameter $u\in X$ via noisy measurements

Equation (1.1)

where $G: X\to \mathbb{R}^d$ is a (possibly nonlinear) measurement operator and ε is measurement noise, typically assumed to be independent of u. The Bayesian approach to solving such inverse problems (Stuart 2010) is to combine prior knowledge given by µ with the data-dependent likelihood into the posterior distribution µy given by

Equation (1.2)

Here, the so-called potential $\Phi: X\to \mathbb{R}$ depends on G and the statistical structure of the measurement noise ε, while $Z:= \int_{X} \exp(-\Phi(u)) \mu(\mathrm{d}u)$ is simply the normalization constant, which is well defined under suitable conditions on Φ (see assumption 2.1 later on). If, for example, the measurement noise is distributed according to a centered Gaussian measure on $\mathbb{R}^d$, $\varepsilon \sim N(0, \Gamma)$ with symmetric and positive definite covariance matrix $\Gamma \in \mathbb{R}^{d\times d}$, then $\Phi(u) = \frac1 2 \|\Gamma^{-1/2}(y-G(u))\|^2$, but we will use general formulation (equation (1.2)) as the starting point for our considerations. For an overview of the Bayesian approach to inverse problems and a discussion of its well-posedness we refer to (Stuart 2010) and the references therein.

Our focus lies on the analysis of the so-called 'maximum a posteriori (MAP) estimator' or 'mode', i.e. the summary of the posterior µy in the form of a single point $u_{\textrm{MAP}} \in X$. In the finite-dimensional setting $X = \mathbb{R}^{k}$, if µy has a continuous Lebesgue density ρy , MAP estimators are simply defined as the parameter of highest posterior density, $u_{\textrm{MAP}} = \mathrm{argmax}_{u\in\mathbb{R}^{k}} \rho^{y}(u)$ (note that such maximizers may not be unique or fail to exist).

Unfortunately, this definition does not generalize to measures without a continuous Lebesgue density, in particular it cannot cover infinite-dimensional settings, where there is no equivalent of the Lebesgue measure.

For this reason Dashti et al (2013, definition 3.1) suggested to define MAP estimators as 'maximizers of infinitesimally small ball (posterior) mass', see definition 1.3 below. To simplify notation, we first introduce the following shorthand for the ratios of ball masses:

Notation 1.1. For a separable metric space X and a probability measure ν on X, we denote the open ball of radius δ > 0 centered at $x\in X$ by $B_{\delta}(x)$. Further, for $w,z \in X$ with $\nu(B_\delta(z)) \gt 0$, we set

Similarly, we set $\mathfrak{R}_{\nu}^{\delta}(\sup,w) := \mathfrak{R}_{\nu}^{\delta}(w,\sup)^{-1}$ whenever $\nu(B_\delta(w)) \neq 0$.

Remark 1.2. Note that $\sup_{z\in X}\nu(B_\delta(z)) \gt 0$ follows from the separability of X: assume that $(z_{n})_{n\in{\mathbb{N}}}$ is dense in X, δ > 0 and $\nu(B_{\delta}(z_{n})) = 0$ for each $n\in{\mathbb{N}}$. Then $\nu(X) \leqslant \sum_{n\in{\mathbb{N}}} \nu(B_{\delta}(z_{n})) = 0$ (since $X \subseteq \bigcup_{n\in{\mathbb{N}}} B_{\delta}(z_{n})$) and ν could not be a probability measure.

We work with the following rather general definition of MAP estimators:

Definition 1.3 (Ayanbayev et al 2021a, definition 3.6).

Let X be a separable metric space and ν be a probability measure on X. A strong mode for ν is any $z\in X$ satisfying

Equation (1.3)

If $\nu = \mu^y$ is a Bayesian posterior measure given by (1.2), then we call any strong mode a MAP estimator.

Other sources, especially from the physics community, see e.g. (Dürr and Bach 1978), (informally) define the MAP estimator as the minimizer of the so-called Onsager–Machlup (OM) functional, which can be thought of as a generalization of the negative posterior log-density (Dashti et al 2013):

Definition 1.4. Let µ be a Gaussian (prior) measure on a separable Banach space X with Cameron–Martin space $(E,\lvert \boldsymbol{\cdot} \rvert_{E})$ and $\Phi\colon X \to \mathbb{R}$ be such that $\exp(-\Phi)$ is µ-integrable. We define the Onsager-Machlup (OM) functional $I\colon E\to \mathbb{R}$ corresponding to µy given by (1.2) by

Equation (1.4)

The connection between OM minimizers and MAP estimators is non-trivial in general separable Banach spaces 3 . Natural questions arising in this context are

  • whether (or under which conditions) MAP estimators exist and
  • whether MAP estimators can equivalently be characterized as minimizers of the OM functional.

One fundamental ingredient, and the most direct reason why small-ball probabilities are related to the functional I, is the following theorem about the OM functional:

Theorem 1.5 (Dashti et al 2013, theorem 3.2).

Let assumption 2.1 hold. Then for $z_1,z_2\in E$,

However, theorem 1.5 does not yield the full answer regarding the connection of MAP estimators and OM minimizers—not only is it restricted to elements of the Cameron–Martin space E, also it only provides pairwise comparisons of two points $z_{1},z_{2}\in E$, while MAP estimators require consideration of the ratio $\mathfrak{R}_{\mu^{y}}^{\delta}(z_1,\sup)$ and its limit as $\delta\searrow 0$.

Remark 1.6. Note that I amounts to a Tikhonov–Phillips regularization of the misfit functional Φ, so the results in this manuscript are also to be understood in the context of regularized optimization.

Dashti et al (2013) discussed, for the first time, the existence of MAP estimators as well as their connection to minimizers of the OM functional, in the specific setting of a Bayesian inverse problem of type (1.1). More precisely, they claim to prove the following statements for every separable Banach space X under assumption 2.1 below (Dashti et al 2013, theorem 3.5):

  • (I)  
    Let $z^{\delta} = \mathrm{argmax}_{z\in X} \mu^{y}(B_{\delta}(z))$. There exists a subsequence of $(z^{\delta})_{\delta \gt 0}$ that converges strongly in X to some element $\overline{z}\in E$.
  • (II)  
    The limit $\overline{z}$ is a MAP estimator of µy (this proves existence of such an object) and it is a minimizer of the OM functional.

However, while the ideas of Dashti et al (2013) are groundbreaking, their proof of the above statements, as well as the corrections provided by Kretschmann (2019), rely on techniques that hold in separable Hilbert spaces rather than separable Banach spaces, see section 1.1.

Further, neither Dashti et al (2013) nor Kretschmann (2019) show the existence of the δ-ball maximizers zδ above, which are the central objects in their proofs. It turns out that the existence of zδ is a highly non-trivial issue and has recently been discussed by Lambley and Sullivan (2022), who proved their existence for certain measures (including posteriors arising from non-degenerate Gaussian priors on $\ell^p$) and gave counterexamples for others.

Our approach relies on asymptotic maximizers in the following sense, which are guaranteed to exist by the definition of the supremum (in fact, even for arbitrary families $(\varepsilon^\delta)_{\delta \gt 0}$ in $(0,1)$).

Definition 1.7. Let X be a separable metric space and ν be a probability measure on X. A family $(\zeta^\delta)_{\delta \gt 0}\subset X$ is called an asymptotic maximizing family (AMF) for ν, if there exists a family $(\varepsilon^\delta)_{\delta \gt 0}$ in $(0,1)$ such that $\varepsilon^\delta \searrow 0$ as $\delta\searrow 0$ and, for each δ > 0,

Equation (1.5)

Lemma 1.8. For any separable metric space X and any probability measure ν on X, there exists an AMF for ν. Further, if $\bar{z}$ is a MAP estimator for ν, then the constant family $(\bar{z})_{\delta \gt 0}$ forms an AMF for ν.

Proof. This follows directly from the definition of the supremum (in fact, for any family $(\varepsilon^\delta)_{\delta \gt 0}$ a corresponding asymptotic maximizing family (AMF) can be found) and definitions 1.3 and 1.7.

The corresponding statements to (I)–(II) are given in conjecture 2.3. Note that we strengthened those statements by stating the equivalence of MAP estimators, minimizers of the OM functional and limit points of AMFs. Especially the latter cannot be expected for the δ-ball maximizers zδ , even when they exist and are unique, since it is easy to construct MAP estimators that are not limit points of $(z^{\delta})_{\delta \gt 0}$ as $\delta \searrow 0$, even for continuous measures on $\mathbb{R}^{1}$. Apart from their guaranteed existence, this is yet another advantage of working with AMFs $(\zeta^{\delta})_{\delta \gt 0}$ rather than with $(z^{\delta})_{\delta \gt 0}$.

1.1. Why this paper is necessary

The contribution of this paper is twofold:

  • 1.  
    Remedy the shortcomings of previous work on the existence of MAP estimators mentioned above and listed in detail below, resulting in a corrected and strongly simplified proof of the existence of MAP estimators in the Hilbert space setting (theorem 2.4, proven in section 3);
  • 2.  
    Generalize the corresponding result from Hilbert spaces to sequence spaces $X = \ell^{p}$, $1 \leqslant p \lt \infty$, of pth-power summable sequences and diagonal 4 and nondegenerate Gaussian prior measures, proven in section 4). For this purpose, we develop a novel and non-trivial convexification argument for the difference between the Cameron–Martin norm $|\boldsymbol{\cdot}|_E$ and the ambient space norm $\|\boldsymbol{\cdot}\|_X$ in proposition 4.6.

The shortcomings of previous work on the existence of MAP estimators include:

  • The crucial object in the proofs of (Dashti et al 2013),
    is defined without a proof of its existence. This is a highly non-trivial issue which was not fixed by the corrections in Kretschmann (2019). In (Lambley and Sullivan 2022, example 4.8), the authors construct a probability measure on a separable metric space without such δ-ball maximizers zδ , but prove in (Lambley and Sullivan 2022, corollary 4.10) that such maximizers exist for posteriors arising from non-degenerate Gaussian priors on $\ell^p$.
  • Specific Hilbert space properties are used in Banach spaces, in particular, the proof of (Dashti et al 2013, theorem 3.5) relies heavily on the existence of an orthogonal basis of the Cameron–Martin space which satisfies $\lVert x \rVert_{X}^{2} = \sum_{n\in{\mathbb{N}}} x_{n}^{2}$ for $x\in X$, where xn are the coordinates of x in that basis.
  • While the defining property of a MAP estimator $z\in X$ is given by
    the proof of (Dashti et al 2013, theorem 3.5) considers this limit only for a specific null sequence $(\delta_{m})_{m\in{\mathbb{N}}}$. This is hidden in their notation, where, for simplicity, they adopt the notation $(z^{\delta})_{\delta \gt 0}$ for subsequences—a rather typical abuse of notation which is illegitimate in this specific case, since different null sequences $(\delta_{m})_{m \in {\mathbb{N}}}$ can yield different candidates for MAP estimators.
  • While Dashti et al (2013, lemma 3.9) is stated for $\bar{z} = 0$, it is later applied to more general $\bar{z}\in X$. In Banach spaces, validity of this substitution is equivalent to tacit assumption of the Radon–Riesz property, which only holds for a strict subset of separable Banach spaces (and excludes the paradigmatic case $X = \ell^1$).
  • The proof of (Dashti et al 2013, corollary 3.10) relies on MAP estimators being limit points of $(z^{\delta})_{\delta \gt 0}$. However, only the reverse implication had been discussed, and, in fact, this implication is incorrect even when zδ , δ > 0, is guaranteed to exist, as can be easily seen from the following simple example of a bimodal distribution on $\mathbb{R}^{1}$: let $0 \lt \sigma \lt 1$ and µy have Lebesgue density $\rho^y(x) \propto \exp(-(x-1)^2/2)\chi_{\mathbb{R}^+} + \exp(-(x+1)^2/(2\cdot \sigma^2))\chi_{\mathbb{R}^-}$. Then $z^\delta = 1$ for all $\delta \lt \frac{1}{2}$, but both $x = \pm 1$ are true MAP estimators. For this purpose, we work with AMFs introduced in definition 1.7, the limit points of which we show to coincide with MAP estimators.

Conjecture 2.3 in general separable Banach spaces and general Gaussian measures remains unsolved and is an extremely intricate issue. The 'skeleton' of our proofs is provided in theorem 2.8, where the main steps are shown under suitable conditions (while proving those conditions in specific settings typically requires a lot of work). This establishes a framework for proving conjecture 2.3 in other Banach spaces, thereby paving the road for future research on this topic.

1.2. Related work

The definition of strong modes by Dashti et al (2013) has sparked a series of papers with variations on this concept, most notably generalized strong modes Clason et al (2019), weak modes (Helin and Burger 2015). Agapiou et al (2018) studied the MAP estimator for Bayesian inversion with sparsity-promoting Besov priors. The connection between weak and strong modes was further explored in Lie and Sullivan (2018), and Ayanbayev et al (2021a, 2021b) discussed stability and convergence of global weak modes using Γ-convergence. Recently, Lambley and Sullivan (2022) presented a perspective on modes via order theory.

1.3. Structure of this manuscript

Section 2 describes the common framework along which the well-definedness of MAP estimators can be proven in all cases considered (Hilbert space and $X = \ell^p$) and, possibly, further separable Banach spaces. Sections 3 and 4 apply this framework in order to prove well-definedness of the MAP estimator in the Hilbert space and $\ell^p$ case, respectively.

2. Existence of maximum a posteriori estimators

This section covers all the main results mentioned in the introduction. Throughout the paper, we will make the following general assumptions:

Assumption 2.1. Let $(X,\|\boldsymbol{\cdot}\|_X)$ be a separable Banach space, which we call the ambient space, and µ be a non-degenerate centered Gaussian (prior) probability measure on X. Let $(E,\lvert \boldsymbol{\cdot} \rvert_{E})$ denote the corresponding Cameron–Martin space and µy be the (posterior) probability measure on X given by (1.2), where the potential $\Phi \colon X \to \mathbb{R}$ satisfies the following conditions:

  • (a)  
    Φ is globally bounded from below, i.e. there exists $M\in \mathbb{R}$ such that for all $u\in X$,
  • (b)  
    Φ is locally bounded from above, i.e. for every r > 0 there exists $K(r) \gt 0$ such that for all $u\in X$ with $\|u\|_X \lt r$ we have
  • (c)  
    Φ is locally Lipschitz continuous, i.e. for every r > 0 there exists $L(r) \gt 0$ such that for all $u_1,u_2\in X$ with $\|u_1\|_X,\|u_2\|_X \leqslant r$ we have

Purely for convenience, we assume that $\Phi(0) = 0$. This can be easily achieved by subtracting $\Phi(0)$ from Φ and incorporating the resulting additional prefactor into the normalization constant Z in (1.2).

Remark 2.2. Conditions (a)–(c) are identical to (Dashti et al 2013, assumption 2.1), except that (a) is slightly stronger: (Dashti et al 2013) initially assume the weaker inequality $\Phi(u)\geqslant M - \varepsilon \|u\|_{X}^{2}$ for every ε > 0, but also make the additional assumption of global boundedness from below (in the sense of (a) in assumption 2.1) in their main theorem 3.5. This assumption is usually not too restrictive as our condition (a) still covers most practical Bayesian inverse problems, since Φ is typically even non-negative (cf introduction). Further, the non-degeneracy of µ together with the above conditions guarantees that the ratios $\mathfrak{R}_{\mu}^{\delta}(w,z)$ and $\mathfrak{R}_{\mu^{y}}^{\delta}(w,z)$ etc are always well-defined. Given the assumption $\Phi(0) = 0$, condition (b) is an implication of (c), but we keep the conditions separated for didactical reasons and comparability to previous papers.

First, let us restate the result in (Dashti et al 2013, theorem 3.5) as a conjecture, since their proof is only (partially, due to unclear existence of δ-ball maximizing centers zδ ) correct in Hilbert spaces and the Banach space version remains an open problem:

Conjecture 2.3. Let assumption 2.1 hold. Then:

  • (a)  
    The following statements are equivalent:
    • (i)  
      $\bar z$ is an X-strong limit point as δ → 0 of some AMF for µy . 5
    • (ii)  
      $\bar z\in E$ and $\bar z$ minimizes the OM functional.
    • (iii)  
      $\bar z$ is a MAP estimator.
  • (b)  
    There exists at least one MAP estimator.

The main goal of this paper is to provide proofs of conjecture 2.3 in the special cases where

  • X is a separable Hilbert space (theorem 2.4), where we correct and strongly simplify the proofs initially proposed by (Dashti et al 2013) and worked out in detail in the PhD Thesis of Kretschmann (2019), or
  • $X = \ell^{p}$ with $p\in [1,\infty)$ and $\mu = \otimes_{k\in{\mathbb{N}}} {\mathscr{N}}(0,\sigma_k^2)$ is a diagonal Gaussian measure on X (theorem 2.5), which is an entirely new result.

Theorem 2.4. Let assumption 2.1 hold. Then conjecture 2.3 holds for any separable Hilbert space $X = {\mathscr{H}}$.

Proof. See section 3.

Theorem 2.5. Let assumption 2.1 hold. Then conjecture 2.3 holds for $X = \ell^p$, $p\in[1,\infty)$, and any diagonal Gaussian (prior) measure $\mu = \otimes_k {\mathscr{N}}(0,\sigma_k^2)$ on X.

Proof. See section 4.

2.1. Proof strategy

In order to prove theorems 2.4 and 2.5, we proceed along the following seven steps, where $(\zeta^{\delta})_{\delta \gt 0}$ is an arbitrary AMF for µy and $(\delta_{m})_{m\in{\mathbb{N}}}$ denotes an arbitrary null sequence. This is a rather general approach and can be followed to prove conjecture 2.3 for further classes of Banach spaces.

  • (i)  
    Show that $(\zeta^{\delta_{m}})_{m\in{\mathbb{N}}}$ is bounded.
  • (ii)  
    Extract a weakly convergent subsequence of $(\zeta^{\delta_{m}})_{m\in{\mathbb{N}}}$, which, for simplicity, we denote by the same symbol, with weak limit $\bar{z} \in X$.
  • (iii)  
    Prove that $\bar{z}$ lies in the Cameron–Martin space E.
  • (iv)  
    Show that the convergence is, in fact, strong: $\lVert \zeta^{\delta_{m}} - \bar{z} \rVert_{X} \to 0$ as $m\to\infty$.
  • (v)  
    Infer that any limit point $\bar{z}$ of an AMF (not just the one obtained in (ii)–(iv)) is a MAP estimator, proving its existence.
  • (vi)  
    Prove that any MAP estimator minimizes the OM functional and is a limit point of some AMF.
  • (vii)  
    Show that any OM minimizer is also a MAP estimator.

An illustration how this proof strategy fits within the context of conjecture 2.3 can be found in figure 1.

Figure 1.

Figure 1. Strategy for proving the existence and equivalence of AMF limit points, MAP estimators and OM minimizers.

Standard image High-resolution image

The proof of (i), (iii) and (iv) is highly non-trivial and relies on the following idea: First, we prove that, under assumption 2.1, the fraction $\mathfrak{R}_{\mu}^{\delta_{m}}(\zeta^{\delta_m},0)$ is bounded away from 0, meaning that the $\zeta^{\delta_m}$ do not carry negligible prior ball mass in the asymptotic limit. Second, we show for any sequence $(x_{m})_{m\in{\mathbb{N}}}$ in X that, if either

  • $(x_{m})_{m\in{\mathbb{N}}}$ is unbounded or
  • $x_{m} \rightharpoonup \bar{z}$ with $\bar{z}\notin E$ or
  • $x_{m} \rightharpoonup \bar{z} \in E$ but $\lVert x_{m} - \bar{z} \rVert_{X} \not\to 0$,

then

providing a contradiction for $x_{m} = \zeta^{\delta_{m}}$. The three properties described above, as well as (ii), are formulated in condition 2.7 (C1)–(C4) and stated as assumptions in theorem 2.8, which can therefore be seen as a 'shell theorem'. Note that steps (v), (vi) and (vii) then follow in any separable Banach space.

Finally, we prove condition 2.7 (C1)–(C4) and finalize the proof of conjecture 2.3 in the two mentioned cases—section 3 covers the case where X is a Hilbert space (theorem 2.4), while section 4 considers $X = \ell^{p}$, $1\leqslant p \lt \infty$, and diagonal Gaussian measures (theorem 2.5).

Remark 2.6. Apart from providing a 'skeleton' for the proof of conjecture 2.3, the strength of theorem 2.8 lies in its generality: It holds for any separable Banach space and thereby paves the way for future research. Further, remarkably, while condition 2.7 (C1)–(C4) are stated in terms of the prior measure µ, the conclusions are drawn for MAP estimators of µy , with assumption 2.1 providing the sufficient conditions for comparability between prior and posterior in order to make this possible.

2.2. A framework for proving existence of MAP estimators

While we use the proof strategy described above to prove theorems 2.4 and 2.5, it paves the way for further research. Note that theorem 2.8 is applicable to any separable Banach space, so this approach can be followed to prove conjecture 2.3 for other classes of Banach spaces.

Condition 2.7. Under assumption 2.1, we introduce the following four conditions:

  • (vanishing condition for unbounded sequences)—For any null sequence $(\delta_{m})_{m \in {\mathbb{N}}}$ in $\mathbb{R}^{+}$ and unbounded sequence $(x_m)_{m\in{\mathbb{N}}}$ in X,
  • (weakly convergent subsequence condition)—If $(\delta_{m})_{m \in {\mathbb{N}}}$ is a null sequence in $\mathbb{R}^{+}$ and $(x_m)_{m\in{\mathbb{N}}}$ is a bounded sequence in X such that there exists K > 0 satisfying, for each $m\in{\mathbb{N}}$, $\mathfrak{R}_{\mu}^{\delta_{m}}(x_m,0) \geqslant K$, then $(x_m)_{m\in{\mathbb{N}}}$ has a weakly convergent subsequence.
  • (vanishing condition for weak limits outside E)—For any null sequence $(\delta_{m})_{m \in {\mathbb{N}}}$ in $\mathbb{R}^{+}$ and weakly convergent sequence $(x_{m})_{m \in {\mathbb{N}}}$ in X with weak limit $\bar z \notin E$, $\liminf_{m \to \infty} \mathfrak{R}_{\mu}^{\delta_{m}}(x_m,0) = 0$. 6
  • (vanishing condition for weakly, but not strongly convergent sequences)—For any null sequence $(\delta_{m})_{m \in {\mathbb{N}}}$ in $\mathbb{R}^{+}$ and weakly, but not strongly convergent sequence $(x_{m})_{m \in {\mathbb{N}}}$ in X with weak limit $\bar z \in E$, $\liminf_{m \to \infty} \mathfrak{R}_{\mu}^{\delta_{m}}(x_m,0) = 0$. 7

Theorem 2.8. Let assumption 2.1 hold and $(\zeta^{\delta})_{\delta \gt 0}$ be any asymptotic maximizing family (AMF) in X. Then there exist constants K > 0 and $\delta_0\gt 0$, such that, for any $0 \lt \delta \lt \delta_0$,

Equation (2.1)

It follows that:

  • (a)  
    If condition 2.7 (C1)–(C4) hold, $(\zeta^{\delta})_{\delta \gt 0}$ is an AMF in X and $(\delta_{m})_{m\in{\mathbb{N}}}$ is a null sequence, then $(\zeta^{\delta_m})_{m\in{\mathbb{N}}}$ has a subsequence which converges strongly (in X) to an element $\bar w\in E$ and any limit point $\bar{z}$ of $(\zeta^{\delta})_{\delta \gt 0}$ lies in E and is a MAP estimator for µy .
  • (b)  
    If condition 2.7 (C3) holds, then any MAP estimator for µy is an element of the Cameron–Martin space E, minimizes the OM functional and is a limit point of some AMF.
  • (c)  
    If condition 2.7 (C3) holds and µy has a MAP estimator $\bar{z}$, then any minimizer $\bar{x}\in E$ of the OM functional is also a MAP estimator.

In particular, if condition 2.7 (C1)–(C4) are satisfied, then conjecture 2.3 holds.

Proof. Due to assumption 2.1(c) and definition 1.7 there exists a family $(\varepsilon^\delta)_{\delta \gt 0}$ such that $\varepsilon^\delta \searrow 0$ for $\delta \searrow 0$, and, for any $0 \lt \delta \leqslant 1$,

Equation (2.2)

Furthermore, by assumption 2.1(a), for any $z\in X$ and δ > 0,

Equation (2.3)

Choosing $0 \lt \delta_{0} \leqslant 1$ such that $\varepsilon^\delta \lt 1/2$ for each $0 \lt \delta \lt \delta_{0}$, and denoting $K := e^{M-L(1)}/2$,

proving (2.1).

Proving (a). Consider the sequence $\zeta^{\delta_m}$ with $\delta_{m} \searrow 0$ as $m\to\infty$. Then

  • (i)  
    condition 2.7 (C1) implies boundedness of $(\zeta^{\delta_m})_{m\in{\mathbb{N}}}$ in X,
  • (ii)  
    condition 2.7 (C2) implies that $(\zeta^{\delta_m})_{m\in{\mathbb{N}}}$ has a weakly (in X) convergent subsequence with weak limit point $\bar w \in X$.
  • (iii)  
    condition 2.7 (C3) implies that any weak (in X) limit point $\bar z \in X$ of $(\zeta^{\delta_{m}})_{m\in{\mathbb{N}}}$ lies in the Cameron–Martin space E.
  • (iv)  
    condition 2.7 (C4) implies that any weak (in X) limit point $\bar z \in E$ of $(\zeta^{\delta_{m}})_{m\in{\mathbb{N}}}$ is also a strong (in X) limit point of $(\zeta^{\delta_{m}})_{m\in{\mathbb{N}}}$.

In particular, there exists a subsequence of $(\zeta^{\delta_{m}})_{m\in{\mathbb{N}}}$ which converges strongly (in X) to some $\bar w\in E$. This proves the first part of (a).

Now let $\bar z$ be any limit point of $(\zeta^{\delta})_{\delta \gt 0}$ and $(\delta_{m})_{m\in{\mathbb{N}}}$ be such that $(\zeta^{\delta_{m}})_{m\in{\mathbb{N}}}$ converges (strongly) to $\bar z$. Note that $\bar z\in E$ by (iii). We set

Using the local Lipschitz constant L(r) for Φ on $B_r(0)$ (see assumption 2.1(c)), we obtain, for any $m\in{\mathbb{N}}$,

Since $\zeta^{\delta_{m}} \to \bar z$ as $m\to\infty$, lemma A.2 and definition 1.7 of AMFs imply

Equation (2.4)

If we can show that $\limsup_{\delta\searrow 0}\mathfrak{R}_{\mu^{y}}^{\delta}(\sup,\bar z)\leqslant 1$ (i.e. for any null sequence, not just for $(\delta_{m})_{m\in{\mathbb{N}}}$), then, since $\mathfrak{R}_{\mu^{y}}^{\delta}(\sup,\bar z) \geqslant 1 $ for each δ > 0, this implies that in fact $\lim_{\delta\searrow 0}\mathfrak{R}_{\mu^{y}}^{\delta}(\sup,\bar z) = 1$, proving that $\bar z$ is a MAP estimator and finalizing the proof. For this purpose assume otherwise, i.e. there exists a null sequence $(\varepsilon_{m})_{m\in{\mathbb{N}}}$ such that $\limsup_{m\to\infty} \, \mathfrak{R}_{\mu^{y}}^{\varepsilon_m}(\sup,\bar z) \gt 1$.

With the same argumentation as in (i)–(iv), there exists a subsequence of $(\zeta^{\varepsilon_{m}})_{m\in{\mathbb{N}}}$, which, for simplicity, we denote by the same symbol, that converges strongly to some element $\bar{x} \in E$. Similarly to (2.4) we obtain

Equation (2.5)

Now, since $\bar{x},\bar{z} \in E$, the property of the OM functional, theorem 1.5, guarantees the existence of the limit $\lim_{\delta\searrow 0} \mathfrak{R}_{\mu^{y}}^{\delta}(\bar x,\bar z)$ and therefore (2.4) implies

Equation (2.6)

It follows from (2.5) and (2.6) that

which is a contradiction, finalizing the proof.

Proving (b). Now let $\bar z\in X$ be any MAP estimator (not necessarily the one obtained as the limit of $\zeta^{\delta_m}$). Assuming $\bar z \notin E$ and considering the constant sequence $(\bar z)_{m\in{\mathbb{N}}}$ (clearly converging to $\bar{z}$), the vanishing condition for weak limits outside E, condition 2.7 (C3), implies that

for any null sequence $(\delta_{m})_{m\in{\mathbb{N}}}$. Since the constant family $(\bar{z})_{\delta \gt 0}$ is an AMF for µy by lemma 1.8, (2.1) implies

This contradiction proves $\bar{z}\in E$. By definition of MAP estimators and theorem 1.5, it follows for any $z^\star \in E$ that

Hence, $I(z^\star) \geqslant I(\bar z)$ and $\bar{z}$ is a minimizer of the OM functional. Finally, by lemma 1.8, $\bar{z}$ is also a limit point of the constant AMF $(\bar{z})_{m\in{\mathbb{N}}}$.

Proving (c). By (b), $\bar{z}\in E$ and minimizes the OM functional I, hence $I(\bar{z}) = I(\bar{x})$. It follows from theorem 1.5 that

proving (c).

In summary, we have shown that each AMF (the existence of some AMF follows from lemma 1.8) has a limit point $\bar{z}\in E$, which is a MAP estimator. Furthermore, each limit point of an AMF lies in E and is a MAP estimator. In addition, any MAP estimator minimizes the OM functional and is a limit point of some AMF. Finally, each minimizer of the OM functional is a MAP estimator. Together, this proves conjecture 2.3.

2.3. Some comments on the proof of condition 2.7 (C1)–(C4)

The main obstacle in proving theorems 2.4 and 2.5 is the verification of condition 2.7 (C1)–(C4). Let us shortly summarize one of the main ideas, demonstrated on the derivation of the vanishing condition for unbounded sequences (C1) in the finite-dimensional setting $X = \mathbb{R}^{k}$, $k\in{\mathbb{N}}$: our aim is to show that, for any δ > 0 the ratio $\mathfrak{R}_{\mu}^{\delta}(x,0)$ decays to zero as $\lVert x \rVert_{X} \to \infty$. For this purpose we extract a certain prefactor from the integrals in the following way:

If L satisfies the following conditions,

  • (i)  
    there exists α > 0 and $\kappa_{1},\kappa_{2} \geqslant 0$ such that, for each $v\in\mathbb{R}^{k}$, $\lVert v \rVert_{X}^{\alpha} - \kappa_{1} \leqslant L(v) \leqslant \lVert v \rVert_{X}^{\alpha} + \kappa_{2}$,
  • (ii)  
    $\lvert \boldsymbol{\cdot} \rvert_{E}^{2} - L$ is non-negative and convex,

then (ii) implies that, by Anderson's inequality, we can bound the remaining ratio of integrals from above by 1, while (i) implies that, for any fixed δ > 0, the first fraction vanishes as $\lVert x \rVert_{X} \to \infty$.

In separable Hilbert spaces $X = {\mathscr{H}}$ a function L satisfying (i)–(ii) is not hard to find (in both finite and infinite dimensions) since both $\lVert \boldsymbol{\cdot} \rVert_{{\mathscr{H}}}$ and $\lvert \boldsymbol{\cdot} \rvert_{E}$ are quadratic. In general separable Banach spaces the large discrepancy between the geometries induced by the norms $\lVert \boldsymbol{\cdot} \rVert_{X}$ and $\lvert \boldsymbol{\cdot} \rvert_{E}$ strongly complicates the search for such a function L, where convexity is particularly hard to ensure. For $X = \ell^{p}$, the technical proposition 4.6 guarantees the existence of such a function L. This result together with proposition 4.8 can be seen as the crux to the results presented in this paper.

3. The Hilbert space case: proof of theorem 2.4

In this section we treat the case where $X = \mathscr{H}$ is a Hilbert space, i.e. we prove theorem 2.4. These results have already been presented by Dashti et al (2013), with some corrections by Kretschmann (2019). However, both of these manuscripts did not prove the existence of the central object in their proofs, namely the δ-ball maximizing centers $z^\delta = \mathrm{argmax}_x \mu^y(B_\delta(x))$, which seems to be a highly nontrivial issue, see Lambley and Sullivan (2022). This section closes this theoretical gap by working with AMFs ζδ defined by definition 1.7 and serves two further purposes:

First, the Hilbert space case provides insight into the main ideas of the proof of conjecture 2.3 with fewer technicalities than the more general case $X = \ell^{p}$. Second, we use a helpful statement from (Da Prato and Zabczyk 2002), restated in proposition 3.2 below, which simplifies the proofs considerably in comparison to (Dashti et al 2013, Kretschmann 2019) and renders the proofs more streamlined.

Notation 3.1. Let $\mathscr{H}$ be an infinite-dimensional separable Hilbert space and $\mu = \mathscr N(0,Q)$ a centered and non-degenerate Gaussian measure on $\mathscr{H}$. As the covariance operator Q of µ is a self-adjoint, positive, trace-class operator (Baker 1973), there exists an orthonormal eigenbasis $(e_k)_{k\in{\mathbb{N}}}$ of Q in which $\mu = \otimes_{k\in{\mathbb{N}}} \mathscr N(0,\sigma_k^2)$ is a product measure of one-dimensional Gaussian measures, where $Qe_k = \sigma_k^2 e_k$ and $\sigma_{k} \gt 0$ for each $k\in{\mathbb{N}}$ and $\sum_{k\in{\mathbb{N}}}\sigma_k^2 \lt \infty$. We assume the eigenvalues to be decreasing, i.e. $\sigma_{1} \geqslant \sigma_{2} \geqslant \ldots$. We write $D = \mathrm{diag}(d_1,d_2,\ldots) := \sum_{k\in{\mathbb{N}}} d_{k}\, e_{k}\otimes e_{k}$ for any operator that is diagonal in the basis $(e_k)_{k\in{\mathbb{N}}}$. Denoting $a_k := \sigma_k^{-2}$ for $k \in {\mathbb{N}}$, the Cameron–Martin space of µ is given by

Equation (3.1)

see Da Prato and Zabczyk (2014, theorem 2.23). Finally, we define the orthogonal projection operators $\Pi^k,\Pi_k\colon {\mathscr{H}} \to {\mathscr{H}}$, $k\in{\mathbb{N}}\cup \{0 \}$, by

Note that $\Pi^{0} = 0$ and $\Pi_0 = \mathrm{Id}$.

We start by reciting the following result which will allow us to 'extract an exponential rate' by integrating over a slightly wider Gaussian measure:

Proposition 3.2 (Da Prato and Zabczyk 2002, proposition 1.3.11). If $\Gamma \colon {\mathscr{H}} \to {\mathscr{H}}$ is self-adjoint and such that $Q^{1/2}\Gamma Q^{1/2}$ is trace class on ${\mathscr{H}}$ and additionally $\langle x, Q^{1/2}\Gamma Q^{1/2}x \rangle_{\mathscr{H}} \lt \|x\|_{{\mathscr{H}}}^2$ for all $x\in{\mathscr{H}}$. Then for $\mu = {\mathscr{N}}(0,Q)$ and $\nu = {\mathscr{N}}(0, (Q^{-1}-\Gamma)^{-1})$ we have

Remark 3.3. In one dimension this boils down to the following: let σ > 0 and $\mu = {\mathscr{N}}(0,\sigma^2)$. Then, for any $\gamma \lt \sigma^{-2}$,

where $\nu = {\mathscr{N}}(0,\frac{\sigma^2}{1-\gamma^2\sigma^2}) = {\mathscr{N}}(0,(\sigma^{-2}-\gamma^2)^{-1})$.

Then we can re-prove the following lemma (as already stated in (Dashti et al 2013) and (Kretschmann 2019)):

Lemma 3.4 (Dashti et al 2013, lemma 3.6).

Let assumption 2.1 hold and $X = {\mathscr{H}}$ be a separable Hilbert space. Then, using notation 3.1, for any δ > 0 and $z\in{\mathscr{H}}$, and $n\in {\mathbb{N}}$,

Proof. Using notation 3.1, for arbitrary $n\geqslant n_0$, let $\Gamma = \mathrm{diag}(0,\ldots,0,r,\ldots,)$ with entries $0 \lt r \lt a_n$ starting at position n, such that $Q^{-1}-\Gamma = \mathrm{diag}(a_1,\ldots, a_{n-1},a_n-r, a_{n+1}-r,\ldots)$ is a valid precision (i.e. inverse covariance) operator of a Gaussian measure on ${\mathscr{H}}$. This means that $\langle x, \Gamma x\rangle = r\|\Pi_{n-1} x\|_X^2$. This choice of Γ fulfills the conditions of proposition 3.2: first, $(Q^{-1}-\Gamma)^{-1}$ is a valid covariance operator:

Second, since Q is trace class (Baker 1973), so is

Finally, as $r \lt a_n = \sigma_n^{-2}$, and $\sigma_m^2 \leqslant \sigma_n^2$ for m > n, we also have that $r\sigma_m^2 \leqslant 1$ for all $m\geqslant n$, hence $\langle x,Q^{1/2}\Gamma Q^{1/2} x\rangle \leqslant \|x\|_X^2$.

Thus, with $\nu = {\mathscr{N}}(0, (Q^{-1}-\Gamma)^{-1})$, proposition 3.2 implies for any δ > 0:

due to Anderson's inequality (theorem A.4 with $\gamma = \nu$, $A = B_\delta(0)$ and a = z). Since above inequality holds for any $0 \lt r \lt a_n$, it also holds for $r = a_{n}$ by continuity, and the claim follows.

Corollary 3.5. Let assumption 2.1 hold and $X = {\mathscr{H}}$ be a separable Hilbert space. Then the vanishing condition for unbounded sequences, condition 2.7 (C1), holds.

Proof. Let $(\delta_{m})_{m \in {\mathbb{N}}}$ be a null sequence in $\mathbb{R}^{+}$ and $(x_m)_{m\in{\mathbb{N}}}$ be an unbounded sequence in X. We have to prove that for any ε > 0 and any $m\in {\mathbb{N}}$ there exists a $m^\star \geqslant m$ such that

Indeed, for arbitrary ε > 0 and $m\in {\mathbb{N}}$ there exists M > 0 such that $\frac{a_1 M^2}{4} \geqslant \log \varepsilon^{-1}$. Since $(\delta_{m})$ is a null sequence, there exists $m_1 \geqslant m$ such that for all $n\geqslant m_1$, $\delta_n \lt M/4$. By unboundedness of $(x_m)_m$ we can find a $m^\star\geqslant m_1\geqslant m$ such that $\|x_{m^\star}\|_{\mathscr{H}} \geqslant M$. Then, by lemma 3.4,

Similarly we can shorten the proof of the following lemma:

Corollary 3.6 (Dashti et al (2013, lemma 3.7), Kretschmann (2019, lemma 4.11)).

Let assumption 2.1 hold and $X = {\mathscr{H}}$ be a separable Hilbert space. Then the vanishing condition for weak limits outside E, condition 2.7 (C3), is satisfied.

Proof. We use notation 3.1 throughout the proof.

Let $(\delta_{m})_{m \in {\mathbb{N}}}$ be a null sequence in $\mathbb{R}^{+}$ and $(x_m)_{m\in{\mathbb{N}}}$ be a weakly convergent sequence with weak limit $\bar{z} \notin E$.

Let ε > 0. Since $\bar{z}\notin E$, $\lvert \Pi^{n}\bar{z} \rvert_{E} \to \infty$ as $n\to\infty$ by (3.1), hence there exists $n \in {\mathbb{N}}$ (which we fix from now on) such that

Equation (3.2)

Note that $\Gamma := \mathrm{diag} (a_1/2, a_2/2, \ldots, a_n/2, 0,0, \ldots)$ is a valid choice for the operator Γ in proposition 3.2 and observe that

Equation (3.3)

Since weak convergence $x_m \rightharpoonup \bar z$ implies componentwise convergence, there exists $m_1\in {\mathbb{N}}$ such that, for any $m \geqslant m_1$, $\lvert \Pi^{n}(\bar{z} - x_m) \rvert_{E} \leqslant 1$. Since $(\delta_{m})_{m\in{\mathbb{N}}}$ is a null sequence, there exists $m^\star\geqslant m_1$ such that, for each $m\geqslant m^\star$, $\delta_{m}^2\leqslant \sigma_n^2/n$. It follows from (3.2) for any $m\geqslant m^\star$, any $z \in B_{\delta_m}(x_m)$ and any $w \in B_{\delta_m}(0)$, denoting $x_{m,j},z_j,w_j$ for the jth component of $x_m,z,w$, that

  • (i)  
    $\lvert \Pi^{n}(x_m-z) \rvert_{E}^{2} = \sum_{j = 1}^{n} \sigma_{j}^{-2} \lvert x_{m,j} - z_{j} \rvert^{2} \leqslant \sum_{j = 1}^{n} \sigma_{n}^{-2} \delta_{m}^{2} \leqslant \sum_{j = 1}^{n} n^{-1} = 1$;
  • (ii)  
    $\lvert \Pi^{n} z \rvert_{E} \geqslant \tfrac{1}{2}\lvert \Pi^{n} \bar{z} \rvert_{E} + \underbrace{\tfrac{1}{2} \lvert \Pi^{n} \bar{z} \rvert_{E}}_{\geqslant 2} - \underbrace{\lvert \Pi^{n}(\bar{z} - x_m) \rvert_{E}}_{\leqslant 1} - \underbrace{\lvert \Pi^{n}(x_m-z) \rvert_{E}}_{\leqslant 1} \geqslant \tfrac{1}{2}\lvert \Pi^{n} \bar{z} \rvert_{E}$;
  • (iii)  
    $\lvert \Pi^{n} w \rvert_{E}^2 = \sum_{j = 1}^{n} \sigma_{j}^{-2} \lvert w_{j} \rvert^{2} \leqslant \sum_{j = 1}^{n} \sigma_{n}^{-2} \delta_{m}^{2} \leqslant \sum_{j = 1}^{n} n^{-1} = 1 \leqslant \tfrac{1}{16}\lvert \Pi^{n} \bar{z} \rvert_{E}^2$.

Using (3.3) and Anderson's inequality (theorem A.4) applied to the Gaussian measure ν on ${\mathscr{H}}$ as defined in proposition 3.2, this implies, for any $m\geqslant m^\star$,

proving the claim.

Corollary 3.7 (Dashti et al 2013, lemma 3.9 and Kretschmann 2019, lemma 4.13).

Let assumption 2.1 hold and $X = {\mathscr{H}}$ be a separable Hilbert space. Then the vanishing condition for weakly, but not strongly convergent sequences, condition 2.7 (C4), is satisfied.

Proof. We use notation 3.1 throughout the proof. Let $(\delta_{m})_{m \in {\mathbb{N}}}$ be a null sequence in $\mathbb{R}^{+}$ and $(x_m)_{m\in{\mathbb{N}}}$ converge weakly, but not strongly to $\bar{z} \in E$. We will show that, for any ε > 0 and $m_{1}\in{\mathbb{N}}$, there exists $m^\star \geqslant m_{1}$ such

Now let ε > 0 and $m_{1}\in{\mathbb{N}}$. Since weak convergence $x_m\rightharpoonup \bar z$ implies $\|\bar z\|_{\mathscr{H}}\leqslant \liminf_{m\to\infty} \|x_m\|_{\mathscr{H}}$ and as the convergence is not strong by assumption, the Radon–Riesz property guarantees the existence of c > 0 such that

Equation (3.4)

(Otherwise, $\lim_{m\to\infty} \|x_m\| = \|\bar z\|_{\mathscr{H}}$, in which case weak convergence implies strong convergence.) Since $a_{k} \to \infty$ as $k\to\infty$, there exists $n \in {\mathbb{N}}$ (which we fix from now on) such that $a_{n} \geqslant - 24 c^{-2}\log\varepsilon$. Since $(\delta_{m})_{m\in{\mathbb{N}}}$ is a null sequence and weak convergence $x_m \rightharpoonup \bar z$ implies componentwise convergence, (3.4) guarantees the existence of $m^{\star} \geqslant m_{1}$ such that $\delta_{m^{\star}}\leqslant c/6$, $\lVert \Pi^{n}(\bar{z} - x_{m^{\star}}) \rVert_{{\mathscr{H}}} \lt c/2$ and $\|x_{m^{\star}}\|_{{\mathscr{H}}} \gt \|\bar z\|_{{\mathscr{H}}} + c$. This implies

and lemma 3.4 yields

Proof of theorem 2.4. By lemma 3.4, corollaries 3.6 and 3.7, condition 2.7 (C1), (C3) and (C4) are fulfilled, while the weakly convergent subsequence condition (C2) follows from the reflexivity of ${\mathscr{H}}$. Hence, all statements follow from theorem 2.8.

4. The case $X = \ell^{p}$: proof of theorem 2.5

In this section we will extend the results in section 3 to the spaces $X = \ell^{p}$, $1\leqslant p \lt \infty$, i.e. we will prove theorem 2.5. Note that theorem 2.5 is an actual generalization of theorem 2.4 since the covariance structure in a Hilbert space can always be 'diagonalized' by choosing an orthonormal eigenbasis of the covariance operator, which is a consequence of the Karhunen–Loève expansion (Sprungk 2017, theorem 2.21). In other words, the Hilbert space case $({\mathscr{H}}, \mu)$ with an arbitrary non-degenerate Gaussian measure µ is equivalent to the case $(\ell^2, \otimes {\mathscr{N}}(0, \sigma_k^2))$, where $\sigma_k^2$ are the corresponding eigenvalues (note that the Cameron–Martin space E respects this equivalence due to (3.1)), and the setting considered in this manuscript corresponds to the canonical generalization from $\ell^2$ to $\ell^p$, $1\leqslant p \lt \infty$.

While our proof strategy is quite similar to the one in (Dashti et al 2013), the strong discrepancy between the geometries of the unit balls in E and $X = \ell^{p}$ for p ≠ 2 poses a strong obstacle when attempting to extract an exponential decay rate out of the ratio $\mathfrak{R}_{\mu}^{\delta}(z,0)$ with fixed δ > 0, similar to the statement of lemma 3.4 in the Hilbert space case.

To see exactly why this is problematic, let us reiterate on the crucial line in the proof of lemma 3.4. We set n = 1 for simplicity, and we focus on the finite-dimensional case (or finite-dimensional approximation to the infinite-dimensional case) which allows to write the integrals with respect to Lebesgue measure. Due to the fact that the Hilbert space norm coincides with an (unweighted) $\ell^2$-norm, we can extract a multiple of the Hilbert space norm out of the integral, where δ > 0, $z\in{\mathscr{H}}$ and r > 0 is a sufficiently small constant:

where the second factor (the ratio of the remaining integrals) can be bounded by 1 due to Anderson's inequality (theorem A.3) under some prerequisites: first, the ambient space norm $\|\boldsymbol{\cdot}\|_{\mathscr{H}}$ needs to be dominated by (a multiple of) the Cameron–Martin norm such that the integrand is integrable—this is also true for the Banach space case, simply by compact embedding of E in X. Second, the function $|\boldsymbol{\cdot}|_E - r \|\boldsymbol{\cdot}\|_{\mathscr{H}}$ needs to be convex. This is trivially the case in the Hilbert space case due to this difference being a positive definite quadratic, but does not generalize to the Banach space case. Indeed, $|\boldsymbol{\cdot}|_E - \beta \|\boldsymbol{\cdot} \|_p$ is not convex for p = 1 and any β > 0. This issue is solved (in the general $\ell^p$ case) by proposition 4.6, which demonstrates how to find functions L such that $|\boldsymbol{\cdot}|_E^2 - \beta L(\boldsymbol{\cdot})$ is convex and L is a suitable surrogate of the ambient space norm $\| \boldsymbol{\cdot} \|_p$, see figure 2 for an illustration.

Figure 2.

Figure 2. Visualization of the 2d case, $X = \ell^1$ and $\mu = {\mathscr{N}}(0,1)\otimes {\mathscr{N}}(0,1)$. Left: Plot of the function $(x_{1},x_{2}) \mapsto x_1^2+x_2^2 - \beta (|x_1|+|x_2|)^2$ for a specific β > 0. The level sets show that this function is non-convex (this is indeed true for any β > 0). Right: Plot of the function $(x_{1},x_{2}) \mapsto x_1^2+x_2^2 - \beta L(x_1,x_2)$ for suitable β, which is seen to be convex.

Standard image High-resolution image

Proposition 4.8 then leverages this result towards a generalization of lemma 3.4 in the $\ell^p$ case, after which the proof of validity of condition 2.7 and subsequently theorem 2.5 is more or less straight-forward.

When working in sequence spaces $X\subseteq \mathbb{R}^{{\mathbb{N}}}$, such as $\ell^{p}$ spaces, one important technique (Dashti et al 2013, Agapiou et al 2018, Ayanbayev et al 2021b) is to consider finite-dimensional approximations of $\mu(B_{\delta}(x))$, $x\in X$. For this purpose, we introduce the following notation:

Assumption 4.1. We consider $X = \ell^{p} := \ell^{p}({\mathbb{N}})$ with $1\leqslant p \lt \infty$ together with $\mu = \otimes_{j\in{\mathbb{N}}} {\mathscr{N}}(0,\sigma_{j}^2)$, a non-degenerate centered Gaussian measure on X with diagonal covariance structure, where $\sigma_{1}\geqslant \sigma_{2} \geqslant \ldots \gt 0$ and $\sum_{j\in{\mathbb{N}}} \sigma_j^p \lt \infty$.

Remark 4.2. The condition $\pmb{\sum}_{j\in{\mathbb{N}}} \boldsymbol{\sigma}_j^p \lt \infty$ is a necessary condition for $\mu(X) = 1$ (i.e. samples $(x_i)_{i\in{\mathbb{N}}}$ are almost surely in $\ell^p$), see (Ayanbayev et al 2021b, lemma B.3).

Notation 4.3. Let assumption 4.1 hold. Define

Further, for $k,K\in{\mathbb{N}} \cup \{0\}$ with K > k define the projection operators $P^{k}\colon \mathbb{R}^{{\mathbb{N}}} \to \mathbb{R}^{k}$, $P_{k}\colon \mathbb{R}^{{\mathbb{N}}} \to \mathbb{R}^{{\mathbb{N}}}$, $P_{k}^{K}\colon \mathbb{R}^{{\mathbb{N}}} \to \mathbb{R}^{K-k}$ and $P^{-k}\colon \mathbb{R}^{k} \to \mathbb{R}^{{\mathbb{N}}}$ by

where $P^{k} := 0$ for k = 0. Accordingly, we define, for any $u\in\mathbb{R}^{k}$ and $v\in\mathbb{R}^{K}$,

  • $\displaystyle \lvert u \rvert_{E^{k}} := \mathop \sum\nolimits_{j = 1}^{k} \sigma_{j}^{-2} u_{j}^{2}, \qquad \lvert v \rvert_{E_{k}^{K}} := \mathop \sum\nolimits_{j = k+1}^{K} \sigma_{j}^{-2} v_{j}^{2}$,
  • $\displaystyle B_{\delta}^{k}(u) := \{w\in\mathbb{R}^{k} \mid \lVert w-u \rVert_{p} \lt \delta \}$,
  • $\displaystyle \mu_{k} = \otimes_{j = 1}^{k} {\mathscr{N}}(0,\sigma_{j}^2)$.

Note that $\frac{1}{2}|\boldsymbol{\cdot}|_{E^k}$ is the negative log density of µk .

Lemma 4.4. If assumption 4.1 holds, then the Cameron–Martin space of $(\ell^p, \mu)$ is given by $E = \{z\in\ell^p:~ |z|_E \lt \infty\}$ where $|z|_E^2 : = \pmb{\sum}_{k = 1}^\infty\frac{z_k^2}{\sigma_k^2}$.

Proof. By (Bogachev 1998, lemma 3.2.2), we may consider µ as a Gaussian measure on a Hilbert space ${\mathscr{H}} \supseteq X$, into which X is continuously and linearly embedded, without changing the Cameron–Martin space or its norm. If $p\leqslant 2$, X is continuously embedded in ${\mathscr{H}} = \ell^{2} \supset X$, since $\lVert \boldsymbol{\cdot} \rVert_{2} \leqslant \lVert \boldsymbol{\cdot} \rVert_{p}$. For p > 2, this can be accomplished by choosing any positive sequence $b \in \ell^{\frac{p}{p-2}}$ and ${\mathscr{H}} := \{x\in\mathbb{R}^{\mathbb{N}} \colon \|x\|_{{\mathscr{H}}}^2 := \sum_{k\in{\mathbb{N}}} b_k x_k^2 \lt \infty\}$, since, by Hölder's inequality,

The Cameron–Martin space and its norm for both X and ${\mathscr{H}}$ are therefore given by the well-known formulas (3.1), see e.g. (Da Prato and Zabczyk 2014, theorem 2.23), proving the claim.

In order to prove theorem 2.5, we will again proceed by showing conditions 2.7 (C1)–(C4) and then applying theorem 2.8. We start by showing the vanishing condition for weak limits outside E (C3), while the vanishing condition for unbounded sequences (C1) and the vanishing condition for weakly, but not strongly convergent sequences (C4) will require some additional work (propositions 4.6 and 4.8).

Lemma 4.5. Under assumptions 2.1 and 4.1, for any family $(x^{\delta})_{0\lt \delta \lt 1}$ in X and for any $\bar z \in X \setminus E$, such that $x^\delta \rightharpoonup \bar z$ converges weakly as $\delta \searrow 0$, we have

In particular, the vanishing condition for weak limits outside E, condition 2.7 (C3), is satisfied.

Proof. We use notation 4.3 throughout the proof. Let $(x^{\delta})_{0\lt \delta \lt 1}$ be a family in X and $\bar z \in X \setminus E$ such that $x^\delta \rightharpoonup \bar z$ converges weakly as $\delta \searrow 0$. Let $0 \lt \varepsilon \lt 1$ be arbitrary and $A := \sqrt{8 \log(2/\varepsilon)}$. We proceed in four steps.

Step 1: There exist $K_{1}\in {\mathbb{N}} $ and $\delta_1 \gt 0$ such that, for each $u\in B_{\delta_{1}}^{K_{1}}({P^{K_1}\bar z})$, $\lvert u \rvert_{E^{K_{1}}} \geqslant A$.

In order to see this, we assume the contrary, i.e. for each $K_{1} \in {\mathbb{N}}$ and $\delta_{1} \gt 0$, there exists $u\in B_{\delta_{1}}^{K_{1}}(P^{K_1}\bar z)$ with $\lvert u \rvert_{E^{K_{1}}} \lt A$. Then, for each $m\in{\mathbb{N}}$ (choosing $K_{1} = m$ and $\delta_{1} = m^{-1}$), there exists $u^{(m)} \in B_{m^{-1}}^{m}(P^m\bar z)$ with $\lvert u^{(m)} \rvert_{E^{m}} \lt A$.

Since $(P^{-m} u^{(m)})_{m\in{\mathbb{N}}}$ is bounded in E by A, it has a weakly convergent (in E) subsequence, which, for simplicity, we also denote by $(P^{-m} u^{(m)})_{m\in{\mathbb{N}}}$, with weak limit $\bar u \in E$. Further, since $u^{(m)} \in B_{m^{-1}}^{m}(P^m\bar z)$ for each $m\in{\mathbb{N}}$, $P^{-m}u^{(m)} \to \bar z$ strongly in X as $m\to\infty$:

By considering each component $j\in{\mathbb{N}}$ separately, weak convergence in E and (strong) convergence in X imply

Hence, by the uniqueness of the limit (in $\mathbb{R}$), we obtain the contradiction $E \ni \bar u = \bar z \notin E$.

Step 2: There exists $0 \lt \delta_2 \lt \delta_{1}/2$ such that, for each $0 \lt \delta \lt \delta_{2}$ and each $u\in B_{\delta}^{K_1}(P^{K_1}x^{\delta})$, we have that $\lvert u \rvert_{E^{K_{1}}} \geqslant A$.

This can be seen as follows: since $x^\delta \rightharpoonup \bar z$ converges weakly (and therefore componentwise) in X, there exists $0 \lt \delta_2 \lt \delta_{1}/2$ such that, for each $0 \lt \delta \lt \delta_{2}$, we have that $\lVert P^{K_1}x^{\delta} - P^{K_1}\bar{z} \rVert_{p} \lt \delta_{1}/2$.

Hence, for each $0 \lt \delta \lt \delta_{2}$ and each $u\in B_{\delta}^{K_1}(P^{K_1}x^{\delta})$,

i.e. $B_{\delta}^{K_1}(P^{K_1}x^{\delta}) \subseteq B_{\delta_{1}}^{K_1}(P^{K_1}\bar z)$ for each $0 \lt \delta \lt \delta_{2}$, and the claim follows from Step 1.

Step 3: There exists $0 \lt \delta^\star \lt \delta_2$ such that, for each $\delta \lt \delta^\star$ and each $u\in B_{\delta}^{K_1}(0)$, we have $|u|_{E^{K_1}} \leqslant A/\sqrt 2$.

This is evident from the fact that $|\boldsymbol{\cdot}|_{E^{K_1}}$ and $\|P^{K_1}\boldsymbol{\cdot}\|$ are equivalent norms on the (finite-dimensional) vector space $P^{K_1}X$.

Step 4: For each $0 \lt \delta \lt \delta^\star$, $\mathfrak{R}_{\mu}^{\delta}(x^\delta,0) \leqslant \varepsilon$, finalizing the proof.

Let $0 \lt \delta \lt \delta^{\star}$. For any $x\in X$, since $B_{\delta}(x) = \bigcap_{k\in{\mathbb{N}}} B_{\delta}^{k} (P^kx) \times \mathbb{R}^{{\mathbb{N}} \setminus \{1,\ldots,k \} }$, the continuity of measures implies that $\mu_{k} (B_{\delta}^{k} (P^kx)) \to \mu(B_{\delta}(x))$. Hence, there exists $k \gt K_{1}$ such that

Since, for any $x\in X$, $\mathbb{R}^k\ni v\in B_{\delta}^{k}(P^kx) $ implies $ P^{K_1}v\in B_{\delta}^{K_{1}}(P^{K_1}x) $, it follows from Steps 2 and 3 that

where we bounded the last ratio of integrals by 1 using Anderson's inequality (theorem A.3).

As explained above, the following proposition implements a convexification of the function $|\boldsymbol{\cdot}|_E - \beta \|\boldsymbol{\cdot} \|_p$, which is necessary for the application of Anderson's inequality in the proof of proposition 4.8:

Proposition 4.6. Using notation 4.3, let $1\leqslant p \lt \infty$, let $k\in {\mathbb{N}}$ and $\rho\in\mathbb{R}^{k}$ with ${\rho_{1} \geqslant \ldots \geqslant} {\rho_{k} \gt 0}$. Further, let γ > 0, let $\beta_{\ast} := \frac{2\gamma^{2-\alpha}}{q\rho_{1}^{\alpha}}$ and let $0 \leqslant \beta \lt \beta_{\ast}$. Then the functions $L_{\rho,\gamma},\, f_{\rho,\beta,\gamma} \colon \mathbb{R}^{k} \to \mathbb{R}$ given by

satisfy

  • (a)  
    $\lVert x \rVert_{p}^{\alpha} - \gamma^{\alpha} \lVert \rho \rVert_{p}^{\alpha} \leqslant L_{\rho,\gamma}(x) \leqslant \lVert x \rVert_{p}^{\alpha}$ for any $x\in\mathbb{R}^{k}$;
  • (b)  
    $f_{\rho,\beta,\gamma}$ is non-negative;
  • (c)  
    $f_{\rho,\beta,\gamma}$ is convex.

Proof. Recall that, for $0\leqslant p_{1} \leqslant p_{2} \lt \infty$, and $v \in \mathbb{R}^{n}$, $n\in{\mathbb{N}}$,

Equation (4.1)

While (a) is trivial for p > 2, it follows for $1\leqslant p \leqslant 2$ directly from the inequalities $a^{q} \leqslant (a+b)^{q} \leqslant a^{q} + b^{q}$ for any $a,b \geqslant 0$ and $q\leqslant 1$, where the second inequality is a consequence of (4.1) for $v = (a,b)$:

For (b), note that, for any $\xi \in \mathbb{R}$, $1\leqslant p \leqslant 2$ and $r,\beta,\tau \gt 0$

which holds true, using Bernoulli's inequality with exponent $p/2 \leqslant 1$, for any $0 \lt \beta \leqslant \frac{2\tau^{2-p}}{pr^{2}}$:

By applying this observation componentwise with $r = \rho_{j}$ and $\tau = \gamma\rho_{j}$, we see that $f_{\rho,\beta,\gamma}$ is (globally) non-negative for any $0 \lt \beta \leqslant \min_{j = 1,\ldots,k} \frac{2\gamma^{2-p}}{p\rho_{j}^{p}} = \frac{2\gamma^{2-p}}{p\rho_{1}^{p}}$, proving (b) for any $1\leqslant p \leqslant 2$ (for β = 0 the claim holds trivially). In the case p > 2, (b) follows from (4.1), since, for any $0 \leqslant \beta \leqslant \rho_{1}^{-2}$,

For (c), first consider the case $1\leqslant p \leqslant 2$, for which the Hessian of $f_{\rho,\beta,\gamma}$ is diagonal. Hence $f_{\rho,\beta,\gamma}$ is convex if and only if all those diagonal entries,

are non-negative functions. Since, for τ > 0, $\xi \in \mathbb{R}$ and $1\leqslant p \leqslant 2$,

Equation (4.2)

$f_{\rho,\beta,\gamma}$ is convex for each $0 \leqslant \beta \lt \min_{j = 1,\ldots,k} \frac{2 \gamma^{2-p}}{p \rho_{j}^{p}} = \frac{2\gamma^{2-p}}{p \rho_{1}^{p}}$ (by applying (4.2) componentwise with $\tau = \gamma\rho_{j}$, $j = 1,\ldots,k$).

Now consider the case $2\lt p\lt \infty$. The second-order partial derivatives of $L_{\rho,\gamma}$ for x ≠ 0 are given by

Hence, the Hessian of $f_{\rho,\beta,\gamma}$ for x ≠ 0 can be written in the form

where $\mathrm{diag}(d_{1},\ldots,d_{k})$ denotes the k×k diagonal matrix with diagonal entries $d_{1},\ldots,d_{k}$ and the functions $g_{j} \colon \mathbb{R}^{k}\setminus \{0\} \to \mathbb{R}$, $j = 1,\ldots,k$, and $h\colon \mathbb{R}^{k}\setminus \{0\} \to \mathbb{R}^{k}$ are given by

Since $\lvert g_{j} \rvert \leqslant 1$, $\nabla^{2} f_{\rho,\beta,\gamma}$ is symmetric and positive definite on the set $\mathbb{R}^k\setminus \{0\}$ for $0 \leqslant \beta \lt \frac{1}{(p-1)\rho_{1}^{2}}$. In order to prove convexity, we show that for any $x,y\in \mathbb{R}^{k}$ and $\lambda \in [0,1]$,

Equation (4.3)

by considering the following three cases:

  • 1.  
    case: $x,y\neq 0$ and the line through x and y does not touch the origin $0\in \mathbb{R}^{k}$. In this case, we can restrict the function $f_{\rho,\beta,\gamma}$ to an open half-space containing x and y, but not containing $0\in\mathbb{R}^{k}$. On this convex set, $f_{\rho,\beta,\gamma}$ is twice continuously differentiable and positive definiteness of the Hessian $\nabla^2 f_{\rho,\beta,\gamma}$ proves convexity, in particular (4.3).
  • 2.  
    case: $x,y\neq 0$ and the line through x and y contains the origin $0\in \mathbb{R}^{k}$. In this case, there exists $\lambda^\star \in (0,1)$ such that $\lambda^\star x + (1-\lambda^\star)y = 0$ and thereby $y = -\frac{\lambda^\star}{1-\lambda^\star}x$. It follows for each $\lambda \in [0,1]$ that
    Since $f_{\rho,\beta,\gamma}(tx) = t^{2} f_{\rho,\beta,\gamma}(x)$ for each $t\in\mathbb{R}$,
    which is a quadratic function in λ with non-negative prefactor $f_{\rho,\beta,\gamma}(x) \gt 0$ (by (b)) and thereby convex. Therefore, we obtain (4.3) from
  • 3.  
    case: x ≠ 0 and y = 0. In this case, (4.3) follows from the previous cases by continuity:

Remark 4.7. Note that this bound on β is not optimal. For example, for n = 2, p = 4 and $\rho_1 = \rho_2 = 1$, we consider here $f_{\rho,\beta,\gamma}(x) = x^2+y^2 - \beta \sqrt{x^4+y^4}$. The lemma from above proves that this function is convex for $\beta \lt \frac{1}{3}$. In fact, it is convex even for $\beta \lt \sqrt2/3$ as can be shown by more elementary methods (exclusive to this low-dimensional setting). Note that in this specific case $f_{\rho,\beta,\gamma}(x)\geqslant 0$ for $\beta \leqslant 1$.

Proposition 4.8. Under assumptions 2.1 and 4.1 and using notation 4.3, for each $0 \lt \delta \lt 1$, each $k \in {\mathbb{N}} \cup \{0 \}$, each γ > 0 and each $z\in X$,

Equation (4.4)

Proof. Let $\beta := \frac{\gamma^{2-\alpha}}{q \sigma_{k+1}^{\alpha}}$. Let $K\in{\mathbb{N}}$ and $\rho = (\sigma_{k+1},\ldots,\sigma_{K})$.

Observe that the function $f: \mathbb{R}^k\to \mathbb{R}$ defined by

is positive, symmetrical, integrable (since $f(u) \leqslant \exp ( - \tfrac14 |u|_{E^{K}}^2 )$ by proposition 4.6(b)) and log-concave (by proposition 4.6(c)). Hence, by propositions 4.6(a) and (c) and Anderson's inequality (theorem A.3),

For any $x\in X$, since $B_{\delta}(x) = \bigcap_{k\in{\mathbb{N}}} B_{\delta}^{k} (P^kx) \times \mathbb{R}^{{\mathbb{N}} \setminus \{1,\ldots,k \} }$, the continuity of measures implies that $\mu_{k} (B_{\delta}^{k} (P^kx)) \to \mu(B_{\delta}(x))$. Therefore, taking the limit $K\to\infty$ proves the claim.

Corollary 4.9. Under assumptions 2.1 and 4.1 the vanishing condition for unbounded sequences, condition 2.7 (C1), is satisfied.

Proof. We use notation 4.3 throughout the proof. Let $(\delta_{m})_{m \in {\mathbb{N}}}$ be a null sequence in $\mathbb{R}^{+}$ and $(x_{m})_{m \in {\mathbb{N}}}$ be an unbounded sequence, i.e. there exists a subsequence $(x_{m_{n}})_{n \in {\mathbb{N}}}$ such that $\lVert x_{m_{n}} \rVert_{p} \to \infty$ as $n\to \infty$. Using notation 4.3 and proposition 4.8 with γ = 1 and k = 0 we obtain

proving the claim.

Corollary 4.10. Under assumptions 2.1 and 4.1 the weakly convergent subsequence condition, condition 2.7 (C2), is satisfied.

Proof. We use notation 4.3 throughout the proof. If p > 1, the statement follows directly from the reflexivity of $X = \ell^{p}$. Now let p = 1, let $(\delta_{m})_{m \in {\mathbb{N}}}$ be a null sequence in $(0,1)$ and $(x_m)_{m\in{\mathbb{N}}}$ be a bounded sequence in X satisfying, for some K > 0 and each $m\in{\mathbb{N}}$, $\mathfrak{R}_{\mu}^{\delta_m}(x_m,0)\geqslant K$.

We first show that $(x_{m})_{m\in {\mathbb{N}}}$ is equismall at infinity, i.e. for every r > 0 there exists $k\in {\mathbb{N}}$ such that, for each $m \in {\mathbb{N}}$, $\|P_k x_{m}\|_1 \lt r$. Assuming the contrary, there exists r > 0 such that, for any $k\in{\mathbb{N}}$, there exist $m_k \in {\mathbb{N}}$ such that $\|P_k x_{m_k}\|_1 \geqslant r$.

If the sequence $(m_{k})_{k\in{\mathbb{N}}}$ was bounded by some $N\in{\mathbb{N}}$, then, using the fact that $\lim_{k\to\infty}\|P_k x\|_1 = 0$ for any (fixed) $x\in X$,

Since this is a contradiction, $(m_{k})_{k\in{\mathbb{N}}}$ is unbounded. Using $\sigma_{k} \searrow 0$ and $\delta_{k} \searrow 0$ as $k\to\infty$, this implies the existence of $k\in{\mathbb{N}}$ such that $\delta_{m_{k}} \leqslant r/8$ and

Using proposition 4.8 with $\gamma := \frac{r}{4\sum_{j\in{\mathbb{N}}} \sigma_j}$ we obtain

contradicting the assumption $\mathfrak{R}_{\mu}^{\delta_m}(x_m,0)\geqslant K$ for each $m\in{\mathbb{N}}$.

Hence, $(x_{m})_{m\in{\mathbb{N}}}$ is equismall at infinity and, combined with its boundedness, this implies the existence of a weakly convergent subsequence of $(x_{m})_{m\in{\mathbb{N}}}$ by (Trèves 1967, theorem 44.2).

Corollary 4.11. Under assumptions 2.1 and 4.1 the vanishing condition for weakly, but not strongly convergent sequences, condition 2.7 (C4), is satisfied.

Proof. We use notation 4.3 throughout the proof. Let $(\delta_{m})_{m \in {\mathbb{N}}}$ be a null sequence in $\mathbb{R}^{+}$ and $(x_{m})_{m \in {\mathbb{N}}}$ be a weakly, but not strongly convergent sequence in X with weak limit $\bar z \in E$,

Step 1: There exists a c > 0 and $k_0\in{\mathbb{N}}$ such that, for any $k\geqslant k_0$,

There exists A > 0 such that $\limsup_{m \to \infty} \lVert x_{m}-\bar{z} \rVert_{X} \gt A$ (otherwise the convergence would be strong). Let $c := \tfrac{A}{2}$. Since $\bar{z} \in E$, we have $\lvert P_{k}\bar{z} \rvert_{E} \to 0$ as $k\to\infty$ by lemma 4.4 and therefore $\lVert P_{k}\bar{z} \rVert_{X} \to 0$ as $k\to\infty$ by continuous embedding $E\subset X$ (Bogachev 1998, proposition 2.4.6). Hence, there exists $k_{0} \in {\mathbb{N}}$ such that, for each $k\geqslant k_{0}$, $\lVert P_{k}\bar{z} \rVert_{X} \lt c$. Let $k\geqslant k_{0}$ and assume the contrapositive, i.e. $\limsup_{m \to \infty} \lVert P_{k}x_{m} \rVert_{X} \leqslant c$. But then, since weak convergence implies componentwise convergence,

which is a contradiction, proving the claim.

Step 2: For each $0 \lt \varepsilon \lt 1$, $\liminf_{m \to \infty} \mathfrak{R}_{\mu}^{\delta_m}(x_m,0) \lt \varepsilon$.

Let $0 \lt \varepsilon \lt 1$, $\delta_{0} := \tfrac{c}{4}$, $\gamma := \tfrac{c}{4S}$ and $k \geqslant k_{0}$ such that

Let $m_{0} \in {\mathbb{N}}$. Using Step 1, there exists $m\geqslant m_{0}$ such that $\delta_{m} \lt \delta_{0} = \tfrac{c}{4}$ and $\lVert P_{k} x_{m} \rVert_{X} \gt c$. Since $\frac{3^{\alpha}-2}{4^{\alpha}} \geqslant \frac{1}{4}$ for $1\leqslant \alpha \leqslant 2$, and by setting $\gamma = \tfrac{c}{4S}$, proposition 4.8 implies

Proof of theorem 2.5. By lemma 4.5, corollaries 4.94.11, condition 2.7 (C1)–(C4) are fulfilled and all statements follow from theorem 2.8.

5. Conclusion

We proved the existence of MAP estimators in the context of a Bayesian inverse problem for parameters in a separable Banach space X, where X is either a Hilbert space or $X = \ell^p$, $p\in [1,\infty)$, with a diagonal Gaussian prior. The Hilbert space case had been proven before by (Dashti et al 2013, Kretschmann 2019), however, they did not show the existence of the central object in their proofs, namely the δ-ball maximizers $z^{\delta} = \mathrm{argmax}_{z\in X} \mu^{y}(B_{\delta}(z))$. We fixed this gap by working with an AMF $(\zeta^{\delta})_{\delta \gt 0} \subset X$ defined by definition 1.7 and strongly simplified their proof by employing (Da Prato and Zabczyk 2002, proposition 1.3.11), restated in proposition 3.2. We decided to present this elegant and simple proof even though the Hilbert space case can be understood as a special case of $X = \ell^{p}$ for p = 2. The case p ≠ 2, on the other hand, turned out to require novel techniques to prove the corresponding results. The crucial mathematical argument in this case relies on a convexification of the difference $|\boldsymbol{\cdot}|_E^2 - \beta \|\boldsymbol{\cdot}\|_X^2$ (proposition 4.6). This allows to extract a suitable 'rate of contraction' such that the ratio $\mathfrak{R}_{\mu}^{\delta}(z,0)$ can be bounded for any fixed δ > 0 by a function decaying exponentially in $\lVert z \rVert_{X}$ (proposition 4.8).

We have also outlined a general proof strategy in section 2 how similar results (i.e. conjecture 2.3) can be obtained for further separable Banach spaces. For this purpose, we filtered out four crucial conditions, namely condition 2.7 (C1)–(C4), which need to be proven in the Banach space of interest, and then the corresponding result follows almost immediately from theorem 2.8.

Note that our results rely strongly on the characteristics of the $\ell^{p}$ norm and the diagonal structure of the covariance matrix of the Gaussian measure. We suspect that the generalization to Gaussian measures on arbitrary separable Banach spaces requires deeper insight into the compatibility between the ambient space's geometry and the Cameron–Martin norm. We hope that our theorem 2.8 paves the way for future research in this direction.

Acknowledgments

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy (EXC-2046/1, Project 390685689) through the Project EF1-10 (IK) and EF1-19 (PW) of the Berlin Mathematics Research Center MATH+. The authors would like to express their gratitude to Birzhan Ayanbayev, Martin Burger, Nate Eldredge, Remo Kretschmann, Hefin Lambley, Han Cheng Lie, Claudia Schillings, Björn Sprungk, and Tim Sullivan for fruitful discussions and pointing out both errors and solution strategies.

Data availability statement

No new data were created or analysed in this study.

Appendix: Gaussian measures in Banach spaces

In notation, we will mainly follow (Bogachev 1998). The continuous (or topological) dual space of X is denoted by $X^\star$, while Xʹ denotes its algebraic dual. In some cases, we will assume that X is a Hilbert space, in which case we write $X = {\mathscr{H}}$ for clarity. The object µ will always be a centered Gaussian measure on X (or ${\mathscr{H}}$). We denote the Cameron–Martin space by $(E,\langle \boldsymbol{\cdot} , \boldsymbol{\cdot} \rangle_{E})$, where we write the Cameron–Martin norm with single bars in order to differentiate it from the ambient space norm: $\lvert u \rvert_{E} := \sqrt{\langle u , u \rangle_{E}}$.

It turns out that the extension of the covariance operator

to the reproducing kernel Hilbert space (RKHS) $X_{\mu}^{\star} := \overline{X^\star}^{L^2(X,\mu)}$ of µ satisfies $R_\mu (X_{\mu}^\star) = E$ (Bogachev 1998, theorem 3.2.3), where E is viewed as a subspace of $(X^{\star})^{^{\prime}}$. In addition, $R_\mu\colon (X_\mu^\star, \langle \boldsymbol{\cdot},\boldsymbol{\cdot} \rangle_{L^{2}(\mu)}) \to (E,\langle \boldsymbol{\cdot},\boldsymbol{\cdot}\rangle_E)$ is an isometric isomorphism (Bogachev 1998, p 60) and satisfies the reproducing property

Equation (A.1)

which follows from the above and from treating $h = R_{\mu}g$ (for some $g\in X_{\mu}^{\star}$) as an element of $(X^{\star})^{^{\prime}}$:

Remark A.1. In the special case where the measure is defined on a Hilbert space ${\mathscr{H}}$, the covariance operator Rµ takes the form of a self-adjoint, non-negative trace-class operator: $R_\mu = Q$ where

In addition, the CM inner product and norm take the form

Equation (A.2)

A result we are going to use in this context is the following technical lemma:

Lemma A.2. Let X be a separable Banach space and µ a centered Gaussian measure on X, $\bar z\in E$ and $x^\delta \rightharpoonup \bar z$ weakly in X. Then

Proof. For any $\hat h\in X^\star$, the Cameron–Martin formula (Bogachev 1998, corollary 2.4.3) implies

Equation (A.3)

where we used Anderson's inequality (theorem A.4) in the last step. Since

due to symmetry of the set $B_\delta(0)$, another application of the Cameron–Martin theorem yields

Equation (A.4)

where we used the inequality $a+a^{-1} \geqslant 2$ for any a > 0 (alternatively, (A.4) can be proven via Jensen's inequality). Since $x^\delta \to \bar z$ weakly in X, it follows from (A.3) and (A.4) that, for any $\hat h\in X^\star$,

where we used the reproducing property (A.1). Choosing a sequence $(\hat h_n)_{n\in{\mathbb{N}}}$ in $X^\star$ such that $R_\mu\hat h_n \to \bar z$ strongly in E (this is possible by density of $X^\star$ in $R_\mu^{-1}E$), replacing $\hat h$ by $\hat h_n$ in the above inequality and taking the limit $n\to\infty$ proves the claim.

Theorem A.3 (Anderson's inequality, version 1; Bogachev  2007, theorem 3.10.25).

Let A be a bounded centrally symmetric convex set in $\mathbb{R}^n$, $n\in{\mathbb{N}}$ and let $f\colon \mathbb{R}^{n} \to \mathbb{R}$ be

  • non-negative and locally integrable,
  • symmetrical, i.e. $f(-x) = f(x)$ for each $x\in\mathbb{R}^{n}$, and
  • unimodal, i.e. the sets $\{f\geqslant c\}$ are convex for all c > 0.

Then, for every $h\in \mathbb{R}^n$ and every $t\in [0,1]$, one has

In particular, for every $z\in\mathbb{R}^n$, $\int_{z+A} f(x)\, \mathrm{d} x\leqslant \int_A f(x)\, \mathrm{d} x.$

Theorem A.4 (Anderson's inequality, version 2; Bogachev  1998, corollary 4.2.3).

Let γ be a centered Gaussian measure on a Banach space X. Let A be a centrally symmetric convex set. Then for any $a\in X$, we have that $\gamma(A+a)\leqslant \gamma(A)$.

Footnotes

  • Note that (Dashti et al 2013, theorem 3.2), restated as theorem 1.5 below, only gives partial answers, since only pairwise comparisons of points lying in E are made, while (Ayanbayev et al 2021a, proposition 4.1) makes the connection between OM minimizers and weak modes (rather than strong modes, which correspond to MAP estimators) under different assumptions.

  • By 'diagonal' we mean that $\mu = \otimes_{k\in{\mathbb{N}}} {\mathscr{N}}(0,\sigma_k^2)$ has a diagonal covariance structure with respect to the canonical basis, while 'nondegenerate' refers to the fact that the eigenvalues of the covariance operator are strictly positive, $\sigma_{k}^{2} \gt 0$ for $k\in{\mathbb{N}}$. Note that Gaussian measures on separable Hilbert spaces can always be diagonalized in this sense by choosing an orthonormal eigenbasis of the covariance operator, see notation 3.1, hence our results constitute a genuine generalization of the Hilbert space case.

  • I.e., there exists a sequence $(\delta_n)_{n\in {\mathbb{N}}}$ with $\delta_n\searrow 0$ such that $\|\zeta^{\delta_n}-\bar z\|_X \to 0$ as $n\to \infty$.

  • This condition corresponds to (Dashti et al 2013, lemma 3.7) and (Kretschmann 2019, lemma 4.11). While this is sufficiently strong for our purposes, namely the proofs of the main theorems 2.4 and 2.5, we actually prove the stronger statement with $\limsup$ in place of $\liminf$ both for Hilbert spaces (corollary 3.6) as well as for $X = \ell^{p}$ (lemma 4.5).

  • This condition corresponds to (Dashti et al 2013, lemma 3.9) and (Kretschmann 2019, lemma 4.13).

Please wait… references are loading.
10.1088/1361-6420/acce60