Strong maximum a posteriori estimation in Banach spaces with Gaussian priors

This article shows that a large class of posterior measures that are absolutely continuous with respect to a Gaussian prior have strong maximum a posteriori estimators in the sense of Dashti et al (2013 Inverse Problems 29 095017). This result holds in any separable Banach space and applies in particular to nonparametric Bayesian inverse problems with additive noise. When applied to Bayesian inverse problems, this significantly extends existing results on maximum a posteriori estimators by relaxing the conditions on the log-likelihood and on the space in which the inverse problem is set.


Introduction
Nonparametric Bayesian models -which have infinite-dimensional parameters such as functions -are increasingly popular in modern statistical practice.For inverse problems, the need for prior information to overcome ill-posedness motivates the use of a Bayesian approach, and the desire for algorithms consistent at every resolution makes the nonparametric approach (as advocated by Stuart, 2010) very appealing.One challenge in nonparametric Bayesian inference is that the posterior is a probability distribution on an infinite-dimensional space, making it difficult to analyse and interpret.
This article studies maximum a posteriori (MAP) estimation in nonparametric Bayesian inverse problems.A MAP estimator is a mode of the posterior: a summary by a "most likely" point under the measure.The usual definition of a MAP estimator as a maximiser of the Lebesgue density is not available when the posterior is a measure on an infinite-dimensional parameter space, so it is common to define modes as the centres of metric balls with asymptotically maximal probability as proposed by Dashti et al. (2013).
This definition allows modes of probability measures to be studied in very general settings, but we restrict attention to posterior measures arising in nonparametric Bayesian inverse problems.In particular, we study the case that µ y is a posterior measure on a separable Banach space X which is absolutely continuous with respect to a Gaussian prior µ 0 and has Radon-Nikodym derivative µ y (dx) = exp −Φ(x) µ 0 (dx). (1.1) The potential Φ : X → R is determined by the structure of the problem of interest and is essentially the negative log-likelihood of the statistical model.Any map Φ satisfying mild regularity conditions (Theorem 2.8) yields a well-defined probability measure µ y .
A classical example giving rise to such a posterior is the nonlinear inverse problem of inferring a parameter x ∈ X, which is typically a function, from a noisy observation y ∈ Y given a Gaussian prior µ 0 for x, with y = G(x) + ξ. (1. 2) The observation operator G : X → Y is a measurable map relating the unknown x ∈ X with the idealised observation G(x) ∈ Y , which is often assumed to have finite dimension.This observation is then corrupted by the additive random noise ξ taking values in Y .Under appropriate regularity conditions on G and ξ, the posterior µ y for the conditional distribution x | y is given by Bayes' rule and has the form (1.1) for some potential Φ( • ; y) : X → R (Theorem 2.7).
In the Bayesian inverse problems literature, Dashti et al. (2013) developed the notion of a strong mode (Definition 2.1) to define MAP estimators in the nonparametric setting and proved that strong modes exist when X is a separable Hilbert space under mild assumptions on the potential Φ.They also showed that strong modes coincide with minimisers of an Onsager-Machlup (OM) functional (Definition 2.4) for the posterior in this setting, connecting their approach with previous work on most-likely paths of diffusion processes.For Bayesian inverse problems with additive noise as in (1.2), the OM functional can be viewed as a Tikhonov-regularised misfit functional, so the variational solution to an inverse problem -that is, the minimiser of the Tikhonov functionalcan be viewed as a MAP estimator for a fully Bayesian approach.This connection is a significant driver for the development of the nonparametric mode theory described here.
As pointed out by Klebanov and Wacker (2023), although Dashti et al. (2013) stated their results in the Banach setting, technical complications limit their proof strategy to Hilbert spaces, and some additional results are needed to complete the proof even in the Hilbert case, as described by Kretschmann (2019).
Recent work by Klebanov and Wacker extended the existence result to the sequence spaces X = ℓ p (N; R), 1 p < ∞, for Gaussian priors with diagonal covariance structure with respect to the canonical basis, i.e. µ 0 = n∈N N (0, σ 2 n ) for some (σ n ) n∈N ∈ ℓ p (N; R).This article proves the existence of strong modes for posteriors of the form (1.1) defined on a separable Banach space with any Gaussian prior, as originally claimed by Dashti et al. (2013, Theorem 3.5), and shows that strong modes are equivalent in this setting to the other types of small-ball modes present in the literature: the weak mode (Definition 2.1) and the generalised strong mode (Definition 2.3).
Theorem 1.1.Let X be a separable Banach space equipped with a centred nondegenerate Gaussian prior µ 0 .Let µ y be the corresponding Bayesian posterior of the form (1.1) for some continuous potential Φ : X → R, and suppose that, for each η > 0, there exists X for all x ∈ X. (1.3) Then: (a) µ y has a strong mode, i.e. a strong MAP estimator, and any strong mode lies in the Cameron-Martin space of µ 0 ; (b) strong modes, generalised strong modes, weak modes and minimisers of an OM functional for µ y coincide.
The conditions imposed on the potential are weaker than those used by Dashti et al. (2013) and Klebanov and Wacker (2023), who assumed that the potential was globally bounded below and locally Lipschitz.As pointed out by Kretschmann (2019Kretschmann ( , 2023)), a global lower bound on Φ excludes Bayesian inverse problems with observations corrupted by additive white noise or Laplacian noise, and the proof of Dashti et al. (2013) can be extended to handle these cases using a less restrictive lower bound.These cases can also be treated under the yet weaker conditions used here, which are similar to the assumptions used by Stuart (2010, Assumption 2.6) in developing a well-posedness theory for nonparametric Bayesian inverse problems.

Outline
Section 2 defines the small-ball modes used in this paper in the general setting of a metric space and states some essential results, including the strong-weak dichotomy (Lemma 2.2) which appears to be new to the literature.This section also recalls properties of Gaussian measures used throughout the article and briefly outlines the motivating application of Bayesian inverse problems.
Section 3 states the main estimate (Proposition 3.1) needed to prove Theorem 1.1, which can be viewed as an analogue of the explicit Anderson inequality of Dashti et al. (2013, Lemma 3.6).This is then used to establish the M -property for Gaussian measures on a separable Banach space (Corollary 3.3), which was until now known rigorously only for special cases such as separable Hilbert spaces and ℓ p spaces equipped with diagonal Gaussian measures.
Section 4 uses the tools developed in the previous section to study MAP estimators for Bayesian posteriors of the form (1.1).First, it states a short proof for the existence of weak modes using the M -property.Then, the bound in Proposition 3.1 is used to show that any asymptotic maximising family (Definition 4.2) for the posterior has a limit point (Lemma 4.4).Lemma 4.5 shows that such a point must be a strong mode, extending a previous proof of Klebanov and Wacker (2023) to the Banach case, and this completes the proof of Theorem 1.1.
Section 5 studies consistency theory for MAP estimators of Bayesian inverse problems of the type (1.2).Using Theorem 1.1, the consistency results of Dashti et al. (2013) are extended to apply in any separable Banach space X (Theorem 5.1, Theorem 5.2).
Section 6 gives some concluding remarks and suggests directions for future research.

Preliminaries and related work
For most of the paper, X will be a separable real Banach space, although some definitions and preliminary results in this section will be given in the more general case that X is a metric space.
In any metric space, the closed ball of radius r will be denoted B r (x).We consider only Borel measures and denote the set of Borel probability measures on X by P(X).When X is separable, the topological support is nonempty (Aliprantis and Border, 2006); this ensures that the quantity M r defined in (2.2) is strictly positive for all r > 0.

Mode theory
As mentioned in the introduction, the small-ball mode theory has been developed largely in the Bayesian inverse problems literature.Strong modes were proposed by Dashti et al. (2013), and weak modes were later suggested by Helin and Burger (2015) as a more convenient definition when connecting MAP estimators with variational solutions to inverse problems.Following Ayanbayev et al. (2022a), we consider only global weak modes in this article.
Definition 2.1.Let X be a metric space and let µ ∈ P(X).A weak mode of µ is any point x ⋆ ∈ supp(µ) such that, for all x ∈ X, Suppose also that X is separable.Then a strong mode of µ is any point The modes of a posterior measure µ y will also be called MAP estimators.The difference between the two definitions (2.1) and (2.2) amounts to the order in which the supremum is taken: a weak mode must have asymptotically greater mass when compared to every other point individually, whereas a strong mode must asymptotically have the supremal ball mass.All strong modes are weak modes, because if x ⋆ is a strong mode and x ∈ X, then lim sup Lie and Sullivan (2018) proved that the converse may be false: there exist measures which have only weak modes and no strong modes.While the literature on modes largely treats "strong" or "weak" as a property of the mode itself, one should really think of "strong" or "weak" as a global regularity condition on the measure, because either all modes of a measure are strong or none of them are strong, as the following result shows.
Lemma 2.2 (Strong-weak dichotomy for modes).Let X be a separable metric space.If µ ∈ P(X) has a strong mode, then all weak modes of µ are strong modes.
Proof.Suppose that x ⋆ is a strong mode and y ⋆ is a weak mode.As both x ⋆ and y ⋆ are weak modes, the definitions imply that 1.
An application of the product rule for limits shows that y ⋆ must also be a strong mode: Clason et al. (2019) proposed the generalised strong mode, motivated by inverse problems with hard parameter constraints (in the spirit of Ivanov regularisation) which lead to a posterior assigning zero mass outside of some feasible set.
Definition 2.3.Let X be a separable metric space.A generalised strong mode of µ ∈ P(X) is any point x ⋆ ∈ X such that, for each sequence (r n ) n∈N → 0, there exists ( Taking the constant sequence x n = x ⋆ in the definition shows that a strong mode x ⋆ is also a generalised strong mode.Unlike strong and weak modes, generalised strong modes need not lie in the support of the measure.Furthermore, there is no strong-generalised strong dichotomy or weak-generalised strong dichotomy analogous to Lemma 2.2: for the measure on R with Lebesgue density ρ(x) = 1{x ∈ [0, 1]}, any x ∈ (0, 1) is a strong mode (and hence a weak mode), but the points x = 0 and x = 1 are only generalised strong modes and are neither strong modes nor weak modes.
An alternative approach to find "most likely" points is to minimise an OM functional associated with the measure of interest.This arises from the study of most-probable paths of diffusion processes (Dürr and Bach, 1978).
Definition 2.4.Let X be a metric space and let µ ∈ P(X).Suppose that ∅ = E ⊆ supp(µ).A function I : E → R is called an Onsager-Machlup functional for µ if, for all x, x ′ ∈ E, OM functionals are unique up to additive constants and can be interpreted heuristically as the negative logarithm of the Lebesgue density -but this cannot be taken literally for measures on an infinite-dimensional space, where there is no Lebesgue measure.For example, an OM functional for a Gaussian measure on an infinite-dimensional Banach space can be defined only on a small subspace called the Cameron-Martin space (see (2.6)).As an OM functional need not be defined on the entire space X, it is not immediate that an OM minimiser is in any sense "most likely" under the measure µ, and this is the motivation to study small-ball modes as in Definition 2.1 instead.A weak mode is always a minimiser of any OM functional for µ, however, and the Mproperty of Ayanbayev et al. (2022a) gives a sufficient condition to ensure that an OM minimiser is a weak mode.
Definition 2.5 (M -property).Let X be a metric space and let µ ∈ P(X).Property M (µ, E) holds for the set = 0 for all x / ∈ E.
The next result states this equivalence between OM minimisers and weak modes under the M -property and shows that the M -property is inherited by a posterior of the form (1.1) from the prior.This generalises Proposition 4.1 and Lemma B.8 of Ayanbayev et al. (2022a) to potentials that are merely continuous rather than locally uniformly continuous.In the specific case that µ 0 is a Gaussian measure on a separable Banach space X, the claim (a) generalises Theorem 3.2 of Dashti et al. (2013), which requires that the potential is locally bounded and Lipschitz.
Proposition 2.6.Let X be a metric space and suppose that µ 0 ∈ P(X) has OM functional I 0 : E → R. Suppose that property M (µ 0 , E) holds and that µ y is a probability measure on X of the form (1.1) for some continuous potential Φ : X → R. Then: (a) µ y has OM functional Proof.Let x ∈ X and x ′ ∈ supp(µ 0 ).As the density exp(−Φ) is strictly positive, x ′ ∈ supp(µ y ) and thus µ y (B r (x ′ )) > 0 for all r > 0. By the continuity of Φ, for each ε > 0 there exists Hence, for r < δ, it follows that Property M (µ y , E) follows immediately from (2.3) by choosing x / ∈ E, x ′ ∈ E and taking the lim sup as r → 0. To obtain the OM functional I y , suppose instead that x, x ′ ∈ E; then by (2.3) and using the OM functional I 0 for µ 0 , lim sup By deriving a lower bound analogous to (2.3) using the continuity of Φ and taking the lim inf as r → 0, we obtain the inequality As ε > 0 is arbitrary this proves that I y (u) = I 0 (u) + Φ(u).
The claim in (b) is an immediate consequence of Ayanbayev et al. (2022a, Proposition 4.1).
Thus, when property M (µ y , E) holds, one can view I y as an extended-real-valued function with value +∞ outside E. This interpretation is not valid if the M -property does not hold and one can say very little about the behaviour of µ y on balls centred outside of E using an OM functional in this case.
While this article considers only Gaussian priors, MAP estimators have also been studied for Bayesian inverse problems with Besov and Cauchy priors (Agapiou et al., 2018;Ayanbayev et al., 2022b).Besov and Cauchy priors are typically constructed as product measures placing full mass on a Banach subspace of R ∞ , and the product structure of R ∞ makes finite-dimensional approximation arguments possible.As an arbitrary Banach space need not have such product structure, we instead exploit the fact that a Gaussian measure is fully determined by its behaviour on a Hilbert subspace (the Cameron-Martin space) whose geometry is much more convenient to work with.

Gaussian measures
This section summarises the properties of Gaussian measures used in the article; see the monograph of Bogachev (1998) for a thorough introduction to Gaussian measures.If X is a separable Banach space, a measure γ ∈ P(X) is Gaussian if the pushforward γ • f −1 is a Gaussian measure on R for every f lying in the topological dual X * .The measure γ is centred if it has mean zero and nondegenerate if it has full support, i.e. supp(γ) = X; we assume that γ is always centred and nondegenerate in the remainder of the article.
The reproducing-kernel Hilbert space (RKHS) X * γ of γ is the L 2 (γ)-closure of X * , and the covariance operator As X is separable, the measure γ is Radon and thus R γ (f ) is representable by an element of X for any f ∈ X * γ (Bogachev, 1998, Theorem 3.2.3).The image of R γ in X is called the Cameron-Martin space E ⊂ X.It is a separable Hilbert space under the Cameron-Martin inner product . The Cameron-Martin space of a Radon Gaussian measure γ is compactly embedded in X, i.e. there exists C > 0 such that and the inclusion ι : E → X is a compact operator (Bogachev, 1998, Corollary 3.2.4).In particular, any E-weakly convergent sequence is mapped by ι to an X-strongly convergent sequence.
The covariance operator R γ : X * γ → E is a Hilbert isometric isomorphism between the RKHS (equipped with the L 2 (γ)-inner product) and the Cameron-Martin space (equipped with the Cameron-Martin inner product).
The Cameron-Martin space for γ is precisely the set of all directions h ∈ X for which the shifted measure γ h ( • ) := γ( • − h) is absolutely continuous with respect to γ.The space E has γ-measure zero, but if γ is nondegenerate then E is dense in X.When h ∈ E, the density of the shifted measure γ h with respect to γ is given by the Cameron-Martin formula (Bogachev, 1998, Corollary 2.4.3), (2.5) If h / ∈ E, then the measures γ and γ h are mutually singular by the Feldman-Hájek theorem (Bogachev, 1998, Theorem 2.7.2).
A centred Gaussian measure γ has OM functional which is defined only on the Cameron-Martin space E. Property M (γ, E) is known to hold when X is a separable Hilbert space, as proven by Dashti et al. (2013, Corollary 3.8) and Ayanbayev et al. (2022a, Corollary 5.2), and when X = ℓ p , 1 p < ∞, provided that γ has diagonal covariance structure (Klebanov and Wacker, 2023, Lemma 4.5).The measure γ also satisfies Anderson's inequality (Bogachev, 1998, Theorem 2.8.10): γ(B r (x)) γ(B r (0)) for any x ∈ X and r > 0. (2.7) Gaussian measures do not charge the boundaries of metric balls, i.e. γ(∂B r (x)) = 0 (see e.g.Agapiou et al., 2018, Lemma 6.1), so it would be equivalent to use open balls in any of the results in this article.
The tail behaviour of a Gaussian measure is described by Fernique's theorem (Fernique, 1970), and this is the chief reason for the lower bound (1.3) on the potential needed in Theorem 1.1.Fernique's theorem states that for any Gaussian measure γ on a separable Banach space X, there exists η > 0 such that In the rest of the article, γ will denote a centred nondegenerate Gaussian measure and the prior measure µ 0 will always be a centred nondegenerate Gaussian; in either case, E will denote the corresponding Cameron-Martin space.

Bayesian inverse problems
Ill-posed inverse problems are challenging to solve and require the use of prior information about the solution x to restore the well-posedness of the problem.The motivating example in this article is the nonlinear inverse problem of recovering an infinite-dimensional parameter (e.g. a function) x ∈ X from a noisy observation of the finite-dimensional quantity y = G(x), as discussed in the introduction.
Well-posedness is essential to allow for numerical solution of inverse problems, and the classical approach to restoring well-posedness uses regularisation (see e.g.Benning and Burger, 2018): a variational solution to the inverse problem (1.2) is a minimiser of the Tikhonov functional where • ′ is some norm penalising undesirable properties of the solution x, e.g. the total-variation norm of the function x (Rudin et al., 1992).
In contrast, the Bayesian approach incorporates prior information using a prior measure on the solution space.As stated in the next theorem, under mild conditions on the prior µ 0 and on the potential Φ arising from the observation operator G, an analogue of Bayes' rule gives an expression for the posterior for x | y on the infinite-dimensional parameter space.
Theorem 2.7 (Dashti and Stuart, 2017, Theorem 14).Let X and Y be separable Banach spaces and suppose that G : X → Y is measurable.Suppose that x has prior distribution µ 0 ∈ P(X) and where ξ is random noise with distribution τ 0 ∈ P(Y ), which is assumed to be independent of x.Suppose that the translated measure τ ) is absolutely continuous with respect to τ 0 for µ 0 -almost all x ∈ X and define the potential Suppose further that Φ : X × Y → R is measurable with respect to the product measure µ 0 ⊗ τ 0 , and that for τ 0 -almost all y ∈ Y , Z(y) := X exp −Φ(x; y) µ 0 (dx) > 0. (2.8) Then the conditional distribution µ y of x | y exists, is absolutely continuous with respect to µ 0 , and exp −Φ(x; y) . (2.9) The data y ∈ Y will be considered fixed and we suppress the explicit dependence on y; thus, the potential is a map Φ : X → R. When y has finite dimension and τ 0 is absolutely continuous with respect to the Lebesgue measure, τ G(x) is absolutely continuous with respect to τ 0 and Φ(x) can typically be interpreted as a misfit functional: when ξ has mean-zero Gaussian distribution ξ ∼ N (0, Σ) on Y = R d , for example, one can take (2.10) By absorbing the normalisation factor 1 Z(y) into Φ, the posterior (2.9) can be expressed in the form (1.1) discussed in the introduction.
To ensure that the posterior measure is normalisable for a given potential Φ : X → R, i.e. is a probability measure, we impose mild conditions on the form of the potential Φ in Theorem 1.1.If the measure µ y does indeed arise from an inverse problem as in Theorem 2.7, the following result is merely a sufficient condition to ensure that Z(y) > 0 in (2.8).
Theorem 2.8 (Stuart, 2010, Theorem 4.1).Suppose that the potential Φ : X → R is continuous and that for each η > 0, there exists a constant K(η) ∈ R such that X for all x ∈ X. (2.11) Then the posterior measure µ y given by (1.1) can be normalised to yield a probability measure.
Proof.Given the unnormalised density exp(−Φ), one can normalise to obtain a probability measure with density exp(−Φ ′ ) by setting Φ ′ = Φ − log Z with the finite normalisation constant where the upper bound follows by applying (2.11) with an appropriate η > 0 such that the integral is finite by Fernique's theorem.
As discussed, a significant reason for studying MAP estimators is that they connect the Bayesian and variational approaches to inverse problems.When the M -property holds, the weak MAP estimators of a Bayesian inverse problem coincide with minimisers of an OM functional, and when a Gaussian prior is used, an OM functional for the posterior has the form of a Tikhonov functional (see e.g.Dashti et al., 2013).This correspondence depends on the M -property, which until now has been shown only for Gaussian measures on separable Hilbert spaces and for diagonal Gaussian measures on X = ℓ p , 1 p < ∞.This article therefore extends the connection between Bayesian and variational approaches to Banach spaces.

Small-ball probabilities for Gaussian measures in Banach spaces
The main technical result required for the proof of Theorem 1.1 is the following bound on the ratio of the measures of small balls under a Gaussian measure stated in Proposition 3.1.This bound is similar in spirit to the explicit Anderson inequality of Dashti et al. (2013, Lemma 3.6), which takes the form when γ is a centred nondegenerate Gaussian measure on the separable Banach space X, a = a(γ) > 0, x ∈ X and r > 0. Both (3.1) and the bound we prove in Proposition 3.1 may be thought of as quantitative analogues of the Anderson inequality (2.7).In contrast to the inequality (3.1), which is written in terms of the ambient norm of the Banach space, the result here is written in terms of the decentring function (Ghosal and van der Vaart, 2017) given by We will show in Proposition 3.1 that the infimum in the decentring function is attained by some point h ⋆ ∈ E, justifying the use of a minimum instead.When X is a separable Hilbert space, the Cameron-Martin norm can be viewed as a reweighting of the norm of X and the bound (3.1) in X-norm suffices to prove the desired results on MAP estimators.In a Banach space, however, this is no longer true -thus, writing the bound in terms of the Cameron-Martin norm is a natural generalisation, with the compact embedding (2.4) providing the means to relate the two norms.
Proposition 3.1 (Explicit Anderson inequality in Cameron-Martin norm).Let X be a separable Banach space equipped with a centred nondegenerate Gaussian measure γ.For any x ∈ X and r > 0, Proof.This is an immediate corollary of Ghosal and van der Vaart (2017, Proposition 11.19), and we give a version of the proof here.The set E ∩ B r (x) is nonempty (as γ is nondegenerate), E-closed (as it is the preimage of B r (x) under the continuous embedding ι : E → X) and convex.This implies that E ∩ B r (x) is E-weakly closed.Hence, the E-weakly lower semicontinuous map h → h 2 E defined on E ∩ B r (x) attains its minimum on some h ⋆ = R γ g ⋆ ∈ E. The Cameron-Martin formula (2.5) gives the equality and we now show that g ⋆ (u) 0 for γ-almost all u ∈ B r (x − h ⋆ ).As E ∩ B r (x) is convex and h ⋆ minimises the E-norm on E ∩ B r (x), it follows that Rearranging and taking limits as λ → 0 shows that Hence, there is a subsequence (g n k ) k∈N converging pointwise γ-almost everywhere to g ⋆ .By Bogachev (1998, Theorem 3.5.1),γ-almost all elements u ∈ B r (x − h ⋆ ) may be written as where the convergence of the series is in the norm of X.Hence, for all n sufficiently large and Using (3.3), we observe that so it immediately follows that g n (u) 0. As g n k (u) → g ⋆ (u) as k → ∞ γ-almost everywhere, we obtain the claimed lower bound g ⋆ (u) = lim k→∞ g n k (u) 0 for γ-almost all u ∈ B r (x − h ⋆ ).The result follows by bounding the integrand in (3.2) and using Anderson's inequality (2.7): Though we shall not make use of (3.1), it can be proven easily from Proposition 3.1 by applying the compact embedding (2.4).
The following corollary on the measure of balls with centres converging to some x ⋆ ∈ E is slightly weaker than the corresponding results of Kretschmann (2019, Lemma 4.14) and Klebanov and Wacker (2023, Lemma A.2), but it is sufficient for our purposes.The proof stated here takes advantage of the bound developed in Proposition 3.1.
Proof.Construct the sequence (h n ) n∈N ⊂ E by selecting a minimiser (which exists as argued in the proof of Proposition 3.1) of h → h 2 E from E ∩ B rn (x n ).Using the OM functional I defined by (2.6) for γ, which satisfies I(0) = 0, and by applying the upper bound from Proposition 3.1, we may write If (h n ) n∈N has no E-bounded subsequence, then the claim follows immediately as the limit on the right-hand side is zero.Otherwise, pass to an E-bounded subsequence and, by reflexivity of E, pass to a further E-weakly convergent subsequence which we do not relabel.Since ( r n , and by the compact embedding of E in X, the E-weak limit of (h n ) n∈N must agree with the X-strong limit.Hence, (h n ) n∈N ⇀ x ⋆ weakly in E, and as the Cameron-Martin norm is E-weakly lower semicontinuous, lim sup The next result establishes a technical approximation condition for sequences in X by elements of E, which is useful in combination with Proposition 3.1, and applies it to establish property M (γ, E) for Gaussian measures on Banach spaces.As discussed in Section 2, this extends previous results which establish the M -property when X is a separable Hilbert space or when X = ℓ p , 1 p < ∞, and γ is a diagonal Gaussian measure.In particular, this is a natural analogue for Banach spaces of Corollary 3.8 of Dashti et al. (2013), which proves the M -property in separable Hilbert spaces.
Corollary 3.3.Let X be a separable Banach space equipped with a centred nondegenerate Gaussian measure γ.
(a) Let (r n ) n∈N → 0 and Proof.(a) By hypothesis, there must exist a subsequence (r n k ) k∈N and a sequence ( ∈ E. The constant sequence (x) n∈N cannot have a limit point in E, so by (a), for any sequence (r n ) n∈N → 0, lim n→∞ min Thus, by Proposition 3.1, the M -property holds because lim sup 4 Existence of MAP estimators

Weak MAP estimators
With the M -property established for Gaussian measures on a separable Banach space, it is now possible to provide a short proof of the existence of weak MAP estimators for Bayesian posteriors of the form (1.1).One could prove the existence of strong MAP estimators directly, as in Dashti et al. (2013), and use the fact that all strong modes are weak modes, but it is instructive to prove the existence of weak MAP estimators separately.Though weak modes were not proposed until the work of Helin and Burger (2015), Corollary 3.8 of Dashti et al. (2013) already proved what is now called the M -property for Gaussian priors on Hilbert spaces, taking an important step towards showing the existence of weak modes.By Proposition 2.6, it is sufficient to minimise the posterior OM functional I y , and it is well known that I y does indeed have a minimiser (see e.g.Stuart, 2010, Theorem 5.4) under mild conditions.
In particular, we only require coercivity of I y in E to obtain weak modes rather than the lower bound on Φ needed in Theorem 1.1.It is important to note that without the lower bound on Φ, it may not be possible to normalise the measure defined in (1.1) as Theorem 2.8 need not hold; the following result considers only measures which can be normalised.
Observe also that the hypotheses of Theorem 1.1 always imply E-coercivity of I y : using the compact embedding (2.4) of E in X and the lower bound (1.3) gives and selecting η > 0 sufficiently small ensures that 1 2 − C 2 η > 0. It is not clear whether coercivity is sufficient to obtain a strong mode, and this question is left to future work.Proposition 4.1 (Weak MAP estimators for Bayesian posteriors with Gaussian priors).Let X be a separable Banach space and let µ 0 be a centred nondegenerate Gaussian measure.Suppose that µ y is a probability measure of the form (1.1) for some continuous potential Φ : X → R. Suppose also that the posterior OM functional I y (u) := Φ(u) + 1 2 u 2 E is E-coercive, i.e. there exists A ∈ R and c > 0 such that or equivalently, using the definition of I y , Then µ y has a weak mode.
Proof.The prior µ 0 has OM functional I 0 (u) = 1 2 u 2 E as described in (2.6) and Corollary 3.3 proves that property M (µ 0 , E) holds.Hence, by Proposition 2.6, the posterior has OM functional I y (u) = 1 2 u 2 E + Φ(u) and property M (µ y , E) holds, and furthermore weak modes coincide with minimisers of I y .It remains to show that I y does have a minimiser.
First, note that Φ is E-weakly continuous: if u n ⇀ u weakly in E, then by the compact embedding u n → u strongly in X and thus Φ(u n ) → Φ(u) by strong continuity of Φ in X.As the E-norm is also clearly weakly lower semicontinuous, the OM functional I y must be E-weakly lower semicontinuous.As I y is also coercive, it has a minimiser in E by the direct method of the calculus of variations: take a sequence (h n ) n∈N ⊂ E with I y (h n ) < inf u∈E I y (u) + 1 n , and observe that it is E-bounded by coercivity; passing to an E-weakly convergent subsequence with limit h ⋆ and using the weak lower semicontinuity of I y proves that h ⋆ is a minimiser of I y .This minimiser is a weak mode by Proposition 2.6.

Strong MAP estimators
We now prove the main theorem on the existence of strong MAP estimators.The strategy of the proof is similar in spirit to the prior work of Dashti et al. (2013), Kretschmann (2019Kretschmann ( , 2023) ) and Klebanov and Wacker (2023).
In the proof of Dashti et al. (2013), the explicit Anderson inequality (3.1) is first used to show that any family (x ⋆ r ) r>0 of maximisers of the posterior radius-r ball mass x → µ y (B r (x)) must be bounded in X under some regularity assumptions on Φ. Next, a weakly convergent subsequence is extracted, and Lemma 3.7 and Lemma 3.9 of Dashti et al. (2013) can be used to show that if the limit is not in E or the convergence is not strong, then This yields a contradiction because the assumptions on Φ mean this ratio cannot converge to zero, showing that the limit point lies in E and convergence is strong in X.Finally, this limit point is shown to be both a strong MAP estimator and an OM minimiser.Klebanov and Wacker (2023) point out that it is not obvious that the radius-r maximisers exist and show that the proof can be adapted to use a family (x r ) r>0 of "approximate maximisers" nearly attaining the supremal radius-r mass instead.Klebanov and Wacker (2023) call such a family an asymptotic maximising family.Definition 4.2.Let X be a metric space.An asymptotic maximising family (AMF) for µ ∈ P(X) is a net (x r ) r>0 ⊂ X such that, for some increasing function ε Every measure has at least one AMF, though in general there may not exist any point x ⋆ r ∈ X such that µ(B r (x ⋆ r )) = M r .If X is a Hilbert space, then the radius-r maximisers x ⋆ r do always exist (Lambley and Sullivan, 2023, Corollary A.9), but we will use AMFs to avoid further discussion about these maximisers.The next result summarises the connection between AMFs and small-ball modes, which is explored in greater detail by Lambley and Sullivan (2023, Theorem 4.11).
Proposition 4.3.Let X be a separable Banach space and suppose that µ ∈ P(X).
(a) Suppose that x ⋆ is a generalised strong mode for µ.Then x ⋆ is a limit point of some AMF (x r ) r>0 ⊂ X.(b) Suppose that (x r ) r>0 ⊂ X is an AMF for µ which converges to x ⋆ along every subsequence.
Then x ⋆ is a generalised strong mode.
Proof.Pick any sequence (r n ) n∈N → 0 and choose a corresponding sequence (x rn ) n∈N → x ⋆ from the definition of a generalised strong mode (Definition 2.3).Selecting any AMF (x r ) r>0 with this subsequence (x rn ) n∈N proves the first claim.For the second claim, let ε : [0, ∞) → [0, 1) denote the function corresponding to the AMF (x r ) r>0 ; for any sequence (r n ) n∈N → 0, it follows by definition that proving that x ⋆ is a generalised strong mode.
Aside from the issues associated with radius-r maximisers, the proof of Dashti et al. (2013) omits some technical results which were later proved by Kretschmann (2019).Klebanov and Wacker (2023) argue that the proof also relies on several properties that do not hold in an arbitrary separable Banach space X.To give just one example, the step passing from a bounded sequence to a weakly convergent subsequence requires additional hypotheses, e.g.reflexivity of X.
To resolve this, Klebanov and Wacker (2023) first establish the proof when X is a separable Hilbert space.In this setting, any Gaussian measure γ is characterised by its mean m ∈ X and covariance operator C : X → X, so by working in an eigenbasis of C, one can reduce to the case X = ℓ 2 (N; R) with µ 0 = n∈N N (0, σ 2 n ), with the Cameron-Martin norm given by a simple reweighting of the ℓ 2 -norm.Klebanov and Wacker (2023) then extend to the case X = ℓ p (N; R), 1 p < ∞, with µ 0 = n∈N N (0, σ 2 n ); unlike in the Hilbert case, not all Gaussian measures on ℓ p can be expressed in this product form.Even this generalisation is nontrivial since the Cameron-Martin norm can no longer be expressed as a reweighting of the X-norm.This motivates a technical convexification argument to bridge the gap between the two norms, making use of the diagonal structure of the prior to write the E-norm in terms of the canonical sequence-space basis.It is challenging to generalise this approach further given the heavy dependence on the diagonal structure.
We overcome this difficulty by using the explicit Anderson inequality of Proposition 3.1.As discused in Section 3, this is more natural than the bound (3.1) used in prior work because the behaviour of γ is fully determined by its Cameron-Martin space, and the Cameron-Martin space has more favourable topological properties.Proposition 3.1 first allows us to show that any AMF is bounded in X, and Corollary 3.3 shows that any AMF is closely approximated in X by a sequence bounded in E. This sequence has an E-weakly convergent subsequence regardless of the choice of X, and applying the compact embedding of E in X yields strong convergence of this subsequence in X.
This approach avoids the need to explicitly prove Lemma 3.7 and Lemma 3.9 of Dashti et al. (2013), since the necessary claims can be derived directly from Proposition 3.1 and Corollary 3.3.
We will later show in Lemma 4.5 that a limit point of an AMF is a strong mode for the Bayesian posterior µ y ; combining this result with the existence of limit points proven in the following result completes the proof of Theorem 1.1.
Lemma 4.4 (Limit points of AMFs for Bayesian posteriors).Under the assumptions of Theorem 1.1, if (x r ) r>0 ⊂ X is an AMF for µ y , then: (a) any limit point of (x r ) r>0 lies in E; (b) the net (x r ) r>0 has at least one limit point.
Proof.Fix any decreasing sequence (r n ) n∈N → 0. By Proposition 3.1 and the compact embedding (2.4), we have On the other hand, let ε be the function corresponding to the AMF (x r ) r>0 ; using the lower bound (1.3) on Φ and picking δ > 0 from the definition of continuity such that |Φ(x) − Φ(0)| < 1 for |x| < δ, we see that for all n such that r n < δ, the following lower bound holds: This inequality gives and combining this bound with (4.1) yields This implies that the sequence (x rn ) n∈N is bounded: if it were not, then setting η < 1 2C 2 would give the contradiction As (x rn ) n∈N is bounded, (4.2) implies that there is a constant L > 0 such that Thus, again using the upper bound provided by Proposition 3.1, we see that It then follows from Corollary 3.3 that (x rn ) n∈N has a further subsequence converging to some point x ⋆ ∈ E. In particular, if (x rn ) n∈N is a convergent sequence, then the limit must lie in E.
As discussed, Klebanov and Wacker (2023, Theorem 2.8) proved that a limit point of an AMF for the posterior is a strong mode in the sequence-space setting.The following lemma generalises this result to any separable Banach space and slightly weakens the hypotheses required on the potential to be merely continuous rather than locally Lipschitz.Lemma 4.5 (Limit points of AMFs are strong modes).Under the assumptions of Theorem 1.1, any X-strong limit point x ⋆ ∈ E of an AMF (x r ) r>0 ⊂ X for µ y is a strong mode.
Proof.Let x ⋆ ∈ E be some limit point of (x r ) r>0 .To show that x ⋆ is a strong mode, it suffices to check that for any (r n ) n∈N → 0, Indeed, it would be enough to show that any (r n ) n∈N → 0 has a further subsequence such that (4.3) holds along that subsequence: this follows from the fact that if (u n ) n∈N is an arbitrary real sequence and any subsequence of (u n ) n∈N has a further subsequence converging to u, then u n → u.
Hence, take any sequence (r n ) n∈N → 0; by Lemma 4.4 we may pass to a subsequence of (r n ) n∈N , which will not be relabelled, such that (x rn ) n∈N converges to some y ⋆ ∈ E. As Φ is continuous, for any ε > 0 there exists δ > 0 such that for any x ∈ B δ (y ⋆ ), it follows that |Φ(y ⋆ ) − Φ(x)| < ε.Since (r n ) n∈N → 0 and (x rn ) n∈N → y ⋆ , there exists N ∈ N such that |r n | < δ 2 and x rn − y ⋆ X < δ 2 for n N .Hence for such n and any x ∈ B rn (x rn ), we have Φ(x rn ) − Φ(x) < 2ε.This implies that Since (x rn ) n∈N is a subsequence of an AMF, the previous equation implies that In particular, the point x ⋆ fixed at the start of the proof is a limit point of the AMF (x r ) r>0 , i.e. there exists (s n ) n∈N → 0 such that (x sn ) n∈N → x ⋆ , so the above argument implies the existence of a subsequence such that This does not yet hold for every sequence (r n ) n∈N → 0, only the specific sequence (s n ) n∈N .To complete the proof, fix an arbitrary (r n ) n∈N → 0 and y ⋆ ∈ E as above.As µ y has an OM functional I y defined on E, Hence, using (4.5) and (4.6), it follows that Hence, for any (r n ) n∈N → 0, there is a further subsequence for which (4.3) holds, and thus x ⋆ is a strong mode.
Proof of Theorem 1.1.(a) By Lemma 4.4, any AMF (x r ) r>0 has an X-strong limit point x ⋆ ∈ E, and this limit point is a strong mode by Lemma 4.5.(b) Lemma 2.2 proves that strong and weak modes coincide as a strong mode exists, and Proposition 2.6 shows that weak modes coincide with minimisers of the OM functional.
As any generalised strong mode must be the limit point of an AMF (Proposition 4.3) and Lemma 4.4 implies that such a point lies in E, Lemma 4.5 implies that the generalised strong mode is also a strong mode.

Consistency of MAP estimators
We return to the additive-noise Bayesian inverse problem (1.2) discussed in the introduction.Suppose that X is a separable Banach space, Y := R d and G : X → Y .For simplicity, we restrict attention to the case of mean-zero Gaussian noise ξ and Gaussian prior µ 0 , giving the model Under the frequentist assumption that there is a fixed true parameter x † ∈ X, consistency theory studies the behaviour of the posterior and point estimators -which depend on the random observations y 1 , . . ., y N -in the infinite-data or small-noise limit.
Classically, a sequence of posterior measures is consistent at x † if, for any neighbourhood U of x † , the posterior measure of U C converges to zero in probability (Ghosal and van der Vaart, 2017).For Bayesian inverse problems, this notion is often too restrictive: the parameter x † need not even be identifiable from the model, because there may exist x = x † such that G(x) = G(x † ).Indeed, if G is a bounded linear operator, then G(x † ) = G(x † + z) for any z ∈ ker G = ∅, so x † is never identifiable, and thus one cannot expect posterior consistency to hold.Posterior consistency is often possible to show if G is known more explicitly, e.g. if it is the solution operator for a partial differential equation (see Agapiou et al., 2013;Knapik et al., 2011;Vollmer, 2013), but we focus on the general case of a possibly nonlinear operator G : X → Y .
In a similar vein, one cannot expect a sequence of MAP estimators to be consistent estimators of x † , i.e. the MAP estimators need not converge in probability to x † .It is instead typical to study a weaker notion of consistency for MAP estimators, where one identifies a limit point x ⋆ of any sequence of MAP estimators and shows that G(x † ) = G(x ⋆ ) (Agapiou et al., 2018;Dashti et al., 2013;Dunlop, 2019).The results of Dashti et al. (2013, Section 4) on the consistency of MAP estimators in the setting of (5.1) depend on the correspondence between strong MAP estimators and OM minimisers, and on the existence of strong MAP estimators.Thus, Theorem 1.1 can be used to extend the applicability of these consistency results from separable Hilbert spaces to arbitrary separable Banach spaces.
(a) there is a subsequence of (G(x N )) N ∈N converging to G(x † ) almost surely; (b) if x † ∈ E, there is a subsequence of (x N ) N ∈N converging to some x ⋆ ∈ E weakly in E almost surely, and G(x ⋆ ) = G(x † ).

Closing remarks
MAP estimators provide a simple summary of the posterior distribution, but in the nonparametric setting it is not straightforward even to verify that MAP estimators exist.This article has shown that Bayesian inverse problems defined on any separable Banach space with a Gaussian prior have well-defined strong MAP estimators under very mild conditions on the forward problem.The fact that MAP estimators correspond with minimisers of a Tikhonov functional is an important justification for the Bayesian approach, and this article has also extended the connection between MAP estimators and variational minimisers to the Banach setting.As a corollary of Theorem 1.1 on the existence of strong MAP estimators, this article also extends results on the consistency of MAP estimators to additive-noise Bayesian inverse problems set in any separable Banach space.
The strategy adopted here depends on two essential points: the statistical structure of the Bayesian inverse problem (1.2), which ensures the posterior is absolutely continuous with respect to the prior, and the topological structure provided by the Gaussian prior through the compactly embedded Cameron-Martin space.
In more general settings, such as those where the observed quantity has infinite dimension, it need not be the case that the posterior is absolutely continuous with respect to the prior (Stuart, 2010, Remark 3.8).While the small-ball theory for modes (Section 2.1) does not depend on this absolute continuity, new techniques are needed to translate statements from prior to posterior without a density to relate the two.
Though this article has restricted attention to the case that X is a separable Banach space, the results on Gaussian measures used in this article hold more generally for Radon Gaussian measures on a locally convex space X, and this would form a natural extension of this work.
Another possible extension is to other priors with similar structure used in nonparametric Bayesian inverse problems, such as the p-exponential priors of Agapiou et al. (2021).As pointed out by Agapiou et al. (2021, Remark 2.14), a bound analogous to Proposition 3.1 is more challenging for non-Gaussian p-exponential measures (i.e.p < 2) because the appropriate analogue of the Cameron-Martin space is not a Hilbert space; on the other hand, p-exponential measures are defined on subspaces of the countable product space R ∞ , which provides a useful topological structure not present in an arbitrary Banach space.
It would also be interesting to know whether the hypothesis of coercivity in Proposition 4.1 -which was sufficient to prove the existence of weak modes -would also suffice for proving the existence of strong modes.