Regularization Graphs -- A unified framework for variational regularization of inverse problems

We introduce and study a mathematical framework for a broad class of regularization functionals for ill-posed inverse problems: Regularization Graphs. Regularization graphs allow to construct functionals using as building blocks linear operators and convex functionals, assembled by means of operators that can be seen as generalizations of classical infimal convolution operators. This class of functionals exhaustively covers existing regularization approaches and it is flexible enough to craft new ones in a simple and constructive way. We provide well-posedness and convergence results with the proposed class of functionals in a general setting. Further, we consider a bilevel optimization approach to learn optimal weights for such regularization graphs from training data. We demonstrate that this approach is capable of optimizing the structure and the complexity of a regularization graph, allowing, for example, to automatically select a combination of regularizers that is optimal for given training data.


Introduction
In the last decades, a significant part of inverse problems theory has revolved around constructing suitable regularization approaches that allow for a reliable solution of ill-posed inverse problems. Among those, energy-based methods such as Tikhonov regularization [50] have been successful both with respect to mathematical guarantees, e.g., on well-posedness and stability, and with respect to practical performance in applications. An important cornerstone of energy-based methods are regularization functionals, which are responsible for stabilizing the ill-posed inversion of the forward model and for incorporating prior knowledge, such as smoothness, on the sought solution. The later is relevant in particular when dealing with highly structured data such as image data, where a suitable inclusion of prior knowledge makes a significant difference regarding the overall performance of the resulting method.  (1,2) , α (2,3) , α (2,4) ) = (1, 1, α). See Remark 2.5 for a detailed interpretation.
In this context, non-smooth sparsity-based methods building on measures, measure-valued differential operators, basis transforms or frames have become very popular. Besides the celebrated total variation (TV) functional [46], those include methods building on higherorder derivatives such as second-order TV [31], infimal-convolution-based approaches [18] or the total generalized variation (TGV) functional [8], see [7] for a recent review. Transformbased methods include wavelet-, curvelet-or shearlet transforms [36,39,49] as well as learned dictionaries [23]. Also more specific approaches tailored, for instance, to model certain oscillations [25,27,33,38,40,43,44] or texture [17], as well as different combinations of existing methods exist, such as TV and second-order TV [42], higher-order regularizers [12,19,42], TV-type functionals with curvelets or shearlets [26,30,28], a combination of different transform-based approaches [35] or the infimal convolution of TV with L p -norms [14,15]. We refer to [7,3,9,37,47] for a review of a subset of the plethora of existing methods. While all these approaches share the goal of providing a model-based regularization for inverse problems, the way and extent to which they are developed and analyzed is rather different and often application-specific. Moreover, the choice of any of such methods is mostly done manually. A systematic approach for the analysis and the automatic, data-based design of regularization functionals that covers a broad class of existing methods does not exist to date. With introducing the framework of regularization graphs, we aim to provide a step in this direction. A regularization graph can be described as a weighted, directed graph together with a collection of functionals and operators associated with the nodes and the edges of the graph, respectively. Such structure allows to define regularization functionals via a rather arbitrary combination of linear operators and functionals, e.g., via variable splittings or summations. In particular, both the sum and a (generalized) infimal convolution of the functionals associated with two regularization graphs can be formulated as a regularization graph functional, where the underlying graph is obtained by properly combining the two original ones. This yields a flexible framework for designing new regularization functionals or combining existing ones, e.g., via infimal convolution. Moreover, by associating weights to the edges of such graphs, a learning of both the parameters associated with such functionals as well as the structure of the underlying graph is possible. The latter in particular allows to automatically select optimal regularization functionals from a set of possible choices within a bilevel approach.
A prototypical example of a regularization graph with nodes V = {1, 2, 3, 4}, directed edges E = {(1, 2), (2,3), (2,4)} and weights (α e ) e∈E is provided in Figure 1. Here, the operators and the functionals associated to the nodes and the edges of the graph are defined as follows. For n ∈ V and e ∈ E, X n and X e are suitable Banach spaces, Φ e are bounded linear operators, Θ e are (possibly unbounded) closed range operators and Ψ n are convex functionals. The spaces X n and X e are called node spaces and edge spaces, respectively, while the Ψ n are called node functionals. Further, we call Θ e forward operators as they map from the edge space X e to the direct successor node space X n . Similarly, we call Φ e backward operators as they map from the edge space to the direct predecessor node space. Variables {w e 1 , w e 2 , w e 3 } associated with the edges of the graph, on which both the forward and backward operators are evaluated, are called edge variables. Notice that in our example the root node 1 and the splitting node 2 correspond to the functional I {0} , i.e., Ψ 1 = Ψ 2 = I {0} and Φ e 2 = I denotes a continuous embedding of X e 2 into X 2 ; see Remark 2.5 below for details. Also note that the weights (α e ) e∈E associated to the graph are depicted in Figure 1 as scalar factors in front of the backward operators (Φ e ) e∈E , where we use the convention that fixed, trivial weights α e = 1 are not depicted explicitly. We also remark that, besides the notation α e for e ∈ E, for specific regularization graphs the non-trivial weights will be often numbered independently of the edge they are associated with; see Figure 2.
The structure defined in this example is a regularization graph under mild additional conditions, most importantly weak* lower semicontinuity and coercivity of Ψ 3 and Ψ 4 , and closedness of the range of each Θ e , which, for instance, still allows the Θ e to be densely defined differential operators and the Φ e to be synthesis operators for a given dictionary or frame. The non-trivial weight α allows to adapt the structure of the graph by removing edges, as with α = 0 and supposing for example that Ψ 4 vanishes in zero, we obtain R 0 (u) = inf {Φe 1 we 1 =u} Ψ 3 (Θ e 2 Θ e 1 w e 1 ). The general structure of a regularization graph is defined in Section 2 and examples of existing regularization approaches that are included in this setting are provided in Section 2.1 and listed in the Appendix. Here, the main conditions on the involved functionals and operators are that the forward operators Θ e have closed range (i.e., satisfy a Poincaré-type estimate), that the backward operators Φ e are continuous and that the involved node functionals Ψ n are coercive. Under these conditions, we prove well-posedness, stability and convergence results for the application of regularization graphs in a general inverse problem setting. Moreover, we develop a bilevel approach that allows to learn the structure of an optimal graph for a given set of training data and show well-posedness of the resulting non-convex optimization problem.
Contribution of the paper in relation to the state of the art. In a rather abstract setting, general conditions on regularization functionals that allow to guarantee well-posedness, stability and convergence are of course well-known, see for instance [29,32]. Those, however, are conditions on the overall functionals rather than their building blocks and their verification is often at the same level of difficulty than the results themselves. Furthermore, they do not allow to easily combine different approaches without re-checking the underlying conditions. More specific results also exist, but deal with particular settings such as higher-order regularization [6,12]. More related to the aim of this paper are some works on bilevel optimization, see for instance [16] for a review. In the probably most closely related work [21], the authors consider a general bilevel framework that includes TV, the infimal convolution of first and second order TV functionals as well as the TGV functional as particular cases. In contrast to [21], however, where essentially well-posed linear inverse problems are considered, i.e., those with closed range forward operator, our work is generally applicable to any bounded forward operator.
In particular, we do not require closed range and allow for genuinely ill-posed inverse problems, a generalization that is the main source of difficulty for the analysis in this context. A second, closely related work is the preprint [20]. There, the authors consider a bilevel scheme for learning parameters and operators in a TGV-like functional. They provide conditions on the involved operators under which they show well-posedness for a bilevel approach in image denoising. As application they consider an interpolation between a symmetrized and a non-symmetrized differential operator in the second order TGV functional. Besides being applicable to inverse problems beyond denoising, our work is different to [20] in allowing a more flexible combination of linear operators and functionals, far beyond the cascadic structure of TGV. Further, our framework allows for an automatic selection from different choices of existing regularization functionals but also, for instance, to select an optimal order in TGV regularization.
Organization of the paper. The paper is organized as follows. In Section 2 we give the precise definition of regularization graphs clarifying the main assumptions on the linear operators, the functionals and the involved Banach spaces that yield the results of our work. Also, we provide several examples of existing regularization approaches that can be constructed using a suitable regularization graph. In Section 3 we provide basic algebraic properties of regularization graphs, in particular a recursive representation that will be quite useful later on. In Section 4 we provide the main analytic properties of functionals associated with regularization graphs that will be the basis for subsequent results on the regularization of inverse problems and bilevel optimization. In particular, we show that any such functional is weak* lower semi-continuous and coercive up to a finite dimensional space. In Section 5 we provide an equivalent predual formulation of regularization graphs. Also, the connection to well-known predual representations of existing regularization approaches is made. While the results of this section will not be needed in the subsequent theory, they are nevertheless of interest on their own, in particular in view of optimality conditions and duality-based algorithms. Section 6 then provides well-posedess and convergence results for the application of regularization graphs to the regularization of linear inverse problems. We focus on linear inverse problems since this allows for a compact presentation of the results without any additional assumptions on the forward model except for continuity. Nevertheless, the analytic results of Section 4 also allow to show well-posedness for non-linear inverse problems under standard assumptions on the forward model such as in [32]. In Section 7, we develop and analyze a bilevel framework for learning the weights of regularization graphs. In particular, we show well-posedness and an example for a bilevel approach that allows to select optimal regularizers from a set of possible choices by learning zero-weights in the graph. An appendix further provides a list that shows how a selection of existing regularization functionals can be represented by regularization graphs.

Notation and assumptions
In this section we define the underlying setting and assumptions used in the paper. The structure of a general regularization functional will be represented by a directed graph G = (V, E), where V is a non-empty finite set of nodes not containing 0 and E ⊂ (V × V )\{(n, n) : n ∈ V } are the edges. We assume that G has a tree structure and that a root noden ∈ V exists, i.e., we assume that G contains no cycles and that for each n ∈ V there exist edges ((n i−1 , n i )) M i=1 in E such that n M = n and n 0 =n. We call a set F ⊂ E a chain (of length M > 0 with root n 0 ) if F = {(n i−1 , n i ) | i = 1, . . . , M, n i = n j for i = j}. Further, for n ∈ V , we denote by n − the node such that (n − , n) ∈ E if n is not the root node of the graph and n − = 0 otherwise, noting that n − is well defined due to the tree structure of G. To any graph G = (V, E) we associate a family of Banach spaces spaces (X n ) n∈V with the nodes and a family of Banach spaces (X e ) e∈E with the edges. Further, we associate the following functionals and operators with G.
We suppose that each X n , n ∈ V and each X e , e ∈ E admits a predual space denoted by X # n and X # e , respectively, and make the following assumptions on (Ψ n ) n , (Θ e ) e and (Φ e ) e : (H1) Ψ n is weak* lower-semicontinuous for every n ∈ V .
(H4) Θ e is weak* closed for every e ∈ E.
(H7) Φ e is weak* to weak* continuous for every e ∈ E.
(H8) Bounded sequences in X e and X n admit weak* convergent subsequences for every e ∈ E, n ∈ V .
Remark 2.1. We can observe the following details in the above assumptions: • Hypothesis (H5) implies the existence of a linear and continuous projection on ker(Θ e ).
• Hypothesis (H7) implies the existence of a continuous predual operator for Φ e for each e ∈ E. Consequently, each Φ e is continuous as well (see for instance [10,Remark 3.2]).
• Hypothesis (H8) holds whenever X e and X n are reflexive or dual spaces of separable spaces. In case of reflexivity, the notion of weak* convergence can be replaced by weak convergence in all assumptions.
• Note that, since the Ψ n are convex, assumption (H3) implies that Ψ n (λv) ≤ λΨ n (v) for any v ∈ X n , λ ∈ (0, 1] and n ∈ V . This consequence of assumption (H3) will be needed in the context of varying the weights of a regularization graph. For well-posedness results such as existence and stability as presented in this paper, however, assumption (H3) is not necessary and could be dropped.
We also note that Hypothesis (H2) implies a coercivity estimate as follows.
Remark 2.2. Hypothesis (H2) holds if and only if there exists C > 0 and D ∈ R such that v Xn ≤ CΨ n (v) + D for every v ∈ X n . A proof for this can be found for example in [4,Fact 4.4.8].
We are now in a position to define the main objects of interest in this paper: Regularization graphs and associated regularization functionals. To this aim, we allow for weights of the form (α e ) e∈E with α e ∈ [0, ∞) for all e ∈ E. Definition 2.3 (Regularization graph and associated regularization functional). Given G = (V, E) a directed graph with tree structure and root noden, and the associated spaces, functionals and operators as in Section 2 such that the hypotheses (H1) to (H8) hold, the structure of a regularization graph is defined as the tuple G = (G, (Ψ n ) n∈V , (Θ e ) e∈E , (Φ e ) e∈E ). Together with a family of weights α = (α e ) e∈E , a regularization graph is then defined as the tuple G α = (G, α).
Remark 2.4 (Weights). Generically, to each edge e within the graph structure of a regularization graph is associated a weight α e . In many cases, e.g., when node functionals only take values in {0, ∞}, this leads to an overparametrization of the associated regularization functional. To avoid this, we often fix a subset of weights to be equal to 1 already when defining a regularization graph. Such weights are called trivial weights, and the other, nontrivial weights that might still vary are often numbered independently of the edge they are associated with.
Remark 2.5 (Graphical representation of regularization graphs). Let us revisit the prototypical graphical representation of a regularization graph in Figure 1. There, the circles represent nodes, with the node space shown above the circle and the functional Ψ n inside. A splitting node is represented by a ⊕ and is associated with the functional I {0} . The rectangles denote the edges, with the edge space shown in the center, the forward operator Θ e shown at the top and the backward operator Φ e at the bottom. The weights (α e ) e∈E are depicted as scalar factors in front of the backward operators (Φ e ) e∈E (with arbitrary numbering independent of their position in the graph), and we use the convention that omitted weights at an edge e correspond to trivial weights α e = 1. The arrows connect the nodes. At each node n, the node functional Ψ n is evaluated at Θ e of the variable from the incoming edge e minus the sum of all Φ e applied to the variables w e from the outgoing edges {e = (n, m) ∈ E : m ∈ V }. The regularization graph functional is given by minimizing this construction over all edge variables in the domain of the corresponding operators (Θ e ) e∈E .
We can also obtain a more compact representation of R α as follows. Define the spaces equipped with the product norm, and the operator Λ α : dom(Λ α ) ⊂ X E → X V as for n =n (2.2) for every n ∈ V and w = (w e ) e∈E ∈ dom(Λ α ) where dom(Λ α ) = × e∈E dom(Θ e ). Then we can write the functional R α associated with the regularization graph G α as For notational convenience we also define the functional Ψ u : Proposition 2.6. Every regularization graph functional R α : Xn → [0, +∞] is convex, R α (0) = 0 and R α (λu) ≤ λR α (u) for all u ∈ Xn, λ ∈ (0, 1]. Further, in case each Ψ n for n ∈ V is positively one homogeneoous, also R α is positively one homogeneoous. The statement follows easily from the representation in (2.4) together with Assumption (H3).

Examples
In this section, we provide some concrete examples of regularization graphs to which our general assumptions apply. Here, for d ∈ N, d ≥ 1, we always denote by Ω ⊂ R d a bounded Lipschitz domain. Moreover, we denote by I the embedding of a Banach space into another one. We remark that domain and codomain of the embeddings change for different examples. However, they can easily be deduced from the context.
Infimal convolution of TV k 1 − TV k 2 . Figure 2b shows the regularization graph corresponding to the infimal convolution of TV k 1 and TV k 2 with k 1 , k 2 ∈ N. Here the exponents for the Lebesgue spaces are chosen as (c) Second order total generalized variation.
Id  domains of the linear operators ∇ k i : where Sym k (R d ) denotes the space of symmetric tensors of order k, e.g., R d for k = 1 and the space of symmetric d × d matrices for k = 2. We refer to [7] for details and basic properties of BV k (Ω) and TV k . By similar arguments to those used in the previous example and the generalized Poincaré inequality for TV k i [7, Corollary 3.23], it follows that our general assumptions (H1)-(H8) are satisfied. The functional R α : L p (Ω) → [0, +∞] associated to the regularization graph depicted in Figure 2b is given as Total generalized variation. Figure 2c shows the regularization graph corresponding to TGV 2 α , the second order TGV functional as in [8].
for u ∈ BV(Ω) and +∞ otherwise. Building on results in [6], also the TGV functional of arbitrary order k ∈ N can be realized via a regularization graph as in Figure 2f. Figure 2d shows the regularization graph that recovers a TGV 2 -shearlet infimal convolution model introduced in [28] (see also [30]). Here Ω ⊂ R 2 is a bounded Lipschitz domain. The exponent for the Lebesgue space L p (Ω) is chosen as 1 < p ≤ 2. The domain of the linear operator ∇ : L 2 (Ω) → M(Ω, R 2 ) is BV(Ω) and the domain of the symmetrized gradient E : L 2 (Ω, R 2 ) → M(Ω, Sym 2 (R 2 )) is BD(Ω), where again we take advantage of the embeddings BV(Ω) ֒→ L 2 (Ω) and BD(Ω) ֒→ L 2 (Ω, R 2 ). By similar arguments to those used the previous examples and the generalized Poincaré inequality for w → Ew M [5,Corollary 4.20] it follows that in this setting our general assumptions (H1)-(H8) are satisfied for edges and nodes realizing the total generalized variation. In order to introduce the shearlet transform in L 2 (R 2 ) we start with several notations. First, for a > 0 and s ∈ R let A a and S s be the dilatation matrix and the shearing matrix defined respectively as The discrete shearlet system of Ψ ∈ L 2 (R 2 ) is defined as for k, j ∈ Z and m ∈ Z 2 [36,Definition 8]. This allows to define the discrete shearlet transform operator SH as for f ∈ L 2 (R 2 ). By standard results in shearlet theory it holds that if Ψ is a classical shearlet, then SH : Proposition 2]. In particular, this verifies (H6) for SH. Moreover, a simple computation using that Ψ j,k,m L 2 (R 2 ) = Ψ L 2 (R 2 ) for every j, k, m together with Hölder's inequality shows that SH is weak*-to-weak* continuous, implying (H4). The backward operator I •r Ω is the composition of the embedding I : L 2 (Ω) → L p (Ω) with the restriction r Ω : L 2 (R 2 ) → L 2 (Ω). It is immediate to check that I • r Ω is weak*-to-weak* continuous showing (H7). Finally, we remark that the functional · 1 : ℓ 2 (Z 4 ) → [0, +∞] is intended as the extension to +∞ of the ℓ 1 -norm on ℓ 2 . Such extension is convex, coercive and weak* lower semicontinuous showing (H1)-(H3). The functional R α : L p (Ω) → [0, +∞] associated to the regularization graph functional depicted in Figure  2d is then given as Convex convolutional sparse coding. Figure 2e shows the regularization graph corresponding to a data-adaptive convolutional-sparse-coding-based method recently introduced in [17]. As such methods are in general non-convex, in [17], the authors proposed a convex relaxation of the convolution LASSO problem in the tensor product of convolutional filter kernels and coefficient images. We refer to [17] for a more detailed description of the model. We denote by M ⊗ π L 2 the projective tensor product between M(Ω Σ ) and L 2 (Σ) [17, Appendix A], where Σ is a bounded Lipschitz domain and Ω Σ := Ω + Σ ⊂ R d is the Minkowski sum of Ω and Σ. The operatorK : M ⊗ π L 2 → L 2 (Ω) is the unique tensor lifting of the bilinear operator K : M(Ω Σ ) × L 2 (Σ) → L 2 (Ω) defined essentially as Thanks to [17,Lemma 2], the operatorK is weak* to weak* continuous. We also define the convex functional Ψ : M⊗ π L 2 → [0, +∞] as Ψ(C) = C π +ν C nuc for every C ∈ M⊗ π L 2 , where ν > 0 is a parameter, is the projective norm of M ⊗ π L 2 and is an extension of the nuclear norm, where σ i (T C ) are the singular values of C interpreted as a bounded linear map from L 2 (Ω Σ ) to L 2 (Σ). By Lemma 1 and Lemma 7 in [17] it follows that Ψ is weak* lower semicontinuous. Hence the general assumptions (H1)-(H8) for a regularization graph are satisfied and the functional R α : L 2 (Ω) → [0, +∞] associated to the regularization graph depicted in Figure 2e is given as Remark 2.7. The regularization graph functional for TV, infimal convolution of TV k 1 − TV k 2 and total generalized variation can be extended to L 1 (Ω) even if L 1 (Ω) does not admit a predual. Such extension is described for general regularization graphs in Proposition 4.8.

Algebraic properties of regularization graphs
This section provides a recursive representation of regularization graphs and deals with estimates between different regularization graph functionals as well as their combination via addition or infimal convolution. First we need the definition of the height of a graph.
Definition 3.1 (Height of a regularization graph). Given a regularization graph G α with G = (V, E) the associated directed graph, we denote by H(G α ) its height defined as the number of edges in the longest path of G connecting the root to one of the leaves. That is, with n 0 =n the root node, we define if this set is non-empty and define H(G α ) = 0 otherwise, i.e., in case of a trivial graph.
Note that the height of a regularization graph does not depend on the particular choice of weights α. Next, we provide a recursion result that allows us to rewrite a regularization graph of height h in terms of regularization graphs of height h − 1.

Lemma 3.2 (Recursive representation of regularization graphs).
For G α a regularization graph of height h ≥ 1, G = (V, E) the associated directed graph andn the root node, let E ⊂ E be the set all edges connected to the root noden,nê forê ∈Ê be their endpoints and let Gê = (Vê, Eê) be the subtree of G = (V, E) withnê as root node. Then, there exist regularization graphs Gê αê with associated directed graphs Gê = (Vê, Eê) of height at most h − 1 and weights (αê e ) e∈Eê such that, with R α = R(G α ) and Rê αê = R(Gê αê ) the associated functionals, the following recursive representation holds Proof. We explicitly construct the claimed recursive representation as visualized in Figure 3. First note that we can re-write R α as Now define Gê αê to be a regularization graph with graph structure Gê = (Vê, Eê) and the associated operators, functionals and weights, such that Rê αê = R(Gê αê ) is given as where we note that herenê is regarded as a node of Gê αê , and thus Θ ((nê) − ,nê) = Id and w ((nê) − ,nê) = z. The recursive representation of R α is then given as which proves the assertion.
As first consequence of this recursive representation, we obtain an estimate between two regularization graph functionals corresponding to regularization graphs with different weights. Lemma 3.3. Let G α 1 and G α 2 be two regularization graphs with the same underlying graph structure G and directed graph G = (V, E) with root noden, and let α 1 , α 2 be weights such that α 1 e ≥ α 2 e for all e ∈ E. Then, with | F ⊂ E is either empty or a chain withn as root , where we use the conventions e∈∅ α 2 e α 1 e = 1 and 0 0 = 0, for the associated regularization graph Proof. We prove the result by induction over the height h of the graphs. Assume the result holds true for any two regularization graphs with height less than h. Now note that, by assumption, (α 1 e = 0) implies (α 2 e = 0), so we can adapt the graph G = (V, E) by removing all edges e ∈ E with α 1 e = 0 and all subsequently disconnected nodes, without increasing its height, changing C α 1 ,α 2 or the values of the R α i . Hence, without loss of generality, assume that α 1 e > 0 for all e ∈ E. Now for h = 0 the result holds trivially and for h ≥ 1 we can use the recursive representation of Lemma 3.2 to obtain where in the first line we substituted wê to α 1 e wê and in the second line we substituted α 2 e wê to wê; additionally, in the first inequality we used that Rê (α 1 )ê (λu) ≤ λRê (α 1 )ê (u) for λ ∈ [0, 1], see Proposition 2.6, and we obtained the last estimate from the induction hypothesis and the definition of C α 1 ,α 2 .
Remark 3.4. It is easy to see from the proof above that, whenever Ψ n for some n ∈ V is positive one-homogeneous, the assumption α 1 e ≥ α 2 e for e = (n − , n) can be replaced by (α 1 e = 0) implying (α 2 e = 0). In particular, if (α 1 e = 0) if and only if (α 2 e = 0) for all e = (n − , n) such that Ψ n is positive one-homogeneous and all other α 1 e and α 2 e coincide, then R α 1 and R α 2 are equivalent, i.e., R α 1 can be estimated from above and below by a constant times R α 2 , dom(R α 1 ) = dom(R α 2 ) and also their zero-sets coincide.

Combining regularization graphs
Obviously, for R α = R(G α ) being a regularization graph funtional and λ > 0, also λR α is a regularization graph functional (corresponding to an adaption of the regularization graph G α where all node functionals Ψ n are replaced by λΨ n ). In this subsection we show that also the sum and infimal-convolution of two regularization graph functionals are again regularization graph functionals.
Splitting unit Xn 2
• Infimal-convolution: For the additional nodes and edges in G, define the spaces and the functionals and weights and adopt the elements of G 1 α 1 and G 2 α 2 for all other nodes and edges. Then, the associated structure G α = (G, α) defines a regularization graph and, for R α = R(G α ), • Summation: For the additional nodes and edges in G, define the spaces the operators and the functionals and weights and adopt the elements of G 1 α 1 and G 2 α 2 for all other nodes and edges. Then, the associated structure G α = (G, α) defines a regularization graph and, for R α = R(G α ), Proof. It is easy to see that all spaces, functionals, operators and weights involved in the definition of G α fulfill Assumptions (H1) to (H8), such that G α defines a regularization graph. Denote the edges e l = (n,n l ) for l ∈ {0, 1, 2}. For the claimed representation of R α in case of the infimal-convolution, we observe that Likewise, for the claimed representation of R α in case of summation, we observe that More generally, note that any regularization graph can be extended by appending another regularization graph to one of its leaves, and in particular by appending a regularization graph corresponding to the infimal convolution or the sum of two other regularization graph functionals. The latter can be achieved by appending a splitting or summation unit as in Figure 4 to a leaf-node, where the I {0} and X in the left, green nodes in Figure 4 are replaced by the corresponding node functional and node space of the leaf node.
Remark 3.6 (Assumptions on the sum of two regularization graphs). The notion of regularization graphs was designed mainly for an infimal-convolution-type combination of functionals and operators, since we believe this situation is more interesting in practice. For infimalconvolution-type combinations, we believe that our assumptions on the underlying functionals and operators are rather minimal. Our framework also allows for the summation of two functionals, but in this situation our assumptions are suboptimal. Indeed, one would expect that in a summation, only one of the two functionals needs to fulfill the assumptions of a regularization graph in order to provide a suitable regularization strategy. Indeed, when using the sum of two (suitable) functionals for regularization, generically, only one of them needs to fulfill coercivity properties (such as (H2) together with (H6)) in order to obtain well-posedness results for linear inverse problems. Nevertheless, we do not further generalize our framework towards weakening the assumptions for the sum of two functionals since i) we believe this situation is less relevant and ii) this would significantly complicate our basic assumptions and results for instance on convergence for vanishing noise and bilevel optimization, thereby hindering our main goal of providing an easily applicable framework.

Analytic properties of regularization graphs
The goal of this section is to obtain analytic properties of regularization graph functionals that provide the basis for well-posedness results for the regularization of inverse problems.
To this aim, we first consider lower semi-continuity and coercivity properties, for which we need a general lemma that deals with projections. For the lemma, remember that for a Banach space X and a finite dimensional subspace L of X, there always exists a bounded linear projection P : X → L.
be a functional, L ⊂ X be a finite dimensional subspace and assume there is a function G : for all u ∈ X. Then, for a closed subspace K ⊂ X and a bounded, linear projection P K∩L : In particular, if K = X, this holds for any bounded linear projection onto L.
Proof. Assume this does not hold true, then we can pick a sequence (u k ) k in K such that for each k, with C, D being the constants of the original estimate. This implies in particular In particular, this implies that ((G(u k ) − P K∩L u k )/ u k − P K∩L u k X ) k is bounded and, by finite dimensionality, admits a (non-relabeled) subsequence strongly converging to some z ∈ L. Consequently, also (ũ k ) k converges strongly to z and from closedness of K we get that z ∈ K.
strongly, which is a contradiction to ũ k X = 1 for each k and concludes the proof of the coercivity estimate in the general form. Also, it can be seen from this argument that we can chooseD = 0 in case D = 0, which completes the proof.
The following lemma provides a standard lower semi-continuity and compactness result. Since it will be used frequently in the paper, we provide its proof for the sake of completeness.
Lemma 4.2. Let X, W and Y be Banach spaces such that bounded sequences in X and W admit weak*-convergent subsequences. For Θ : dom(Θ) ⊂ X → W a weak* to weak* closed operator and F : W → [0, ∞) convex and weak* lower semi-continuous, suppose that: ii) The space ker(Θ) is finite dimensional and there exists a continuous projection P ker(Θ) : X → ker(Θ) and B > 0 such that Proof. At first note that, by Lemma 4.1 and using that rg where Θ −1 is the inverse of Θ : ker(Θ) ⊥ → rg(Θ). Then, we observe that F • Θ is invariant on M and that M is a finite dimensional vector space. Further, for any w ∈ dom(Θ), We now prove that F • Θ is weak* lower semi-continuous. Take (w k ) k weak* converging to some w ∈ X. Without loss of generality, we can assume that lim inf k (F •Θ)(w k ) < ∞ and, up to extracting a subsequence, we can choose w k such that it realizes the lim inf. Then, by the estimate (4.4) above, both ( w k − P w k X ) n and ( Θ(w k − P w k ) W ) k are bounded such that, by taking a non-relabeled subsequence, we can assume that Θ(w k − P w k ) weak* converges to some z ∈ W and w k − P w k weak* converges to some v ∈ X. Weak* closedness of Θ then implies that v ∈ dom(Θ) and Θv = z. Also, thanks to the finite dimensionality of M we have that w − v = w*-lim k w k − (w k − P w k ) ∈ M such that, by weak* lower semi-continuity of F , we conclude It follows that w k −w k = (Id −P Z )P w k ∈ ker(K) ∩ M and that (w k − P w k ) k is bounded in X by the estimate (4.4) and the boundedness of (F • Θ(w k )) k . Now since K is injective on the finite dimensional space Z and KP Z = K, we further get for A > 0 a generic constant that together with the estimate (4.4) further implies that (Θw k ) k is bounded.
Using the previous lemma, we now deal with the kernel and coercivity of regularization graph functionals.
Theorem 4.3. Let G α be a regularization graph with weights (α e ) e , underlying graph structure G = (V, E) and root noden ∈ V , and let R α = R(G α ) : Xn → [0, +∞] be the associated regularization graph functional. Then: i) The infimum in the recursive representation of R α (3.1) provided in Lemma 3.2 is attained for any u ∈ Xn.
iii) There exists a finite dimensional subspace L ⊂ Xn such that R α is invariant on L and for P L : X → L a bounded, linear projection there exist C > 0, D ≥ 0 such that, for u ∈ Xn, u − P L u Xn ≤ CR α (u) + D. Proof. We prove the result via induction over the height of the graph. Assume that the claimed assertions hold true for any regularization graph of height less than h and let G α be a regularization graph of height h with associated functional R α = R(G α ). If h = 0, the results hold trivially with L = {0} thanks to Assumptions (H1), (H2) and the definition of trivial regularization graphs. Otherwise, using Lemma 3.2 we write R α as whereÊ are the edges connected ton and for eachê ∈Ê, Rê αê : X nê → [0, ∞] is a functional associated to a regularization graph Gê αê with root nodenê. Also, remember that by (H5) and (H6) (see also Remark 2.1) each Θê : Xê → Xnê has closed range, finite dimensional kernel and satisfies for Bê > 0 and all w ∈ dom(Θê). Applying the induction hypothesis, each Rê αê is weak* lower-semicontinuous and there exists a finite dimensional subspace Lê where Rê αê is invariant such that for P Lê a bounded linear projection there exist constants Cê > 0 and Moreover, applying Lemma 4.1 with L = Lê, G = P Lê , R = Rê αê and K = rg(Θê) (that is closed thanks to (H5), (H6) and Remark 2.1) yields that for P rg(Θê)∩Lê a linear, continuous Now proceeding as in the proof of Lemma 4.2, we define ker(Θê) ⊥ := rg(Id −P ker(Θê) ) ∩ dom(Θê). It is easy to see that ker(Θê) ⊥ is a complement of ker(Θê) in dom(Θê) and that Θê is injective on ker(Θê) ⊥ . Hence, with we can define Pê : dom(Θê) → Mê as is the inverse of Θê : ker(Θê) ⊥ → rg(Θê). Then, we observe that Rê αê • Θê is invariant on Mê and that Mê is a finite dimensional vector space. Then, estimating as in (4.4), we obtain that for each w ∈ Xê that Now we first show weak* lower semi-continuity of R α on Xn. To this aim, take (u k ) k to be a sequence in Xn converging weakly* to some u ∈ Xn. Without loss of generality, we can assume that lim inf k R α (u k ) < ∞ and, up to extracting a subsequence, we can assume that (u k ) k realizes the lim inf. Next, take (w k ) k to be a sequence in X such that Together with assumption (H2), this implies boundedness of (Kw k ) k and (F •Θ)(w k ). We now want to apply Lemma 4.2 choosing ×ê ∈Ê Lê for L and v → (P Lê vnê)ê ∈Ê with v = (vnê)ê ∈Ê ∈ ×ê ∈Ê Xnê for P L . Note that F is weak* lower semicontinuous, it is invariant on ×ê ∈Ê Lê and the estimate (4.2) holds thanks to the inductive assumption (4.8) applied to each Rê αê . Moreover, it can be readily verified that the operator Θ is weak* to weak* closed, has finite dimensional kernel and the estimate (4.3) holds as a direct consequence of (4.7) applied to each Θê. So, applying Lemma 4.2 and using the weak* to weak* closedness of Θ we can select, up to a non-relabeled subsequence, (w k ) k such thatw k − w k ∈ ker(K) ∩ M ,w k converges weak* to some w ∈ Xn and Θw k converges weak* to Θw. Weak* lower semi-continuity of Ψn and F • Θ together with weak* continuity of K and the invariance of F • Θ on M then imply thus weak* lower semi-continuity of R α on Xn follows. In addition, given any u ∈ Xn, choosing u k = u for every k implies existence of minimizers in (4.6) as claimed. Now we note that R α is invariant on the finite dimensional space Next we show the coercivity estimate. To this aim, for any given u ∈ Xn, we select (wê)ê ∈Ê to be minimizers in (4.6) and define v = v(u) := ê∈Ê αêΦêPêwê. Using (4.12), the coercivity of Ψn (see Remark 2.2) and the continuity of Φê we estimate  where equality follows since the infimum in the recursive represenation of Lemma 3.2 is attained. Further, a simple contradiction argument shows that the finite dimensional subspace L where R α is invariant and coercive in the sense of (4.5) is unique (and will henceforth be called the invariant subspace of R α ). We also have the recursive representation with Lê the invariant subspace of Rê αê . Via induction, this implies in particular that L only depends on the support of (α e ) e , i.e., the set of edges where α e = 0, but not on their values.
Existence for the infimum over edge variables associated with edges connected to the root node in the recursive representation of R α , as stated in Theorem 4.3, immediately implies, via induction, existence of infimizing edge variables in the definition of the regularization graph for all edges. This is stated in the following corollary.
Corollary 4.5 (Existence of infimizing edge variables). Let G α be a regularization graph with root noden and R α = R(G α ) be the associated functional. Then, for each u ∈ Xn, there exists (w e ) e∈E such that i.e., the infimum in the definition of the regularization graph functional is attained.
Remark 4.6 (Regularity). Let us observe how an infimal-based combination and an extension of regularization graphs affect the coercivity of regularization graph functionals as in Theorem 4.3.
• When combining two different regularization graph functionals defined on two different normed spaces via infimal convolution, the norm for underlying joint space and hence the norm for the coercivity estimate needs to be the weaker of the two norms. In the construction of Proposition 3.5, this is reflected in the assumption that the embeddings I 1 X : Xn1 → X and I 2 X : Xn2 → X need to be weak* continuous. An example here is the infimal convolution of TV and TV 2 , where TV and TV 2 are coercive up to their kernels on L d/(d−1) and L d/(d−2) , respectively (here the exponents are set to ∞ for d = 1 and d ≤ 2, respectively). The infimal-convolution-based combination of the regularization graphs corresponding to TV and TV 2 , according to Proposition 3.5, is then coercive on the weaker space L d/(d−1) , see [7, Section 4.2] for details.
• When extending a given regularization graph with a further edge, stronger norms can be chosen. A particular example is the composition of two gradient operators ∇ 1 , ∇ 2 to obtain TV 2 = (∇ 2 • ∇ 1 ) · M . Given that ∇ 2 is coercive up to its kernel on L d/(d−1) , we can define ∇ 1 as operator from L d/(d−2) to L d/(d−1) and again obtain coercivity up to constant functions between those spaces by standard Sobolev embeddings. In this case, the overall regularization graph functional corresponding to TV 2 is coercive up to affine functions with respect to the norm in L d/(d−2) , which is the improved regularity of TV 2 , see [7, Section 3].
The following proposition deals with the dependence of the coercivity estimate for a regularization graph functional R α on the weights α.
Proposition 4.7 (Dependence on the weights). Let G α be a regularization graph with weights α and root noden, let L be the invariant subspace of R α = R(G α ) that only depends on the structure of the regularization graph G and the support of (α e ) e∈E , let K ⊂ Xn be a closed subspace and let P K∩L : Xn → L be a bounded, linear projection. Then, for any A ≥ max{α e | e ∈ E} there exist C > 0, D ≥ 0 that only depend on A such that for any where C α := max e∈F α e | F ⊂ E is either empty or a chain withn ∈ V as root , using the same conventions as in Lemma 3.3.
Proof. This follows from first applying Theorem 4.3 and, subsequently, Lemma 4.1 to Rα = R(Gα), whereα e = A if α e > 0 andα e = 0 else, and then using the estimate of Lemma 3.3.
The next proposition deals with extending a regularization graph functional R α : Xn → [0, ∞] by infinity to a Banach space X not satisfying hypothesis (H8), but with Xn ֒→ X. The prototypical application of this result would be, e.g., to extend R α from L p (Ω) with p > 1 to L 1 (Ω), where Ω ⊂ R d is a bounded domain. Note that directly choosing Xn = L 1 (Ω) is not feasible since, in general, bounded sequences in L 1 (Ω) do not admit weak* convergent subsequences (or weakly convergent subsequences since L 1 (Ω) is generally not a dual space).

Proposition 4.8 (Extended domain).
Let G α be a regularization graph with weights α and root noden. Let L be its invariant space. Suppose that X is a Banach space such that Xn ֒→ X and Xn is reflexive. Then, with R α = R(G α ) extended to X via R α (x) = ∞ for x ∈ X \ Xn, R α is convex and lower semi-continuous w.r.t. weak convergence in X, and for any continuous, linear projectionP L : X → L, there exists C > 0, D ≥ 0 such that Proof. Convexity is immediate and the coercivity estimate follows directly from the continuous embedding Xn ֒→ X and Theorem 4.3 by defining P L as the restriction ofP L to Xn.
Regarding weak lower semi-continuity, take (u k ) k to be a sequence in X converging weakly to some u ∈ X. Without loss of generality, we can assume that lim inf k R α (u k ) < ∞ and, up to extracting a subsequence, we can choose u k such that it realizes the lim inf and u k ∈ Xn for every k. With P L : Xn → L a continuous, linear projection, from the coercivity estimate of Theorem 4.3 applied to R α : Xn → [0, ∞] we obtain that v k := u k − P L u k is bounded in Xn such that we may assume weak convergence of the latter to v ∈ Xn. Also, by the embedding Xn ֒→ X, P L u k = u k − v k is bounded in X and hence, by finite dimensionality of L, admits a subsequence converging to some z ∈ Xn ∩ L with respect to · Xn . Again by the embedding Xn ֒→ X, weak convergence in Xn implies weak convergence in X such that, by uniqueness of the weak limit, u = v + z ∈ Xn. Lower semi-continuity of R α with respect to weak convergence in Xn finally implies implying the lower semi-continuity of R α with respect to weak convergence in X.
Remark 4.9. It can be observed that, in the above result, reflexivity of Xn (instead of just requiring that bounded sequences admit weak* convergent subsequences) is only needed to conclude from (weak) convergence of (v k + P L u k ) k to v + z in Xn and the weak convergence of (v k + P L u k ) k to u in X that, by uniqueness of limits, v + z = u follows. The same could be achieved for weak* convergence of (v k + P L u k ) k in Xn, thus not requiring reflexivity, if, for instance, Xn = L ∞ (Ω) and X = L 1 (Ω).

Predual formulation of regularization graphs
The goal of this section is to provide an equivalent, predual reformulation of regularization graphs. Remember that a regularization graph functional R α : Xn → [0, ∞] can be written in a vectorized form as with Λ α and Ψ u given in (2.2) and (2.3), respectively. With Λ # α and Ψ # u predual versions of Λ α and Ψ u , respectively, our goal is to show that every regularization graph functional R α can be written equivalently as To this aim, we need in particular that the functionals Ψ n and the operators Θ e and Φ e admit predual versions. By an application of the Fenchel-Moreau theorem [22,Proposition 4.1] it is easy to see that there exist convex, proper, lower semicontinuous functionals Ψ # n : X # n → [0, ∞] such that their convex conjugates are Ψ n . Proof. Consider the dual pair (V, V * ) for V = (X n , w*) and V * = (X # n , w). Note that Ψ n is convex, proper and lower semicontinuous on V . Therefore by the Fenchel-Moreau theorem [22,Proposition 4.1] there holds that Ψ * * n = Ψ n . In particular, defining Ψ # n = Ψ * n , we have that Ψ # n : X # n → [−∞, +∞] is proper, convex and strongly lower semicontinuous and its convex conjugate is Ψ n . The positivity of Ψ # n follows from Assumption (H3).
Moreover, Remark 2.1 ensures the existence of a bounded predual of Φ (n,m) that we are going to denote by Φ # (n,m) : X # n → X # (n,m) . Finally, we suppose that the operators Θ e admit closed, densely defined preadjoints as stated in the following additional assumption.
(H9) For each e = (n, m) ∈ E, Θ (n,m) is the adjoint of a closed, densely defined operator Θ # (n,m) : dom(Θ # (n,m) ) ⊂ X # m → X # (n,m) . Define the following predual spaces of X V and X E : Now we characterize the predual of the linear operator Λ α : X E → X V from (2.2).
Our goal is now to show that R # α = R α . As first step, we obtain the following proposition.
Proposition 5.4. Assuming again that hypothesis (H9) holds, any predual regularization graph functional R # α = R # (G α ) according to Definition 5.3, where G α is a regularization graph with root noden, can be written as where Ψ u is defined as in (2.3).
In the next proposition we use Theorem 4.3 to prove that rg(Λ # α ) is weak*-closed, and hence obtain the desired duality formulation.
Proposition 5.5. Let Λ α : X E → X V be as in (2.2) corresponding to a regularization graph G α . Then, rg(Λ α ) is weak*-closed in X V . If in addition (H9) holds and R α = R(G α ) and R # α = R # (G α ) are the primal and predual regularization graph functionals according to Definitions 2.3 and 5.3, respectively, then Proof. We are only going to show weak* closedness of rg(Λ α ) since, under assumption (H9), the assertion R # α = R α then follows as immediate consequence of Proposition 5.4 and the definition of R α . Assume that the result holds true for all operators corresponding to a regularization graph of height less than h and let G α be a regularization graph of height h with corresponding operator Λ α and associated directed graph G = (V, E). The case h = 0 is immediate since rg(Λ α ) = {0}. Denote byn the root node of G α and letÊ be the edges connected to the root node andV their corresponding endpoints. Further, forê = (n,nê) ∈Ê, denote by Gê = (Vê, Eê) the subtree of G with root nodenê and by Λê αê the operator corresponding to the subtree Gê. Then, we define Gê αê to be the regularization graph with structure Gê = (Gê, ( · Xn ) n∈Vê , (Θ e ) e∈Eê , (Φ e ) e∈Eê ) and weights αê = (α e ) e∈Eê . It follows that Gê αê is indeed a regularization graph and that the associated functional Rê αê = R(Gê αê ) : X nê → [0, ∞) is given as to be a sequence in rg(Λ α ) weak*-converging to some y ∈ X V = × n∈V X n . Then, we note that by the definition of Rê αê and Λê αê we have that are bounded, where LÊ is the invariant subspace of F is given as LÊ = ×ê ∈Ê Lê with Lê the invariant subspace of Rê αê and the respective projection onto LÊ is given as P LÊ wÊ = (P Lê wê)ê ∈Ê , with P Lê the projection onto Lê. Hence, up to taking a further non-relabeled subsequence, we can assume that both (w k E ) k and (ΘÊw k E ) k are weak* converging, such that, by weak* closedness of ΘÊ, for wÊ := w*-lim kw k E we have that ΘÊwÊ = w*-lim k ΘÊw k E . Now since the infimum in the definition of Rê αê as in (5.8) is attained thanks to Corollary 4.5, and since Rê αê (Θê(w k e −w k e )) = 0, there exist minimizers z k Eê ∈ dom(Λê αê ) such that 0 = Θê(w k e − w k e ) − (Λê αê z k Eê )nê and 0 = (Λê αê z k Eê ) n for all n ∈ Vê \ {nê}. Defining, with where in the first equality we used the definition ofw k and in the second equality the fact that 0 = Θê(w k e − w k e ) − (Λê αê z k Eê )nê. Also, since 0 = (Λê αê z k Eê ) n for all n ∈ Vê \ {nê} we have This implies that also Λ αw k * ⇀ y in X V and, since (ΘÊw k E ) k is weakly* convergent, that also (Λê αêw k Eê ) k , withw k Eê := w k Eê − z k Eê , is weakly* convergent for eachê ∈Ê. By induction hypothesis, there hence exist w Eê ∈ dom(Λê αê ) such that w*-lim k Λê αêw k Eê = Λê αê w Eê . Defining w = wÊ, (w Eê )ê ∈Ê we finally see that Λ α w = y, since, from Λ αw k * ⇀ y it follows that

Examples of predual regularization graph functionals
Here we provide predual regularization graph functionals for several examples introduced in Section 2.1 by verifying the additional assumption (H9). We represent such predual regularization graphs as in Figure 5. In this context, we denote by I Bp the indicator function of the L p unit ball for p ∈ [1, ∞]. Note that, for sake of clarity, in the root node of each predual regularization graph we write the corresponding functional vn → Ψ # n (vn) − u, vn and not just Ψ # n . Moreover, nodes represented by an empty circle are associated with the zero functional.
Total variation. Figure 5a shows a predual regularization graph for TV. We refer to Section 2.1 for the construction of the regularization graph realizing TV. We remind also that X 1 = L p (Ω) and Therefore the predual Banach spaces associated with the nodes are X # 1 = L p ′ (Ω) and X # 1 = C 0 (Ω, R d ) where p ′ satisfies 1/p + 1/p ′ = 1. Moreover, it is easy to see that the convex conjugate of I B∞ on C 0 (Ω, R d ) is · M , and the convex conjugate of the zero function is I {0} . To show (H9), we further claim that ∇ : Note that from (5.9), using a simple density argument we obtain that dom(−div) is densely defined and closed in C 0 (Ω, R d ). To show that ∇ is the adjoint of −div according to the definitions above, it is enough to observe that implying that dom(−div * ) = BV(Ω) and (−div) * = ∇. The predual regularization graph functional R # α : L p (Ω) → [0, +∞] is given as Infimal convolution of TV k 1 − TV k 2 . Figure 5b shows a predual regularization graph corresponding to the infimal convolution of TV k 1 and TV k 2 with k 1 , k 2 ∈ N. We refer to Section 2.1 for the construction of the regularization graph realizing the infimal convolution of TV k 1 − TV k 2 . We remind also that p is chosen such Similarly to the TV predual regularization graph, the pre-adjoint of each linear operator ∇ k i : can be seen to be the (possibly negative) higher-order divergence (−1) which is again closed and densely defined, showing (H9). The predual regularization graph functional R # α : L p (Ω) → [0, +∞] is given as Total generalized variation of order 2. Figure 5c shows a predual regularization graph for TGV 2 α . We refer to Section 2.1 for the construction of the regularization graph realizing the total generalized variation of order 2. We remind also that p is chosen such that (5.13) which is again densely defined and closed, showing (H9). The predual regularization graph functional R # α : L p (Ω) → [0, +∞] is for α > 0 given as TGV 2 -shearlet infimal convolution. Figure 5d shows a predual regularization graph for the TGV 2 -shearlet infimal convolution model. We refer to Section 2.1 for the construction of the regularization graph realizing for the TGV 2 -shearlet infimal convolution. We also remind that the exponent p is chosen as 1 < p ≤ 2. Note that a predual of the extension to infinity of the ℓ 1 norm is the indicator function of the unit ball of c 0 , denoted by I c 0 : Thanks to the closedness of c 0 ∩ ℓ 2 in ℓ 2 , such indicator function is lower semicontinuous. Moreover, as SH : L 2 (R 2 ) → ℓ 2 (Z 4 ) defined according to (2.5) is a bounded operator between Hilbert spaces, its pre-adjoint exists and is bounded, showing (H9). Further, it can be easily characterized for v ∈ ℓ 2 (Z 4 ) as Finally, noticing the the pre-adjoint of the restriction operator r Ω is the extension to zero outside Ω (denoted by r 0 Ω ), the predual regularization graph functional R # α : L p (Ω) → [0, +∞] is for α 0 , α 1 > 0 given as 6 Regularization of linear inverse problems

Setting and well-posedness
We now consider the application of regularization graphs to the regularization of linear inverse problems. That is, with K : Xn → Y a bounded linear operator (the forward model), S f : Y → [0, ∞) a discrepancy functional associated with the data f and β > 0 a regularization parameter, we consider the minimization problem with G α a regularization graph with root noden.
Remark 6.1 (Forward operator with general domain X). Note that considering only forward operators being defined on Xn, where bounded sequences need to admit weak* convergent subsequences according to (H8), is not a restriction compared to considering general operators K : X → Y with X a Banach space such that Xn ֒→ X and R α being extended by ∞ to X as in Proposition 4.8, since one can always recover this setting by choosing K =K • I Xn,X , with I Xn,X the continuous embedding of Xn to X.
In order to study convergence in the data space for general discrepancies S f , we introduce the following notion of convergence: We say the functionals ( and f k → f in Y , but the more general assumptions allow us to capture, for instance, also the situation when S f is the Kullback-Leibler divergence [7, Example 2.16]. Now, under weak assumptions, the previously established properties of R α allow to obtain a standard well-posedness result for (6.1).
Theorem 6.2. Let R α = R(G α ) with G α being a regularization graph with weights α and root noden such that Xn is reflexive, β > 0, and let Y be a Banach space, K : Xn → Y be linear and continuous and S f : Y → [0, ∞] be a proper, convex, weakly lower semi-continuous and −div (c) Second order total generalized variation. coercive discrepancy functional. Then, the Tikhonov minimization problem (6.1) is well-posed, i.e., there exists a solution and the solution mapping is stable in sense that, if S f k converges to S f and (S f k ) k is equi-coercive, then for each sequence of minimizers (u k ) k of (6.1) with discrepancy S f k , i) either S f k (Ku k ) + βR α (u k ) → ∞ as k → ∞ and (6.1) with discrepancy S f does not admit a finite solution, ii) or S f k (Ku k ) + βR α (u k ) → min u∈Xn S f (Ku) + βR α (u) as k → ∞ and there exists, possibly up to shifts by functions in ker(K) ∩ L, with L the invariant subspace of R α , a weak accumulation point u ∈ Xn of (u k ) k that minimizes (6.1) with discrepancy S f .
Further, in case (6.1) with discrepancy S f admits a finite solution, for each subsequence (u k i ) i weakly converging to some u ∈ Xn, it holds that R α (u k i ) → R α (u) as i → ∞. Also, if S f is strictly convex and K is injective, finite solutions u of (6.1) are unique and u k ⇀ u in Xn.

Proof. Existence follows by the application of the direct method of calculus of variations in
Xn. More precisely, given a minimizing sequence (u k ) k for (6.1) we can apply Lemma 4.2 with W = X = Xn, F = R α , Θ = Id and L being the finite dimensional invariant space of R α provided by Theorem 4.3, to obtain the existence of another minimizing sequence (ũ k ) k for (6.1) that is bounded in Xn. Note that the assumptions of Lemma 4.2 are fulfilled since the weak lower semi-continuity of R α (which is equivalent to weak* lower semi-continuity of R α by reflexivity of Xn) and Assumption i) of Lemma 4.2 hold as a consequence of Theorem 4.3, and the boundedness of (Ku k ) k follows from the coercivity of S f . Therefore, thanks to the weak lower semi-continuity of R α and the boundedness of K we can apply the direct method to the sequence (ũ k ) k and conclude existence of minimizers for (6.1). The claimed stability follows with standard arguments. For instance, it can be proven adapting straightforwardly [7, Theorem 2.14].
Remark 6.3. Note that the results of Theorem 6.2 can also be modified to hold without assuming reflexivity of Xn but assuming, for instance, that K is weak*-to-weak continuous. Indeed, in this setting, Lemma 4.2 applies the same way and existence follows from the coercivity statement of Lemma 4.2 using weak*-to-weak continuity of K and weak lower semi-continuity of S f . Likewise, also the claimed stability can be shown by straightfoward adaptions.

Convergence and stability for varying parameters
In this section we study the stability of solutions of (6.1) for varying parameters α and for vanishing noise. To this aim, we first define a variant of regularization graphs for vanishing weights.
This definition is required to deal with lower semi-continuity with respect to weights (α e ) e converging to zero. An example ofR α in case R α (u) = R(G α )(u) = min w∈BD(Ω,R d ) ∇u − α 0 w M + Ew M and α 0 = 0 is given aŝ It is easy to see that for any regularization graph G α and any choice of weights α,Ĝ α is again a regularization graph such that all previous results apply. Moreover, the following lemma holds.
We now prove a weak* lower semi-continuity result for regularization graph functionals with respect to the parameters.
Theorem 6.6. Let G α be a regularization graph with root noden and weights α ∈ [0, ∞) E , R α = R(G α ), and (α k ) k be a sequence of weights in (0, ∞) E such that (α k ) k → α. Then, for every sequence (u k ) k in Xn such that u k * ⇀ u ∈ Xn it holds that Moreover, for u ∈ Xn and γ k := min e∈F α k e /α e F ⊂ E is either empty or (6.6) a chain with rootn ∈ V and α e > 0 ∀e ∈ F , (6.7) using again the conventions that for F = ∅, e∈∅ Remark 6.7. Note that, in case each node functional Ψ n is positively one homogeneous (such that R α k is positively one homogeneous according to Proposition 2.6), the convergence of (6.8) implies that lim k→∞ R α k (u) =R α (u). Also, in case α e > 0 for each e ∈ E,R α can be replaced by R α .
Proof of Theorem 6.6. We argue again by induction and, supposing that the claimed assertions hold for any regularization graph of height less than h, assume that the height of G α is h. Again, for h = 0, the result holds trivially, so we assume h ≥ 1. We first deal with lower semi-continuity of R α = R(G α ), for which, up to taking a nonrelabeled subsequence, we assume that lim inf k→∞ R α k (u k ) = lim k→∞ R α k (u k ) < +∞. Using the recursive representation of R α k and the notation from Lemma 3.2, and the result of Theorem 4.3 we can select a sequence (w k ) k in ×ê ∈Ê dom(Θê) such that with Rê (α k )ê = R(Gê (α k )ê ) and Gê (α k )ê being regularization graphs of height at most h − 1 with graph structure Gê = (Vê, Eê) and root nodenê. Thanks to Proposition 4.7 and the weights α k e being positive, the invariant subspace Lê of Rê (α k )ê does not depend on k and for Cê ,α k := max e∈F α k e | F ⊂ Eê is either empty or a chain with rootnê , (6.10) it follows that (Cê ,α k ) k is bounded and that u − P rg(Θê)∩Lê u Xnê ≤ Cê ,α k CêRê (α k )ê (u) + Dê ∀u ∈ Xnê (6.11) with Cê, Dê independent of k. Further, remember that each Θê satisfies with Bê > 0, thanks to Assumption (H6). Now defining Mê ⊂ dom(Θê) and Pê : dom(Θê) → Mê as in (4.10) and (4.11), respectively, one sees that they also do not depend on k and, estimating as in (4.12), we obtain . Note that P Z is a projection onto Z. Then, defining αÊw := (αêwê)ê ∈Ê for w ∈ W and αÊ ∈ [0, ∞)Ê , we can observe that, with α k E = (α k e )ê ∈Ê , also realizes the minimum in (6.9), since with (Id −P Z )α k E P M w k ∈ ker(K)∩M , such that ê∈Ê α k e Φê(w k −w k ) = K(Id −P Z )α k E P M w k = 0 and Θê(w k −w k )ê = (1/α k e )Θê[(Id −P Z )α kÊ P M w k ]ê = 0. By the estimate (6.12) we obtain for some constants C, D,D > 0 that where the constantD does not depend on k thanks to the boundedness of R α k (u k ), the recursive formula (6.9), and the boundedness of (Cê ,α k ) k . Since K is injective and bounded (see Remark 2.1) on the finite dimensional space Z, there exists C > 0 independent from k such that z W ≤ C Kz Xn for all z ∈ Z. Thus we can estimate by coercivity of Ψn and using P Z α kÊ P M w k = α k E (w k − (w k − P M w k )) that for generic constantsC, D (andD as in (6.15)) we have where also used (6.15), the fact that (u k ) k is uniformly bounded as it is weak* converging and thatw k realizes the minimum in (6.9). Now forê ∈Ê with αê > 0 this together with (6.14) and (6.15) implies that (w k e ) k is bounded, hence admits a (non-relabeled) subsequence weak* converging to somewê ∈ Xê by (H8). Moreover, using (6.9), (6.11), (6.16), the boundedness of R α k (u k ) and the finite dimensionality of Z we have forê ∈Ê that where the constantC is independent of k and we use the definition of Pê in (4.11). Hence, by weak* sequential compactness of the Xnê and weak*-closedness of Θê we obtainwê ∈ dom(Θê) and, up to taking a further non-relabeled subsequence, w*-lim k→+∞ Θw k e = Θwê. Further, forê ∈Ê with αê = 0, we see from (6.14), (6.15) and (6.16) that (α k ew k e ) k is bounded. Hence, up to taking a further subsequence, we can assume that where, forê ∈Ê,Rêαê = R(Ḡễ αê ) andḠễ αê = (Ḡê,αê), withḠê = (Gê, (Ψ n ) n∈Vê , (Θ e ) e∈Eê , (Φ e ) e∈Eê ) andαê = (α e ) e∈Eê according to Definition 6.4, is a regularization graph of height at most h − 1 with graph structure Gê = (Vê, Eê) and root nodenê. Note that forê ∈Ê such that αê > 0 we haveRêαê =Rê αê withRê αê = R(Ĝê αê ) andĜê αê being the modification of the regularization graph Gê αê according to Definition 6.4. Therefore, weak* lower semi-continuity of Ψn, the induction hypothesis and (6.17), leading toRê αê (Θêzê) = 0 for αê = 0, then yieldŝ Now take u ∈ Xn and observe that, since the convergence γ k → 1 as k → +∞ is immediate, the second assertion of (6.8) follows directly from what we just showed, provided that R α k (γ k u) ≤R α (u) for every k ∈ N holds. In order to show the latter, we first select w ∈ W to attain the minimum in the recursive representation ofR α (u) according to Lemma 3.2 (which is possible by Theorem 4.3), noting that we can choose wê = 0 forê ∈Ê with αê = 0, and that α k e wê → αêwê for allê ∈Ê. In particular, Rê (α k )ê (Θêwê) =Rê αê (Θêwê) = 0 forê ∈Ê with αê = 0. Also, define γê k := min e∈F α k e /α e | F ⊂ Eê is either empty or a chain with rootnê and α e > 0 ∀e ∈ F , using again the convention e∈∅ α k e αe = 1. Therefore, using the induction hypothesis together with Remark 2.1 and Proposition 2.6 we obtain where we used that γ k ≤ 1 as well as γ k We are now ready to prove a result that will in particular imply stability for varying parameter α and convergence for vanishing noise for (6.1).
Theorem 6.8. Let R α = R(G α ) with G α be a regularization graph with weight α and root noden such that Xn is reflexive, and let Y be a Banach space, K : Xn → Y be linear and continuous and S f , S f k : Y → [0, ∞] for k ∈ N be proper, convex, lower semi-continuous and coercive discrepancy functionals with S f (v) = 0 if and only if v = f . Further, assume that S f k converges to S f and that (S f k ) k is equi-coercive. Choosing δ k : In case β = 0, assume additionally that ii) there exists u 0 ∈ Xn withR α (u 0 ) < +∞ such that Ku 0 = f .
Then, for (u k ) k a sequence of minimizers of (6.1) with parameters (β k ) k and (α k ) k , up to shifts in ker(K) ∩ L, with L being the invariant subspace of R α k (which does not depend on k), (u k ) k has a subsequence weakly converging in Xn. Further, any limitû of a subsequence (u k i ) i converging weakly in Xn solves min u∈Xn S f (Ku) + βR α (u) (6.18) in case β > 0 and min u∈XnR α (u) s.t. Ku = f (6.19) in case β = 0. Also, in both cases, Proof. Given the properties we have obtained for R α and the assumptions on S f k , S f , the proof is now rather direct and we only provide a sketch for the sake of completeness. At first note that, in case β = 0, existence of a solutionû to (6.19) follows using Theorem 6.2 with S f = I {f } , and assumption ii) ensures a finite minimum. Further, since α k e ≥ α e for all e ∈ E, which yields γ k = 1 for γ k according to (6.6), Theorem 6.6 implies that R α k (û) →R α (û) and we get using assumption ii). Consequently, using hypothesis i), it also holds that This implies in particular boundedness of S f k (Ku k ) and R α k (u k ).
In case β > 0, we can selectû to be a solution to (6.18) and by Theorem 6.6 and convergence of S f k to S f estimate according to In particular, also in this case, both S f k (Ku k ) and R α k (u k ) are bounded. Choosing Z as a complement of ker(K) ∩ L in L, such that the projection P Z : L → Z satisfies rg(Id −P Z ) = ker(K) ∩ L, andũ k := u k − P L u k + P Z P L u k , we observe thatũ k − u k ∈ ker(K) ∩ L and, using equi-coercivity of (S f k ) k and Proposition 4.7, we can obtain, as in the proof of Lemma 4.2, that (ũ k ) k is bounded and hence admits a subsequence weakly converging in Xn. Now take u ∈ Xn to be the limit of a subsequence (u k i ) i of (u k ) k weakly converging in Xn. In case β = 0, using weak lower semi-continuity, convergence of S f k to S f , and that S f (v) = 0 only if v = f , it follows from (6.20) and (6.21) that Ku = f andR α (u) ≤R α (û) as claimed in (6.19), and consequently, also that lim i R α k i (u k i ) =R α (u). In case β > 0, again using weak lower semi-continuity and convergence of S f k to S f , it follows from (6.22) and yields a contradiction, hence also lim i R α k i (u k i ) =R α (u) and the proof is complete.
Remark 6.9. Theorem 6.8 is valid for several particular cases which are worth mentioning: • If α > 0 component-wise, then the above results hold forR α = R α .
• If we fix α k = α and have β = 0, this is a classical convergence-for-vanishing-noise result for a fixed regularization functional.
• Regarding both β and α as regularization parameters, this is a rather general convergence result for multi-parameter regularization and we refer to [6,41,34] for related work.
• If we fix f k = f , this is a stability result for varying the parameters α, β, which is in particular relevant in the context of bilevel optimization, see Section 7 below.
• Note that α k e ≥ α e was only used in combination with Theorem 6.6 to ensure that lim k→∞ R α k (u) =R α (u). In case R α is positively one-homogeneous, following Remark 6.7, this assumption can be dropped. Also, in case f k = f and β > 0, the assumption can be dropped in case S f is continuous in the sense that lim λ→1 S f (λv) = S f (v) for all v ∈ dom(S f ).
• Again, as described in Remark 6.3, the result can be modified to hold without assuming reflexivity of Xn.

Bilevel optimization
The goal of this section is to show well-posedness of a bilevel optimization problem for learning the weights α in a regularization graph. In order to allow for an arbitrary removal of different subtrees of the graph via setting α e = 0, we will need to include an additional penalty on the edge variables (w e ) e∈E . To formulate this, we use the notation where again w (n − ,n) = u and Θ (n − ,n) = Id. Also, we need an assumption based on the invariant subspaces of regularization graph functionals. To formulate this, first recall the recursive representation of a regularization graph functional R α = R(G α ) from Lemma 3.2 as Rê αê (Θêwê) wê ∈ dom(Θê) for allê ∈Ê . (7.1) Based on this, for e ∈ E, we henceforth denote M e := Θ −1 e (L e ), where Θ −1 e is the inverse of Θ e : ker(Θ e ) ⊥ → rg(Θ e ) (recall that ker(Θ e ) ⊥ := rg(Id −P ker(Θe) ) ∩ dom(Θ e ) with P ker(Θe) according to Assumption (H6)) and L e is the invariant subspace of the regularization graph functional R e α e = R(G e α e ) with G e α e the regularization graph corresponding to the subtree of G starting at edge e ∈ E with functionals, spaces, operators and weights inherited from G α . Note that M e is finite dimensional by finite dimensionality of L e and of ker(Θ e ) for every e ∈ E. Finally, we denote the projection P e : dom(Θ e ) → M e as P e w := Θ −1 e P rg(Θe)∩L e Θ e w + P e ker(Θe) w, (7.2) where P e ker(Θe) is a projection onto ker(Θ e ), noting that P e is indeed a projection. Using these notations, we now provide a lower semi-continuity result that includes vanishing weights as follows.
Then, with (u k ) k weak* converging to some u ∈ Xn and ((w k e ) e∈E ) k a sequence realizing the minimum in (2.2) with u k for u and α k for α such that (P e w k e ) k and (R α k (u k , ((w k e ) e ) k )) k are bounded, ((w k e ) e∈E ) k is bounded and admits a subsequence converging weak* to some (w e ) e∈E such that Note that, in addition to explicitly including the variables (w e ) e , this lower semi-continuity result differs from the one of Theorem 6.6 in the fact that in the limit, only the weights change (possibly to zero), but not the original regularization graph. This can be achieved thanks to the boundedness assumption on the sequences (P e w k e ) k that does not always hold true as clarified in the following remark.
Remark 7.2. Consider the regularization graph functional for TGV 2 (see Section 2.1) according to where α k → 0 and u k = u for every k with ∇u ∈ ker(E) \ {0}. Then, the sequence (w k 1 , w k 2 , w k 3 ) k = (u, 0, ∇u/α k ) k realizes the minimum in (7.4) with R α k (u k ) = 0 for every k. However, for edge 3, we have M 3 = ker(E) and it holds that showing that in this case, the assumptions of Lemma 7.1 do not hold.
Proof of Lemma 7.1. Again we proceed by induction, assuming the result holds true for all regularization graphs of height less than h and assume that the height of G α is h. The case h = 0 is again immediate and we assume h ≥ 1. Writing withÊ ⊂ E the set all edges connected to the root noden, R (α k )ê = R(Gê (α k )ê ), and (Gê (α k )ê ) regularization graphs of height less than h (see Lemma 3.2) and with root nodenê we observe, estimating as in (6.12) and using boundedness of the (α k e ) k , for generic constants C, D,C > 0 independent of k, that hence boundedness of (P e w k e ) k for every e ∈ E implies that (w k e ) k is bounded forê ∈Ê. Further, again using the coercivity estimates for Rê (α k )ê , the definition of Pê, the estimate in (6.11)  where again C, D,C > 0 denote generic constants independent of k. Hence, by weak* compactness and weak* closedness of the Θê we obtain that that w k e * ⇀ wê ∈ dom(Θê) as well as Θêw k e * ⇀ Θêwê. The induction hypothesis together with the weak* lower semicontinuity of Ψn and the weak*-to-weak* continuity of Φê implies then the result.
Consider now a regularization graph G α with root noden and let R α = R(G α ) : Xn → [0, ∞] be the associated regularization functional. Let Z be a Banach space such that Z ֒→ Xn and let H 1 , H 2 be two functionals that penalize the weights α and auxiliary variables (w e ) e∈E , respectively. We consider the bilevel optimization problem s.t. (u α,β , (w α,β e ) e∈E ) ∈ arg min u∈Xn, (we) e∈E S f (Ku) + βR α (u, (w e ) e∈E ), (7.5) whereû is some ground truth datum and f ≈ Kû a corrupted measurement. Remark 7.3. Note that this single-datum bilevel setting is a generic model problem for learning parameters from a larger training set (û m , f m ) m . Indeed, the single-datum bilevel setting can be extended to a larger training set by simply vectorizing all involved quantities, for instance. We now provide an existence result for the bilevel problem, where we use the convention that for β = ∞, we have βR α (u, (w e ) e ) = 0 if R α (u, (w e ) e ) = 0 and βR α (u, (w e ) e ) = ∞ else, and for which a concrete example and its assumptions are discussed after its proof below.
In this existence result, regarding the existence of an optimal parameter β, it is important to note that in (7.5), the parameter β is taken in the open interval (0, ∞). This is necessary as otherwise, existence of a solution to the lower level problem cannot be guaranteed. The following theorem takes this into account by allowing the optimal parameter also to attain the value 0, in which case it states that existence to the lower level problem with β = 0 also holds, see Remark 7.5 for details.

(7.6)
Proof. In case the infimum in the bilevel problem (7.5) is infinite, any parameter combination together with a corresponding solution of the lower level problem will be a solution, hence we assume from now on that the infinum in (7.5) is finite. Take (α k , β k ) k to be a minimizing sequence in [0, ∞) E × (0, ∞) for (7.5) with (u k ) k = (u α k ,β k ) k and ((w k e ) e ) k = ((w α k ,β k e ) e ) k corresponding sequences of solutions to the lower level problem. Then, obviously (u k ) k is bounded in Z and by · Xn ≤ C · Z we obtain a (non-relabeled) subsequence weakly converging to some u in Xn. By the coercivity of H 1 (hypothesis iv)) we can also assume that, up to a subsequence, α k →α ∈ dom(H 1 ). By possibly considering another (nonrelabeled) subsequence, we can further achieve that, for each e ∈ E, either α k e > 0 for all k or α k e = 0 for all k. Noting that in the latter case we can remove the subgraphs of G = (V, E) after e ∈ E with α k e = 0 for all k without changing the value of R α or any of the R α k , we can further assume that α k e > 0 for all k and e ∈ E. At first assume that there exists a subsequence of (β k ) k converging to zero. Then, moving to this subsequence, we obtain for any z ∈ dom(Rα) ⊂ dom(Rα) (where dom(Rα) ⊂ dom(Rα) follows from Lemma 3.3 and the definition ofα) and z ∈ dom(S f • K) that where we have used that R α k (γ k z) ≤Rα(z) ≤ Rα(z) for γ k according to (6.6) by Theorem 6.6 and Lemma 6.5, and that S f is continuous on dom(S f ) with dom(S f ) open (hypothesis ii)). Density of dom(Rα) and continuity of S f then implies that (u, (0) e∈E ) is a solution to the lower level problem in (7.6) forβ = 0. Lower semi-continuity of · Z and H 1 , and the fact that 0 = H 2 ((0) e∈E ) ≤ lim inf k H 2 ((w k e ) e ) then yields the claimed optimality of (α,β) withβ = 0. Assume now that (β k ) k is unbounded such that, again by using a non-relabeled subsequence, we can assume that β k → ∞. Optimality and the estimate (6.8) then give for any z ∈ dom(S f • K) with Rα(z) = 0 (such a z exists by hypothesis iii) since Lα ⊂ Lα, with Lα and Lα being the invariant subspaces of Rα and Rα, respectively) that This implies in particular that (R α k (u k , (w k e ) e )) k is bounded such that, using that (P e w k e ) k is bounded for each e ∈ E due to coercivity of H 2 as in assumption v), by Lemma 7.1, the sequence ((w k e ) e ) k admits a subsequence weak* converging to some (w e ) e . Weak* lower semi-continuity then yields Also, from weak lower semi-continuity of S f • K, we obtain that S f (Ku) ≤ S f (Kz) and, consequently, that (u, (w e ) e ) solves the lower level problem in (7.6) for (α,β) withβ = ∞.
Lower semi-continuity of · Z and H 1 , and weak* lower semi-continuity of H 2 finally implies that (α,β) is optimal as claimed. At last assume that, again up to a non-relabeled subsequence, β k →β ∈ (0, ∞). Then, we get for any z ∈ dom(Rα) ∩ dom(S f • K) (which again exists by hypothesis iii) since Lα ⊂ dom(Rα) ⊂ dom(Rα)) that such that again, (R α k (u k , (w k e ) e )) k and (P e w k e ) k are bounded and by Lemma 7.1, we can assume that ((w k e ) e ) k admits a subsequence weak* converging to some (w e ) e . Lower semicontinuity then yields which shows that (u, (w e ) e ) solves the lower level problem in (7.6). Finally, again lower semicontinuity of · Z , H 1 and weak* lower semi-continuity of H 2 imply optimality of (α,β) as claimed.
• Ifα 1 > 0,α 0 > 0, then Rα = TGV 2 (1,1/α 1 ) △( · 1 • SH). Thus, the model is able to learn different functionals by modifying the graph accordingly. This extends directly, e.g., to learning the order of TGV or the infimal convolution of TGV with other regularization functionals. The term I [0,d] P ker(E) w α,β 1 L 2 (Ω) puts a constraint on the norm of the projection of the auxiliary variable w α,β 1 to ker(E). Avoiding such a term is also possible, but would lead to different limit functionals in case of vanishing α: Without a bound on the elements of ker(E), the limit graph in caseα 0 =α 1 = 0 would in this example for instance beRα (u) = inf w∈ker(E) ∇u − w M instead of Rα(u) = ∇u M . Hence, in case of using the infimal convolution of functionals with non-trivial invariant subspace, the limit functional still allows to subtract an arbitrary element of this subspace.
Remark 7.5. We now discuss necessity of the additional density and continuity assumptions of the theorem and the obtained result in more detail.
• Ifβ = 0, the theorem states that uα ,β is a solution to min u∈Xn S f (Ku), and in particular that a best approximation of the noisy data exists. Note that this is not true in general. In a classical Hilbert space setting with S f (v) = u − f 2 2 for instance, existence of a best approximation for every f ∈ Y is in fact equivalent to K having closed range [24].
Here, it can be shown as in Theorem 6.2 that solution always exists and we could have alternatively used β ∈ (0, ∞] in the bilevel problem (7.5).
• Density of dom(Rα) is only required in caseβ = 0 to ensure optimality over the entire space instead of dom(Rα). In particular, this assumption can be dropped by bounding the admissible β away from zero. • The assumption S f being continuous and dom(S f ) open is always fulfilled if, for instance, S f (u) = u − f q Y . It can be replaced by the weaker assumption that S f (γ k Kz) → S f (Kz) for all z ∈ dom(S f • K) and γ k ∈ (0, 1] converging to 1 by either bounding β away from zero or reducing the optimality of uα ,β in caseβ = 0 to optimality with respect to all functions in dom(Rα) instead of the entire space.
• The assumption dom(S f • K) ∩ Lα = ∅ is always true if, for instance, 0 ∈ dom(S f ). It can be weakened to dom(S f • K) ∩ dom(Rα) = ∅ if the set of admissible β is bounded above.
• Typical examples for H 1 fulfilling the assumption of Theorem 7.4 would be H 1 that constrains α e ∈ [0, c] for all e ∈ E 0 or penalizes e∈E 0 |α e |, with some E 0 ⊂ E, and fixes α e = 1 for all remaining e ∈ E \ E 0 . Here, a penalization of e∈E 0 |α e | is expected to promote sparsity of α and hence, a reduced complexity of the optimal regularization graph. The purpose of the constraints α e = 1 for e ∈ E \E 0 is to avoid overparametrization, i.e., the usage of unnecessary parameters. This happens, for instance, in case of splitting nodes, i.e., if Ψ n = I {0} for some n. Further, the constraint α e = 1 for e ∈ E \ E 0 can be used to avoid Rα = I {0} , which is the case if all weights are set to zero and Ψn = I {0} (see Definition 2.3).
• The coercivity of H 2 is only required on the finite dimensional spaces M e for all e ∈ E, and is used to allow for the bilevel framework to cut edges of the graph by setting weights to zero. Without this assumption, a similar existence result with R α being replaced byR α can be obtained.

Conclusions
In this work, we have introduced regularization graphs as a flexible framework for designing regularization functionals for the variational regularization of inverse problems. The proposed framework thoroughly covers existing regularization approaches and allows to define new ones in a simple and constructive way, essentially by drawing corresponding regularization graphs. We have provided a comprehensive analysis of the class of functionals derived from regularization graphs, which in particular includes well-posedness and convergence results for applying this class of functionals in a general inverse problems setting. Furthermore, we have developed and analyzed a bilevel optimization approach that allows to learn an optimal structure and complexity of a regularization graph, and hence of the corresponding regularization functional, from training data. Future goals are to develop an equally flexible numerical framework for the application of regulariazation graphs to general inverse problems, as well as the numerical realization of the proposed bilevel approach.

A Appendix
Here we provide a list extending the examples of Section 2.1, that outlines the representation of different, existing regularization functionals as regularization graphs. Note that, as discussed in Section 3, also a finite combination of any of those functionals via summation or infimal convolution can again be represented as regularization graph.
General second-order model [12,19]. where 1 < p ≤ d ′ , m ∈ N and the linear operator A : R d×d → R m is defined pointwise on ∇w such that suitable lower semicontinuity and coercivity assumptions hold.
R α (u) = inf w1,w2∈L 2 (Ω) where Φ * i are associated with tight frames such as curvelets or Gabor frames [35] and · 1 is the extension to +∞ of the ℓ 1 -norm to ℓ 2 .  For the sake of completeness, we also provide the proof of the equivalence of a coercivity-and and closed-range assertion for the operators considered in this paper.
Lemma A.1. Let Θ : dom(Θ) ⊂ X e → X m be a linear operator between Banach spaces X e and X m that both admit a predual space and such that bounded sequences in X e admit weak* convergent subsequences. Further, assume that Θ is weak* closed and has finite dimensional kernel. Then, there exists C > 0 and P ker(Θ) : X e → ker(Θ) a linear, continuous projection such that w − P ker(Θ) w Xe ≤ C Θw Xm for all w ∈ dom(Θ) if and only if Θ has closed range.
Proof. Assuming that the coercivity assertion holds, the closedness of rg(Θ) can be proven directly using the weak* closedness of Θ. On other hand, if rg(Θ) is closed, then from [ and arbitrarily extending G outside dom(Θ) to a function G : X e → ker(Θ) we obtain that w − G(w) Xe ≤CR(w) ∀w ∈ X e . (A.4) Finally, applying Lemma 4.1 with D = 0 (which yieldsD = 0) and K = X e , we obtain the existence of a bounded, linear projection P ker(Θ) and a constant C > 0 such that the claimed coercivity holds.