Sparse optimization problems in fractional order Sobolev spaces

We consider optimization problems in the fractional order Sobolev spaces $H^s(\Omega)$, $s\in (0,1)$, with sparsity promoting objective functionals containing $L^p$-pseudonorms, $p\in (0,1)$. Existence of solutions is proven. By means of a smoothing scheme, we obtain first-order optimality conditions. An algorithm based on this smoothing scheme is developed. Weak limit points of iterates are shown to satisfy a stationarity system that is slightly weaker than that given by the necessary condition.


Introduction
We are interested in the following optimization problem min u∈H s (Ω) where F : H s (Ω) → R is assumed to be smooth, and with p ∈ (0, 1) is the L p -pseudo-norm of u, which is a concave and nonsmooth function. Here, Ω ⊆ R d is a Lipschitz domain, α > 0, β > 0. Moreover, we will work with s ∈ (0, 1), where s is chosen small to facilitate discontinuous solutions.
The main motivation of this work comes from sparse control problems: we want to find controls with small support. This can be useful in actuator placement problems, where one wants to identify small subsets of Ω, where actuators needs to be placed. The functionals of type (1.2) are known to be sparsity promoting: solutions tend to be zero on certain parts of the domain. Another motivation is the study of sparse source or coefficient identification problems, see, e.g. [6,15,24]. In order to avoid over-smoothing, we study the regularization in H s (Ω) for small s ∈ (0, 1).
If problem (1.1) would be set in L 2 (Ω) instead of H s (Ω), s > 0, then it is impossible to prove existence of solutions [22]. In fact, following the construction in [33], one can construct problems of type (1.1) that do not attain their infimum on L 2 (Ω). The situation changes on H s (Ω), s > 0. Due to the compactness of the embedding H s (Ω) → L 2 (Ω), the map u → u p p is weakly continuous from H s (Ω) to R, which enables the standard existence proof, see Theorem 3.1.
Since first introduced in [1] for image denoising and phase-field models, the use of H s (Ω)-norm as a regularizer, especially in the imaging community, has seen a significant growth. The use of functionals of type (1.2) in imaging problems set on sequence spaces has resulted in a rich literature. We refer only to [23,27,28]. There it was proven that solutions are sparse: based on optimality conditions one can prove that entries u i of a solution u ∈ 2 are either zero or have absolute value larger than some given number.
In the context of optimal control, the research on problems with functionals of type (1.2) was initiated in [22]. There, for problems set on L 2 (Ω), an optimality condition in form of the Pontryagin maximum principle was proven, which can be used to prove sparsity of solutions. In addition, the regularization of these problems in H 1 (Ω) was suggested.
In order to solve problems of type (1.2) a monotone algorithm for a smoothing of (1.2) was introduced in [22,23]. For > 0, one defines Then a smooth version of (1.2) is given by u → Ω ψ (u(x) 2 ) dx, see also [30]. It is proven in [22,23] that for quadratic F and fixed > 0 the weak limits of the iterates of the algorithm solve the necessary condition of the smoothed problem. In this paper, we extend this idea to allow for a decreasing sequence of smoothing parameters k 0. Still we can prove that weak limit points of iterates satisfy a certain optimality system.
In addition, we prove necessary conditions of first order for (1.1). Here, we use a standard penalization technique together with the smoothing procedure outlined above. Doing so, we are able to prove that for every local solution ū ∈ H s (Ω) there isλ ∈ (H s (Ω)) * such that the system is satisfied. For the precise statement of the result, we refer to Theorem 5.7. Let us mention that this result is new, even in the case s = 1.
Weak limit points of iterates of our algorithm, see Algorithm 1 in Section 7, are proven to satisfy for someλ ∈ (H s (Ω)) * the weaker system The plan of the paper is as follows. We will work in an abstract framework with some Hilbert space V , where we have in mind to use V = H s (Ω). The proof of existence of solutions of (1.1) is given in Section 3. The smoothing approach is described in Section 4, which is then used in Section 5 to obtain the first-order necessary optimality condition. We comment on the use of the fractional Hilbert spaces H s (Ω) and the possible realizations of fractional Laplace operators (−∆) s in Section 6. The optimization method is analyzed in Section 7.

Notation and assumptions
We will consider the problem in a more general framework that includes V = H s (Ω) and V = H 1 (Ω).

Assumption 1 (Standing assumption).
(1) V is a real Hilbert space, V ⊆ L 2 (Ω) with compact and dense embedding V → L 2 (Ω). The inner product of V is denoted by ·, · V . The duality product between V * and V is denoted by ·, · V * ,V .
(2) F : V → R is weakly lower semicontinuous, and bounded below by an affine function, i.e., there is g ∈ V * , c ∈ R such that F (u) ≥ g, u V * ,V + c for all u ∈ V .

Application to a sparse source identification problem
Before investigating the abstract problem in depth, let us introduce one possible application of the sparse optimization problem. Here, we will look into identification of sparse perturbations of the initial condition in a parabolic partial differential equation. This is motivated by the question of detecting the source of, e.g., environmental pollution [9]. The identification problem now reads: Given some measurement z of the state y(T ) at the terminal time T > 0 determine the perturbation u in the initial condition y(0) = y 0 + u. We formulate this as the following minimization problem: y| (0,T )×∂Ω = 0, and y(0) = y 0 + u.
Here, Ω ⊆ R d , d = 2, 3, is a bounded domain, a ∈ L ∞ (Ω) is a positive diffusivity coefficient, and f : R → R is a smooth function such that f is bounded from below. Here, a and f are assumed to be known. Let us denote F (u) := 1 2 y(T )− z L 2 (Ω) . Then one can show that F satisfies Assumption 1 for V = H s (Ω) with s ≥ 0, see, e.g., [32] and the recent contribution [12]. Similar problems were investigated in [11,25], where the unknown u is a measure. Following the considerations in [18,19] one can consider noisy measurements z δ with z − z δ L 2 (Ω) ≤ δ, where z denotes the unavailable exact data, and prove convergence of solutions of the problem above for (α, β, δ) → (0, 0, 0). Proof. The proof follows by standard arguments. Let (u n ) be a minimizing sequence. Due to Assumption 1, (u n ) is bounded in V . Hence, we have (after extracting a subsequence if necessary) u n ū in V and u n ū in L 2 (Ω). Passing to the limit (limit inferior) in the functional shows thatū realizes the minimum.

Existence of solutions
Remark 3.2. The origin u 0 = 0 is a minimum of (3.1) along lines. In fact, let u ∈ V . Then for t > 0 small we have Hence, the function t → F (tu) has a local minimum at t = 0. A stronger claim holds true for minimization problems on R n involving the l p -norm of vectors: there the origin is a local solution, which is due to the inequality n i=1 |x i | p ≥ ( n i=1 |x i |) p for p ∈ (0, 1) and x ∈ R n .

Smoothing scheme
In order to prove necessary optimality conditions and to devise an optimization algorithm, we will use a smoothing scheme, which was already employed in [22,30]. Let ≥ 0. Then we will work with the following smooth approximation of t → t p/2 defined by with derivative given by Note that ψ 0 (u 2 ) = |u| p . In addition, we have the following properties of ψ .
, the function ψ 2 is affine linear and tangent to ψ 1 , hence the claim follows by concavity of t → t p/2 . On the interval (0, 2 1 ) both functions are affine linear with ψ 2 < ψ 1 . In addition, we have Let us define the following integral function associated to ψ which serves as an approximation of Ω |u| p dx.

Optimality conditions
Here, we will now prove optimality conditions for the nonsmooth problem (3.1).
To this end, we will introduce an auxiliary smooth optimization problem. For a general exposition of this method, we refer to [7, Section 3.
For > 0 define the auxiliary problem: Arguing as in the proof of Theorem 3.1, one can prove that (5.2) is solvable.
We will now show that solutions to this problem converge for 0 to solutions of (3.1).
Lemma 5.1. Let ( k ) be a sequence of positive numbers with k → 0. Let (u k ) be a family of global solutions of (5.2) to the smoothing parameter k . Then u k →ū in V .
Proof. The first part of the proof is similar to the one of [22,Proposition 5.3].
, where the latter inequality follows from local optimality ofū. This proves u * = u. From elementary properties of limit inferior and superior, we get Since all subsequences of (u k ) contain subsequences converging strongly toū, the convergence of the whole sequence follows.
A particular implication of the previous result is that the constraint u − u V ≤ ρ will be satisfied as strict inequality if is sufficiently small.
. This functional is differentiable according to Assumption 1 and Lemma 4.3.
We will now pass to the limit 0 in (5.3). The term u −ū, v L 2 (Ω) will disappear according to Lemma 5.1. Thus, the next step is to investigate the behavior of G (u ) for 0.
Lemma 5.3. Let ( k ) be a sequence of positive numbers with k → 0, and (u k ) be a sequence of global solutions of (5.2). Define Proof. This is a consequence of the convergence u k →ū by Lemma 5.1 and the continuity of F . Lemma 5.4. Let u ∈ V and > 0 be given. Then Proof. The integrand in the expression G (u 2 )u according to Lemma 4 which immediately implies and the claims follows by integration.
Lemma 5.5. Let ( k ), (u k ), (λ k ) be as in Lemma 5.3 with u k →ū and λ k →λ in V and V * , respectively. Then Proof. Let us recall the definition λ k = G k (u k ), so that Lemma 5.4 gives an upper and lower bound of λ k , u k V * ,V . We will show that both bounds converge to the same value. Since u k →ū in V we can assume (after possibly extracting a subsequence) that u k →ū pointwise a.e., with |u k | ≤ w pointwise a.e. for some w ∈ L 2 (Ω), see [10,Thm 4.9]. Hence, pw p ∈ L 1 (Ω) is an integrable pointwise upper bound of the integrands in both integrals of Lemma 5.4 evaluated at u k . Clearly, Ω |u k | p dx → Ω |ū| p dx by dominated convergence. For the second integral, we have And we can use dominated convergence theorem to pass to the limit k → ∞ as follows The claim follows now with Lemma 5.4.
In this sense,λ can be interpreted as the derivative of u → |u| p atū.
Remark 5.6. Let us assume that uζ ∈ V for all u ∈ V and ζ ∈ C ∞ c (Ω). Then one can prove with similar arguments as in Lemma 5.5 that Hence,λū can be interpreted as a non-negative distribution.
Now we have everything at hand to be able to prove optimality conditions for the original problem (3.1).
Theorem 5.7. Letū be a local solution of the original problem. Then there is λ ∈ V * such that Proof. This is a consequence of the results above. Under stronger assumptions on F and on the underlying space V , we can prove additional regularity ofū andλ.
Lemma 5.9. Suppose Assumption 2 is satisfied. Let u be a solution of (5.3) such that F (u ) ∈ L 1 (Ω). Define λ := G (u ). Then Proof. We follow an idea of [21,Theorem 5.1]. Let us test (5.3) with v n := max(−1, min(n · u , +1)) ≈ sign(u ), where n ∈ N. Using Assumption 2, we get v n ∈ V and u, v n V ≥ n −1 v n 2 V ≥ 0. In addition, u , v n L 2 (Ω) ≥ 0 and | ū, v n L 2 (Ω) | ≤ ū L 1 (Ω) . With v n as test function in (5.3), we get The integral involving λ can be written as see Lemma 4.3. We can pass to the limit n → ∞ by dominated convergence to obtain which is the claim. Proof. Due to the assumptions, (F (u k )) is bounded in L 1 (Ω). By Lemma 5.9, the sequence (λ k ) is bounded in L 1 (Ω). We can identify L 1 (Ω) with a subspace of C 0 (Ω) * using the natural embedding. Hence (after extracting a subsequence if necessary), we have λ k * λ in C 0 (Ω) * . Due to the density assumption in Assumption 2, it followsλ =λ.
In order to obtain L ∞ -regularity, we need the following embedding assumption.
Assumption 3 (L q -embedding). There is q > 2 such that V is continuously embedded in L q (Ω).
The following lemma mimics one key step in the proof of the celebrated L ∞ -regularity result for weak solutions of elliptic partial differential equations of [31].
Proof. Let us set v n :=ū − max(−n, min(ū, +n)). Let ( k ), (u k ), (λ k ) be as in Lemma 5.3 with u k →ū and λ k →λ in V and V * , respectively. Then arguing as in the proof of Lemma 5.5 one can show for k → ∞ with the help of dominated convergence. Testing (5.3) for (u k , k ) instead of (u , ) with v n and passing to the limit k → ∞, gives the inequality Now the claim follows from Lemma 5.11. Remark 5.13. If F : V → L s (Ω) is continuous then one can prove the boundedness of (u k ) in L ∞ (Ω) with similar arguments as in the proof of Theorem 5.12 above.
6 Discussion of assumptions Assumption 1, Assumption 2, and Assumption 3 are satisfied when V = H 1 (Ω) or when V is a fractional order Sobolev space. The former is straightforward. Next, we elaborate on the fractional order Sobolev space setting.

Let us introduce in addition
and where the latter identity is due to [16,Theorem 6].  : . Now, we can prove that parts of the assumptions are satisfied for fractional Sobolev spaces. Note that these parts of the assumptions only depend on the properties of the Hilbert spaces but not on the concrete choice of the inner product.  [26,Theorem 11.7], and of the density of C ∞ c (Ω) in H 1 0 (Ω) = H 1 0 (Ω)∩L 2 (Ω), see [8,Theorem 4.2.2]. This proves (b). In addition, there is q > 2 such that the embedding H s (Ω) → L q (Ω) is continuous [14,Theorem 6.7], which is (c).
It remains to check Assumption 2-(1), which is an assumption not only on the space V but also on its inner product. Here, we want to work with inner products induced by fractional Laplacians.
We consider two well-known definitions of fractional Laplacian [2,3,14]. We start with the integral fractional Laplacian. To define the integral fractional Laplace operator we consider the weighted Lebesgue space For u ∈ L 1 s (R d ), ε > 0, and x ∈ R d we set is a normalization constant. Then the integral fractional Laplacian is defined for s ∈ (0, 1) by taking the limit ε → 0, i.e., where P.V. denotes the Cauchy principal value. Due to [16,Proposition 3.6], an equivalent norm on H s (Ω) is given by u → (−∆) s 2 u L 2 (R d ) , which motivates the following choice of the inner product |x − y| d+2s dy dx. Next, we discuss the spectral definition. Let −∆ Ω be the realization in L 2 (Ω) of the Laplace operator with zero Dirichlet boundary conditions. By classical results, −∆ Ω has a compact resolvent and its eigenvalues form a non-decreasing sequence 0 < µ 1 ≤ µ 2 ≤ · · · ≤ µ k ≤ · · · with lim k→∞ µ k = ∞. Let ψ k ∈ H 1 0 (Ω) be the orthonormal eigenfunctions associated with µ k . These eigenfunctions form an orthonormal basis of L 2 (Ω). Then for any u ∈ C ∞ c (Ω), the fractional powers of −∆ Ω can be defined as

Iterative scheme
Throughout this section, we suppose that F fulfills the following condition.
Assumption 4. Let Assumption 1 be satisfied. In addition, we require: (1) F is completely continuous, i.e., u n u in V implies F (u n ) → F (u) in V * for all sequences (u n ).
(2) F : V → V * is Lipschitz continuous on bounded sets, i.e., for all R > 0 there is We will use the following algorithm to compute candidates of solutions for the optimization problem. Similars method were used in [22], where F was assumed to be quadratic, and [17], where a more abstract but finite-dimensional problem was analyzed.
The optimization problem in (7.1) is strongly convex since α > 0 and ψ k ≥ 0. Hence, (7.1) admits a unique solution for each u k and L k > 0. In the following, we want to prove that the sequence (u k ) is bounded in V . In addition, we are interested in proving the weak limit points satisfy conditions similar to the one derived in Theorem 5.7.
First, let us argue that we can find L k and u k+1 satisfying the descent condition (7.2). This is a consequence of the local Lipschitz continuity of F . If F is (global) Lipschitz continuous then (7.2) is fulfilled as soon as L k is larger than the Lipschitz modulus of F , which is implied by the so-called descent lemma.
Proof. This is a consequence of the mean-value theorem for Gâteaux differentiable functions, see, e.g., [29,Proposition 3.3.4].
Proof. For n ∈ N, let w n,k be the solution of By optimality of w n,k , we have Due to the concavity and non-negativity of ψ k , we obtain ψ k (u 2 k )+ψ Consequently, w n,k → u k for n → ∞. By local Lipschitz continuity of F and Lemma 7.1, there are M > 0 and N > 0 such that for all n > N holds, and condition (7.2) is satisfied for L k > max(N, M 2 ). Let us now assume that the sequence (u k ) is bounded, i.e, u k V ≤ R for all k. From (7.3) and the properties of F , we find that there is K > 0 such that w n,k − u k 2 V ≤ K/n for all k and n. Then w n,k V ≤ R + 1 for all n > K and all k. Let M be the Lipschitz modulus of F on B R+1 (0). Then the descent condition (7.2) is satisfied whenever L k ≥ max(K, M 2 ). Due to the selection strategy of L k in Algorithm 1, it follows L k ≤ max(K, M 2 )β. Using arguments as in the proof of Lemma 5.2, the iterate u k+1 satisfies the following optimality condition Let us prove a first basic estimate, which gives us monotonicity of function values. A similar result (for convex and quadratic F ) can be found in [22,Theorem 5.4]. Recall the definition of Φ in (5.1), i.e., Lemma 7.3. Let (L k , u k ) be a sequence generated by Algorithm 1 with ( k ) monotonically decreasing. Then we have the following inequality Proof. Testing (7.4) with u k+1 − u k gives to produce squares, we find which we can rearrange to Using condition (7.2), concavity of t → ψ k (t), and monotonicity of ( k ) implies which is the claim.
Lemma 7.4. Let (L k , u k ) be a sequence generated by Algorithm 1. Then (u k ) and (F (u k )) are bounded in V and V * , respectively.
Proof. By Lemma 7.3, (Φ k (u k )) is monotonically decreasing. Due to α > 0 and Assumption 1, (u k ) is bounded in V . Due to complete continuity of F , cf., Assumption 4, (F (u k )) is bounded in V * .
Corollary 7.5. Let (L k , u k ) be a sequence generated by Algorithm 1. Then (L k ) is bounded.
Proof. Follows directly from Lemma 7.4 and Lemma 7.2.
Corollary 7.6. Let (L k , u k ) be a sequence generated by Algorithm 1. Then Proof. By Assumption 1, Φ is bounded from below uniformly in . Summation of the inequality of Lemma 7.3 implies the claim.
In order to be able to pass to the limit (7.4), we need the following result.
Lemma 7.7. Let (L k , u k ) be a sequence generated by Algorithm 1. Letū be the weak limit of the subsequence (u kn ) in V . Then Proof. Due to Corollary 7.6, u kn+1 ū in V . After possibly extracting a subsequence if necessary, we can assume that u kn and u kn+1 converge toū almost everywhere, and there is w ∈ L 2 (Ω) such that |u kn |, |u kn+1 | ≤ w almost everywhere. Arguing as in the proof of Lemma 5.5, we have 2 Ω ψ kn (u 2 kn )u 2 kn dx = G k (u k )u k → p Ω |ū| p dx.
Proof. Let us define λ k ∈ V * by Due to the boundedness properties of Lemma 7.4 and Corollary 7.5, it follows that (λ k ) is bounded in V * . Let u kn ū in V . After extraction of a subsequence if necessary, we can assume λ kn λ in V * . Then we can pass to the limit along the subsequence in (7.4) to obtain (7.5). Here, we used complete continuity of F and Corollary 7.6.
The system satisfied by limits of the iteration in Theorem 7.8 is clearly weaker than the system provided by Theorem 5.7. This is due to the fact that we cannot expect strong convergence of the iterates of the method, which would be necessary to pass to the limit λ kn , u kn+1 V * ,V → λ ,ū V * ,V .