On an inverse potential problem for a fractional reaction–diffusion equation

This paper considers an inverse problem for a reaction diffusion equation from overposed final time data. Specifically, we assume that the reaction term is known but modified by a space-dependent coefficient to obtain . Thus the strength of the reaction can vary with location. The inverse problem is to recover this coefficient. Our technique is to use iterative Newton-type methods although we also use and analyse higher order schemes of Halley type. We show that such schemes are well defined and prove convergence results. Our assumption about the diffusion process is also more general in that we will extend the traditional parabolic equation paradigm to include the subdiffusion case based on non-local fractional order operators in time. The final section of the paper shows numerical reconstructions based on the above methods and compares our methodology to previous work based on the linear model with as well as to the nonlinear case. We also show the interdependence between effective reconstruction of q and the coupling between the value of the final time of measurement and the subdiffusion parameter.


Introduction
Undetermined coefficient problems for diffusion processes have a long history due to their inherent applicability under a wide range of settings. In the simplest setting of a one dimensional parabolic equation cu t − (au x ) x + qu = 0, where we take the language of heat conduction, c is the product of specific heat and mass density, a is the thermal conductivity and qu a linear forcing function-q being referred to as a potential in this situation. Each of the coefficients a, c, q can depend on the spatial as well the temporal variable. Transformations also exist for the one-dimensional case that essentially allows one to convert the above equation into the canonical form u t − u xx + q(x, t)u = 0. However, the boundary conditions are by necessity also modified by this change of variables, [2], and this severely lessens its utility.
In many applications these coefficients cannot be considered to be known. Indeed, one of the first undetermined coefficient problems for partial differential equations was to determine the time-dependent conductivity a(t) from additional boundary measurements, [8]. The bibliography in the 1984 book on the one dimensional heat equation, [1], shows that already an extensive literature had developed.
Our interest in this paper will be in the determination of the potential term q = q(x) The setting is in a bounded, simply connected domain Ω ⊂ R d with smooth (C 2 ) boundary ∂Ω. However, for reasons to be explained, the analysis will be restricted to dimensions d ∈ {1, 2, 3}. L is a uniformly elliptic second order partial differential operator defined in Ω with sufficiently smooth coefficients and subject to boundary values on ∂Ω that, for simplicity, we take to be of impedance type. Since we are assuming that only the potential term is to be determined we can take L = − + q although the more general case when the other coefficients are known can be treated identically.
We are also interested in the more general problem of allowing an anomalous diffusion mechanism that would lead to a non-local, time-fractional equation In addition, we may also incorporate a nonlinear reaction term of the form q(x) f (u) where f is known so that our model becomes Of course this requires assumptions on the nonlinear function f (u) even for existence of bounded solution up to a given time T. This can be guaranteed under various conditions such as f > 0 or f uniformly Lipschitz, see [7] for the setting without a spatially varying potential q(x). Such reaction-diffusion equations are ubiquitous throughout the sciences occurring as basic components of chemical reactions, biological systems where they lead to pattern formation and physical processes such as combustion. However it is not the purpose or scope of this paper to pursue these application avenues other than what is essential for the inverse problem of recovering the spatial dependence term q(x) with the nonlinearity appearing as a carrier for the underlying physical problem. In short, we assume a very standard physical model except the value of the nonlinear reaction term will vary throughout the region Ω and it is this spatial dependence that we seek to recover.
In the linear case, f (u) = u, the inverse problem for the recovery of q(x) in (4) has a long history in the literature. Of course one must give additional, 'over-posed conditions' in order to obtain the recovery of q; the notation referring to the fact that if q were in fact known, these conditions would be unnecessary for the direct solution-and in fact would be almost surely incompatible. We mention two such types of data.
In the case of one space variable and the interval [0, 1], one imposes the flux, say g(t) = u x (0,t) on one lateral boundary in the case of Dirichlet conditions being imposed there and Dirichlet conditions if the boundary condition at x = 0 is of impedance type. A Fourier series argument shows that the solution to (1) can be written in the form where {λ k , φ k } are the eigenvalues and L 2 normalized eigenfunctions of the operator L q u := −u xx + q(x)u on [0,1] subject to the prescribed boundary conditions. The over-posed data now leads to the equation From this Dirichlet series one can take the Laplace transform in t and obtain a meromorphic function whose poles are at z = −λ k and whose residues there determine φ k (0). If there were two different q functions then these would have to give rise to the operators L q1 and L q2 with identical {λ k , φ k (0)} pairs; that is the same spectrum and end-point constants. It then follows from standard results in inverse Sturm-Liouville theory that the potentials must agree. Uniqueness is therefore a relatively straightforward argument but the severe ill-conditioning of the problem by this means is immediately apparent from the above analytic-continuation argument.
Once the spectral values are obtained then the recovery of q is in theory straightforward in the sense that recovery of the potential from these amount to an effective one derivative loss; that is we have the estimate q 1 − q 2 L 2 C k(λ k,1 − λ k,2 ) 2 . However, this obscures the fact that there is more to the story. On the interval [0, 1] and with Dirichlet boundary conditions we have λ n = n 2 π 2 + 1 0 q + n where n ∈ 2 and with more rapid decay the smoother the potential q, see [18]. Thus the information about q is contained in a small remainder term n that is masked by the rapidly growing leading term of the asymptotic. In a practical sense this effect far outweighs the mild-ill-conditioning noted above. We can easily see that for n = 10 and a smooth q the masking term is of order 10 3 and can be at least one or two orders of magnitude greater still than the signal contained in n . Another way to view this is that the contribution of the term q u against the term −u xx to the measured data can be extremely small.
In [19], the uniqueness question for the fractional derivative case, 0 < α 1 was shown and the reconstruction question was tackled by a more direct approach for which some of the difficulties noted above were minimized. An alternative over-posed condition is to prescribe later time data at some fixed T > 0 instead of data on the lateral boundary There is again a long history for this version of the problem, see for example [5,17], which gave conditions for a unique recovery of q as well as reconstruction algorithms in the parabolic case. The differential operator was evaluated on the over-posed boundary to obtain the fixed point equation This was extended to the fractional case in [25]. Under given assumptions this scheme can be shown to converge to a unique fixed point. However, convergence is often slow in comparison to alternative iterative methods that we shall develop and analyze in this paper.
With the over-posed data (7) the problem appears at first glance much more tractable than from the case of flux data on the later boundary. For q ∈ C α (Ω) the solution u(x, T) ∈ C 2+α and so the forwards problem exhibits only a two derivative smoothing. In this case the inverse problem would only have ill-conditioning equivalent to a two derivative loss. In addition, the term q(x)u(x, t; q) is only mildly nonlinear and so these two factors might indicate a relatively straightforward inversion. However, this overlooks the value of the constant coupling the respective norms of q and u(x, T) and, as we shall see, this plays a considerable role.
As in the lateral flux data case there are subtle difficulties and a main contribution of the current paper is to overcome them as much as the measurement data allows. One difficulty remains the same under both lateral and final data; the effect of the operator − is far greater than that of the potential term (unless there is a significant contribution from a nonlinearity such as in equation (4)). One way to view this, in analogy to the spectral case, is to compare the spectrum for − + q with that for − + q 0 for some known reference potential q 0 . Likewise here we may assume that under certain experimental conditions it is possible to view the data error as being a deviation from that which would have been obtained for such a reference potential. We will say more on this topic in the numerical reconstructions section.
We will take a different approach to the problem by computing the linearized derivative of the map q → u(x, T; q) and seeking to reconstruct q by a Newton-type iterative method. As we will show, this approach is flexible enough to handle the more complex case of weaker assumptions on the potential q (in fact in L 2 (Ω) as opposed to Hölder continuity in previous works) and also the nonlinear term in (4). In fact, we will also consider and analyze an accelerated version using a higher order Halley scheme, [4]. We shall show that these schemes are well defined by showing that the associated derivative maps are injective and provide a convergence analysis for both Newton and Halley methods.
The outline of the paper is as follows. The next section will introduce the requisite background material for the subdiffusion operator and will include the definition of the operator type as well as representation theorems for the solution of the direct problems needed in the inversion formulation. Section 3 will show the injectivity of the linearized derivative which is important for justifying the use of Newton-type iteration schemes which forms the core of this paper. Then section 4 will prove convergence schemes for both the Newton and Halley methods. The final section 5 shows the array of iteration schemes in practice and brings out from a numerical standpoint the topics discussed throughout the paper.

Background results
Classical (Brownian) diffusion can be viewed in the random walk paradigm where at each time step of length δt a random direction is chosen and the particle moves a distance δx. A fixed value of both δt and δx leads to the parabolic equation u t − K u = 0 where K is the coupling constant between the space and time variables (specifically it is the limit as both δx → 0 and δt → 0 such that the ratio δt/(δx) 2 is held constant at K). The associated fundamental solution is a Gaussian. More generally, if the time interval is chosen from a probability density distribution λ(t) and the jump length from another probability density distribution µ(x), then provided the former has a finite mean and the latter a finite first and second mean, then the long term behaviour again converges to a Gaussian fundamental solution (this is merely a restatement of the central limit theorem). Thus to obtain a fundamentally different diffusion paradigm one has to weaken the assumptions on one of λ(t) or µ(x). If we retain the finite first and second moments in µ(x) but drop the assumption of λ(t) having a finite meansay by taking λ(t) ∼ 1/t 1+α for 0 < α < 1-then we obtain the so-called subdiffusion model. Unlike Brownian motion which has the expected value of the variance x 2 ∝ t we now obtain x 2 ∝ t α . We no longer have an integer order derivative in time but one of fractional order.
These arise from the Abel fractional integral operator I α a f (x) = 1 x a f (s) (x−s) 1−α ds and are of two types depending on the order of integration and differentiation: the Riemann-Liouville ds versions. Since we wish to include an initial condition the latter is a better model as the Riemann-Liouville fractional derivative of a constant is singular at the origin. Thus the Djrbashian-Caputo derivative of order α is defined to be Since this is a non-local operator the designation of the starting point t = a is essential; in our case we shall take a = 0. The exponential is the key function for ordinary differential equations in that the solution of u t + λu = 0, with u(0) = 1 is given by u(t) = e −λt . The corresponding function for the fractional order derivative is the Mittag-Leffler function. We will need several critical properties of this function which we state in the next subsection.

The Mittag-Leffler function
The Mittag-Leffler function, [13] is defined by It is an entire function of order 1/α and type 1. For the general properties of this function we suggest the references [12,22]. The initial value problem for the fractional ordinary differential equation D α t u + λu = 0 for x > 0 and 0 < α < 1 with u(0) = 1, has solution u(t) given by u(t) = E α,1 (−λt α ).
Many of the results for parabolic equations carry over to fractional order operators although there are significant exceptions. In analogy to the solution of the non-homogeneous parabolic equation u t − Lu = Q defined on Ω × [0, ∞) with u(x,0) = u 0 and where {φ j } are the normalized eigenfunctions of the elliptic operator L on Ω, the fractional counterpart when u t is replaced by D α t is, see [6], where the evolution operator E(t) is the sequel to the semigroup operator e −tL u 0 for the parabolic case.
One of the most dramatic differences and for our purposes the most interesting and important properties, between the Mittag-Leffler function E α,β (z) and the exponential function E 1,1 (z), is their asymptotic behaviour on the real axis. For z real and positive the growth of E α,β (z) as z → ∞ is of order e 1 α z , that is super-exponential. However, for z real and negative (12) so that there is only a linear decay for large argument.
The effect of this is to change dramatically the decay rate of the solution to equation (3) as opposed to (1). This will manifest itself in the effective stability of inversion for q between the parabolic and fractional diffusion cases. This effect was used to advantage in [9] to design a regularization scheme based on fractional operators for the classical, extremely ill-posed backwards heat problem. However, as was also demonstrated, there are in fact two sides to this story. While the asymptotic behaviour of E α,β (−t α ) for large t is one aspect of the difference between the fractional and classical diffusion operators, the existence of the time argument in the form t α also has consequences for the conditioning differences for small values of t.
Another key fact is the complete monotonicity of the E α,β (−x) for x > 0. This was first shown by Pollard [15], for β = 1 and later extended by Schneider to 0 < α 1 and α β. Lemma 2.1. For any α ∈ (0, 1), β α, the function x → E α,β is completely monotone on the positive real axis.
We will also need standard special function recursion and derivative formulae and in particular the following which can be proven directly from the power series representation holds on the positive real axis.
The following lemma quantifies these facts by means of some essential estimates Lemma 2.3. For any α ∈ (0, 1), λ > 0, the estimates hold. Moreover, for any α ∈ ( 1 2 , 1] and any θ ∈ (0, 2 − 1 α ), there exists C α such that for all λ > 0 B Kaltenbacher and W Rundell Inverse Problems 35 (2019) 065004 Proof. The estimate (13) is easily shown for fixed α and x < 1/Γ(1 − α) by the fact that the series representation (12) in this range is an alternating series with the absolute value of the terms tending monotonically to zero. Hence the sum E α,1 (−x) is dominated by its first term. The uniform estimate valid for all positive x is more complex but a proof based on probability arguments and using the complete monotonicity of the Mittag-Leffler function can be found in [23, theorem 4].
To obtain (15), we use the abbreviation e(t) An important consequence of (13) and a corresponding estimate from below is the Sakamoto-Yamamoto stability estimate [21] for the solution of the fractional diffusion equation, i.e. of (3) with q ≡ 0, which was the basis for the regularization approach developed in [9]. More generally, the stability estimate on the solution operator E(t) defined in (11) is for u 0 ∈ H q (Ω) where 0 q 2 and q p q + 2, and the constant c depends only on α and p − q. In addition, we have, for any Q ∈ H q (Ω),

Filtering the data
As we will prove in section 4, the inverse problem under consideration is well-posed when considered with L 2 (Ω) as a parameter space and H 2 (Ω) as a data space. Since a realistic setting requires a norm on the data space that does not contain derivatives, in order to make use of this well-posedness, we need to smooth the practically given data g δ , that will typically only lie in L 2 (Ω) such that the filtered version g δ is in H 2 (Ω).
For this purpose, we use Tikhonov, Landweber or TSVD with the discrepancy principle for choosing the regularization parameter.
(Ω) ∼ δ for the L 2 noise level in the observer data, see (62). The operator L : L 2 (Ω) → H 2 (Ω) is supposed to be an isomorphism, e.g. (based on the results in section 4 below, Lu = L q0 u = − u + q 0 u or L = F (q 0 ) for some known reference potential q 0 .
Referring to the problem of imbalance between the u and the qu term in the equation, we actually also apply filtering to difference data, i.e.
where u 0 is the solution corresponding to a reference potential q 0 .

Injectivity of the linearized derivative
Suppose we have u 0 = a j φ j (x) where {φ j } are the normalized eigenfunctions of − + q so that the direct solution in sequel to (5) is The Fréchet derivative in the direction δq is given by the equation with solution evaluated at t = T. The solution to (17) is easily computed as stated in equation (11)û Now we have û(x, T) = 0 and we wish to show that this implies δq = 0.
In the case Suppose that u 0 0 and u 0 ≡ 0. Then from the maximum principle for the parabolic equation we see that u(x, τ ) > 0 for all 0 < τ T and so trivially h n (x) > 0 in this case. If α < 1 then such a strong maximum principle is unknown in this sharp form, but from [11] it follows that for each x u(x, τ ) 0 and the number of possible zeros of u(x, τ ) is finite. Hence, again the integral average must be positive and so h n (x) is also positive in the subdiffusion case. In particular, for a given u 0 (x) there is a δ > 0 such that h n δ > 0 for each n. From the standard estimates of the Mittag-Leffler function it also follows that {h n (x)} is uniformly bounded in n.
It then follows that we have for each n and we will have our conclusion if we can show that {ϕ n } is complete in L 2 (Ω). Note that since û satisfies homogeneous initial and boundary conditions we can re-scale (17) by multiplying through by any constant and in particular absorb this into the term h n . Thus without loss of generality we may assume that 0 < δ < h n µ < 1 for all n.
We have now shown that Theorem 3.1. Suppose that the initial condition u 0 in (1) is nonnegative and nontrivial, and u 0 , q ∈ L 2 (Ω). Then the only solution to (17) The term involving f (u) introduces several complications. First, of course, it assumes greater regularity on the reaction term f but the sign of f is also important as without such a condition the solution of (22) may blow up in finite time even in the parabolic case. However if we restrict our derivative map to be taken around q = 0, as is done in the so-called 'frozen Newton' iterations, then we see that equations (17) and (22) differ only in that the right hand side Q = f (u)δq in the latter case. Under appropriate assumptions on f it is possible to use the maximum principle to show that the solution u of the direct problem is of one sign in Ω × [0, T] and this also holds for f (u). Under these conditions then the proof of theorem 3.1 can then be extended to show that

Theorem 3.2. Given the conditions of theorem 3.1 and
Proof. We apply the maximum principle [11, theorem 1.1] together with remark 1.2 there to Note that the maximum principle just needs −qf (u) 0, however f (u) should not vanish since it appears as a multiplier of δq in (22). The smoothness imposed on f makes sure that f (u) in the left hand side of (22) is in L ∞ . □

Convergence of Newton's and Halley's method for the inverse potential problem
The plan of this section is to first of all formulate the inverse problem with L 2 (Ω) as parameter space and H 2 (Ω) as data space, to prove that it is well posed in this setting and to therefore conclude convergence of Newton's and Halley's method for its solution.
We first of all do so for the linear model (23). In section 4.6 we will discuss extension of the results to the semilinear equation (4) under appropriate assumptions on the (given) func- To this end, we show in section 4.1 that the forward operator F mapping the searched for q to the given data g = u(·, T) where u solves is well defined, i.e. for any q in an L 2 neighborhood of some initial guess, the linear (sub) diffusion equation has a unique solution u that is regular enough to allow for final values u(·, T) ∈ H 2 (Ω).
Based on the injectivity result theorem 3.1 we next establish local well-posedness of the linearized inverse problem in this On one hand this implies local well-posedness of the nonlinear inverse problem F(q) = g, due to the inverse function theorem. On the other hand, together with some further smoothness properties of F and F it will allow us to invoke known results on Newton's and Halley's method to conclude their convergence including rates, which we will do in section 4.3.
Since a realistic setting requires a norm on the data space that does not contain derivatives, refer to our discussion of ways of filtering given data g δ ∈ L 2 (Ω) such that the filtered version g δ is in H 2 (Ω) in section 2.2. Based on this, we will also consider inversion from noisy data and investigate the propagation of noise through the algorithms in section 4.5.
Throughout this section we will assume that the space dimension d ∈ {1, 2, 3}, since we will repeatedly make use of the fact that the product of an L 2 (Ω) and an H 2 (Ω) is again in L 2 (Ω), which is not valid in space dimensions four and higher. Moreover, for simplicity of notation, we will drop the identifier Ω in the norms on L p (Ω), H s (Ω), and abbreviate the norm in the Bochner space L 2 (0, T; L p (Ω)) by · L 2 (L p ) .

The forward operator
The inverse problem under consideration can be written as an operator equation We would like to use the space X = L 2 (Ω). As a domain of F we therefore use and H 2 (Ω) → L ∞ (Ω) embedding constants, see [10].
Note that D(F) has nonempty interior in L 2 (Ω).
We start with a preliminary result on the elliptic operator L q := − + q· induced by an L 2 (Ω) potential q ∈ D(F).
and H 1 ellipticity For q in D(F) as in (25), by a fixed point argument, we also have H 2 regularity in the sense that (L q ) −1 : L 2 (Ω) → H 2 (Ω) is bounded. This can be seen as follows.
For arbitrary q ∈ L ∞ (Ω; R + ) and f ∈ L 2 (Ω), we have L q u = f if and only if The latter is satisfied if q ∈ D(F) defined as in (25) with µ > 0 sufficiently small, since for any constant function q ≡ const ∈ L ∞ (Ω; R + ), and any f ∈ As a consequence, the inverse operator (L q ) −1 : L 2 (Ω) → L 2 (Ω) is selfadjoint and compact and therefore L q has a complete orthonormal system of eigenfunctions φ q and a corresponding sequence of eigenvalues λ q tending to infinity. Moreover, for the spaces we have that H s q and H s coincide with equivalent norms as long as s 2, where the constants in the equivalence estimates are uniform for q in an L 2 neighborhood B L 2 µ (q) of some positive and constant potential q. Indeed this is actually true for q ∈ L 2 (Ω) and s 2, since we can As soon as s > 2, higher smoothness of q is needed, as can be seen from the example which makes sense only for q ∈ H 2 (Ω). By means of separation of variables, lemma 4.1 allows to prove well-posedness of the forward problem (23), and in particular well-definedness of the forward operator F as a mapping from L 2 (Ω) to H 2 (Ω). Proposition 4.1. For any q ∈ D(F) according to (25) and any u 0 ∈ L 2 (Ω), the initial boundary value problem (23) has a unique solution u ∈ C ∞ ((0, T]; H 2 (Ω)) ∩ C([0, T]; H 2 (Ω)) and for any t > 0 the estimate holds for some constant C depending only on the q (but not on t, α, or u 0 ) and which can be chosen uniformly for all q in an L 2 (Ω) neighborhood of some constant q.
As a consequence, the operator F defined by (24) is well defined as a mapping from D(F) ⊆ L 2 (Ω) into H 2 (Ω), for any u 0 ∈ L 2 (Ω).
Proof. Due to lemma 4.1, the solution u of (23) can be written by means of the separation of variables formula in terms of the Fourier coefficients with respect to the basis φ q hence, u ∈ C ∞ ((0, T], H 2 (Ω)), and using the estimate (13) we obtain for any t ∈ (0, T] The Fréchet derivative of the forward operator defined by (24) is given by with homogeneous initial and impedance boundary conditions and a solution u to (23).
Proof. The linearized inverse problem of recovering p in (28) from j = v(·, T) can be rewritten as a fixed point equation in L 2 (Ω) with the operator where v (28).
We will prove that A is compact, due to the fact that p → D α t v(·, T) even maps into H 2ε (Ω), for some ε > 0, and compactness of the embedding H 2ε (Ω) → L 2 (Ω) (note that we have assumed boundedness of Ω), which can be seen as follows.
Using the identity [20, equation (11)] where we have used integration by parts and the fact that E α,1 (0) = 1 as well as (see lemma 2.2) we get, for an arbitrary ε ∈ (0, 1/4) where we have used (13) and (14). The two terms on the right hand side of (32) can be estimated by and analogously for r * * = 2r * r * −2 and r * = ∞ in case d = 1, r * < ∞ in case d = 2 and r * 2d d−2+4ε in case d 3, so that we can continue (32) as follows: For establishing the required L 2 (0; T; L r * * (Ω)) regularity of ∂ t u we just use the solution formula which we differentiate with respect to t to get (see (31)) where, according to the requirements ε > 0, θ < 2 − 1 α and s d−2+4ε Boundedness away from zero of u(·, T) implies boundedness of the multiplication operator w → 1 u(·,T) w as an operator from L 2 (Ω) into itself. Thus altogether we get compactness of the operator A, which, by means of the injectivity result theorem 3.1 and Fredholm's alternative, applied to the fixed point equation (29), yields invertibility of F (q) : L 2 (Ω) → H 2 (Ω). Since the second Riesz theorem implies closedness of the range of F (q), its inverse is bounded. □ Note that the sign of q is of no influence here, as can be seen from the fact that the change of variables v(x, t) = e ct u(x, t) leaves the initial data unchanged, while transforming the equation to one with a potential q(x) + c. One can argue similarly in the subdiffusion case. However, q ∈ L ∞ (Ω) appears to be essential.
At this point, dependence of the stability constant on T or α comes into play: Larger T or α closer to one implies that the values of u at final time, by which we divide, are smaller. This is clearly visible in the numerical results of section 5, see figure 1 there.

Convergence of Newton's and Halley's method
We now consider the following derivative based iterative reconstruction methods.
Newton's method: frozen Newton: Halley's method:  Well-posedness of the linearized problem (a) has already been established in the previous section, so it remains to prove (b) and (c). Lipschitz continuity of F could in principle be proven as well. However, it would need higher than L 2 regularity of q so it would not be applicable in a full Halley scheme where the iterates q k can only be expected to be in L 2 (Ω); and in a frozen Halley scheme, where q 0 might be chosen to have higher regularity, Lipschitz continuity of F cannot be shown to enhance the convergence speed (see section 4.4 below), so proving it would not pay off.

Lemma 4.2. For any
for a constant C 1 > 0 and any p ∈ L 2 (Ω).
, and the Cauchy-Schwarz inequality as well as an estimate similar to (33) where r * = r 1−r , r * * = 2r 2−r = 2r * r * −2 , and ∂ t u L 2 (L r * * ) can be estimated as in (37), using (35) and (15) This is exactly the same situation as in the proof of (37) (setting ε = 0 there) and hence allows to choose θ s, in space dimensions three and less, which by equivalence of the H 2 q and the H 2 norm yields ∂ t u L 2 (L r * * ) C u 0 H 2 . (45) Moreover, u(·, T)) L ∞ C H 2 q →H 2 C Ω H 2 →L ∞ L q u(·, T) L 2 , which using (35) we can further estimate by and v,ṽ solve (28) and respectively, both with homogeneous initial and impedance boundary conditions.
Proof. Employing the solution formula for (46) we get, analogously to (43), using the fact hence, using the estimate (14), where the last term on the right hand side can be estimated according to (42): For the first term on the right hand side of (48), observe that, with an estimate similar to (33): hence, using a * b L 2 (0,T) a L 1 (0,T) b L 2 (0,T) for the second term and (15) for the first term In here, we can compute to arrive-analogously to (50)-at where ρ * * = 2ρ * ρ * −2 , we can further estimate the ∂ t u term analogously to (44) by The above embedding requirements lead to the conditions which can be satisfied by setting σ = 0 in case d = 2 and σ > 0 (but arbitrarily small) in case d = 3. Moreover we can set σ = 2, ρ * = 1 in both cases d = 2, 3. Thus, choosing θ ∈ (0, 2 − 1 α ) for α ∈ (0, 1 2 ), allows to obtain σ − θ = 0 in both cases d = 2 and d = 3. Altogether we have Based on these results we can state the following on convergence of Newton's and Halley's method as well as their respective frozen versions. The requirement of q 0 or q k are in L ∞ (Ω) results from the use of a maximum principle in order to guarantee that u(x,t;q 0 ) (or u(x,t;q k )) takes its minimum at time t = 0. See remark 4.1.
In the case of the frozen versions of Newton or Halley, this is only a matter of choosing the starting value; in the respective full versions it can be achieved by projecting q k into an interval pointwise after each step. Obviously this projection will not increase the iteration error if the values of the exact solution q are in this interval as well, so the convergence results remain valid.

Convergence of frozen Halley
Consider a nonlinear operator equation which we assume to have a (not necessarily unique) solution q † and where F : D(F)(⊆ X) → Y maps from a subset of a Banach space X to a normed linear space Y and satisfies, for some initial guess q 0 sufficiently close to q † the following smoothness conditions.
where the last one may be skipped, as we will see later on. The norms will here be used without subscripts denoting the spaces, since it should be clear from the context whether we refer to the norm on X or on Y. Sufficient for the first and the second of these conditions are, e.g. bounded invertibility of F (q 0 ) and boundedness of F on B ρ (q 0 ). The last one would additionally require Lipschitz continuity of F at q 0 , but we will not further go into detail about this since it does not yield an improvement of the convergence rate, as we will see later on.
One step of Halley's method, frozen at q 0 , is defined by

Using the abbreviations
we can write the recursion for the error e k = q k − q † as where for any f ∈ X * (with the dual pairing f , · X * ,X ) and the fundamental theorem of calculus, applied to the real function φ f : (we have used the substitution t = 1 − r and the simple identities 1 = where the terms in braces can be estimated by Choosing f := e * k+1 as a normalizing functional (according to the Hahn-Banach theorem) of e * k+1 ∈ X * such that e * k+1 , e k+1 X * ,X = e k+1 and e * k+1 = 1, and using the triangle inequality and the substitution σ = 1 − τ , we get (57) Here we have estimated the norm of (I + BD k ) −1 by using the fact that similarly to the above, (Note that replacing q 0 by q k in (57) would give the cubic error estimate of Halley's method.) This estimate on one hand shows that convergence can only be shown to be linear, even if F is Lipschitz. On the other hand, by the usual reasoning, assuming that q 0 − q † < ρ with ρ sufficiently small and applying an inductive argument, we can conclude that there exists η ∈ (0, 1) such that for all k ∈ N, we have q k − q † < ρ hence q k − q 0 < 2ρ and , which is smaller than one for ρ sufficiently small. Linear convergence remains valid when dropping the Lipschitz continuity assumption on F and just assuming for all q ∈ B ρ (q 0 ), p,p ∈ X, since then we can still estimate As compared to the error estimate for the (frozen Newton) predictor, see (56), it seems that the frozen Halley therefore cannot be shown to give an improvement (not even with respect to the size of the convergence factor), although the numerical results consistently exhibit better convergence of Halley as compared to Newton. Altogether we have proven the following result (which in the above sense is sharp). (53) and (58) whose domain D(F) has nonempty interior in X, and any solution q † of (52) there exists ρ > 0 such that for any starting value q 0 with q 0 − q † ρ the iterates defined by (39) or (41) converge linearly to q † .

Proposition 4.3. For F satisfying
When dealing with noisy data g δ = g + δg with δg δ in place of g, the error recursion which modifies the estimate to By induction this yields that provided q 0 − q † < ρ we get q k − q † < ρ for all k ∈ N, for all k ∈ N. with s chosen such that for some r ∈ [1, ∞] and r * = r r−1 , r * * = 2r 2−r = 2r * r * −2 , the embeddings H 1 (Ω) → L r * (Ω) and H s (Ω) → L r * * (Ω) are continuous, i.e. s d r * d−2 2 . Here ū solves the homogeneous linear initial value problem Well-posedness of this problem follows from proposition 4.1.
To prove that T is a contraction, we estimate the X norm of the difference û := T v − Tṽ of the images in terms of the difference v := v −ṽ of the arguments, using the fact that it satisfies D α This allows to represent û (analogously to (43)) aŝ where we have used the fact that v(·, 0) = 0. Thus, we get, analogously to the estimate after (43), For the second part of the X norm of û we estimate ∂ tû analogously to (51) (setting ρ = r)
The expressions for the derivatives of F change as follows.
with homogeneous initial and impedance boundary conditions and u is a solution to (63) and (68) and v,ṽ solve (67) and respectively, both with homogeneous initial and impedance boundary conditions. Also here we work with fixed point formulations that replace qf (u) on the left hand side by cqu and add corresponding terms q(c − f (u))v, q(c − f (u))w on the respective right hand sides. The · C(0,T;H 2 (Ω)) and ∂ t · L 2 (0,T;H s (Ω)) norms of these terms can be estimated analogously to the above, it only remains to estimate the additional term qf (u)vṽ, see (49) and (50) which we do as follows: , where the terms on the right hand side can be further estimated according to lemma 4.2 as well as (50). So again (64) with c 1 and c 2 sufficiently small allows to prove contractivity of the respective fixed point operators on B X R (0) and thus carry over all results from section 4.3 to the semilinear setting.

Reconstructions
In this section we show some reconstructions of the potential function q(x) and illustrate many of the features discussed with the iteration schemes presented. In particular we will bring out the dependence on α (and by necessity coupling this to the value of T). Previous work, for example [9,19], has shown the definite pattern affecting the ill-conditioning and hence the ability to effectively reconstruct from final time data. In terms of condition number of the inverse problem there is an interplay between the time T and the fractional exponent and we shall see this effect quite clearly in our ability to recover q.
For numerical reconstruction we require solvers for the direct problem (3) and its nonlinear counterpart (4). We used two different types to be described below.
The first of these was based on the spectral decomposition for the equation D α t u − u = Q in Ω × (0, T) subject to the initial condition u(x,0) = u 0 (x) from equation (11) The eigenfunctions φ j for − on Ω have to be computed and this is straightforward in R or for special geometries in R d . The quadrature schemes for the integrals in (11) have to take into account the known order of the singularities. To allow for a term q(x)u one can instead use the eigenfunctions for the operator − + q on Ω. Again, in R these are straightforward computations using eigenvalue/eigenfunction solvers from Sturm-Liouville theory. For the case of the term q(x) f (u) one must use iteration as now Q = Q(x, t, u) and this proceeds in a standard way. Convergence tends to be quite rapid as the time over which the integral has to be evaluated, (0, T], is small. A second type is based on a finite difference scheme. In the case α = 1 this was taken to be the Crank-Nicolson algorithm, but for α < 1 and Ω ∈ R, a time stepping method based on the Abel integral for the Djrbashian-Caputo derivative was employed. This is unconditionally stable, second order accurate in space but first order in time with the constant depending on α. As α → 1 the scheme recovers the usual fully implicit method for the parabolic equation. For the case of Ω ∈ R d with d > 1 and nontrivial geometry, a more sophisticated finite element solver should be used in conjunction with a fractional time stepping method. For more information on such a solver for the nonlinear qf (u) case, see [7].
In our reconstructions we tended to use one solver for computing data g given q(x) and the other as the direct solver in the Newton/Halley iteration scheme used to recover q. In the case where the same solver was used different grid parameters were used (although by adding random noise to the final data g we automatically avoided an inverse crime).
In figure 1 we show the reconstructions of q(x) at effective numerical convergence for different T and α values for the linear model (3) using the frozen Newton scheme. The actual potential taken was q act (x) = sin(πx) + 2e −2x sin(3πx) and the initial data was u 0 (x) = 20x(1 − x)e −2x . Throughout q 0 = 0 was used as an initial guess and as a reference potential.
Since the equation here was the linear model, effective numerical convergence was achieved within two or three iterations for T = 0.1 but slightly more were required with increased T and noise level δ for the fractional case and considerably more for the parabolic equation (1). The reconstruction shown for T = 1 and δ = 1% was for the fourth iteration in the case of α = 0.9 but the 30th for α = 1, although in the latter case the tenth iteration was already close to the one shown. Of course this would have changed considerably if we had instead used a Halley method.
In the top row plots the noise level was taken to be 0.1% and in the bottom row it was increased to 1%. In each case we show the outcome for T values of 0.1, 0.5 and 1. Two values of α are shown: the fractional case with α = 0.9 and the parabolic with α = 1. The reconstruction for α = 0.9 is in dashed lines while that for α = 1 is in dotted lines.
It is worthy to note here the comparison with the scheme noted in (8). For sufficiently small T this converges quickly; for example with T = 0.1 in about twice the number of iterations of the frozen Newton scheme. As T increases the associated operator is no longer a contraction and the space has to be re-normed by a weight factor and this increases the number if iterations that are correspondingly required so that the scheme quickly becomes non-competitive with those based on the Fréchet derivative.
These figures clearly show the expected degradation in resolution that is inevitable for the parabolic case with increasing values of T. On the other hand, for the fractional case there is very little loss and the reconstructions hold for much larger T than shown. The situation is not at all strongly dependent on α provided α is taken to be less that unity.
When the noise level in the measured data is increased from 0.1% to 1% we see the expected further loss of resolution in the reconstructions. It is noteworthy that in the fractional case the reconstructions, while poorer at the higher noise level, hold almost independent of T at the ranges considered. For the parabolic case the reconstructions possible are again poorer at the higher noise level and by T = 1 the reconstruction is quite poor.
The above is exactly in line with the recovery of initial data in the backwards diffusion problem in [6,9] and indicates that a feasible regularizer for the parabolic problem is to use instead the fractional operator with α near to, but less than, one. In fact, a more sophisticated version is possible here along the lines discussed in [9].
We now show some numerical results for the nonlinear reaction-diffusion model, equation (4). The nonlinearity was taken to be f (u) = 10u 3 . In figure 2 we show the relative decrease in the L 2 norms of each iterate for all four methods. Since the equation and the consequent inversion now has an additional degree of nonlinearity we should expect a slower convergence rate and this is indeed the case. As expected the full implementations converge faster than the ones frozen at the initial approximation (here q 0 = 0) and the Halley outperforms the Newton based on the number of iterations. Of course, there is a considerable saving by not having to recompute the derivative map at each iteration as is done in the frozen schemes and of course the Halley scheme requires computing a corrector step in addition to the predictor that is used in the Newton. Thus the actual running time of the algorithms is a considerably more complex question than mere iteration count and will of course depend strongly on the nonlinear term f (u) among other things.
In figure 3 we show the reconstructions after one, three and ten iterations of the frozen Newton scheme (leftmost graphic) and the full Halley scheme (rightmost graphic) for recovering q in (4) where F(u) = u 3 . As shown in figure 2 the steady state was not yet reached by the frozen Newton scheme (this took approximately 30 iterations) but the Halley method had already achieved steady state essentially at iteration 8.
As a final remark, the issue of adding noise to the actual data g = u(x, T; q act ) or to g − u(x, T; q 0 ) = u(x, T; q act ) − u(x, T; q 0 ) for some reference potential q 0 , as noted in the introduction and section 2, deserves further comment. In all the reconstructions shown in this section we used the former approach. However, if the latter were taken, even if the reference potential was quite far from the actual, for example q 0 = 0, then the imbalance between the u and the qu term in the differential equation becomes much smaller. Under these conditions comparable reconstructions with noise levels of even five percent and higher are feasible.