A Low-Rank Approach to Off-The-Grid Sparse Deconvolution

We propose a new solver for the sparse spikes deconvolution problem over the space of Radon measures. A common approach to off-the-grid deconvolution considers semidefinite (SDP) relaxations of the total variation (the total mass of the absolute value of the measure) minimization problem. The direct resolution of this SDP is however intractable for large scale settings, since the problem size grows as $f_c^{2d}$ where $f_c$ is the cutoff frequency of the filter and $d$ the ambient dimension. Our first contribution introduces a penalized formulation of this semidefinite lifting, which has low-rank solutions. Our second contribution is a conditional gradient optimization scheme with non-convex updates. This algorithm leverages both the low-rank and the convolutive structure of the problem, resulting in an $O(f_c^d \log f_c)$ complexity per iteration. Numerical simulations are promising and show that the algorithm converges in exactly $r$ steps, $r$ being the number of Diracs composing the solution.


Sparse spikes deconvolution
Sparse super-resolution problems consist in recovering pointwise sources from low-resolution and possibly noisy measurements. Such issues arise naturally in fields like astronomical imaging [28], fluorescency microscopy [19,30] or seismic imaging [21], where it may be crucial to correct the physical blur introduced by sensing devices, due to diffraction or photonic noise for instance.
Formally, let δ x be the Dirac measure at position x ∈ T d , where T = R/Z is the torus. We aim at retrieving a d-dimensional discrete Radon measure µ 0 = r k=1 a 0,k δ x k (with a 0 ∈ C r , x k ∈ T d ), given the measurements y = Φµ 0 + w = y 0 + w where Φ is some known convolution operator, and w some unknown noise.

Beurling LASSO
Although this problem is severly ill-posed, sparse estimates can be found by solving the following optimization program, known as the BLASSO [6] argmin µ∈M(T d ) where the total variation norm (defined below) naturally extends the 1 -norm of finite-dimensional vectors to the continuous setting of measures, favoring in particular the emergence of spikes in the solution. This problem might have non-unique solutions.
The main asset here is that no assumption is made about the support of the measure, contrarily to early superresolution models where the latter was assumed to live on some discrete grid [9,10]. This grid-free approach has been at the core of several recent works [2,4,7], and has offered beneficial mathematical insight on the problem, leading to a better understanding of the impact of minimal separation distance [33] or of the signs of the spikes [4,6], lacking in first superresolution works, and to sharp criteria for stable spikes recovery [8,13].
However, the infinite-dimensionality of (P λ ) poses a major numerical challenge, and off-the-grid superresolution, although analytically better, remains far more difficult to perform in practice. This work introduces a new mehod to numerically solve this optimization problem in a scalable way.

Related works
Conditional gradient approach In the case of a generic operator Φ, [2] propose to iteratively estimate the support of the initial measure using a conditional gradient (also known as Frank-Wolfe) scheme, which basically consists in greedy stepwise addition of new spikes. To remedy to the low convergence rate of the method, non-convex corrective updates are added as a final step [2,1]. This method achieves state-of-the-art result in several applications, among which fluorescence microscopy [1].
Semidefinite approaches Another important class of methods consist in introducing convex semidefinite relaxations of (P λ ), which enables the use of tools from semidefinite programming. SDP approaches for the total variation minimization problem were originally introduced in [4,34], for unidimensional measures (i.e. d = 1) only however. The multivariate case on the other hand raises a far more challenging theoretical problem, which is solved using the so-called Lasserre's hierarchy [24], consisting in a sequence of increasingly better semidefinite approximations of (P λ ) or its dual [7,11].
The numerical interest of semidefinite programs holds in that they are well-known, and benefit from efficient solvers; however, these become quickly intracatable for large matrices. This is a major obstacle for the method, since the size of the matrices involved in the semidefinite relaxation of (P λ ) are typically of size f d c × f d c , where f c is the cutoff frequency of the low-pass filter Φ. Up to now, semidefinite relaxations have essentially been successful for combinatorial problems like the MaxCut problem [16], in which the number of variables is high and the degree of the polynomials involved in the constraints is low (it is equal to 2 in the MaxCut case). Imaging problems on the contrary typically involve polynomials with low numbers of variables (essentially the same as the dimension d), but high degrees (f c ), and to our knowledge, no efficient way of solving the hierarchy in those settings has been proposed yet.
Non variational approaches Although we focus here on 1 -regularization techniques, there is also a vast literature on non-convex or non-varational superresolution schemes. Many of these methods derive from the same idea proposed by Prony to encode the positions of the spikes as zeros of some polynomial, see [22] for a review. This is the case for MUSIC [32], ESPRIT [29], or finite rate of innovation [37], among others. They remain difficult to extend to multivariate settings, see for instance [23].

Contributions
We propose a novel algorithm for solving the BLASSO, able to cope with large problem sizei.e. with large cutoff frequency f c . We target cases where Φ is a convolution operator.
Our first contribution introduces a semidefinite lifting of (P λ ), which admits solutions whose rank is bounded by the number of spikes in the solution of (P λ ). We prove this result for d = 1, and give numerical evidences that it also holds when d > 1.
Our second and main contribution is a conditional gradient algorithm for this lifting, which requires only O(f d c log f c ) elementary computations per iterations, making it scalable for superresolution problems in dimension greater than one. Numerical simulations suggest a r-steps convergence, r being the number of spikes in the solution of the BLASSO.
Roadmap We introduce our semidefinite approximation in Section 3 and 4. The low-rank property is proved in Section 3. Sensitivity analysis is exposed in Section 4. Our algorithm is detailed in Section 5. Finally, Section 6 presents our numerical results.

Notations
Spaces We consider measures defined over the torus T d = (R/Z) d . Note that in practice, in order to avoid periodization artifacts, one can use zero-padding or symmetrization at the boundary of the signal and still consider a periodic convolution setting. We denote by M(T d ) the space of bounded Radon measures on T d , endowed with its weak-* topology. It is the topological dual of the space C(T d ) of continuous functions on T d .

Linear operator
We focus on the case where the measurements y consist in a low-pass filtering of the initial measure. In our periodic framework, measures are uniquely defined by their Fourier moments, and convolution with some kernel boils down to the product of these moments with the Fourier coefficients of the kernel. For a filter of cutoff frequency f c , the measurements are thus simply a finite collection of weighted trigonometric moments of the initial measure. We define the linear operator Φ : where f c is the cutoff frequency of the filter, and (c k ) its transfer function. Its adjoint Φ * : In the following, we call n = 2f c + 1 the number of observations in each dimension. This model encompasses filters which have compact support in the Fourier domain. Note that without loss of generality, since one can still renormalize the observations by posingỹ k = y k /c k for all k, in the rest of the paper, we restrict our attention to the ideal low-pass filter case, i.e. c k = 1 for all k. Φ is then simply the Fourier operator.
Note that while using a periodic boundary condition might seem an issue, provided that the filtering kernel has a bounded support in the Fourier domain, one can always assume this holds, at the price of possibly extending the domain size to avoid periodization issues (e.g. with padding).
Trigonometric polynomials We call multivariate Laurent polynomial of degree l a function of the form defined for z ∈ C d . Note that the exposants may be negative. When p −k = p k , P is said to be Hermitian; in particular, it is real-valued on the unit circle, and satisfies P (z) = P (1/z). When Φ is a low-pass filter, Φ * p is a trigonometric polynomial defined on T d such that In the following, we call P p the Laurent polynomial associated to Φ * p, simply defined as P p (z) = k∈ −fc,fc d p k z k , for z ∈ C d , so that P p (e 2iπx ) = Φ * p(x) for all x ∈ T d . Remark 1. In all the paper, we writez to designate the conjugate value of z, V h the conjugate transpose of a vector or a matrix V , and Φ * the hermitian adjoint of an operator Φ.

Total variation norm
The total variation norm of a measure is defined as It is the natural extension of the discrete 1 -norm to the space of Radon measures: indeed, if for instance µ is some discrete measure µ = µ a,x def.
It can be proved that the subdifferential of the total variation norm reads In particular, one has . . , r .

Primal and Dual problems
Dual problem The dual of the BLASSO reads [35,12] argmax Here the computational challenge comes from the continuum of constraints, which asks for a trigonometric polynomial to be bounded by one in magnitude.
Dual certificate Strong duality holds between (P λ ) and (D λ ) [2, Proposition 2], and optimality relations between any pair of solutions (µ λ , p λ ) read The trigonometric polynomial η λ = Φ * p λ is called the dual certificate [3]. The associated complex polynomial P p λ is Hermitian. We now recall some important basic results about the primal and dual solutions, with brief proof so that the paper be self-contained.
The dual polynomial, bounded by one in magnitude, thus interpolates the signs of the spikes on the support of µ λ , see Fig. 1. One can then recover the measure µ λ using the following result: has a unique solution µ λ given by Proof. Let µ λ and µ λ be solutions of (P λ ). Using Proposition 1, we may write µ λ = s∈Σ λ a s δ s and µ λ = s∈Σ λ a s δ s . Thus which implies a = a , and therefore µ λ = µ λ . Moreover, the previous relation immediately yields

Support recovery
In order to determine the support of a solution µ λ of (P λ ) from a dual certificate, one can therefore solve the polynomial equation Proposition 3. Suppose that the hypothesis of Proposition 2 hold. LetP λ = 1 − |P p λ | 2 . Then: Proof. From Proposition 2, we know that Supp µ λ ⊂ Σ λ . Then we have arg z ∈ C d ;P λ (z) = 0 and |z i | = 1, i = 1, . . . , d which yields the result.
Remark 2. Should the inclusion above ever be strict, solving the linear system (2) which gives the amplitudes leads to null amplitudes for points located in Σ λ \ Supp µ λ , provided Φ Σ λ injective.
Remark 3. Finding the support of µ λ thus amounts to finding the roots ofP λ that are located on the unit circle. For multivariate polynomials, this root-finding problem is very difficult. We discuss our numerical approach further on, see Section 6.

Semidefinite hierarchies
An important feature of (P λ ) is that although it is defined over the space of Radon measures, only moments are actually involved. It would thus seem only natural to reduce it to a problem over sequences of moments. But this raises the difficult question of finding conditions to ensure that a sequence coincides with the moments of some measure (such sequence is said to admit a representing measure). This problem, known as the moment problem, has been extensively studied, and it is well-known that such conditions exist for measures defined on semi-algebraic sets, i.e. sets described by polynomials inequalities.
The numerical assets of this theory have only recently been investigated by Lasserre, with Parrilo [27] making similar contributions independently. His pioneering work [24] introduces a generic method based on semidefinite programming, now known as the Lasserre's hierarchy, for solving a large class of problems, referred to as generalized moment problems [25] -of which the BLASSO is an instance. The core idea of this method is that for measures defined on a semi-algebraic set K, one can derive a hierarchy of SDP-representable necessary conditions for any sequence to admit a representing measure on K, such that at each order of the hierarchy, the size of the semidefinite program increases as the corresponding condition sharpens. In other words, one can approximate to arbitrary precision the cone of sequences admitting a representing measure by either larger semidefinite cones (thus semidefinite relaxations of the initial problem are build), or thinner ones (semidefinite strenghtenings). We refer to [25] for a detailed presentation of Lasserre's hierarchy.
Semidefinite hierarchies as studied by Lasserre are concerned with measures defined on real algebraic sets. The case of trigonometric algebraic sets (i.e. described by trigonometric polynomials inequalities), of greater interest to us, is studied by Dumitrescu [11]. It can actually be cast into a real framework, since any trigonometric polynomial We thus obtain an equivalent real semialgebraic set by considering which is indeed a real semialgebraic set (see [11] for further details).

Moment encoding
The semidefinite relaxation at order m (m n, with n = 2f c + 1) of (P λ ) reads: Here θ kj denotes the m × m Toeplitz matrix with ones on its k j -th diagonal and zeros everywhere else, and ⊗ stands for the Kronecker product. For instance, with 2 × 2 matrices, for d = 2 and k = (−1, 0), one has The constraints (a) and (b) above encode a necessary condition for the sequence z to be the moments of some measure on T d . We often refer to R as a moment matrix. This particular formulation, introducing this block-structured matrix R, can be deduced from the bounded real lemma [11].
Remark 4. The dual point of view of this moment formulation consists in replacing the positivity constraint in (D λ ) by a sufficient sum-of-squares condition, which is also SDP representable. Thus, to the above primal relaxation corresponds a dual strenghtening, which reads [11,38] We refer to [11] for further details on this problem.
Collapsing In [25], it is shown that the optimal value of the semidefinite program at level m of the hierarchy converges to the one of the initial problem as m goes to infinity. However, in order to recover the correct support in our case, we obviously need the hierarchy to have finite convergence -in which case it is said to collapse. In dimension one, the semidefinite approximations of (P λ ) and (D λ ) are known to be tight for m 2f c + 1. This results from Caratheodory-Toeplitz theorem [34] or Fejér-Riesz theorem [11]. In dimension d > 1 however, there are no results on whether the relaxations converge finitely or not in general, except in the case d = 2, for which the hierarchy is known to collapse, but with no a priori information on the order m at which it occurs, as mentioned in [7, Section 4]: more specifically, one may find instances where the order at which the hierarchy collapses is arbitrarily large. Still, although the theory remains rather pessimistic, numerical simulations tend to show that finite convergence almost always happens at low orders of relaxation. Remark 5. To detect the collapse of the hierarchy, one may recover a measure µ ) and (P λ ) have the same optimal value, and (P λ ) admits a r-atomic solution that can be retrieved from the moment matrix R (m0) λ ; we refer to [7,5] for further details about this property.
From now on, we omit the suscript (m) when referring to solutions of (P Dual certificate As before, we define the dual certificate as Φ * p λ , where p λ is the solution of (D (m) λ ). One actually do not need to solve the dual problem to recover this vector, thanks to the following result: Lemma 1. The vector p λ of coefficients of the dual polynomial is given by where z λ is a solution of (P Proof. This is a direct consequence of primal-dual relations at optimality.
The corresponding support can then be recovered via the aforementioned root-finding techniques, described in Section 6.
As it appears, the size of the above semidefinite program is m 2d , with m 2f c + 1. Therefore, usual interior points methods are limited when d > 1, or even when d = 1 for large values of f c . In the rest of this paper, we introduce a method which scales well with the dimension d.

Low-rank structure
The unidimensional case We now state a fundamental low-rank property of the moment matrix involved in (P (m) λ ), which we then fully exploit in our algorithm, both in terms of spatial memory and computational efficiency. λ ) admits a solution R λ whose rank is bounded by the number of spikes r in a solution µ λ of (P λ ).
Proof. This proof is entirely based on the proof of Proposition 2.1 in [34]. Suppose a solution of (P λ ) is given by µ λ = r k=1 a k δ x k , with a k ∈ R. Let e(x, ϕ) def.

= e 2iπjx+iϕ
−fc j fc . Then Let z = k |a k |e(x k , ϕ k ), u = k |a k |e(x k , 0), τ = k |a k | and consider the matrix Therefore R is indeed a solution of (P (m) λ ) since µ λ is a solution of (P λ ), and one has rank R r.
Higher dimensions For d > 1, although we do not have theoretical proof for this result, numerical experiments seem to indicate that the same rank property holds. Figure 2 shows the singular values of both a solution of (P (m) λ ) and a solution of its dual [4], computed using a standard interior points solver , interfaced with the CVX software [17].
The search space of (P (m) λ ) may thus be restricted to rank-deficient matrices. Such geometry is well exploited by conditional gradient algorithms. However, these methods are not able to handle the SDP contraint (a) together with the affine constraint (b). Furthermore, the intersection between manifolds of fixed rank matrices and the affine space defined by (b) quickly becomes unanalyzable for matrices larger than 2 × 2, and non-convex optimization schemes on this search space would likely be difficult to implement. Instead, we propose to smooth the geometry of the problem. Singular values (relative to the maximal singular value) of primal (blue) and dual (black ) matrices. The rank of the primal matrices appears to be bounded by r, and that of the dual matrices to be bounded below by n − r. Not all dual matrices satisfy this property: when d = 1 for instance, there always exists a dual solution of rank at most 3; however, such low-rank matrices are unlikely to be found by interior points methods, since these tend to maximize the rank of their solutions.

Toeplitz penalization
To overcome the difficulty induced by constraint (b) in (P (m) λ ), we introduce a penalized version of (P (m) λ ) -which can also be seen as a perturbation of the atomic norm regularizer used in [34,35]. We call T the Toeplitz projector, defined as where the matrix Θ k are introduced in Section 3.1. In dimension one, the operator T replaces each entry of R by the mean of the corresponding diagonal. We consider the following program: where the parameter ρ controls the penalization of the Toeplitz constraint. We study numerically the validity of this approach. For simplicity, the numerical experiments of this section are all in dimension one.

Sensitivity analysis
Obviously, the dual polynomial η λ,ρ associated with (P (m) λ,ρ ) (defined as Φ * ( y λ +2z λ,ρ )) may differ from η λ , in particular with regard to their saturation points, thus compromising the support recovery. However, following an approach similar to [36], under some mild non-degeneracy hypothesis on η λ,ρ , it is possible to show that, for small enough values of ρ, R λ,ρ is sufficiently close to R λ to allow accurate support reconstruction -in particular, they have the same rank. Numerical observations confirm that this regime exists: Figure 3 shows the stability of the rank for low values of ρ. Here, we have rank(R λ ) = 5 (resulting from r = 5). We plot the evolution of the rank of R λ,ρ w.r.t. ρ, averaged over 50 randomized tries (with positive amplitudes), and minimal separation distance ∆ min We see that for low values of ρ, we have rank(R λ,ρ ) = rank(R λ ). We lose some stability as the minimal distance between spikes decreases.

Support recovery
The rank stability result however do not provide explicit information on the behavior of the roots of the Laurent polynomialP λ,ρ (see Section 2.2), in which we are interested to recover the support. Although such a link may exist, we do not have any theoretical results. Nonetheless, we observe experimentally that for reasonable values of ρ, we are still able to recover the roots ofP λ,ρ which derive from unit-modulus roots ofP λ , see Fig 4. Moreover, in such regimes, the argument of those new roots appear to be sufficiently close to the new ones to still allow for accurate support recovery. When ρ = 0, support is recovered from double roots ofP λ , which are of unit modulus. When ρ > 0, these may split but remain identifiable (in the sense that they are close to the unit circle) when ρ is small. We use a manually tuned tolerance around the unit circle to detect them.

Fast Fourier Transform-based Conditional Gradient Solver
In this section, we take advantage of the low-rank property of the solutions as well as of the convolutive structure of the Toeplitz constraint to build a robust and efficient numerical scheme for solving (P (m) λ ). In the following, we note

Conditional gradient
Conditional gradient algorithm, aka Frank-Wolfe algorithm [15], aims at minimizing a convex and continuously differentiable function f over a compact convex subset K of a Hilbert space. The essence of the method is as follows: linearize f at the current position x r , solve the auxiliary linear problem of minimizing s → ∇f (x r ), s on K, and move toward the minimizer to obtain the next position. This scheme ensures the sparsity of its iterates, since the solution after k iterations is a convex combination of at most k atoms. We refer to [20] for a detailed overview of the method.
A remarkable asset of the method is the simplicity of the linear minimization step, which amounts to extracting an extremal point of the set K. When the latter consists in the convex hull of an atomic domain, this subproblem may thus be solved efficiently. In particular, the positive semidefinite cone being linearly spanned by unit-rank matrices, the linear minimization in this case results in computing a leading eigenvector of ∇f , which can be done using power iterations.

Super-resolution algorithm
We propose a Frank-Wolfe scheme to minimize f over the cone of positive semidefinite matrices. In order to boil the problem down to optimization over a compact set, we impose an upper bound D 0 on the trace of the iterates. This does not change the solution of the problem provided that tr R λ,ρ D 0 . In practice, one may choose D 0 = f (0).
It is known [20] that if s is a solution of the linear minimization oracle, then it satisfies the inequality x − s, ∇f (x) f (x) − f (x ) for any x. We use this property as a stopping criterion, ceasing the iterations if x − s, ∇f (x) /f (x 0 ) goes below some tolerance ε.
To take advantage of the low-rank structure of the solutions, we store our iterates as R = U U h . Consequently, at each step of the algorithm, the update consists in adding a column (namely a leading eigenvector of ∇f , as mentioned above) at the end of the matrix U , thus increasing by one the rank of R each time. We further add a non-convex step similar to [1], consisting in a gradient descent on F : s → f (ss h ), which can only improve the speed of convergence of the algorithm. We use a BFGS descent in our implementation, see Algorithm 1.

Algorithm 1 Recovering dual polynomial
Both the positions and amplitudes of the solution may then be recovered from the coefficients p λ . Algorithm 2 recalls the main steps of the reconstruction process. As explained in Section 4.2, in order to recover the correct roots ofP λ,ρ , we fix a tolerance δ around the set of roots whose components are all of unit-modulus, in which they should be located for small enough values of ρ.

Fast-Fourier-Transform-based computations
Step 1. in Algorithm 1 simply amounts to computing a leading eigenvector of ∇f (U r U h r ), using power iterations, which is based on matrix-vector products only. Given our objective f defined in (5), the gradient reads The sole costly operations of the power iterations steps thus consist in (i) computing T (U r U h r ) the projection of U r U h r on Toeplitz tensors, (ii) multiplying the latter with a vector.
Both steps can be done efficiently using only Fast Fourier Transforms: indeed, one can show that the projector in (i) has a simple factorization involving only padding, un-padding, FFT and inverse FFT operators; on the other hand, the Toeplitz-vector product (ii) may also be computed using FFT and padding operators, since multiplication with a Toeplitz matrix is simply a circular convolution combined with zero-padding. Proof.
Proof. The periodic convolution c * x can be written as the product between the circulant matrix C whose diagonal values are given by c, and the vector x. In that case, one has C · x = F −1 ( F(c), F(x) ). The product Toep t · x on the other hand corresponds to an aperiodic convolution, and as such can be equivalently reformulated as a periodic convolution between t and a zero-padded version of x, hence the result.
Remark 7. Let U be a (n d + 1) × r-matrix and v a (n d + 1)-vector. Regarding the computation of α r , β r for the linesearch step 2. in Algorithm 1, we note that We thus need to solve (α r , β r ) = argmin α,β∈∆ x + y 1 . This problem admits a closed form solution.

Complexity
At step r in algorithm 1, the iterate s r has size (n + 1) d × r, with n = 2f c + 1.
Step 1 is done with power iteration, using the FFT-scheme described in section 5.3, and costs O(n d (r + log n)) per iteration, hence O(C PI (n) · n d (r + log n)) overall, where C PI (n) depends on the number of iterations. The BFGS iterations in Step 3 costs O(C BFGS (n) · n d (r + log n)). Hence, for r iterations, the overall costs is O((C PI (n) + C BFGS (n)) · rn d (r + log n)). The dependency of the factors C PI and C BFGS with respect to n is still unclear; C BFGS should at least decrease as n increases, since the problem gets better conditioned for larger values of f c .

Numerics
Power Iteration step The tolerance for the power iteration step is set to 10 −4 , with a maximum of 500 iterations. In order to determine the lowest singular value of ∇f (simultaneously with a corresponding singular vector), one may need to run the algorithm two times consecutively: if after the first run, the algorithm returns a value λ 1 < 0, this is indeed the lowest one (since power iterations retrieve the singular value whose magnitude is the greatest); if not, we re-run it on ∇f − λ 1 I, and add the resulting value to λ 1 to obtain the lowest singular value of ∇f .

BFGS step
We use Mark Schmidt's code for the BFGS solver [31]. The tolerance for this step is set to 10 −8 (in terms of functions or parameters changes), with 500 as the maximum number of iterations (except in Figure 5). This step is crucial to ensure the finite convergence of the algorithm, see Fig. 6. When ρ tends to zero, plain Frank-Wolfe steps become significantly insufficient, and the number of BFGS iterations necessary to converge at each step dramatically increases, Fig. 5.
Root-finding Because of the Toeplitz relaxation described in Section 4, the quality of our reconstruction depends on our ability to reliably detect the roots of P λ,ρ that are truly associated with a Dirac. In our simulations, we manually fix a tolerance δ around the unit circle in which we look for those roots, as already mentioned. In practice, we take δ of the order of 10 −2 . In future works, it is important that we find a sharper criteria to separate the r-th root from the (r + 1)-th one.
In dimension 1, the root-finding step may be done using a standard polynomial root-finding algorithm, whose complexity is of the order of O(f 3 c ). However, for multivariate polynomials, the problem of root-finding becomes numerically challenging, and could be a bottleneck for our approach. To our knowledge, two methods are possible to determine the support, only one we implemented: • we locate the extremas of the dual polynomial using a Newton algorithm. As an initialization, we select points on a chosen grid such that the modulus of η λ,ρ at those points is not farther than δ from 1; in our experiments, we use a grid of size 200 × 200. Finally, we average the different clusters of points returned by the Newton steps to get the locations. An example of reconstruction is given in Fig. 7.
• it is also possible to determine the support of the initial measure at a lower cost directly from the moment matrix R λ , using tools from algebra. Schematically, this is done by solving a generalized eigenvalue problem, whose associated matrices are morally of size r×r, r being the sparsity [25,26]. Such methods are already implemented in the software Gloptipoly [18] for Number of BFGS iterations 10 4 Mean over 50 trials, min = 1/f c Figure 5: r = 5, f c = 13, d = 1, λ = 0.05, ||w|| = 0. Left: Error between µ λ,ρ and µ λ , measured with the dual bounded Lipschitz norm (defined as sup | T f dµ| ; lipf + ||f || ∞ 1 , and also known as flat-norm [14]), with respect to ρ. µ λ is computed using CVX. Right: Total number of BFGS iterations with respect to ρ. Here we fix the maximum number of BFGS iterations by Frank-Wolfe iteration to 10 5 . The tests are done over randomized positive measures, with minimal separation distance 1/f c . There is obviously a sweet spot to find between reasonable computational cost and sufficient quality reconstruction. real measures, and can be readily extended to measures on the torus. We did not investigate these methods further yet, but they are likely to provide a viable alternative to root-finding techniques. However, it is still unclear what would be the effect of the Toeplitz relaxation on the recovery algorithm.  Figure 6: r = 5, f c = 8, d = 2, m = 2f c + 1, λ = 5.10 −1 , ||w|| = 5.10 −4 ||y 0 ||, ρ = 50. We plot the relative error between the energy associated with (P λ,ρ ) and the energy at the optimal point for (P λ ) for ten randomized trials with same signal-to-noise ration and same intial measure, and with positive amplitudes. In almost all cases, finite convergence appears to occur after r iterations of Frank-Wolfe. In the noisy case, positions are accurately recovered, but not amplitudes, and a spike is missed (its amplitude is too low).