Masked unbiased principles for parameter selection in variational image restoration under Poisson noise

In this paper we address the problem of automatically selecting the regularization parameter in variational models for the restoration of images corrupted by Poisson noise. More specifically, we first review relevant existing unmasked selection criteria which fully exploit the acquired data by considering all pixels in the selection procedure. Then, based on an idea originally proposed by Carlavan and Blanc-Feraud to effectively deal with dark backgrounds and/or low photon-counting regimes, we introduce and discuss the masked versions—some of them already existing—of the considered unmasked selection principles formulated by simply discarding the pixels measuring zero photons. However, we prove that such a blind masking strategy yields a bias in the resulting principles that can be overcome by introducing a novel positive Poisson distribution correctly modeling the statistical properties of the undiscarded noisy data. Such distribution is at the core of newly proposed masked unbiased counterparts of the discussed strategies. All the unmasked, masked biased and masked unbiased principles are extensively compared on the restoration of different images in a wide range of photon-counting regimes. Our tests allow to conclude that the novel masked unbiased selection strategies, on average, compare favorably with unmasked and masked biased counterparts.


Introduction
Poisson noise is a very pervasive noise statistics in imaging problems, as it arises whenever the acquired data is formed by counting the number of photons irradiated by a source and hitting the image domain [3]. Typical applications where the Poisson noise removal is a particularly relevant problem are astronomical and microscopy imaging; both scenarios are characterized by a low-light condition, which is intrinsically related to the acquisition set-up in the former case [22], while in the latter it is somehow preferable so as to preserve the specimen of interest [18]. However, the weaker the light intensity, the stronger the degradation in the acquired images.
The image formation model under Poisson noise corruption can be formulated in vectorized form as follows where y ∈ N n ,x ∈ R n + and b ∈ R n + -with N and R + denoting the sets of natural numbers including zero and of non-negative real numbers, respectively-are vectorized forms of the n 1 × n 2 observed degraded image, unknown uncorrupted image and (typically) known background emission image, respectively, with n := n 1 n 2 . The blurring matrix H ∈ R n×n models the action of a known linear blurring operator, which typically arises in the context of astronomical and microscopy image processing. Finally, poiss(λ) := poiss(λ 1 ), . . . , poiss(λ n ) T , with poiss(λ i ) indicating the realization of a Poisson-distributed random variable with parameter (mean)λ i . Henceλ ∈ R n + -as all entries of H are non-negative-is the vectorized form of the n 1 × n 2 noise-free degraded image.
The problem of recoveringx starting from the knowledge of the observation y and of the acquisition model in (1) is typically ill-posed, so that one rather seeks for an estimate x of the target imagex which solves a well-posed problem as close as possible to the original one. In the variational framework, the estimate x is obtained as a global minimizer of a cost (or energy) functional J : R n → R typically taking the form x(µ) ∈ arg min where R and F are the so-called regularization and data fidelity term, respectively, while µ > 0 is the regularization parameter balancing the action of F and R in the overall functional. The data fidelity term F(λ; y) measures the distance between λ and y with a metric induced by the noise statistics. In presence of Poisson noise, based on the maximum likelihood estimation approach (see, e.g. [5]), the fidelity term is typically set as the generalized Kullback-Leibler (KL) divergence between λ and y, that is F(λ; y) = KL(λ; y) := where F(λ i ; y i ) = λ i − y i ln λ i + y i ln y i − y i if i ∈ I + λ i if i ∈ I 0 := I \ I + , On the other hand, the regularization term R(x) encodes information or beliefs that may be available a priori on the target image x, such as smoothness or sparsity properties. One of the most popular regularizers in image processing is the total variation (TV) semi-norm [19], which reads where (∇x) i ∈ R 2 represents the discrete gradient of image x computed at pixel i. The TV term, which induces sparsity of gradient magnitudes, is known to be particularly effective for the restoration of piece-wise constant images; however, it is also well-established that the TV regularizer tends to promote edges thus producing the so-called staircasing effect on the smooth parts of the image. As a way to overcome the classical drawbacks of TV, one can employ the TV 2 regularizer [17] defined by with (∇ 2 x) i ∈ R 2×2 indicating the discrete Hessian of image x at pixel i and ∥ · ∥ F denoting the Frobenius norm. The TV 2 regularizer promotes piecewise-affine structures in the image, however its ability to recover sharp edges is less than TV. A particularly flexible regularization term that couples the benefits of the TV and of the TV 2 regularizers while mitigating their shortcomings is the popular total generalized variation (TGV) [14], in particular its second-order version which reads (7) where w = (w 1 ; w 2 ) with w 1 , w 2 ∈ R n , w i := (w 1,i ; w 2,i ) ∈ R 2 and (Ew) i ∈ R 2×2 denotes the discrete symmetric Jacobian of vector field w at pixel i, with α 0 , α 1 being positive parameters.
Besides the choice of suitable data and regularization terms, the selection of the regularization parameter µ can highly influence the quality of the output restoration. Generally speaking, a criterion for the selection of µ in variational models of the form (2) can be formulated as follows Select µ = µ such that C( x( µ)) is satisfied , where x(µ) : R ++ → R n , with R ++ = R + \ {0}, is the image restoration function introduced in (2) and C(·) is some selection criterion or principle.
Historically, the criteria proposed for the µ-selection under Poisson noise corruption have been designed so as to extend the particularly wide literature for the parameter selection in presence of additive white Gaussian noise. In this perspective, some attempts have been made in order to modify the generalized cross validation function, originally proposed in [10], so as to be applied to the case of Poisson noise [13,24]. Nonetheless, such strategies have never been significantly employed in imaging problems.
A particularly successful strategy, proposed by Bertero and co-authors in the seminal works [4,26,27], extends the popular discrepancy principle (DP) to Poisson noise scenarios. Ultimately, one requires that the KL fidelity term, evaluated at x(µ) and regarded as a function of µ, equals a given discrepancy value. In [4,26,27] this value is obtained by approximating the expected value of the KL term now interpreted as a function of the Poisson-distributed random vector Y, of which the observed image y has to be considered a realization. This instance of the DP is easy to implement and relies on pleasant theoretical guarantees-see, e.g. [21,23,28]. However, it is also well established that it does not yield a good selection in low photoncounting situations. This has been ascribed to the approximation of the original underlying expected value, that several strategies tried to overcome [5,7].
Different methods rely on transforming the noise statistics. More specifically, in [2] the authors consider a Gaussian approximation of the noise and then employ the popular Morozov DP. More recently, a criterion based on the whiteness property of a standardized Poisson distribution has been proposed [6].
In [9], the authors raise the possibility that the sub-optimal results obtained via the DPs in [2,26] can be explained in light of the fact that the pixels corresponding to zero-entries in the data should not be considered when solving the corresponding non-linear equations. Nonetheless, such modified principles still rely on modeling the corrupting noise as a Poisson distribution that does not explicitly encode the discarding of the zero-pixels, thus introducing a bias in the final formulation of the strategies.

Contribution
In this paper, we review, discuss and compare experimentally all the existing state-of-the-art principles and five new ones for the automatic selection of the regularization parameter µ in the class of so-called R-KL variational models for image restoration, with R being a general closed, proper and convex regularization term such as, e.g. (5)-(7). More specifically, our contribution can be summarized as follows.
(a) We provide a detailed and organic review of the most popular and effective existing unmasked principles (including the two very recent ones proposed in [5,6]), which fully exploit the information in the data without discarding any pixel. (b) Inspired by the idea originally proposed in [9] to deal with low photon-counting scenarios, we introduce and discuss the masked biased versions of the previously reviewed unmasked principles, some of them already proposed in [9] and other new. These approaches come from simply discarding the zero-pixels in the acquired image when applying the principles while, at the same time, keeping the (Poisson) distribution of undiscarded data unchanged. (c) As the main contribution of this work, we propose a whole new class of masked unbiased selection criteria based on the introduction of a novel positive Poisson distribution which suitably models the data statistics after discarding the zero-photon pixels. A theoretical analysis of the biases eliminated by using the new unbiased principles is also carried out. (d) As a final contribution, we extensively and reliably test the three categories of principlesi.e. the unmasked, masked biased and novel masked unbiased strategies-on the restoration of different images, using different regularizers and in a wide range of different photon-counting scenarios, from very low to high counting situations.
The computed examples strongly indicate that, on average, the proposed masked unbiased criteria outperform both their unmasked and their masked biased counterparts, especially in the lowcounting regime.
The paper is organized as follows. In section 2 we review and discuss the most relevant existing unmasked principles, while the associated masked biased criteria are outlined in section 3. The novel positive Poisson distribution as well as the resulting masked unbiased approaches are illustrated in detail in section 4. Numerical solution of the class of R-KL variational models by the alternating direction method of multipliers (ADMM) is dealt with in section 5. Extensive numerical tests assessing the performance of the considered principles are carried out in section 6. Section 7 concludes our work with some final considerations and outlook for future research.

Unmasked principles
In this section, we recall and outline the four most relevant unmasked parameter selection principles for Poisson noise proposed in literature so far, namely those principles which fully exploit the information encoded in the observed image y without discarding the zero-photon pixels. To this aim, first we introduce the µ-dependent image which represents, for each selected µ value, an estimate of the unknown true noise-free image λ defined in (1), obtained by solving the R−KL variational model in (8). We also introduce the true and estimated standardized images where all operations in the above definitions have to be intended component-wise.
It can be proved-see Proposition 1 in [6]-that the original matrix (or image) form of vector z above is the realization of a 2D white (i.e. uncorrelated) random field. In particular, each entry z i of z is the realization of a scalar random variable with zero mean and unitary variance.

The approximate DP
The abstract form of the DP applied to selecting the regularization parameter µ in the class of R−KL variational models defined in (8) is as follows: where the last equality and the scalar ∆ ∈ R + in (11) are commonly referred to as the discrepancy equation and the discrepancy value, respectively, while the discrepancy function D( · ; y) : R ++ → R + is defined by with the function F and the estimated noise-free image λ(µ) defined in (4) and (9), respectively.
The DP formalizes a simple idea: choose a value µ of the regularization parameter µ in the R−KL variational model in (8) such that the value of the KL data fidelity term associated with the solution x( µ) is equal to a prescribed discrepancy value ∆.
The direct extension of the Morozov DP-originally designed for additive white Gaussian noise-to the case of Poisson corruption consists in selecting ∆ as the expected value of the KL fidelity term in (3) regarded as a function of the n-variate Poisson-distributed random vector Y (of which the deterministic measure y ∈ N n is a realization). This version of the DP for Poisson noise, that we refer to as the exact or expected value DP (EDP), can be formalized as follows where E F λ i (µ); Y i denotes the expected value of F λ i (µ); Y i regarded as a function of the Poisson-distributed scalar random variable Y i . Nonetheless, unlike the Gaussian case, the discrepancy value is not a constant but is a function ∆ (E) (µ) of the regularization parameter µ, and deriving its analytic expression is a very hard task. A popular and widespread strategy, originally proposed in [26] for denoising purposes and extended in [4] to the image restoration task, replaces the exact expected value function ∆ (E) (µ) with a constant approximation coming from truncating its Taylor series expansion. We refer to this DP version as approximate DP (ADP). It reads: The popularity of the ADP mainly relies on the strong theoretical guarantees that it brings along: in fact, existence and uniqueness of the solution of the discrepancy equation in (14) can be proven under very mild conditions. However, it is well-established (see, e.g. [5]) that the ADP tends to return either over-smoothed or under-smoothed restorations in low photoncounting scenarios.

The quadratic DP
In the volume where the ADP has been originally proposed, a different selection criterion also inspired by the Morozov DP has been published. Instead of approximating the expected value of the KL fidelity term, in [2] the authors propose to directly approximate (quadratically) the KL term in such a way that the expected value of the approximate term admits a simple closed-form expression. The approximation reads with the introduced function F (Q) (approximating the function F in (4)) defined by The quadratically approximated version D (Q) of the exact discrepancy function D defined in (12) and used in the ADP in (14) thus reads By regarding F (Q) in (15) as a function of the Poisson-distributed random variable Y i with mean λ i , it is immediate to prove that [12] Hence, the DP version proposed in [2], referred to as quadratic DP (QDP), reads

The nearly exact DP
The two aforementioned strategies perform an approximation either on the discrepancy function or on its expected value. As anticipated, the latter scenario is known to return poor quality results especially in low counting regimes when the considered approximation becomes particularly rough. This issue has been first commented in [4], where the authors state (in remark 3) that the choice of the constant value δ (A) = 1/2 in the ADP in (14) may not be 'optimal' and suggest to replace it with 1/2 + ε, where ε is a generic small positive or negative real number. Later, in [7] the authors proposed to introduce a non-constant ε, which is function ε(µ) of µ and that is set as the sum of the second to tenth terms of the same Taylor expansion used in [26]. However, such expansion converges only for λ approaching +∞-see [5]-and cannot aspire to improve the performance of ADP in low-count regimes. Recently, in [5] the authors proposed a novel and more accurate approximation of the discrepancy value ∆ (E) in the EDP in (13) based on Monte Carlo simulation and nonlinear fitting. More specifically, on a fine grid of selected λ i , a very large number of pseudo-random realizations of random quantity F(λ i ; Y i ) are generated. By computing the sample mean of such realizations for each λ i one gets back estimates for E [F(λ i ; Y i )]. A weighted least square fitting is then employed to infer a nearly-exact estimate δ (NE) (λ) of the exact expected value function δ (E) (λ) used in (13), namely Based on (18), the novel DP version proposed in [5], referred to by the authors as the nearly exact DP (NEDP), takes the form

The whiteness principle
The popularity of the DP when the underlying noise is Gaussian has motivated the introduction of the three unmasked selection principles described above for Poisson noise. Besides the DP, a successful selection strategy in the Gaussian noise case consists in choosing µ so as to maximize the whiteness of the residual image or, equivalently, minimize the cross-correlation between its entries [1,15].
In the recent paper [6], the authors extended the above principle, referred to as the whiteness principle (WP), to the case of Poisson noise. In this scenario, the WP can be applied thanks to the introduction of the standardized images z and z(µ) defined in (10). In fact, it can be proved [6] that z is the realization of a white random field and, hence, it makes sense to seek for µ maximizing the whiteness of the estimate z(µ).
The WP for Poisson noise thus reads: where S(z) denotes the 2D normalized auto-correlation of image z (see [6] for details).

Masked biased principles
After noting that the ADP and QDP principles defined in (14) and (17) can yield sub-optimal results in case of many zero-photon pixels, in [9] the authors proposed masked versions of those principles based on simply discarding all pixels measuring zero photons-i.e. pixels for which y i = 0. We refer to these masked principles as biased since they do not consider that by carrying out a selection of pixels based on the value of the noise realization should require to change the statistics of the selected pixels, as it will be illustrated in section 4. The masked versions of the exact discrepancy function D in (12) used in the ADP (14) and of the quadratically approximated discrepancy function D (Q) in (16) used in the QDP (17)indicated by D + and D (Q) + , respectively-take clearly the following forms (21) with functions F and F (Q) defined in (4) and (15), respectively. Hence, based on their unmasked versions in (14) and (17), the ADP and QDP masked biased principles proposed in [9]-that we shortly refer to as ADP-MB and QDP-MB, respectively-can be formulated as follows: where n + indicates the cardinality of set I + , namely the number of non-zero pixels. Also the NEDP in (19), which was proposed after [9], admits a masked biased version (NEDP-MB), which clearly reads: Finally, by introducing the masked versions of the standardized image z(µ) in (10), namely the masked biased version of the WP (WP-MB) can be formulated as follows with function W(z) defined as in (20).

The proposed masked unbiased principles
In the next subsection we introduce and analyze a novel scalar discrete distribution, called positive Poisson distribution, which correctly models the statistics of non-zero pixels considered in the masked selection principles. Based on such distribution, in subsections 4.2-4.5 we introduce the novel masked unbiased principles.

Positive Poisson distribution
In the following definition 4.1 we recall the Poisson distribution and introduce the positive Poisson distribution, then in proposition 4.2 we outline some important properties of positive Poisson-distributed random variables.

Definition 4.1 (Poisson and positive Poisson random variables).
The expected value, variance and second-order raw moment of Y are given by The discrete random variable Y + defined by is said to be positive Poisson-distributed with parameter λ and denoted by Y + ∼ P + (λ).

Proposition 4.2.
Let Y ∼ P(λ) and Y + ∼ P + (λ), with λ ∈ R ++ , and let T, V : R ++ → R be the functions defined by Then, the probability mass function, expected value, second-order raw moment and variance of the positive Poisson-distributed random variable Y + read (34) Moreover, for λ tending to 0 and λ tending to +∞, we have Proof. It easily follows from definition (28) that, for any λ ∈ R ++ , the probability mass function P Y+ (y) of Y + ∼ P + (λ) is a (positively) scaled version of the probability mass function By imposing that the probability mass function P Y+ sums to one, it is easy to prove that α(λ) in (39) coincides with function T(λ) in (29), thus demonstrating (30): Then, it easily follows from (30) that the mth order raw moments of Y + are given by for any m ∈ N. By specifying the above formula for m = 1 and m = 2, one gets (31) and (32), respectively. It follows from (31)-(32) that where the last equality in (41) comes from the definition of functions T and V in (29) and from recalling that Then, the inequalities in (34) and the four asymptotic properties (for λ → +∞) in (35)-(38) come from (30)-(33) and the following easily provable-see the plots in figure 1-properties of functions T and V defined in (29): It is now clear that the Poisson and the positive Poisson random variables are characterized by significantly different statistical properties, especially for small values of the parameter λ. In this perspective, proposition 4.2 already warns on the approximations that the masked formulations of the principles given in section 3 bring along. An analysis of the introduced biases will be carried out in subsection 4.6.
In the next subsections, we are going to show how the newly introduced positive Poisson distribution can be adopted so as to formulate masked unbiased versions of the original principles reviewed here.

Masked unbiased approximate DP
As previously outlined, the approximate discrepancy value δ (A) = 1/2 used in the ADP relies on truncating at the first order the Taylor expansion of E [F (λ; Y)], with Y a Poisson-distributed random variable with mean λ. It can be proved that, in the masked unbiased case (where Y is replaced by Y + ), the expected value E [F (λ; Y + )] admits a Taylor expansion which also coincides with 1/2 when truncated at the first order. Hence, masked biased and masked unbiased versions of the ADP coincide; in what follows, they will be indistinctly referred to as ADP-M.

Masked unbiased quadratic DP
In light of statements (31) and (33) in proposition 4.2, we introduce the function Relying on the properties of the novel positive Poisson distribution, it is easy to observe that After introducing the unbiased version of the masked quadratically approximated discrepancy function D (Q) + defined in (21) and used in the QDP-MB (23), namely we get the following formulation for the masked unbiased QDP, referred to as QDP-MU,

Masked unbiased nearly exact DP
The masked unbiased version of NEDP (NEDP-MU) is obtained-analogously to the unmasked NEDP illustrated in [5]-by applying the weighted least squares fitting method to approximate the behavior of the sample means of large numbers of realizations of random quantity F (λ i ; Y +,i ), with F defined in (4). We thus get: with

Masked unbiased whiteness principle
We start noticing that the standardized image z + (µ) in (25), which comes from a blind masking of the original z(µ) cannot be considered a realization of a white random process. Therefore, we introduce the novel standardized image One can clearly observe that, in light of the results summarized in proposition 4.2, z + is a realization of a white random process, thus suggesting the following formulation for the masked unbiased version of the WP (WP-MU) with function W(z) defined as in (20).

Analysis of bias
In light of the introduced unbiased masked principles, in this section we carry out some analysis of the pixel-wise biases of the masked biased principles illustrated in section 3. To this purpose, first we give the following result.
Proposition. Let Y ∼ P(λ) and Y + ∼ P + (λ), with λ ∈ R ++ , and let Z , Z Then, it holds true that Var Z Proof. First, the fact that Z (U) + has zero mean and unitary second-order raw moment and variance comes in a straightforward manner from its definition.
Then, for what concern Z (B) where the last equality in (49) comes from replacing the expression of E[Y + ] given in (31). Then, by recalling also the expression of E Y 2 + in (32), we have that Finally, based on (49) and (50), the variance in (47) can be computed as follows In what follows, we compare the masked biased and masked unbiased versions of the QDP, NEDP and WP in terms of some pixel-based bias functions. We recall that for the ADP the bias has to be considered constantly zero as the masked biased and the masked unbiased versions coincide.
As the definition of the QDP, in its unmasked, masked biased and masked unbiased version, is related to the behavior of the sample variance of the noise realization vector approximated by the residual image, we introduce the following bias function β QDP : R ++ → R to measure the inaccuracy introduced by the QDP-MB at each pixel In the case of NEDP, the bias can be measured in terms of the difference between the building functions used to approximate the behavior of the exact expected value δ (E) (λ) arising in the EDP (13). We thus introduce the pixel-based bias function β NEDP : R ++ → R which is defined as with ϵ(λ), ϵ U (λ) given in (18), (44), respectively.
Finally, for what concerns the WP, we point out that measuring the bias in terms of the autocorrelation of the normalized random variables Z (B) + , Z (U) + -that would be the most natural choice in this scenario-is unfeasible; hence we rather measure how far is Z (B) + from being a zero-mean random variable with constant (unitary) standard deviation. In other words, we introduce the two pixel-based bias functions β WP,η , β WP,σ : R ++ → R defined as

Numerical methods for application of the selection principles
The application of all previously reviewed and newly proposed selection criteria rely on numerically solving the R−KL variational model in (8) which, we remark, is convex under the assumption that the regularization term R is convex (this is the case of (5)-(7) regularizers). Efficient solvers based on the ADMM [8] for the R-KL model when R is the TV, TV 2 or TGV 2 regularizer-that is, for the so-called TV-KL, TV 2 -KL and TGV 2 -KL models-have been proposed, e.g. in [6,16,20], respectively. However, for the sake of completeness of the presentation as well as of reproducibility of the results that we will present in the next section, in this section we recall the main concepts and computational steps of the ADMM applied to the three models. In particular, we are going to show how the TV-KL, TV 2 -KL and TGV 2 -KL models can all be equivalently reformulated as two-blocks separable optimization problems with linear constraints, which can be solved by standard two-blocks ADMM schemes with guaranteed convergence. Recalling the definitions of the R−KL model in (8) and of the TV, TV 2 and TGV 2 regularizers in (5)- (7), and by introducing the three matrices with D h , D v , D hh , D vv , D hv , D vh ∈ R n×n finite difference matrices discretizing the first-order partial derivatives of the vectorized n 1 × n 2 image x in the horizontal and vertical direction and the second-order partial derivatives of image x in the horizontal, vertical, mixed horizontalvertical and mixed vertical-horizontal directions (with D vh = D hv ), respectively, the TV-KL, TV 2 -KL and TGV 2 -KL models can be equivalently written as x(µ) ∈ arg min x(µ), w(µ) ∈ arg min respectively, where ι R n + (x) denotes the indicator function of the non-negative orthant R n + and where, with a little abuse of notation, we indicate by (D 1 x) the discrete gradient and the vectorized discrete Hessian of image x at pixel i, respectively. Moreover, we denote by (D S w) i Then, by introducing for the models in (55)-(57) the auxiliary variable u defined in the three cases, respectively, by and setting t = x for TV-KL and TV 2 -KL, t = (x; w) for TGV 2 -KL, it is easy to verify that all the three models can be equivalently reformulated as the following standard two-blocks (additively) separable minimization problem with linear constraints:

t(µ), u(µ) ∈ arg min t,u
{ C 1 (t) + C 2 (u; µ) } subject to: In particular, for TV-KL, TV 2 -KL and TGV 2 -KL models, functions C 1 , C 2 read, respectively, and matrices M 1 , M 2 take the form, respectively, The Lagrangian function L and augmented Lagrangian function L γ associated with problem (58) read where ρ is the vector of Lagrange multipliers associated to the system of linear constraints in (58) and γ ∈ R ++ is a penalty parameter. Solving problem (58) amounts to seek the saddle point(s) {t * (µ), u * (µ), ρ * (µ)} of the augmented Lagrangian L γ in (64) which, according to the standard two-blocks ADMM [8], can be computed as the limit point of the following iterative procedure: After recalling from (59)-(62) that C 1 (t) = 0 and M 2 = −I , it is immediate to prove that for the three models the t-update step in (65) takes the same following form that is, t (k+1) is obtained by solving a linear system with coefficient matrix M T 1 M 1 . This matrix is symmetric and positive definite-hence, non-singular-in all the three cases and, by assuming periodic boundary conditions for all the involved finite difference matrices, the linear system can be solved very efficiently based on the 2D discrete Fourier transform, implemented by 2D fast Fourier transform (see, e.g. [6,16,20]). We note that t (k+1) = x (k+1) for TV-KL and TV 2 -KL, t (k+1) = (x (k+1) ; w (k+1) ) for TGV 2

-KL.
For what regards the u-update in (66), it is easy to verify that it takes the form Hence, for TV-KL and TV 2 -KL we define ρ (k) = ρ (k) , whereas for TGV 2 -KL we set ρ (k) = ρ (k) For all the three models the two subproblems for variables u 1 , u 2 ∈ R n admit the same pixel-wise close-form solutions which, after introducing the two vectors and setting τ = µ/γ , read, respectively, for i = 1, . . . , n. Also the third subproblem for variable u 3 , after introducing the vector 3,i ∈ R 2 for TV-KL and TGV 2 -KL models, u Finally, for the TGV 2 -KL model, the fourth subproblem for variable u 4 ∈ R 4n can also be solved in pixel-wise closed-form based on the ℓ 2 -norm proximal map; in formula, 4,i ∈ R 4 . To conclude, in the following proposition 5.1 we apply to the three ADMM schemes outlined above (solving the TV-KL, TV 2 -KL and TGV 2 -KL models) a general and classical convergence result for the two-blocks ADMM given in the seminal paper by Eckstein and Bertsekas [11]. Proof. For the TV-KL, TV 2 -KL and TGV 2 -KL variational models considered, for any µ ∈ R ++ the functions C 1 , C 2 defined in (59)-(61) are all clearly proper, closed and convex and the matrices M 1 , M 2 defined in (62) are such that M 2 is the negative identity and M 1 has full (column) rank. Moreover, the two ADMM minimization subproblems for the primal variables t, u in (65) and (66) are solved exactly by formulas in (68)-(71). Hence, by applying the classical convergence result for the two-blocks ADMM given in [11] (Theorem 8), the proof of the statement follows easily.
In order to make the results of the comparison as solid as possible, we act in three directions. First, we consider the three test images satellite, stem and cells shown in figure 3with the associated experiments reported in sections 6.3, 6.4 and 6.5, respectively-which are characterized by different properties and thus allow to test the eleven selection criteria for the three (55-57) image restoration models. Second, for each test image/model we simulate different photon-counting scenarios, ranging from very low-to high-counting ones. Third, for each image/model and each photon-counting level, we consider a number of different (independent) Poisson noise realizations and collect statistics (minimums, maximums and averages) of the quantitative accuracy results achieved by the principles. In particular, we measure the quality of the restored images x(µ) (with respect to the target uncorrupted imagex) obtained by applying the different criteria by means of two accuracy metrics, namely the structural similarity index measure (SSIM) defined in [25] and computed by the Matlab routine ssim and the improved-signal-to-noise-ratio (ISNR) defined by ISNR( x(µ),x, y) = SNR( x(µ),x) − SNR(y,x), SNR(z,x) = 20 log 10 ||x − mx|| 2 ||x − z|| 2 , measured in decibel and with mx denoting the mean value of the target imagex. All tests have been performed in Matlab R2022b, on a Windows 10 Platform. The code is available at https://github.com/MonicaPragliola/MU-principles.
The photon-scaled imagesx κ are then corrupted by space-invariant Gaussian blur, with blur kernel generated by the Matlab routine fspecial, which is characterized by two parameters: the band parameter, representing the side length (in pixels) of the square support of the kernel, and sigma, that is the standard deviation (in pixels) of the isotropic bivariate Gaussian distribution defining the kernel in the continuous setting. In all our tests, we set band = 5, sigma =  1. Then, a constant background emission image b is added to the blurred images, so as to get the nine noise-free degraded imagesλ κ = Hx (k) + b. Finally, for each noise-free imageλ κ , ten different noisy observations are generated by sampling as many independent realizations from an n-variate Poisson random process with meanλ κ , using the Matlab routine poissrnd.
We remark that the factor κ in (72) represents the maximum number of photons that, on average, can hit any pixel of the image domain if no blur degradation (H = I) and a null emission background (b = 0) are considered. In fact, in this case the noise-free imageλ κ -which, we notice, contains the mean values of the Poisson noise distributions at all pixels-is given byλ κ = Hx κ + b =x κ , hence max{λ κ } = max{x κ } = κ. In general, for any given blur corruption and emission background, the factor κ is positively related to the photon-range of the experiment and, recalling that for a scalar Poisson random variable with parameterλ κ = κλ the ratio between its mean (true signal) and its standard deviation (noise level) is equal tō λ, also to the signal-to-noise ratio of the observed degraded image y κ to restore. To highlight clearly the effect of κ on the noise-level in the observation y κ and, hence, on the difficulty of the restoration process, in figure 4 we show the test image satellite corrupted by Gaussian blur and by a realization of Poisson noise for four different values of κ as well as, on the right, the graph of the SNR value of the observation y κ as a function of the factor κ. This graph justifies the non-uniform grid of κ-values considered in (72) (the grid is finer for small κ-values where the SNR changes more rapidly) as well as the maximum value k = 1000 considered (the SNR curve stabilizes, hence taking κ > 1000 does not change significantly the results of the criteria comparison).

Analysis
For each of the three test images and each of the nine photon-level factors κ in (72), the ten generated degraded images y κ (j) are processed as follows. For each y κ (j), we compute the solution of the R-KL variational model for a very fine grid of different µ-values and then, based on the obtained µ-dependent restorations, we apply the eleven different criteria to get the selected regularization parameter values and the corresponding restored images x µ κ C (j) . We then compute and record the associated ISNR and SSIM values denoted by ι κ C (j) and σ κ C (j), respectively, as well as the optimal (i.e. maximum) ISNR and SSIM values achieved on the fine grid of µ-values considered, denoted by ι κ OPT (j) and σ κ OPT (j), respectively. After processing the ten degraded observations y κ (j), we thus get the following sets of quantitavive results: (1), . . . , σ κ OPT (10)}. Then, for each ι κ C (j) ∈ I κ C and each σ κ C (j) ∈ S κ C we compute the percentage difference with respect to the corresponding optimal values ι κ OPT (j) and σ κ OPT (j), respectively, .
The behavior of each selection criterion for a given photon-counting level is thus synthesized by the expected values (or, better, sample means) of the ISNR and SSIM percentage errors achieved for the ten different noise realizations, Moreover, to monitor the variability of the performance of each criterion with respect to different noise realizations, we also compute We remark that, for all the experiments, restoration is performed by means of the R-KL variational model-in particular, TV-KL model for satellite image, TV 2 -KL for stem and TGV 2 -KL for cells-solved numerically by the iterative ADMM schemes outlined in section 5. In all the tests, the ADMM iterations are stopped as soon as the relative change between two subsequent x-iterates satisfies while the ADMM penalty parameter γ is set manually so as to fasten the convergence of the alternating scheme. More specifically, the numerical tests indicated the range γ ∈ [1, 10] as a good choice.

Test image satellite: parameter selection in the TV-KL model (55)
We consider the restoration of the test image satellite. In this first example, we set b ≡ 2 × 10 −3 and, in light of the dominant piece-wise constant features present in the image, we employ the TV regularization term in (5).
We start analyzing the behavior of the expected and limiting values defined in (74) and (75), respectively, within the four classes of ADP-, QDP-, NEDP-and WP-based criteria. In figure 5 for each class we plot the sample means η I κ C corresponding to the different photon-counting regimes κ expressed in log 10 -scale, and also the confidence intervals determined by the limiting values ϵ I κ C , ϵ I κ C . For what concerns the ADP-based strategies, we recall that the MB and MU versions of the principle coincide. Notice that the masked criterion achieves significantly better results as the red band stays below the 30% regardless of the counting regime, while the unmasked method stays above the 80% In the case of QDP-based approaches, one can immediately observe that the percentage differences achieved by the unmasked principle are particularly large for every κ. In the lower counting regimes, i.e. κ ⩽ 5, the MB version returns very poor results, while its performance improves and stays below the 20% for κ ⩾ 20. On the other hand, we highlight that the MU principle presents a very robust behavior along the whole range of counting factors, as the corresponding percentage differences are always below the 30% and approach 0 for the smaller κs.
When analyzing the NEDP-based approaches, one can notice that the MB method returns the poorer results in the low counting regimes while, in expectation, it outperforms the unmasked version in the mid-and high-counting regimes. On the other hand, the MU principle achieves the larger INSR values for all the considered regimes.
Finally, for the WP-based principles we notice that the MB criterion performs poorly in the low-and mid-counting regimes, while the green band stays below the 30% for κ ⩾ 50. One can also observe that the unmasked and the masked unbiased principles present a robust behavior with respect to the considered κ-values, with the unmasked approaching 0 for κ ⩽ 3 and the masked unbiased outperforming the competitors for κ > 3. Figure 6 shows, for each class of methods, the sample means η S κ C and the confidence intervals related to the computed ϵ S κ C , ϵ S κ C , i.e. the performances of the considered principles in terms of the SSIM. One can easily notice that the SSIM bands present the same behavior of the ISNR bands shown in figure 5, so that similar conclusions on the performances of the unmasked, MB and MU criteria can be drawn.
To analyze the results from a different point of view, in figures 7(a)-(c) we show the sample means η I κ C of the unmasked, masked biased and masked unbiased principles, respectively, in the range [0%, 25%], so as to visualize the best performances. In the unmasked category, NEDP and WP are the only principles staying below the 25% for κ ⩾ 20, with WP obtaining better results than the others for the low-and mid-counting regimes. Among the MB principles, QDP-MB and WP-MB stay in the interval of interest only in correspondence of the high-counting regimes, while ADP-M and NEDP-MB stay between 10% and 25% for each κ. Finally, in the MU class, all methods are in the visible range, with WP-MU being the best for κ > 3, followed by QDP-MU which reaches the highest ISNR for κ ⩽ 3.
Figures 7(d)-(f) show the sample means η S κ C of the unmasked, MB and MU principles, respectively, in the range up to [0%, 5%]. The three plots confirm the considerations done for the ISNR about the best method within each group.

Test image stem : parameter selection in the TV 2 -KL model (56)
For the second example, we consider the restoration of the test image stem, with constant background emission b ≡ 2 × 10 −3 , this time using the TV 2 regularization term defined in (6) due to the target image resembling a piece-wise linear function more than a piece-wise constant one. In figure 8 we plot the sample means η I κ C and the confidence intervals corresponding to the different counting regimes κ for the four classes of ADP-, QDP-, NEDP-, and WP-based criteria.
For the ADP-based approaches, one can notice that the masked criterion reaches worse results than the unmasked one, which stays below the 40% for every counting regime.
For what concerns the QDP-based strategies, we can immediately notice that, as for the previous image, the biased masked version returns poor results for lower counting regimes, but it improves for κ ⩾ 50. On the other hand, both the unmasked and the unbiased masked achieve good and similar results, as they return percentage differences that are always below 20% for all factors κ.
In the case of the NEDP-based approaches one can notice a similar behavior to the QDPbased strategies, where the unmasked and unbiased masked achieve analogue results (except a little more variably across the different realization for the unmasked one) staying below the 20% for all κ and the biased masked strategy works poorly for κ ⩽ 50. Finally, for the WP-based principles, all of them return percentage differences less than 25%, with the biased masked working better for the mid-and high-count range, but worst for κ ⩽ 5. On the other hand, the unbiased masked reaches good results in the low-and mid-count range, with the exception of κ = 1 where the sample mean and the variability across the noise realization are higher than the one obtained with the unmasked strategy.
The same observations can be done by analyzing the SSIM results plotted in figure 9.
In figure 10 we show the expected values η I κ C (top row) and η S κ C (bottom row), in the range [0%, 25%] and [0%, 5%] respectively, divided by the type of principle: unmasked, biased masked and unbiased masked. Looking at both the ISNR and SSIM graphs we can note that, in the unmasked category, the best result is achieved by the WP in all the range of κ. Among the biased masked strategies, the WP-MB seems the best as it achieves the highest value of ISNR for κ = 10, 20, but the behavior for κ ⩽ 5 is poorer when compared to the other plots (but is the best in its category). Finally, for the unbiased masked principles, the QDP-MU obtains the best result in terms of ISNR for the low-count regime (even among all the methods), while the WP-MU works well for the mid-and high-range.

Test image cells: parameter selection in the TGV 2 -KL model (57)
In this third example, we consider the restoration of the test image cells by employing the TGV 2 regularization term to effectively deal with the composite nature of the specimen. For the TGV 2 , we set α 0 = 0.8, α 1 = 0.3 so as to maximize the ISNR for the highest counting regime considered here, i.e. κ = 1000. Moreover, we set a constant background emission b ≡ 10 −1 .
In figure 11 we show the ISNR bands achieved by the ADP-, QDP-, NEDP-and WP-based principles. As for the test image stem, the ADP-M strategy performs poorly in the lowest counting regimes, while it outperforms the unmasked version for κ ⩾ 20. On the other hand, the remaining three classes of methods present the same behavior: the MB versions of the principles achieve very low ISNR values in low-and mid-counting regimes, whereas the unmasked and MU principles present a very robust behavior with the latter achieving the best results.
Similar considerations can be drawn by looking at the SSIM bands reported in figure 12. Finally, in figure 13 we show a close-up on the expected values η I κ C , η S κ C in the range [0%, 25%] dividing the principles into unmasked, MB and MU. It is easy to conclude that the MU versions of the QDP, NEDP and WP return the best results both in terms of robustness and of quality measures achieved.

Discussion
The detailed analysis carried out so far allows us to conclude that neglecting the zero-pixels in the acquired images as proposed in [9] can lead to particularly robust and successful parameter selection strategies provided that the proposed positive Poisson distribution is employed to model the undiscarded data. Generally speaking, in terms of quality measures the QDP-MU achieves the best results for κ ⩽ 3, while the WP-MU returns higher quality restorations when κ > 3. Moreover, in accordance with the theoretical results given in proposition 4.2, in the higher counting regimes the performed criteria show similar behaviors. We highlight that the improvements yielded by employing the proposed MU principles with respect to their unmasked versions are particularly relevant in the first example, that is for the test image satellite. To highlight a possible reason behind this phenomenon, in table 1 we report the average percentages of zero-pixels in the acquired images for the counting regimes considered. Such values are clearly influenced by the gray-level statistics of the underlying test images as well as by the selected background emissions. It is immediate to observe that for the test image satellite the number of zero pixels is very large for all the κ-values. As a consequence, masking the data in this scenario is particularly effective.

Conclusions
In the present work, we addressed the parameter selection problem in the general R − KL image restoration variational model under Poisson noise corruption. After recalling the classical unmasked principles typically used for setting the parameter, we discussed the idea of masking the observed data, as originally proposed in [9] to effectively deal with low photoncounting regimes. Nonetheless, we proved that neglecting the zero-pixels in the observed data without modifying the distribution of undiscarded pixels introduces a bias in the resulting principles. Hence, after defining the novel positive Poisson distribution, we proposed the masked unbiased versions of the original criteria. The unmasked, masked biased and masked unbiased strategies have been extensively tested on different images, with different regularization terms and for a wide range of counting regimes. The masked unbiased principles have been proven to outperform, on average, the corresponding unmasked and masked biased counterparts. As a future work, we plan to employ the proposed masked unbiased strategies so as to deal with different imaging problems such as computed tomography reconstruction and image superresolution. Moreover, inspired by [28], we believe that the design of procedures faster than the grid-search employed here for the selection of µ based on the proposed principles is a matter that deserves further investigations.

Data availability statement
The data that support the findings of this study are available upon reasonable request to the authors.