First AI for Deep Super-resolution Wide-field Imaging in Radio Astronomy: Unveiling Structure in ESO 137-006

We introduce the first AI-based framework for deep, super-resolution, wide-field radio interferometric imaging and demonstrate it on observations of the ESO 137-006 radio galaxy. The algorithmic framework to solve the inverse problem for image reconstruction builds on a recent “plug-and-play” scheme whereby a denoising operator is injected as an image regularizer in an optimization algorithm, which alternates until convergence between denoising steps and gradient-descent data fidelity steps. We investigate handcrafted and learned variants of high-resolution, high dynamic range denoisers. We propose a parallel algorithm implementation relying on automated decompositions of the image into facets and the measurement operator into sparse low-dimensional blocks, enabling scalability to large data and image dimensions. We validate our framework for image formation at a wide field of view containing ESO 137-006 from 19 GB of MeerKAT data at 1053 and 1399 MHz. The recovered maps exhibit significantly more resolution and dynamic range than CLEAN, revealing collimated synchrotron threads close to the galactic core.


INTRODUCTION
Image formation in aperture synthesis by radio interferometry (RI) has never been more challenging.On the one hand, extreme data sampling rates, produced by modern radio arrays, raise the urgent need for scalable algorithms.On the other hand, the ill-posedness of the underlying inverse problem calls for tailored regularisation models to be injected in the image formation process in order to deliver the expected precision and robustness of the reconstruction.Over the last decade, regularisation approaches leveraging advanced sparsity-based image models embedded in optimisation algorithms were proposed (e.g.Li et al. 2011;Carrillo et al. 2012;Dabbech et al. 2015;Garsden et al. 2015).In particular, the SARA family of algorithms (Onose et al. 2017;Repetti et al. 2017;Dabbech et al. 2018; Abdulaziz are the projections of each antenna pair baseline on the plane perpendicular to the line of sight.Under these assumptions, the visibility vector y ∈ C M can be modelled as y = Φx + n, with Φ = GFZ, where x ∈ R N is the unknown radio map, whose pixel resolution is often set between 1.5 to 2.5 times below the angular resolution of the observations, to reduce the limitations of pixel-based image restoration.
n ∈ C M is a realisation of a complex random Gaussian noise of mean 0 and standard deviation τ > 0.
Φ ∈ C M ×N denotes the measurement operator, which encodes the incomplete Fourier sampling.More precisely, G ∈ C M × C N denotes a sparse de-gridding matrix whose rows are non-uniform Fourier transform interpolation kernels, F ∈ C N × C N stands for the Discrete Fourier transform, and Z ∈ C N × R N is a zeropadding operator, allowing for a fine grid in the spatial Fourier domain, which also involves a correction for approximations in the convolution kernels of G (Fessler & Sutton 2003).Given the remarkable sensitivity of the modern arrays, the RI measurement equation is further complicated by the so-called direction-dependent effects (DDEs).Some of these are unknown and of either atmospheric or instrumental origin and should be calibrated (Smirnov 2011).In contrast, the DDEs originating from the w component of the antenna pair baselines on the line of sight are known, and induce the so-called w-effect (Cornwell & Perley 1992).DDEs can be encapsulated as additional baseline-specific convolution kernels on each row of G (Dabbech et al. 2017;Repetti et al. 2017).In this work, we propose a new parallelised and automated framework for wide-field high-resolution high-dynamic range monochromatic intensity imaging, which we use to revisit observations of the radio galaxy ESO 137-006, the loudest radio galaxy in the Norma cluster.The remainder of this letter is structured as follows.In section 2, we provide a summary of the proposed framework, from the underpinning algorithmic structure and the two specific incarnations respectively propelled by sparsity-based and AI-based regularisation, to parallelisation and automation functionalities critical to scalability.A description of the utilised RI data of ESO 137 from the MeerKAT telescope is given in section 3. Imaging settings as well as a description of the utilised computational resources are provided in section 4. Imaging results are presented in comparison with a CLEANbased benchmark method in section 5, followed by a discussion on the unveiled structure in ESO 137-006.Finally, conclusions are drawn in section 6.

METHODS
At the algorithmic level, the proposed framework is underpinned by the versatile Forward-Backward (FB) convex optimisation iterative structure (Bauschke & Combettes 2017).At each iteration, FB simply alternates until convergence between a (forward) gradientdescent step promoting fidelity to data and a (backward) step enforcing a prior image model, critical to the regularisation of the inverse problem and the resulting imaging precision (see appendix A).We investigate two incarnations of a recent plug-and-play (PnP) scheme (Venkatakrishnan et al. 2013;Romano et al. 2017), whereby dedicated denoising operators can be plugged into FB as an image regulariser.
The unconstrained SARA (uSARA) algorithm is a pure optimisation variant leveraging a so-called "proximal" denoiser, handcrafted to enforce an advanced sparsity-based image regularisation (Carrillo et al. 2012;Repetti & Wiaux 2021;Terris et al. 2022).The sophistication of the underlying prior image model is precisely introduced to deliver the best possible resolution and dynamic range from the data.The resulting denoiser itself is implemented as an iterative algorithm, leading to an overall sub-iterative FB structure (see appendix B).
The AIRI (standing for "AI for Regularisation in radio-interferometric Imaging") algorithm (Terris et al. 2022) is an AI-based variant leveraging a learned denoiser in the form of a deep neural network (DNN) trained on a rich database to clean Gaussian random noise from high dynamic range images, with a noise level commensurate with the target sensitivity of observation (see appendix C).By design, AIRI inherits the robustness and interpretability of optimisation algorithms and the learning power and speed of DNNs.
Importantly, the degree of refinement with which the uSARA and AIRI image models are enforced is adjusted to the measurement noise τ , more precisely to the corresponding estimate of the noise level in the image domain, τ / √ 2L, which results from a normalisation by the norm L of the measurement operator.In other words, uSARA and AIRI automatically adapt to the input signal-tonoise ratio, or equivalently, the target dynamic range of reconstruction.Last but not least, we emphasise that, by construction, PnP denoisers are completely blind to the measurement conditions underpinning the data to be imaged.As a consequence, the learned variants can be trained once and for all at an appropriate dynamic range, significantly alleviating the associated computation cost.They do not suffer from generalisability challenges with respect to measurement conditions either.This stands in stark contrast with the more traditional end-to-end approaches, where a DNN would be trained to reconstruct an image directly from data (Connor et al. 2022;Terris et al. 2022).
At the high-resolution and high-dynamic range regime of interest, parallelisation and automation functionalities are critical to the scalability of the algorithmic framework.In this context, the image denoisers of uS-ARA and AIRI are decomposed on small image facets with no loss of precision thanks to their convolutional nature and the compactness of the associated kernels (see appendix D).Relying on a hybrid approach to efficiently correct for the wide-field w-effect in both image and data spaces (Cornwell et al. 2005;Wiaux et al. 2009;Offringa et al. 2014;Dabbech et al. 2017), the measurement operator is decomposed into sparse and low-dimensional building blocks (see appendix E).These decompositions are fully automated, enabling a parallel image facet and data block processing, seamlessly adapting to the architecture of the high performance computing (HPC) system where the reconstruction is run.

DATA DESCRIPTION
Both uSARA and AIRI are used to revisit MeerKAT L-band observations of a wide FoV containing the radio galaxy ESO 137-006.MeerKAT (Jonas & MeerKAT Team 2016), located in the Karoo desert of South Africa, is a precursor to the SKA.Its 64 antennas with cryogenic receivers are arranged in a close-packed core and baselines of up to 8 kilometres, resulting in superb sensitivity and imaging quality.The array is particularly suited to study faint extended emission and objects with complex morphology, of which ESO 137-006 represents a "flagship" case.Previous analysis of these observations by Ramatsoku et al. (2020) revealed multiple collimated synchrotron threads (CSTs) connecting the lobes of the radio galaxy, whose origin is yet to be unravelled.
Technical details of the observations and the initial calibration (i.e., reference calibration or 1GC) is reported by Ramatsoku et al. (2020).1GC was performed using the CARACAL pipeline (Makhathini 2018;Józsa et al. 2020).The 1GC-calibrated data are averaged down from 4096 to 1024 channels of 0.84 MHz each, spanning the frequency range 856-1712 MHz.We utilise about 7 hours of on-target time and select two subbands relatively free from radio frequency interference, referred to as the "low" band (961-1145 MHz, centred at 1053 MHz) and the "high" band (1295-1503 MHz, centred at 1399 MHz), to form two continuum images.The respective data sizes after flagging are 8.2 gigabytes (∼532 million data points, double precision) and 10.76 gigabytes (∼673 million data points).The data were then self-calibrated for phase using a combination of the WSClean imager (Offringa & Smirnov 2017) and the CubiCal calibration suite (Kenyon et al. 2018).Attempts to calibrate for the amplitude and the DDEs (Repetti et al. 2017;Dabbech et al. 2021) did not bring a substantial improvement.Therefore, no further data pre-processing was performed.The resulting WSClean images, obtained with the multiscale variant of CLEAN (Cornwell 2008), are presented for comparison purposes.uSARA and AIRI are then used for image reconstruction on these self-calibrated data.

IMAGING SETTINGS & COMPUTATIONAL RESOURCES
The images formed are of size 4096×4096 pixels, spanning the FoV 1.91 × 1.91 square degrees with a cell size of 1.68 arcseconds, for super-resolution factors beyond the angular resolution of observation of about 2 and 1.6 at low and high bands respectively.Data were weighted using the Briggs weighting scheme (robust parameter 0) to mitigate at best the complicated lobes of the dirty beam, i.e. the point spread function arising from the Fourier sampling pattern.Specifically to AIRI, a single denoiser with appropriate dynamic range was trained and used as AIRI regulariser at both bands.Imaging parameter selection for both uSARA and AIRI is automated (see appendix F).Finally, WSClean parameters are set similarly to Ramatsoku et al. (2020).
With regards to computing resources, the MATLAB implementation of uSARA and AIRI and the C++ WS-Clean imager were run on Cirrus1 , a UK Tier2 HPC system.uSARA and CLEAN are deployed on CPUs, while AIRI is deployed mainly on CPUs, with AIRI's denoiser utilising a GPU.More precisely, for uSARA and AIRI, the computation of the measurement operator (see appendix E for details), decomposed into sparse low-dimensional building blocks, utilised 240 and 280 CPUs at low and high bands respectively.For the imaging process itself, forward steps utilised 99 and 180 CPUs at low and high bands respectively.uSARA's denoiser, distributed over 64 image facets, utilised 64 of the CPUs already allocated for the forward steps.AIRI's denoiser relied on a decomposition of the image into 4 facets, lowering the memory requirements per facet and enabling each facet to be processed on a single GPU.Given the relatively negligible GPU computation cost, a single GPU was used, with facets denoised sequentially rather than in parallel.Finally, WSClean used 72 CPUs, associated with the considered number of w-stacks.
Reconstruction results are provided in Figures 1-3, focusing on the ESO 137-006 region of the imaged FoV, and displayed in log 10 scale to enable the joint visualisation of high intensity and faint emission.Specifically to CLEAN reconstructions, we display the outcome of the convolution of the associated model image with the so-called restoring beam, for a more physical representation of the radio sky.By construction, uSARA and AIRI images are in units of Jy/pixel.In order to compare intensities in the same units, CLEAN images are normalised by the flux of the restoring beam.Zooms on selected regions of the imaged FoV are provided on each figure.Firstly, a zoom on the central region of ESO 137-006, including the active galactic nucleus (AGN) at its core, is provided in panels (b) and (e).Secondly, a zoom on some background compact sources at high band are shown in panel (c).Thirdly, a zoom on the neighbouring radio galaxy ESO 137-007 North of ESO 137-006, at low band, is displayed in panel (f).Images of the full FoV are provided as supplementary material (Dabbech et al. 2022).
Where the residual images contain additional information, CLEAN's restored image, consisting of the sum of the convolved model and the residual image, is considered in our analysis.Both zooms on the selected background compact sources at high band and the neighbouring radio galaxy ESO 137-007 at low band from CLEAN restored images are included in the respective panels (c') and (f') of Figure 3.No such consideration is necessary for uSARA and AIRI, where the algorithm solution itself, without further processing, is considered to be the final image reconstruction.This advantage was already highlighted for the previous algorithms of the SARA family, which rely on more advanced and physical regularisation models than CLEAN (Dabbech et al. 2018;Abdulaziz et al. 2019;Dabbech et al. 2021;Thouvenin et al. 2022).Finally, the model image of CLEAN is considered to support our analysis, particularly through zooms of the central region of ESO 137-006 at high and low bands shown in the respective panels (b') and (e') of Figure 3.
Generally speaking, on both bands, one can observe the high level of detail achieved by uSARA and AIRI in comparison with CLEAN, particularly noticeable within the lobes of the radio galaxy.As opposed to the maps produced by CLEAN, whose resolution is, by design, restricted due to the convolution with the restoring beam, uSARA and AIRI maps show a wealth of filamentary detail within the radio lobes of ESO 137-006.These improvements also come with some pixelation effects, more noticeable in AIRI's reconstructions, in super-resolved structures around the galactic core/jets.The ability of both uSARA and AIRI to capture complex structure is further showcased when looking at the zoom on ESO 137-007 at low band (a similar observation is made at high band), in contrast with CLEAN (panel (f) of each figure).In fact, the filamentary detail of its jet is not recovered in the CLEAN convolved model image.Instead, it is left in the residual image and therefore only seen in the CLEAN restored image (panel (f') of Figure 3).uSARA and AIRI also deliver further reconstruction depth, recovering lower signal intensities than CLEAN.Additional examination of the bright superresolved point-like sources South of the jet showcases the improvement in resolution brought by both AIRI and uSARA.Last but not least, one can notice that AIRI further improves the reconstruction dynamic range over uSARA, as is apparent from the recovered faint compact sources at high band in panel (c) of each figure (a similar observation is made at low band).These sources are also visible in the CLEAN restored image (panel (c')), even though not directly recovered in the CLEAN model image (panel (c)).This supports the fact that they are real and not hallucination artefacts of the DNN.AIRI also seems to be less sensitive to calibration errors which take the form of extended ringing structure around ESO 137-006 (panels (a) and (d) of Figures 1-2).
Since the low and high band images are produced completely independently, flux measurements of unresolved sources provide an important cross-check of results.Table 1 summarises flux measurements of the AGN.We can see that all three methods recover almost the same flux, and that the spectrum is relatively flat, as expected from an AGN.Furthermore, spectral index maps of ESO 137-006 obtained from uSARA and AIRI images (Figures 1-2, panel (g)) are in broad agreement with the findings of Ramatsoku et al. (2020) based on the CLEAN images of ESO 137-006 at close frequencies, where the lobes exhibit steep spectra, with a spectral index close to 5 at the tails.
The central region zooms (panels (b) and (e) of each figure) highlight both the super-resolution potential of uSARA and AIRI and the difficulty of interpreting features in RI images.At low band, both AIRI and uSARA recover what appear to be additional sets of filaments, or CSTs: T 1 and T 2 , respectively North and South of the core/jets structure, T 3 further South , and T 0 , a filament already detected by Ramatsoku et al. (2020).Pixel intensity values of these filaments are within the range [0.1, 0.4] mJy, not only well above the image domain noise level of about 0.0014 mJy, but also at least 5 times higher than the level of the imaging artefacts, induced by the lack of amplitude calibration.The filaments are roughly parallel to the linear core/jets structure.This is a cause for both excitement and wariness.On the one hand, one explanation for the origin of CSTs is shearing of relativistic electrons off the jets, which then follow the ambient magnetic field with possibly stretched field lines (see also Condon et al. 2021).From that point of view, additional inner filaments parallel to the jets make physical sense.On the other hand, the core/jets structure is one of the brightest features in the image, and one must always be wary of secondary image features that appear to trace bright features too closely, since calibration and deconvolution artefacts could easily take this form (being modulated by sidelobes of the dirty beam).One must therefore carefully compare reconstructions made with different methods and at different frequencies, since imaging artefacts tend to scale spectrally following the geometry of the dirty beam.
The innermost filaments T 1 and T 2 appear in both the low and high band images made by AIRI and uSARA.T 2 is also confirmed by the CLEAN high band image and is not inconsistent with the CLEAN low band image, where there is emission blending with the core/jets.T 1 is not inconsistent with the CLEAN images either, where there is also emission, however not resolved at all.This is also backed up by the associated model images (Figure 3, panels (b') and (e')), where large scale components are recovered around the same region.The first sidelobe of the dirty beam (indicated by dashed circles in panels (b) and (e) of each figure) is in any case only slightly smaller than the separation between the jets and T 1 and T 2 , explaining why the filaments are not well resolved by CLEAN.Finally, the core/jets structure recovered by AIRI and uSARA has a clear discontinuity between the core and the jets, while the innermost filaments show no such thing.For CLEAN, although the core and the jets are fully connected on the model images (Figure 3, panels (b') and (e')), the connection seems weaker after convolution with the restoring beam (panels (b) and (e) of the same figure).The position of the filaments shifts slightly (by about a pixel) between the low and high band images, but in the opposite direction than that which would be expected from a dirty beam-modulated artefact.This is possibly due to pixelation effects in the reconstructions.The recovered spectral indices are inconclusive.On balance, we must conclude that the innermost filaments are likely to be physical CSTs and not imaging artefacts.
Although reconstructed by both uSARA and AIRI, the nature of T 3 is less conclusive, as it only appears at low band.It may be that, at high band, the reconstruction confuses it with the more complex structure of T 0 , just to the South of it.We note that T 0 is reconstructed by all algorithms, with an impressively clear and finely resolved East-West connection at low band by uSARA and AIRI, while the structure is very blurred and interrupted in the CLEAN image.
Finally, examples of what are almost certainly artefacts are the fainter extended structures labelled S. They are roughly parallel to the radio galaxy structure, and scale inwardly with the dirty beam in the high band images.Since they are reconstructed by all three methods, they are likely to be residual amplitude calibration errors.
Computational cost.-Table 2 summarises the computational cost of the imaging algorithms.Specific to uS-ARA and AIRI, the computation cost associated with the decomposition of the measurement operator is re-ported alongside the cost to run the imaging algorithm.As expected, with AIRI leveraging a fast denoiser on GPU and uSARA relying on a sub-iterative denoiser on CPU, the former brings a significant reduction of the imaging cost over the latter: about 2.3 times less CPU hours at both bands, with a negligible amount of GPU hours.AIRI was only 4 times more expensive than WSClean in the imaging process, and 7 when including the computation cost of the measurement operator.As AIRI denoisers are trained completely independently of the data to be imaged, the training cost associated with the single denoiser used for both bands is not considered part of the computational cost.

CONCLUSIONS
We have introduced the first AI-based framework for deep, super-resolution, wide-field RI imaging, based on a plug-and-play scheme whereby a dedicated denoising operator is injected as an image regulariser in an optimisation algorithm.We have demonstrated two image reconstruction algorithms, uSARA and AIRI, respectively propelled by powerful handcrafted and learned denoisers, aiming at delivering a high level of imaging precision.Both algorithms are highly parallelised for scalability, via automated image faceting and decomposition of the RI measurement operator into sparse low-dimensional building blocks.An in-depth study of practical scalability to the extreme data and image dimensions expected in the SKA context, in particular for wideband imaging, is warranted.uSARA and AIRI were used to revisit MeerKAT L-band observations of a wide FoV containing ESO 137-006, from 19 gigabytes of vis-ibility data.Our results confirm the ability of uSARA and AIRI to access a new regime of imaging resolution and dynamic range with respect to CLEAN.We have studied in particular the wealth of filamentary structure revealed within ESO 137-006's radio lobes, some of which are likely CSTs.Our results also demonstrate further improvement brought by AIRI over uSARA in both dynamic range and speed, underpinned by the hybrid approach at the interface of optimisation and AI.In this work, the inverse RI imaging problem is approached as an optimisation problem.In the context of optimisation theory, an "objective function" is defined, typically as the sum of a data fidelity term f and a regularisation term r injecting a prior image model to compensate for data incompleteness.The image estimate is defined as the minimiser of this objective and is reached via provably convergent algorithms (Bauschke & Combettes 2017).The obtained solution can also be understood in a Bayesian framework as a maximum a posteriori (MAP) estimate with respect to a posterior distribution, the negative logarithm of which is the objective.More specifically to our setting, we aim at solving minimise In this objective, f is a convex Lipschitz-differentiable function of a variable x ∈ R N representing the image variable, also a function of the data vector y ∈ C M , whose role is to enforce fidelity to data.The function r is a convex and possibly non-differentiable function of x, encoding the prior image model.The regularisation parameter λ > 0 acts as a trade-off between the two terms.Problems of the form (A1) can be solved via the iterative FB algorithm, alternating between a forward step in the negative direction of the gradient of f , and a backward step involving a simple denoising operator, known as the proximal operator of the regularisation function r.The proximal operator is itself defined as the solution of a (simpler) minimisation problem: prox λr (z) = argmin u∈R N λr(u) + z − u 2 2 /2, for any z ∈ R N and λ > 0. The proximal operator of simple functions r often benefits from a closed-form solution (e.g. the proximal operator of the 1 norm is a simple component-wise soft-thresholding operator).However, proximal denoisers of sophisticated regularisations must usually be computed iteratively, as solutions of the minimisation task by which they are defined.
The FB iterative structure reads where the step-size γ > 0 is strictly upper-bounded by 2/L to ensure convergence, with L being the Lipschitz constant of ∇f .Interestingly, the recently emerged PnP scheme has established that proximal optimisation algorithms such as FB, not only enable the use of proximal operators of handcrafted regularisation functions, but also the injection of learned DNN denoisers defining the regularisation term implicitly (Venkatakrishnan et al. 2013;Romano et al. 2017).We note that, in order to preserve algorithm convergence, and interpretability of its solution, the PnP denoiser must typically satisfy a "firm nonexpansiveness" constraint, ensuring that it contracts distances (Pesquet et al. 2021;Hurault et al. 2022).
Our RI imaging framework relies on a data fidelity term which reflects the Gaussian nature of the noise, and is given by f (x; y) = 1/2 Φx − y 2 2 , where • 2 denotes the standard 2 norm.Its gradient reads ∇f (x) = Re{Φ † Φ}x − Re{Φ † y}, where (•) † denotes the adjoint of its argument.The Lipschitz constant of ∇f is given by L = Re{Φ † Φ} S , where • S denotes the spectral norm.As for the image regularisation, we leverage the versatility of the PnP framework and investigate both advanced handcrafted and learned prior image models, respectively propelling the uSARA and AIRI imaging algorithms.

B. USARA'S HANDCRAFTED DENOISER
uSARA's image model, originally proposed in Carrillo et al. (2012), promotes the non-negativity of the intensity image and its sparsity in an overcomplete dictionary Ψ ∈ R N ×B , which consists of a normalised concatenation of orthogonal wavelet bases.The sparsity model is encoded via a non-differentiable log-sum regularisation function r, generalising the 1 norm.The resulting multi-term regularisation thus reads (Repetti & Wiaux 2021;Thouvenin et al. 2020) where (.) n denotes the n th coefficient of its argument vector, and ι R N + denotes the indicator function of the real positive orthant, imposing the non-negativity constraint: + and 0 otherwise.The log-sum regularisation being non-convex, the minimisation task is approached via a re-weighting procedure where a series of convex surrogate minimisation tasks, composed of weighted-1 regularisation with nonnegativity constraint, are solved using FB (Repetti & Wiaux 2020, 2021;Terris et al. 2022).The denoising proximal operator of the resulting multi-term regularisation does not have a closed-form solution and is solved iteratively.In other words, uSARA relies on FB with sub-iterative regularisation denoisers.
We note that the parameter λ controls a softthresholding operation acting on the wavelet coefficients.As proposed by Terris et al. (2022), the exact thresholding parameter γλ is set equal to the measurement noise transferred to the image domain τ / √ 2L2 : with the step-size γ typically set to γ = 1.98/L.The parameter ρ > 0 represents a floor level on the wavelet coefficients and is set naturally to the noise level ρ = γλ (Thouvenin et al. 2020).

C. AIRI'S LEARNED DENOISER
Following Terris et al. (2022), we trained a convolutional DNN, with a simple DnCNN architecture (Zhang et al. 2017), on a rich, synthetic database U of normalised images with adaptive dynamic range.The training loss is a classical 1 loss, enhanced with a firm nonexpansiveness constraint on the denoiser: where u ∈ R N are samples of the training database U, n ∼ N (0, 1) is an additive Gaussian random noise, σ > 0 is the training noise level, E denotes the expectation taken over u and n, I denotes the identity operator, and • 1 denotes the 1 norm.We emphasise that training under the firm nonexpansiveness constraint is a highly challenging task.As proposed in Pesquet et al. (2021), in practice, we relax the constraint and introduce a variant of the regularisation in (C5), which penalises softly non firmly nonexpansive networks.We further note that, while Terris et al. (2022) demonstrated in simulation that this leads to a robust way to ensure convergence of the resulting PnP algorithms, we have witnessed that, when used for real data and at large image sizes and dynamic ranges such as those of interest here, some denoisers lead to algorithm instability, requiring further training.
The performance of the learned image regularisation is highly dependent on the training noise level σ, the impact of which mirrors that of the regularisation parameter λ in uSARA's denoiser.Terris et al. (2022) proposed a heuristic according to which σ should be set equal to the measurement noise transferred to the image domain τ / √ 2L.However, the training database is normalised, with peak image values upper-bounded by 1.To avoid any generalisability issues, the trained denoisers should therefore be used on similarly normalised images, which was the case for the test images in the simulation framework of Terris et al. (2022).In general, this constraint can be accommodated by rescaling the inverse problem (1), effectively dividing it by an upper bound on the peak intensity of the sought image, α ≥ max j {x j }, which can be inferred from the peak of the dirty image.The rescaled inverse problem, now targets the recovery of x/α, with a peak value upper-bounded by 1.As a result, the heuristic generalises to setting σ equal to the inverse input imagedomain peak signal-to-noise ratio, which can be understood as the target reconstruction dynamic range, rather than an absolute noise level: Interestingly, if a pre-defined denoiser, trained at some high dynamic range, is available, any RI dataset with signal-to-noise ratio a priori pointing to a lower dynamic range denoiser, can be further rescaled (with a larger α) to match the existing denoiser, according to (C7).We adopt here this single denoiser approach, where different data are matched to the denoiser rather than the contrary.Naturally, PnP solutions are multiplied by α after reconstruction.
D. DENOISER FACETING Specific to the algorithm scalability requirement, arising from the large image dimensions of interest, we propose an automated parallelisation of the studied denoisers, enabled by image faceting.Firstly, uSARA's proximal denoiser takes advantage of a faceted implementation of the sparsity dictionary Ψ, enabled by its convolutional nature and the compact support of the wavelet kernels (Pruša 2012).The number of facets is set to optimise the parallelisation of the processing across the available CPUs under communication constraints.Secondly, AIRI's learned denoiser is decomposed and applied independently across facets of the image.This procedure is enabled by the convolutional nature of the DNNs, which rely on kernels with compact support, in turn yielding a small receptive field (Luo et al. 2016).The number of facets is set to optimise the parallel processing across the available GPUs for scalability, under memory constraints.

E. PARALLEL WIDE-FIELD MEASUREMENT OPERATOR
Firstly, on large FoV such as the one of interest here, the w component of the baselines induces a nonnegligible baseline-dependent chirp-like phase modulation on the radio sky (Cornwell et al. 2005;Wiaux et al. 2009).This w-effect can be formulated in closed form and needs to be accounted for in the model of the measurement operator.Its modelling as a simple phase modulation in the image domain for each baseline is however impractical when used in combination with the Fast Fourier Transform (FFT) underpinning the fast implementation of F ∈ C N ×N in (1), which computes all the discrete coefficients of the Fourier plane at once rather than a selected (u, v) point.For accurate and computationally efficient modelling, we consider a hybrid approach combining the w-stacking (Offringa et al. 2014) and the w-projection approaches (Cornwell et al. 2005), whereby the measurements are grouped into P w-stacks composed of M p data points each, with 1 ≤ p ≤ P , resulting from binning the visibilities in the w dimension.The w-modulation of each visibility is decomposed into two components: (i) a large phase modulation associated with the central w value of the w-stack to which it belongs, incorporated in the measurement operator through phase modulation in the image domain, and (ii) an offset phase modulation injected through convolution with a small w-kernel (Dabbech et al. 2017) in the Fourier plane.The resulting measurement operator Φ is decomposed into a series of sparse operators as where the operator Φ p = G p FZ p ∈ C Mp×N is the measurement operator associated with the p th w-stack.More specifically, Z p ∈ C N ×N denotes the zero-padding operator which encompasses the w-modulation of the associated w-stack, in addition to the correction for the convolution with the approximate non-uniform Fourier transform interpolation kernels, and G p ∈ C M ×N is the sparse de-gridding matrix, encoding row-based convolutions between these kernels and the small w-kernels implementing the phase modulation of the w-offsets in the Fourier plane.Combining w-stacking and w-projection results in a memory-efficient, and accurate measurement operator.We also emphasise that any DDE calibration solutions modelled as Fourier kernels can be easily injected in the measurement operator model via further row-based convolution (Dabbech et al. 2021).
Secondly, the operator of interest in the gradient step of the FB iterative structure (A2) is not Φ ∈ C M ×N but rather Φ † Φ ∈ C N ×N , which now reads , and where the holographic matrices H p = G † p G p ∈ C N ×N encode both the de-gridding and gridding steps.The scalability of Φ † Φ to large data acquisition regimes is promoted by enabling three key features.
Firstly, a dimensionality reduction feature to reduce the memory requirements of Φ † Φ is supported.The functionality consists in gridding the data and encoding the de-gridding and gridding steps as a single operation with the holographic matrices H p , directly implemented as sparse operators.By doing so, the operator Φ † Φ becomes effectively blind to the data dimension M .
Secondly, a planning strategy to automate the choice of the number of the w-stacks and the decision to enable the dimensionality reduction from a subset of the data is devised.In the first instance, estimates of the computational complexity of the application of Φ † Φ (derived from the number of FFTs and the sparsity of the degridding matrices) and of the memory required to host the de-gridding matrices are obtained for a wide range of values of the w-stacks number.The retained value is the one presenting the best trade-off between the computational cost and the memory requirements, under constraints set by the computing architecture on which the imaging algorithm is deployed (number of compute nodes, number of CPUs per node, available memory per CPU, etc).Data dimensionality reduction via visibility gridding is enabled when the memory requirements exceed the available resources.
Thirdly, a fully automated parallelisation of Φ † Φ is achieved through memory-based partitioning of its underlying de-gridding/holographic matrices and data vectors.The sparse matrices are computed as part of the initialisation of the imaging algorithm.A dataclustering step is first performed in parallel for each wstack to further distribute its de-gridding/holographic matrix into blocks.The clusters are made of visibilities belonging to the same radial slice of the Fourier plane, minimising the amount of underpinning discrete Fourier coefficients and subsequent communication requirements.The angular opening of each radial slice is determined by the memory needed to compute and host the resulting blocks.From the identified number of clusters, the CPUs dedicated to the forward step are allocated.The blocks of the de-gridding/holographic matrices are then computed only once and hosted directly on the compute nodes, to be applied in parallel at each FB iteration.

F. AUTOMATED PARAMETER SELECTION
The estimated image noise levels at the low and high bands are respectively τ / √ 2L 0.0014 mJy and τ / √ 2L 0.0017 mJy.The peak intensity values, as estimated from the normalised3 dirty images, are 0.69 Jy at low band and 0.37 Jy at high band.For uSARA, γλ and ρ are set equal to the estimated noise levels, as per (B4).For AIRI, the estimated noise and peak values suggest target dynamic ranges of 5 × 10 5 and 2.2 × 10 5 at low and high bands, respectively.Owing to the chosen normalisation of the dirty image for peak estimation, the dirty peak value consistently overestimates the true peak value, so that the real target dynamic ranges are below these values4 .In this context, we have used a single denoiser trained for target dynamic range 4 × 10 5 , rescaling the inverse problems by the appropriate α at each band independently as in (C6).In other words, after rescaling, and as per (C7), the inverse problem at each band is affected by a noise of standard deviation τ /α √ 2L equal to the training noise level of the chosen denoiser, i.e. σ = 2.5 × 10 −6 Jy.
uSARA and AIRI denoisers were respectively applied on 8 × 8 and 2 × 2 facet decompositions of the images.The low and high bands data were decomposed into 12 and 14 w-stacks in (E8), with Φ † Φ encoded via the holographic matrices H p .At low and high bands respectively, this enabled to lower memory requirements from 470 and 645 gigabytes needed to host the de-gridding matrices G p , down to 81 and 159 gigabytes.Finally, in WSClean, multiscale CLEAN utilised 72 w-stacks for both bands.

Figure 1 .
Figure 1.ESO 137-006: uSARA reconstructions (flip pages to visualise differences at a glance with AIRI in Figure 2 and CLEAN in Figure 3).First and second rows: recovered model images (Jy/pixel, displayed in log 10 scale) at high and low bands (panels (a) and (d)), respectively, overlaid with zooms on the core of ESO 137-006 (panels (b) and (e)), a region with compact sources from the imaged FoV (panel (c)), and a zoom on ESO 137-007, a radio galaxy North of ESO 137-006 (panel (f)).Third row: spectral index map of ESO 137-006 (displayed in linear scale, panel (g)), overlaid with a zoom on its core (panel (h), same region as in panels (b) and (e)).Focusing on the central region (panels (b) and (e)), the first sidelobe of the dirty beam is highlighted with a dashed circle.One can see three filaments emerging: T 1 and T 2 , located North and South of the inner core, seen at both bands, and T 3 , located further South, recovered only at low band.A fourth filament T 0 , detected previously, is also recovered.The filamentary structure S is an example of what mostly likely is a calibration residual artefact, as it moves with the geometry of the dirty beam.The spectral index values of the newly formed filaments (panel (h)) are inconclusive.

Figure 2 .
Figure 2. ESO 137-006: AIRI reconstructions (flip pages to visualise differences at a glance with uSARA in Figure 1 and CLEAN in Figure 3).First and second rows: recovered model images (Jy/pixel, displayed in log 10 scale) at high and low bands (panels (a) and (d)), respectively, overlaid with zooms on the core of ESO 137-006 (panels (b) and (e)), a region with compact sources from the imaged FoV (panel (c)), and a zoom on ESO 137-007, a radio galaxy North of ESO 137-006 (panel (f)).Third row: spectral index map of ESO 137-006 (displayed in linear scale, panel (g)), overlaid with a zoom on its core (panel (h), same region as in panels (b) and (e)).Focusing on the central region (panels (b) and (e)), the first sidelobe of the dirty beam is highlighted with a dashed circle.Similarly to uSARA reconstructions, one can see three filaments emerging: T 1 and T 2 , located North and South of the inner core, seen at both bands, and T 3 , located further South, recovered only at low band.A fourth filament T 0 , detected previously, is also recovered.The filamentary structure S is an example of what mostly likely is a calibration residual artefact, as it moves with the geometry of the dirty beam.The spectral index values of the newly formed filaments (panel (h)) are inconclusive.

Figure 3 .
Figure 3. ESO 137-006: CLEAN reconstructions (flip pages to visualise differences at a glance with uSARA in Figure 1 and AIRI in Figure 2).First and second rows: recovered convolved model images (Jy/pixel, displayed in log 10 scale) at high and low bands (panels (a) and (d)), respectively, overlaid with zooms on the core (panels (b) and (e)), a region with compact sources from the imaged FoV (panels (c) and (c')), and a zoom on ESO 137-007, a radio galaxy North of ESO 137-006 (panels (f) and (f')).Third row: estimated spectral index map of ESO 137-006 (displayed in linear scale, panel (g)), overlaid with a zoom on its core (panel (h), same region as in panels (b), (b') and (e)).Firstly, regarding the central region, the first sidelobe of the dirty beam is highlighted with a dashed circle.Unlike uSARA and AIRI reconstructions, only one new filament T 2 has clearly emerged South of the core at high band.The previously detected filament T 0 is recovered at both bands.The filamentary structure S is an example of what mostly likely is a calibration residual artefact, as it moves with the geometry of the dirty beam.Inspection of the zooms on the core from CLEAN model images (panels (c') and (e')) confirms these findings.Secondly, looking at the zooms on the compact sources, one can see that some faint sources are not recovered in the CLEAN model image (panel (c)).Instead, they are left in the residual, and so only visible in the CLEAN restored image (panel (c')).Finally, a close look at ESO 137-007 in the convolved model image (panel (f)), indicates that much of the filamentary structure is not captured, and is only visible on the CLEAN restored image (panel (f')).

Table 1 .
Flux measurements of the AGN in ESO 137-006 at both bands and its spectral index values, recovered by uSARA, AIRI, and CLEAN.

Table 2 .
Computational cost of uSARA, AIRI, and CLEAN in CPU hours (h).Specifically to AIRI, GPU h used by the DNN denoiser are also reported.