Task-based detectability in anatomical background in digital mammography, digital breast tomosynthesis and synthetic mammography

Objective. Determining the detectability of targets for the different imaging modalities in mammography in the presence of anatomical background noise is challenging. This work proposes a method to compare the image quality and detectability of targets in digital mammography (DM), digital breast tomosynthesis (DBT) and synthetic mammography. Approach. The low-frequency structured noise produced by a water phantom with acrylic spheres was used to simulate anatomical background noise for the different types of images. A method was developed to apply the non-prewhitening observer model with eye filter (NPWE) in these conditions. A homogeneous poly(methyl) methacrylate phantom with a 0.2 mm thick aluminium disc was used to calculate 2D in-plane modulation transfer function (MTF), noise power spectrum (NPS), noise equivalent quanta, and system detective quantum efficiency for 30, 50 and 70 mm thicknesses. The in-depth MTFs of DBT volumes were determined using a thin tungsten wire. The MTF, system NPS and anatomical NPS were used in the NPWE model to calculate the threshold gold thickness of the gold discs contained in the CDMAM phantom, which was taken as reference. Main results. The correspondence between the NPWE model and the CDMAM phantom (linear Pearson correlation 0.980) yielded a threshold detectability index that was used to determine the threshold diameter of spherical microcalcifications and masses. DBT imaging improved the detection of masses, which depended mostly on the reduction of anatomical background noise. Conversely, DM images yielded the best detection of microcalcification s. Significance. The method presented in this study was able to quantify image quality and object detectability for the different imaging modalities and levels of anatomical background noise.


Introduction
The development of digital breast tomosynthesis (DBT) has paved the way to diverse implementations of pseudo-volumetric imaging in mammography (Sechopoulos 2013a(Sechopoulos , 2013b)).This has also led to the development of synthetic mammography (SM) imaging, in which a 2D image is generated from DBT projections and/or reconstructed planes, with the aim of substituting digital mammography (DM) images (Durand 2018).The development of breast phantoms that include anatomical background structures to assess the imaging performance of DM, DBT and SM devices aims to provide a direct and relevant evaluation of the detectability of structures such as microcalcifications and mass-like lesions (Hadjipanteli et al 2017and 2019, Tanguay et al 2019, Marshall and Bosmans 2022).Breast phantoms which mimic anatomical breast noise and some low-frequency and high-frequency structures for task-based performance evaluation have been recently developed.The phantoms used in these studies attempt to match breast anatomical noise in terms of magnitude and correlation of structures (Kiarashi et al 2015, Cockmartin et al 2017, Ikejimba et al 2017, Glick and Ikejimba 2018).
The use of Fourier-based metrics such as presampling modulation transfer function (MTF), noise power spectrum (NPS), noise equivalent quanta (NEQ) and detective quantum efficiency (DQE) (ICRU Report 54 1996, Cunningham 2000) to quantify detector or system imaging properties is well-established (Zhao et al 2009and 2017, Marshall et al 2011, Marshall and Bosmans 2012).The metric of NEQ and model observers in the spatial frequency domain have been generalized to include the influence of anatomical noise and scattered radiation for CBCT and tomosynthesis (Tward and Siewerdsen 2008, Gang et al 2010, Reiser and Nishikawa 2010, Prakash et al 2011).Detectability of low-frequency and high-frequency objects calculated from Fourier-based metrics have been shown to correlate well with observer performance for a wide range of imaging parameters in DBT (Gang et al 2011).In previous work, we have used cascaded linear system theory to include the scatter fraction and the anti-scatter device in a global system DQE and describe in this way the SNR transfer through the complete imaging system (Monnin et al 2017).Recent work has applied this method to the evaluation of global system performance of several DBT systems with different image reconstruction algorithms (Monnin et al 2020), yielding a methodology to assess Fourier metrics and detection performance of DBT systems using a non-prewhitening with eye filter (NPWE) model observer.
This work extends the method described in Monnin et al (2020) to calculate a detectability index which includes a structured anatomical noise term.Fourier-based image quality metrics were calculated for the DM, DBT and SM imaging modes of three mammography systems at three breast equivalent thicknesses.The threshold thicknesses of the CDMAM phantom discs were compared to those given by the NPWE model.The good agreement obtained between the observer model and the contrast-detail analysis established a threshold detectability index that pertained for all the imaging modalities and conditions considered in this study.This threshold index was then used to estimate the threshold diameter of spherical microcalcifications and masses for different imaging levels of system noise (detector air kerma) and anatomical background noise.

Mammography systems and imaging setup
Three flat panel-based mammography systems were involved in this study: a GEHC Senographe Pristina (GE HealthCare, Chicago, USA), a Hologic Selenia Dimensions (Hologic, Massachusetts, USA) and a Siemens Revelation (Siemens Healthineers, Germany).See table 1 for their technical characteristics.For each system, a phantom made of 180 mm × 240 mm plates of poly(methyl) methacrylate (PMMA) was imaged in three thicknesses: 30 mm, 50 mm and 70 mm.The tube voltage, anode/filter (A/F), tube current-time product (mAs) and grid configuration of these three PMMA thicknesses imaged under automatic exposure control (AEC) are shown in table 2. Spacers were used when establishing these technique factors so that the total thickness (PMMA + spacers) was equal to the breast equivalent thickness for a given PMMA thickness (Dance et al 2000).For DM mode, 'For Processing' and 'For Presentation' images were acquired, referred to respectively as 'raw' and 'processed' DM images.DBT scans were performed and DBT planes and SM images were generated using the standard clinical image reconstruction and processing (table 1).A 1 mm reconstructed DBT plane spacing was used for all systems.
As described in Monnin et al (2020), a phantom made of PMMA plates (the 'NPWE phantom') was imaged in three thickness configurations (30, 50 and 70 mm) to measure the 2D MTF and the 2D NPS required in the NPWE observer model to calculate the detectability index.The NPWE phantom contained a 0.2 mm thick aluminium disc of 50 mm diameter placed on top of 20 mm of PMMA at the reference point (van Engen 2013), 6 cm from chest wall side and laterally centred.Four DM and DBT acquisitions were made for each of the three NPWE phantom thicknesses, with the phantom slightly moved between each acquisition to produce four different samples of the disc.An additional acquisition of the 50 mm NPWE phantom placed side by side with a structured phantom (the 'L1 phantom') was made.The L1 phantom is a 48 mm thick semi-cylindrical PMMA box filled with water and acrylic beads of different diameters that generates images containing structured noise similar to DM and DBT images of breasts (Cockmartin et al 2017, Vancoillie et al 2021).
Images of the CDMAM phantom were acquired at the same thicknesses and conditions used for the NPWE phantom for the DM and DBT imaging modes.Attenuation of the CDMAM phantom is approximately equivalent to that of 10 mm PMMA (CDMAM manual, Artinis), and therefore a 10 mm PMMA plate was replaced by CDMAM.The CDMAM phantom was always positioned on top of 20 mm of PMMA, ensuring a fixed height above the detector for all acquisitions.The phantom was slightly moved between each of eight similar acquisitions.We consider imaging systems composed of an anti-scatter grid (if used), a detector, image reconstruction (DBT and SM) and image processing stages (processed DM, DBT and SM).The values of all quantities are expressed in the detector (image) plane considered as the reference plane.The air kerma, the mean photon fluence (photons/mm 2 ) and the scatter fraction (SF) are denoted as K, ̅ q and SF, respectively.When necessary, the subscripts in or out specify whether these quantities correspond to values at the grid input or output, respectively.For the modalities which do not use an anti-scatter grid, the quantities with the subscripts in and out are therefore equal.
For air kerma measurements, the relevant PMMA phantom was held in the compression plate at a height of 2 cm above the breast support table (see figure 2 in Monnin et al 2020).A calibrated ionization chamber was positioned at the reference point (van Engen 2013) above the system (grid) entrance plane, 1 cm below the PMMA/compression plate.The measured air kerma includes primary and scattered radiation at the system entrance (grid entrance if used for imaging).An inverse square distance correction was applied to express the input air kerma at the detector (image) plane, noted K in in table 2. The detector air kerma (DAK) is the product between K in and the total grid transmission.DAK is equal to K in for the case without grid.The product of K in with the photon fluence per air kerma unit (Boone 1998), noted j in table 2, gave the input photon fluence q . in The photon fluence per air kerma unit j depends only on the x-ray spectrum and is the same for the configurations grid in or grid out.
The NPWE phantom contains a thin 0.2 mm Al disc used to calculate the impulse response function (IRF) of the imaging system.This disc produces a radiant contrast ∆ qIRF (zero-frequency signal), which is the difference in photon fluence measured with and without an additional 0.2 mm Al plate fixed at the tube output.We assume the relative disc contrast ∆ ̅ q q IRF depends on the disc thickness (T Al ) and on the linear attenuation coefficient of aluminium (μ Al ) as given in equation (1), neglecting the effect of the thin 0.2 mm Al disc on the SF For a given disc, ∆ ¯q q IRF varies with the x-ray beam energy, determined by the tube voltage, the anode/filter (A/F) combination and the phantom thickness, but neither with the SF produced in the phantom nor with the total grid transmission.
SFs were measured for 30, 50 and 70 mm PMMA using the beam stop method described in Monnin et al (2017).Lead discs with radii between 1.5 mm and 6 mm were positioned on the PMMA thickness at the reference point and imaged under the same conditions as the NPWE and CDMAM phantoms (table 2).The linear response function between 'For Processing' pixel values and DAK was established from the raw projection images for all the DM and DBT beams (van Engen 2013).Pixel values measured within the disc in the 'For Processing' projection images were converted to DAK values using the linear response functions.The SFs measured with grid in and out gave SF out and SF in , respectively.The total grid transmission (T g ) was calculated from the ratio of DAK measured using a 5 × 5 mm 2 ROI at the reference point in the linearized projection images, acquired with grid in and grid out.The primary grid transmission T p , scatter grid transmission T s and grid DQE (DQE grid ) were determined from T g , SF in and SF out using equations (2), (3) and (4), respectively (Monnin et al 2017) The SF increases with the x-ray path length in the phantom, and thus with the projection angle for DBT.The x-ray path length increases with the inverse of the cosine of the projection angle, leading to a 10% increase for the maximum projection angle of 25°(Siemens Revelation).The difference in SF between the central and the outer projections in DBT therefore remains below a few percent and was not considered in this study.
2.3.In-plane Fourier metrics: IRF, MTF, NPS, NEQ and system DQE The use of Fourier-based metrics to characterize properties of medical imaging devices requires the approximation of a stationary and spatially invariant imaging system (Metz andDoi 1979, Cunningham 2000).We consider that pixel values d are spatially invariant and depend linearly on the photon fluence per pixel q, with a gain ¶ ¶ / d q.This local linear relationship was considered for all image types: raw and processed DM images, reconstructed DBT planes and SM images.Although image reconstruction and image processing algorithms use logarithmic transforms that are nonlinear processes, processed DM images, reconstructed DBT planes and SM images were considered linear in a small-signal approximation for a small range of signal variations (contrast) in the images (Tward andSiewerdsen 2008, Zhao andZhao 2008).The 0.2 mm aluminium disc in the NPWE phantom was used to produce a signal (radiant contrast) sufficiently small (between 6% and 22%, table 2) so that the system gain could be considered constant, and the log-normalization performed in the DBT (and by implication in the SM reconstructions) could be considered linear in a small-signal approximation.The impulse response function (IRF), NPS, NEQ and system detective quantum efficiency (system DQE) were therefore calculated in the Fourier space to characterize the image quality parameters of DM, DBT and SM images, following the calculation methodology detailed in previous work (Monnin et al 2020), and summarized below.
In this study, the following coordinate system was used: x for the left-right direction, y for the front-back direction and z for the vertical direction.DBT produces a 3D IRF, which was expressed as the product of two separable functions governed by different physical parameters in the in-plane xy-and z-components according to equation (5) The case for 2D (DM and SM) images is obtained by setting f z = 0 in equations ( 5) and (6), where the in-plane IRF reduces to the xy-component of the IRF.The in-plane IRF was measured in the image of the 0.2 mm Al disc of the NPWE phantom with a radial version of the angled edge method (Samei, Flynn and Reimann 1998).
Extension of the method to radial coordinates is detailed in Monnin et al (2016).Radial edge spread functions (ESF) originating from the disc centre were plotted every 2°, in a square ROI of size 100 × 100 mm 2 , each of them covering a 4°angular aperture.Each radial ESF produced a radial IRF.The in-plane 2D IRF was calculated from the 180 radial IRFs using the angular interpolation method described in Monnin et al (2016).The in-plane system MTF was defined as the in-plane system IRF normalized to 1.0 at its maximum value.Unlike the IRF, the MTF does not vary with ̅ q , system gain or disc contrast, and was used to compare the signal frequency transfer between the different systems and acquisition modes.
The 3D NPS of DBT stacks were measured from two homogeneous volumes of interest (VOIs) of 60 × 60 mm 2 placed in the NPWE phantom, on both sides of the aluminium disc, at 60 mm from the disc centre and 60 mm from the chest wall side (figure 1 in Monnin et al 2020).The first and last images of the stack were excluded from the VOIs.Computation was made without detrending correction, leading to low-frequency peaks in the NPS due to signal trends present in the VOIs.The in-plane 2D NPS, denoted NPS d,ip , was obtained by integrating the 3D NPS over the z-frequency bandwidth (Siewerdsen et al 2002, Zhao andZhao 2008).
The in-plane 2D NPS for DM and SM images were calculated from 2D regions of interest (ROIs) instead of VOIs.
The NEQ and the system DQE (DQE sys ) have been shown to quantify image quality and the detective efficiency of the imaging system including the scattered radiation, the anti-scatter grid, the reconstruction stages and the image processing (Monnin et al 2017(Monnin et al , 2020)).These metrics were calculated from the in-plane IRF and NPS according to equations (8) and (9), respectively ( ) . 9 x y

IRF
The mean radial MTF, NEQ and DQE sys curves are radial averages of the corresponding 2D metrics, excluding the 0°and 90°axial values.

In-depth resolution (MTF z )
In-depth resolution of DBT reconstructions was characterized using MTF z and measured using the thin wire method (Li et al 2007, Hu et al 2008, Marshall and Bosmans 2012).A thin tungsten wire of diameter 20 μm was stretched vertically in air from the compression plate to the breast table, tilted by 20°in the front-back ydirection and by 2-3°in the left-right x-direction.The wire was imaged with the same kV and A/F used for the phantoms (table 2), but with a low mAs value to avoid saturation of pixel values.For each image in the stack (zposition), the voxel with the maximum intensity gave the (x, y) coordinates of the wire position in the DBT planes.A linear least squares regression of x-coordinates to y-coordinates of the wire gave the azimuthal angle of the wire relative to the image matrix.A further linear curve fit of the z-coordinates to azimuthal positions of the wire gave the polar angle (inclination) of the wire in a spherical coordinate system.The intensity of all the voxels of the reconstructed volume having the same azimuthal coordinates as the wire were plotted as a function of their vertical distance (z-position) to the wire to obtain the oversampled PSF z .The amplitude of the Fourier transform of the PSF z , was normalized to 1.0 at the zero frequency to give MTF z .

Anatomical breast noise
In order to obtain the anatomical background NPS, denoted NPS d,a , the L1 and the 50 mm NPWE phantoms were rotated by 90°, positioned adjacent to each other at the chest wall edge and imaged together.These two phantoms yielded equivalent attenuation and hence the same mean pixel value and system noise level (quantum, fixed pattern and electronic noises).Acquisition factors relevant to each imaging system for 50 mm phantom thickness were set and three images acquired for each system.The in-plane NPS of system noise was measured in the NPWE phantom, and was subtracted from the in-plane NPS measured in the L1 phantom to give the anatomical in-plane NPS (equation ( 10)) The measured anatomical NPS was then fitted to the empirical power law relationship used to describe the power spectrum of anatomical structures (Gang et al 2010, Reiser and Nishikawa 2010, Cockmartin et al 2013) given in equation ( 11), with the coefficients κ and β quantifying the magnitude and correlation of the structured noise We are therefore assuming that the structures in the L1 phantom model the structures in a typical breast to some acceptable degree (Cockmartin et al 2013).Images of anatomical background noise were generated and added to CDMAM images to quantify its deleterious effect on object detectability.For this purpose, a homogeneous image with a white Gaussian noise of random phase was generated, Fourier-transformed and filtered by 4 (Bochud et al 1995, Båth et al 2005aand 2005b, Reiser and Nishikawa 2010).The NPS magnitude of the generated noise was then adjusted to correspond to that measured on the images of the L1 phantom.An inverse Fourier transform yielded a simulating structure that was mathematically equivalent to the image of anatomical noise.An anatomical NPS with a different random noise texture was added to each of the 8 CDMAM images of equivalent PMMA thickness 50 mm, as shown in figure 1.For the two other thicknesses of 30 and 70 mm, the magnitude of the anatomical NPS measured on the 50 mm thick phantom was converted to another thickness T using equation (12) The effective slab thickness ∆z is the integral of the in-depth PSF z over the breast thickness T and is equal to T for DM and SM images for which PSFz = 1.0

NPWE observer model
We consider the detectability of a spherical object of radius R embedded in a homogeneous thickness, with a 3D shape spectrum x y z obj The NPWE detectability index ¢ d D 3 for a 3D object uses the 3D IRF d and Where k(R) represents the NPWE detectability index value for an object that produces a radiant contrast equal to that of the 0.2 mm Al disc of the NPWE phantom: .
Assuming separability of the xy-and z-components of the 3D IRF, equation (16) transforms to equation (17) Using equation (6), equation (18) gives a practical form of equation ( 17) that involves the in-plane IRF, inplane NPS, and MTF z measured in the images .
For DM and SM images, the slice projection theorem gives the projection obtained by setting f z = 0 in equation ( 18).The 3D NPWE model for projection images reverts to the 2D form given in equation ( 19) ( ) All the CDMAM discs are much thinner than the DBT plane thickness T z , leading to a sinc function that is approximately 1.0 below the Nyquist frequency f z = 0.5/T z .As a result, the discs can be treated as 2D objects The term ∆ ¯/ q q obj in equation (15) represents the relative radiant contrast of the CDMAM gold discs and is proportional to the disc thickness T when ∆ • m < T 0.1, which is the case for all the CDMAM discs The term Δμ is the difference in mean linear attenuation coefficients between the gold disc and the PMMA + Al base of the CDMAM phantom for a given x-ray spectrum Equations (15), ( 19) and (23) were used to determine the threshold gold disc thickness (T T ) that corresponds to a threshold detectability index ¢ d .

Threshold diameter of a spherical object
We consider spherical objects of radius R embedded in a homogeneous thickness (equation ( 26)) The object shape spectrum corresponds to equation (27) (Bateman and Erdélyi 1954) The maximum radiant contrast (x-ray attenuation) Δq obj occurs along the diameter of the sphere ∆ ¯( ) where ∆m is the difference between the mean linear attenuation coefficients of the spherical object and the background mammography tissue.

∆ ( )
Microcalcifications and masses were considered as spheres made of calcium oxalate (CaC 2 O 4 ) and glandular tissue, respectively.The background mammographic tissue was considered as a mixture of adipose and glandular tissues whose glandular fraction (by weight) depended on the thickness: 67%, 20% and 4% glandularity for the PMMA thicknesses 30, 50 and 70 mm, which are equivalent in attenuation to breast thicknesses 32, 60 and 90 mm, respectively (Dance et al 2000).The threshold object diameter of a spherical object is twice the threshold radius R T that corresponds to a threshold value of the detectability index in the NPWE model observer, noted d T '.R T was calculated numerically using equation (30) T R T for the masses was calculated as a function of the level of the anatomical NPS by adjusting the magnitude coefficient in equation (11) from 0 to 2κ, i.e. from the absence of anatomical noise until twice the level measured in the L1 phantom.R T for the microcalcifications was calculated as a function of K in , between 100 and 1000 μGy.
For this calculation, the impulse response for a linear system was considered proportional to the air kerma at the system input K in (equation ( 31)) Considering quantum limited systems, the system NPS (NPS d,s ) is proportional to K in to a good approximation, and the anatomical NPS (NPS d,a ) is proportional to K in 2 (Mainprize and Yaffe 2010).The variation of the in-plane NPS in the image was therefore calculated as a function of K in according to equation (32) The CDMAM images were scored using the CDCOM module and contrast-detail (c-d) curves were generated using the standard processing method described by Young et al (2006).The cases where the in-focus DBT plane was not obvious, planes around the plane considered to have the best focus were scored and the plane with the lowest contrast thresholds was taken to be the in-focus plane.Uncertainty on the threshold gold thickness was estimated using a bootstrap method.
The predictive value of the CDMAM results has not yet been established against human observers in the presence of anatomical background noise.The agreement between CDCOM scoring, the NPWE model and human performance was therefore evaluated.Six human observers read ten series of images of the CDMAM phantom, five with system noise only and five with anatomical background noise.Software developed in-house was used to crop and rotate each CDMAM square to prevent the observers learning the disc positions, which would bias the results.Each observer read two images in every series, giving 20 images in total for each reader.A Barco 10-megapixel radiology display (model MDMC-12133) (Barco, Kortrijk, Belgium) was used to display the images.During reading, the room lighting was subdued, giving an ambient light level of approximately 6 lux.A typical viewing distance of approximately 65 cm was suggested, although the readers were free to zoom the image content while scoring.The number of reading sessions ranged from 2 to 4 sessions (median value of 3), spread over a period of approximately 3 weeks, depending on the availability of the observer at the hospital.The reading time was approximately 19 min per image, averaged over all images and observers.The series included homogeneous background (raw and processed DM, DBT and SM images for the Siemens and DBT images for the Hologic) and anatomical background (processed DM, DBT and SM images for the Siemens, and DBT and SM images for the Hologic).The average from 12 readings from each series gave the threshold thickness of human reading for each disc diameter.The uncertainty in the threshold thickness was quantified using two times the standard error of the mean (SEM).

Acquisition parameters and x-ray beam characteristics
The AEC settings obtained for the three PMMA thicknesses on the three mammography units, along with the measured values of the photon fluence per unit air kerma, the relative radiant contrast for the 0.2 mm aluminium thickness and SF in are given in table 2. The 0.2 mm aluminium disc in the NPWE phantom produced a signal (radiant contrast) between 6.4% and 21.7% (table 2).We assume that this is sufficiently small for the assumptions of constant system gain and small signal linearity of the log-normalization stage in the DBT and SM reconstructions to be valid.The air kerma at the system input (K in ) obtained under AEC for the homogenous phantom varied widely among the systems and imaging modes, from 163 μGy for the Siemens Revelation (DM, 70 mm) to 822 μGy for the Selenia Dimensions (DBT, 70 mm).SF in varied between 0.291 and 0.360 for the thickness 30 mm, between 0.460 and 0.504 for 50 mm and between 0.555 and 0.589 for 70 mm.This amounts to variations of less than 7% measured for the different imaging modes and systems and can be explained by differences in x-ray spectra, geometry and compression plate size and composition.
3.2.In-plane Fourier metrics: MTF, NPS, NEQ and system DQE The mean radial system MTF of raw DM images decreased to zero around 11 mm −1 on the Selenia Dimensions and Revelation, and around 6 mm −1 on the Pristina (figures 2(a)-(c)).The useful signal bandpass in the digital images is however limited by the Nyquist frequency determined at the sampling stage by the pixel pitch, resulting in spatial frequencies of 7.14, 5.88 and 5.00 mm −1 for the Selenia Dimensions, Revelation, and Pristina, respectively.For all three systems, the image processing or reconstruction algorithm gave the system MTF a similar peak shape that contributed to edge enhancement on processed DM, DBT and SM images.The system MTFs of processed images peaked between 0.7 and 1.3 mm −1 for the three systems.These results are consistent with in-plane MTFs previously measured in DM and DBT images (Zhao et al 2009and 2017, Marshall and Bosmans 2022).Unlike the other systems, the image processing applied on DM and DBT images increased the MTF compared to raw DM images on the Pristina system.
The mean radial NEQ curves of raw DM images describe the ability of the system to transfer signal and noise, and show that the imaging systems act as low-frequency filters (figures 3(a)-(c)).The NEQ of raw DM images increased with the number of detected primary photons, and therefore increased with the DAK chosen by the AEC for a given system, the primary fraction at the detector and the detector DQE.For processed DM, DBT and SM images, image processing or reconstruction cause the NEQ to differ from the original low-frequency shape determined by the detector MTF and NPS.The processing increased the NEQ of DM images acquired on the Selenia Dimensions and the Revelation systems, but not for the Pristina.This result shows that the image processing modifies the signal and noise transfers differently and is amplifying the signal content and decreasing the noise to some extent.The Selenia Dimensions gave the highest NEQ of the studied systems for the DBT planes for the thickness 70 mm.The high exposure delivered by the AEC (table 2) contributes to this result.The NEQ frequency bandpass of DBT planes was reduced compared to that of DM images on the Selenia Dimensions and Revelation systems but was the same on the Pristina.The Revelation system gave the lowest NEQ of the study for the Insight 2D images for the thickness 70 mm.The NEQ of the Insight 2D images peaked at frequencies between 2 and 3 mm −1 for the thicknesses 30 and 50 mm but was poor at lower and higher frequency.This behaviour arises from strong noise filtration that dramatically reduces the high-frequency NPS.The NEQ of the C-View images was systematically lower than that of the corresponding DM images and DBT planes, especially at high spatial frequency.
Mean radial system DQE results are visualized in figures 4(a)-(c).The system DQE measured on raw DM images is the product of the detector and the grid DQE.The raw DM images had a maximum system DQE between 0.50 and 0.65 for the three systems and three thicknesses.The grid DQE, and hence the system DQE, increased with SF in and with the phantom thickness.Image reconstruction and processing algorithms can strongly modify the system DQE, confirming that signal and noise can be handled differently in these processes.The Pristina system gave similar DQE while the Selenia Dimensions and Revelation gave very different DQE for the different image types.The processed DM images gave the highest system DQE over the widest frequency range for the Selenia Dimensions and the Revelation systems.Whileimage processing increased the system DQE of DM images to approximately 0.8 on the Selenia Dimensions and Revelation systems, the low-frequency DQE of processed DM images was decreased for the Pristina system.The DBT planes gave a lower DQE over a reduced bandwidth, compared to DM images.The DQE of DBT planes reached up to 5 mm −1 on the Pristina and Selenia Dimensions but fell to zero at 3 mm −1 on the Revelation.The SM images had a reduced frequency bandwidth that differed strongly for the three thicknesses on the Revelation and Selenia Dimensions systems.The comparison in DQE between the processed against the raw images shows the importance of the image processing and reconstruction algorithm for overall system efficiency.
The grid DQE ranged between 0.96 and 1.10 for the thickness 30 mm, between 1.23 and 1.35 for the thickness 50 mm and between 1.37 and 1.58 for the thickness 70 mm (table 3).The grid DQE curves plotted as a function of SF in show that the cellular grid of the Selenia Dimensions system had a DQE between 7% and 15% higher than the linear grids of the Pristina and Revelation systems (figure 5).

In-depth resolution (MTF z )
Although the DBT plane spacing was 1 mm for the three systems, the different systems and reconstructions gave large differences in in-depth resolution (figure 6).The MTF z of the Siemens system reached zero at the zfrequency of 0.49 mm −1 for an angular span 50°.This cut-off frequency corresponds to an effective plane thickness of 2.04 mm, comparable to the FWHM of the measured PSF z (2.05 mm).The Pristina gave the narrowest effective plane thickness of 1.56 mm, while the narrowest angular acquisition of 15°of the Selenia Dimensions gave a thicker effective plane of 2.95 mm.

Background anatomical breast noise
The large differences in NPS magnitudes measured in the L1 phantom are mostly due to differences in the signal gains used for the different image types (figures 7(a)-(c)).The power-law noise parameters κ and β fitted to the anatomical NPS measured in the L1 phantom are shown in table 4. The power exponent β varied between 2.8 and 3.6 for the different systems, imaging modes and image types, in agreement with β values around 3.0 measured in mammograms (Chen et al 2012, Mainprize et al 2012, Cockmartin et al 2013, Hill et al 2013).The power law parameters κ and β were the same for the raw and processed DM images.This result shows that the image processing applied on DM images did not significantly modify the frequency composition of the low-frequency anatomical structured noise.DBT planes gave lower anatomical noise magnitude κ and power term β than DM images, in agreement with theoretical prediction (Metheany et al 2008) and previous measurements (Engstrom et al 2009).This result confirmed that DBT reduces the amount of structured noise in the reconstructed DBT planes compared to DM projection images.The κ and β attributes of SM images lie somewhere between DM and DBT images, with an anatomical NPS magnitude higher than DBT but reduced compared to DM.The increased angular span of the DBT acquisitions on the Siemens system reduced the noise magnitude as quantified by κ, showing that a wider angular DBT scan improves the out-of-plane clutter rejection and reduces the superposition of structured noise on the reconstructed planes (Yoon et al 2009).
The simulated anatomical NPS was able to match the NPS measured in the L1 phantom for all the systems and imaging modalities (figures 7(a)-( c)).This simulated anatomical noise was added to the CDMAM phantom images to study the effect of this noise on detectability.It is important to note that the aim of this study was not a precise determination of κ and β, but an investigation of their effect on the detectability given by the CDMAM and the NPWE model.Only a rough similarity between the simulated background NPS and the anatomical NPS magnitude measured in the L1 phantom was therefore considered in this study.

Validity of the NPWE model compared against the CDMAM phantom
The three free parameters of the VTF used in the NPWE model were adjusted to find the best correlation between the threshold gold thickness (T T ) of the CDMAM discs given by the CDCOM software (van Engen 2013) and by the NPWE model.The threshold gold thickness for CDMAM discs of different diameters corresponds to the smallest disc thickness visible in 62.5% of cases by a human observer (Young et al 2006).The parameters n = 0.65, c = 0.0012 cycle/°and a = 2.0 were used in equation (20), with a viewing distance of 400 mm without magnification.This VTF covered a wide frequency bandwidth until 10 mm −1 and gave a maximum response at 2.36 mm −1 (figure 8).
The T T values obtained from the CDMAM images were plotted against the ratios ¢ / T d T calculated with the NPWE model (figures 9(a)-(c)).A log-log linear correlation was found for all the systems and imaging conditions and for the three thicknesses.This correlation also held both for images containing anatomical background noise and those with a homogeneous noisy background.The least squares linear regression to all the CDMAM discs between 0.1 and 2.0 mm gave a Pearson linear correlation coefficient (PCC) of 0.980 for a slope set to 1.0 (black line in figures 9(a)-(c)).The NPWE observer model accurately predicted T T values over a broad range of conditions, suggesting that this model is suitable for image quality characterization over the range of thicknesses, background noise and imaging modalities considered, i.e. for DM raw and processed images, DBT planes and reconstructed SM images.The best numerical correspondence in threshold gold disc thickness (T T ) between the CDMAM and the NPWE model given by the linear log-log correlation was obtained for a threshold detectability index d T ' = 2.92 ± 0.11 (uncertainty 2 sigma).

Validity of the CDMAM phantom and NPWE model compared against human observers
The human reading results of the CDMAM phantom were in good agreement with CDCOM scoring for the plain background images (figure 10(a)).The averaged ratio of human to CDCOM threshold thicknesses for all disc diameters was 0.961.The human and CDCOM threshold thicknesses were less consistent in the anatomical background.Slightly better performance for human observers in the anatomical backgrounds was found with a ratio of human to CDCOM results equal to 0.819.The error bars encompass the identity line except for the Insight 2D images, for which the human readers performed significantly better than CDCOM.
The human observers gave results in good agreement with those obtained with the NPWE model (figure 10(b)).As expected, the anatomical background resulted in a greater relative increase of threshold gold thickness for the large diameter discs compared to the smaller discs.The averaged ratios of human to NPWE threshold thicknesses for all disc diameters were 0.992 for the plain background images and 1.070 in the anatomical background.Human observer performance was slightly poorer in the anatomical background than the NPWE model.All the error bars cut the line fitted between NPWE and CDCOM values, except for the five largest discs of the Insight 2D images with plain background.

Threshold diameter of microcalcifications (high-frequency tasks)
The threshold diameter D T of spherical microcalcifications (MCs) is the diameter of a sphere made of calcium oxalate (CaC 2 O 4 ) that corresponded to the threshold detectability index d T ' = 2.92 calculated using the 3D NPWE model observer.The detectability index of small objects, which are considered to be high-frequency tasks, depends mostly on the amount of system noise and has little dependence on low-frequency anatomical background noise (Ruschin et al 2007, Gang et al 2010, Vancoillie et al 2021).This is largely governed by the DAK.D T was therefore determined without anatomical noise as a function of K in , for K in values between 100 and    D T values obtained under the AEC conditions varied for the different imaging modalities within the range 1.7-3.4mm for κ = 0, 2.7-19.6 mm for κ x 1 and 3.2-25.9mm for κ x 2 (table 4).The smallest D T values were obtained on DM processed images of the Selenia Dimensions for κ = 0 and on DBT planes of the Selenia Dimensions for κ x 1 and κ x 2. The threshold diameters of masses calculated in our study for the phantom thckness 50 mm are consistent with those found in Ikejimba et al (2021) and in Vancoillie et al (2021).Masses of diameter 5 mm in Ikejimba et al (2021) gave a PC between 0.6 and 0.8.Human observers in Vancoillie et al (2021) found threshold diameters between 2 and 6 mm (PC 62.5%).For the same PC of 62.5%, Hadjipanteli et al (2019) obtained D T values around 10 mm for DM and 6 mm for DBT on simulated images with anatomical background, slightly larger than our results for the thickness 50 mm.

Discussion
This study has established a link between the CDMAM phantom, human reading and the NPWE model to study detectability in DM, DBT and SM.The NPWE model gave coherent and robust results compared to the CDMAM phantom and human reading for a range of discs between 0.1 and 2.0 mm in diameter, for PMMA thickness between 30 and 70 mm, and for DM, DBT and SM images with and without anatomical background noise.These results valid for a wide range of x-ray beam energies, SF levels and noise characteristics extended those published in previous work for DBT planes (Monnin et al 2020) and for raw DM images in a homogeneous noisy background (Monnin et al 2011).
The discs in the CDMAM phantom are considerably thinner than the reconstructed DBT plane spacing, allowing the detection of MCs to be treated as a 2D problem.In this study, we further considered masses whose threshold diameter exceeded the size of the CDMAM gold discs and was not negligible with regards to the effective DBT plane thickness, leading to the use of a 3D NPWE model.The 3D NPWE model may thus be less accurate or diverge for objects larger than 2.0 mm that cover a lower frequency range.Thick objects can also give a high signal that is not compatible with the assumption of small signal linearity required when applying this model.As a result, nonlinear iterative image processing can lead to sharpness and noise levels dependent on the contrast of the imaged structures.In this study, the assessment of detectability performance using transfer functions in the 3D NPWE model is a linear approximation to nonlinear systems and must be ideally used under task-based conditions.The MTF and NPS used in the NPWE model were calculated under small contrast condition provided by the 0.2 mm Al disc.For thick objects with high contrast, the results could potentially differ from those obtained in this study, which only holds for low-contrast objects.Nevertheless, the extension of the NPWE model to a 3D formulation for tomosynthesis has been shown to have a reasonable correspondence with human observer performance over a broad range of imaging tasks and conditions in the presence of low-frequency background noise (Gang et al 2011).The 3D NPWE model has therefore been successfully used to explore the imaging performance and optimize the imaging parameters of CBCT and tomosynthesis for different imaging tasks like the detection of punctual (small), spherical, Gaussian objects (Gang et al 2010, Hu andZhao 2011).The threshold diameters of masses for human observers in recent studies follow similar trends and are compatible with our results (Ikejimba et al 2021, Vancoillie et al 2021).The accuracy of our model must nevertheless be studied for large objects, and this is a limitation of the present study.
The hierarchy of detection performance for large (low-frequency) tasks like masses depended strongly on the amount of anatomical noise added to the images.We conclude that the use of detectability indices or threshold thicknesses calculated in homogeneous background is, as might be expected, not representative of clinical situations for comparing the performance of DM and DBT for low-frequency tasks.The anatomical noise reduced the detectability indices of masses, and the magnitude of this reduction depended on the in-depth resolution, which governs the ability of the system to reduce or restrict propagation of out of plane anatomical texture to the image plane containing the targets of interest.In our study, the system with a narrow angular span (Selenia Dimensions) could have similar or even superior detectability to the wide-angle systems for masses (Pristina and Revelation).This implies that other factors such as the reconstruction algorithm, in addition to Conversely, detection of small (high-frequency) tasks like microcalcifications depends little on lowfrequency anatomical noise, mostly on the system (quantum and electronic) noise and hence on the DAK and primary x-ray detector efficiency.Our study showed that DBT did not improve the detection performance of microcalcifications, consistent with data in the literature (Gang et   characterized with homogeneous test-objects like the CDMAM phantom, if used with caution.The test-object must remain parallel to the reconstructed planes and obviously the details must remain fully in the plane that is evaluated.The effective thickness of DBT planes measured in our study (between 2.05 and 2.68 mm) was much thicker than the CDMAM discs and insured to fully include the discs in a single DBT plane.A further assumption is that CDCOM is functioning as expected on the image type assessed, and if doubted this must be checked via manual scoring.It is important to note that current test-objects dedicated to measure physical image quality parameters were not designed for the evaluation of SM images.Phantoms with improved anthropomorphic structures might be necessary with for this evaluation.SM images are the result of computational algorithms that have evolved over time and are different between manufacturers.Various methods will be used, involving, for example, processing steps that identify slices and regions containing mass-like and MC-like features and that may adapt to some degree to breast tissue patterns.Adaptive enhancing filters can therefore result in SM images that look quite different and may give different results when applied to test-objects.The 3D-structured L1 phantom used in this study produces background structure whose power spectrum is close to that of breast images (Cockmartin et al 2017), but the structure is not anthropomorphic.Thus, the conclusions of this study cannot be generalized beyond the L1 test-object and SM algorithms used in our study.However, the detection results for SM images obtained in this study using the L1 phantom are consistent with those found by Ikejimba et al (2017), using a phantom that more closely resembles breast images.Further research using phantoms with improved anthropomorphic structures and objects resembling MCs or masses is needed to confirm the extent to which our results for SM images are representative of patient imaging.
In this work, the VTF parameters were optimized to predict the CDMAM performance for a wide range of discs sizes, imaging modalities and background noise types.VTF parameters were adjusted to improve the agreement between human and NPWE performance in the presence of background noise, as described in other studies (Bouwman et al 2016).The resulting VTF used in this work covered a wide range of spatial frequencies, between 0 and 10 mm −1 .The predictive value of the CDMAM results was established against human observers in the presence of anatomical background noise.Human reader performance was generally in good agreement with CDCOM scoring and with the NPWE model, with better agreement for plain images, and slightly lower agreement for the images with anatomical background simulating structure.Only the SM images gave inconsistent results for some disc diameters.The NPWE model gave reasonably close agreement with human readings (averaged ratio 0.992 and 1.070 for images without and with structured anatomical background, respectively).This is consistent with earlier studies comparing contrast-detail detectability in plain images (Marshall 2006, Segui and Zhao 2006, Monnin et al 2011).The extensive application of the NPWE model in this work shows that this model can predict target detectability over a wide range of imaging conditions.Further research is however necessary to validate the conditions for which the NPWE model can be used to match the CDMAM results in the presence of low-frequency, structured background noise.

Conclusion
This study compared the quantitative image quality metrics and detectability of DM, DBT and SM images based on the measurement of image quality indices (MTF, NPS and NEQ) and system efficiency (system DQE) for the situation of cluttered backgrounds.Used in the NPWE observer model, these metrics correctly predicted the detectability of the thin CDMAM discs for a wide range of beam energies, scatter fractions, image processing and reconstruction algorithms of three mammography device vendors.Furthermore, these predictions held for images with and without low-frequency structured background noise.The quantitative data on threshold diameters are potentially valuable technical performance indicators and it may be possible to explain or predict these results from the different components making up the NPWE model.The analysis method used in this work is proposed as a means of including the influence of anatomical structured noise on image quality and detectability in mammography, for a range of imaging tasks and imaging conditions.

⎠
The magnitude coefficient κ was considered proportional to the square of the mean pixel value ̅ d , of the effective slab thickness ∆z, and of the difference between the linear attenuation coefficients of the breast glandular and adipose tissues ∆m m m = -, glandular adipose as reported in previous studies(Mainprize and Yaffe 2010, Gang et al 2012, Hill et al 2013)

Figure 1 .
Figure 1.Examples of simulated anatomical noise added to CDMAM phantom images.(a) DM raw image (b) DM processed image (c) DBT plane (d) SM image.

Figure 5 .
Figure 5. Grid DQE as a function of SF in .The points show the AEC settings for the three PMMA thicknesses 30, 50 and 70 mm.

Figure 7 .
Figure 7. In-plane NPS measured in the L1 phantom and NPS with the simulated anatomical background.added to the CDMAM images.

Figure 9 .
Figure 9. Logarithm of the threshold gold thickness of the CDMAM discs.(a) for the different systems (b) for DM, DBT and SM (c) with and without anatomical background noise.

Figure 10 .
Figure 10.Logarithm of the threshold gold thickness of the CDMAM discs with uniform (U) and anatomical (A) backgrounds.(a) human observer against CDCOM values (b) human observer against NPWE model values for d' = 1.0.

Figure 11 .
Figure 11.Threshold diameter of spherical microcalcification as a function of K in for three PMMA thicknesses.(a) 30 mm (b) 50 mm (c) 70 mm.

Figure 12 .
Figure 12.Threshold diameter of spherical masses as a function of κ for three PMMA thicknesses.(a) 30 mm (b) 50 mm (c) 70 mm.

Table 1 .
Characteristics of the mammography systems.

Table 2 .
Acquisition parameters and x-ray beams characteristics.
IRF d denotes the system IRF measured with scatter, whose magnitude is expressed in pixel values d.IRF d,xy is the IRF component in the xy plane and MTF z the in-depth MTF of DBT reconstructions characterized using the thin wire.The 2D 'in-plane IRF' was measured in the in-focus DBT plane of the thin 0.2 mm Al disc of the NPWE phantom.It is denoted IRF d,ip and is equal to the 3D IRF d integrated over the z-frequency bandwidth (Zhao and Zhao 2008).
(Burgess 1994, Segui and Zhao 2006 of the human eye, Prakash et al 2011)s an approximation of the Barten contrast sensitivity function of the human eye(Barten 1990) with a general form given in equation (20)(Burgess 1994, Segui and Zhao 2006, Gang et al 2011, Prakash et al 2011)The detectability index d' of the discs of different diameters and contrasts of the CDMAM phantom was calculated using equations (15), (18) and (19) of the NPWE model.The Fourier spectrum S obj of a disc of radius R and thickness T is given in equation (21), where J 1 is the Bessel function of the first kind

Table 4 .
(Hadjipanteli et al 2019, Ikejimba et al 2021, Vancoillie et al 2021)phantom, and threshold diameter of masses and microcalcifications (in mm) obtained under the AEC (in bold the best result for each system).theattenuationcoefficient of the calcium hydroxyapatite component used inIkejimba et al (2021)is approximately twice as large as that of the calcium oxalate used in our study.The calcium carbonate used in the L1 phantom-based study ofVancoillie et al (2021)has an attenuation coefficient that is 40% to 60% larger higher than that of calcium oxalate.3.8.Threshold diameter of masses (low-frequency tasks)Masses were modelled as glandular spheres.As for MCs, the threshold diameter D T corresponded to the threshold detectability index T ' = 2.92 obtained from the 3D NPWE model observer.The detection of large objects (low-frequency tasks) depends mostly on the amount of low-frequency anatomical background noise, and has little dependence on exposure level(Ruschin et al 2007, Gang et al 2010, Vancoillie et al 2021).D T was therefore calculated as a function of added anatomical noise, for κ values between 0 and 2κ, with κ being the level of background noise measured on the images of the L1 phantom (figures 12(a)-(c)).D T increased with κ for all the imaging conditions.A smaller thickness was associated with a higher glandularity and a lower contrast, resulting in higher D T values.For the three systems, DM images gave the smallest D T values in the absence of anatomical background noise (κ = 0), whereas DBT planes gave better detectability when anatomical noise was present (table 4).Except for the Insight 2D images which gave particularly poor detectability for the thickness 30 mm, SM images gave D T values intermediate between DM and DBT.These results are consistent with recent technical image quality studies(Hadjipanteli et al 2019, Ikejimba et al 2021, Vancoillie et al 2021). however