Near-field coded-mask technique and its potential for proton therapy monitoring

,


Introduction
There is a consensus in the proton therapy community that the implementation of methods that enable real-time monitoring of proton therapy would allow to better exploit the potential of this treatment modality and thus offer better and safer therapy to patients [1]. What requires verification is the spatial distribution of the dose delivered during therapy and its compliance with the one resulting from the treatment plan, preferably in a continuous manner during irradiation, on a single-spot (or at most a few-spot) basis. It is also of interest for the development of an online adaptive proton therapy, which is already used for photon irradiations [2]. Many of the proposed approaches rely on the detection of prompt-gamma (PG) radiation, where the spectral and spatial characteristics are correlated with the beam range [3][4][5][6]. An overview of PG-based monitoring methods for proton therapy can be found in [7].
Several groups have been developing imaging setups featuring gamma cameras with passive collimation. The most mature projects involve the use of a camera combined with a collimator with a knife-edge-shape slit. Such a setup has been tested in clinical conditions of pencil-beam scanning and passivelyscattered beam and proven to provide control of inter-fractional range changes with a precision of about 2 mm [8,9]. However, in view of the fact that the number of registered prompt-gamma quanta is one of the main limiting factors in prompt-gamma imaging (PGI), multi-slit systems have been investigated too. Not only do they register more gamma quanta than single-slit cameras, but they offer also a larger field of view. The performance of multi-slit setups was studied extensively via Geant4 simulations for different geometrical configurations in [10], and experimentally in [11]. However, no superiority with respect to the knife-edge-shaped camera could be shown. An extension of the latter was proposed in [12,13], where a collimator with many knife-edge-shaped slits was studied. Although the obtained results were quite impressive (2σ = 1 mm range retrieval precision), the studies were conducted at a beam energy of 50 MeV, i.e., below the lower limit of the clinically applied energy range. Unfortunately, the studies of that group were discontinued. The concept, however, was picked up and extended to a dual-head setup, enabling 3D-imaging [14]. Simulation results indicate the feasibility of using the setup for online range monitoring in proton therapy, with a position resolution better than 2 mm across the whole field of view. The group is currently developing a prototype setup to verify their simulation results. Yet a different approach has been presented in [15], where a setup consisting of a pixelated detector and a coded-mask collimator with a modified uniformly redundant array (MURA) pattern [16] has been studied via simulations. This kind of gamma collimation is widely used in astronomy and proven to work well also in reconstructing the positions of gamma sources (see, e.g., [17,18]), though one needs to stress that those applications deal with far-field imaging and mainly point-like sources, which is in general a less demanding imaging scenario. A MURA collimator is undoubtedly easier to manufacture than the one with multiple knife-edge shaped slits. The setup like the one proposed in [15] offers 2D-imaging with a much larger detection efficiency than the solutions discussed above, since half of the collimator pixels are open. The authors report an accuracy of range determination better than 0.8 mm, but this number holds for 10 10 impinging protons, which exceeds by 2 orders of magnitude the number typically applied in a single spot.
Coded-mask (CM) imaging is an extension of the well-known pinhole camera concept, widely used in various areas: from photography to space imaging [19]. In a pinhole camera, the detector is fully covered with an impenetrable material except for a small hole that decodes a source while projecting it onto the detector. Although it may provide a good image resolution when there are no constraints on the irradiation time, it is not applicable in the PGI in clinical conditions, when the fixed number of impinged protons per irradiated spot specified by the prescribed dose to the patient limits the number of emitted gammas. An almost completely opaque collimator further reduces the number of registered gamma quanta, leading to a situation in which the reconstructed image is strongly affected by statistical fluctuations.
In the CM approach, a detector is covered not with a single-hole shield, but with a mask consisting of a number of such holes forming a specific pattern. Usually, the number of open pixels is similar to the number of filled ones, so approximately 50% of the mask is opaque. In comparison to the single-slit camera, such a setup registers more photons which allows to increase the detector efficiency. The optimisation of mask patterns is an interesting problem that has been widely explored in the last few decades [16,20]. In this work, we are using a MURA pattern [16] as it is beneficial in performance metrics such as signal-to-noise ratio (SNR) and image resolution. A specific MURA mask is characterized by a prime number (called the rank or order of the mask) which defines the number of pixels per dimension and the construction of the pattern of opaque and empty pixels.
In this work, we present the results of our experimental studies with two small-scale prototypes of detection setups exploiting the coded-mask technique, conducted using point-like radioactive sources. The two setups featured different detectors and mask patterns, allowing 1D-and 2D-imaging. The images are reconstructed using the maximum-likelihood expectation maximization (MLEM) algorithm [21]. We report the obtained point-spread function (PSF) for different source positions within the field of view and the system detection efficiency. Using the experimental results to benchmark the simulation application, we furthermore simulate the performance of a full-scale prototype, currently under development as a scatterer module within the Silicon Photomultiplier and Scintillating Fibre-based Compton Camera (SiFi-CC) project [22], including its capability to reconstruct a continuous linear gamma source present during proton therapy.

Scintillators
The measurements are conducted with two different scintillation detectors. In both cases, the Ce:LYSO scintillator was used as a sensitive material because of its large effective atomic number and large density, resulting in high efficiency  of gamma detection. Good availability and moderate price were the additional arguments supporting this choice [24]. We use the small-scale prototype (SSP) of a SiFi-CC module which consists of 64 Ce:LYSO fibers (see Fig. 1(a)). The fibers have a squared cross section of 1 mm × 1 mm and are 100 mm long. For this measurement, they are arranged in two layers of 32 fibers each. Every fiber is wrapped individually in aluminum foil and they are held together in an aluminum frame. The pitch between both two fibers and the two layers is 1.36 mm.
The second scintillator used is a three-layered Ce:LYSO array developed for positron emission tomography (PET) imaging simultaneously to magnetic resonance imaging (MRI). The array has a base area of 45 mm × 48 mm, a total height of 15 mm and three layers (see Fig. 1(b)). Each layer consists of individual needles with a pitch of 1.33 mm resulting in 3425 needles in total. The needles within a layer are optically separated by BaSO 4 . Each layer is shifted by half a needle pitch with respect to the layer below to enable the identification of the layer and thus the depth-of-interaction in the array by the footprint of the detected light. The height of the individual layers is optimized for uniform absorption of 511 keV-photons.

Readout platform
As a readout system we use the Hyperion III platform [23,25,26]. It is developed by Hyperion Hybrid Imaging Systems as MRI-compatible detector platform for PET systems and includes hardware, firmware and software for data acquisition, processing and analysis. We use sensor tiles equipped with digital silicon photomultipliers by PDPC (Philips Digital Photon Counting). The dimensions of one sensor tile are (48 × 48) mm 2 and the whole tile holds 6×6 DPC-3200-22 [27,28]. Each digital photon counter (DPC) has four readout channels for the detected number of photons and delivers one common timestamp. Each of the four readout channels contains 3200 single-photon avalanche diodes (SPADs). The DPCs are self-triggering if one of the four readout channels passes a given trigger and validation threshold based on sub-regions on the sensor. During these measurements a DPC is triggered if on average 3.0 ± 1.4 photons are registered in one channel and the average validation threshold for an event to be recorded to disk is 53 ± 15 photons [29]. We use a validation time  to accept a trigger of 40 ns and an integration time of 325 ns to collect photons on the sensor tile. The overvoltage of the silicon photomultipliers (SiPMs) is set to 3 V. As this is a digital tile it is possible to disable the SPADs which produce a high number of dark counts. This inhibit fraction is set to 10 %. The surface of each sensor tile is covered with a glass plate of 1.1 mm. The sensor tiles are connected to a singles processing unit (SPU) which manages their voltage supply and feeds their data to the data acquisition and processing server (DAPS). During the measurement, the tiles are cooled by a 15 • C liquid cooling system.

Masks
We perform measurements in one and two dimensions, i.e., we reconstruct an image along one axis or on a plane. For these two tasks, we use a oneand a two-dimensional versions of a MURA mask of rank 476, clipped to 31 × 31 central pixels (see Fig. 2). The mask rank as well as the setup geometry have been optimised via Monte Carlo simulations before the experiment. To construct the physical masks we use tungsten rods of (2.26 × 2.26 × 20) mm 3 which are inserted into 3D printed rasters made from Pro Grey Resin. The rod manufacturing reaches a precision of 0.1 mm. The resulting masks have a dimension of (73.6×73.6) mm 2 . The rasters have a total thickness of 13 mm and the holes to insert the rods are 10 mm deep. To prevent the rods from falling out, the assembled masks are wrapped in cling film.

Radioactive sources
For image reconstruction, the experimental data were obtained with a radioactive 22 Na source with an activity of 2.89 MBq. The active material in that source covers an area of 1 mm × 1 mm. As a β + -emitter, 22 Na provides two photons of 511 keV emitted back-to-back, which can be used for electronic collimation. For calibration of the detectors we additionally used the 1275 keV gamma line of 22 Na and two more radioactive sources: a 137 Cs source with a gamma line at 662 keV with an activity of 1.73 MBq and a 133 Ba source with  a prominent line at 356 keV with an activity of 1.52 MBq. During the measurement the sources were placed in front of the detector on a grid allowing for easy repositioning of the source for different measurement configurations in 1 cm-steps in vertical and horizontal direction.

Experiment: Setup 1D 2.3.1. Experimental setup for imaging
In the 1D setup of our experiment we aimed for reconstruction of a source position along one axis, so only in one dimension. We used the small-scale prototype as scintillation detector and the 1D coded mask (see Fig. 3(a)). In the current experiment, the distance between the source plane and front part of the detector is 236.5 mm while the distance from the centre of mask to the front part of the detector is 66.5 mm. Just like the mask patterns, the distances were optimised via simulations. The orientation of the bars forming the mask pattern is vertical, the same as of the fibers. Both ends of the fibers are coupled to one SiPM tile each with an optical silicon pad of 0.4 mm thickness made of Elastosil RT 604. The pad has a size of (8 × 48) mm 2 so it covers one row of DPCs and light-sharing between different readout channels is enabled. The other five DPC rows of the tile are not directly exposed to light from the fibers.

Setup for energy and y-position calibration
As the light yield on the SiPMs heavily depends on the position of the interaction along the fiber, a position-dependent energy calibration is needed. For this, we use a fan beam collimator which is described in detail in [30][31][32]. The collimator consists of lead blocks with adjustable slits to two sides so that particles emitted by a radioactive source placed in the centre leave the collimator in thin elongated beams. To calibrate the small scale prototype with this setup, we use a 22 Na source and employ the three-layered PET crystal as coincidence detector on the other side of the collimator to enable electronic collimation. The used slit width of 1.9 mm leads to a beam width on the fibers of 2.5 mm (FWHM). The coincidence window is set to 10 ns. The fibers of the small scale prototype are irradiated at nine different positions in 10 mm steps.

Experiment: Setup 2D 2.4.1. Experimental setup for imaging
In the two-dimensional setup we used the three-layered PET crystal as a detector with the 2D coded mask (see Fig. 2(b)) placed in front of it. The dis-tance between the source plane and the front plane of the detector was 220 mm and the distance from the centre of the mask to the front plane of the detector was 50 mm (see Fig. 3(b)). The scintillator was coupled to the sensor tile over a 1.1 mm thick light guide glued to it with SCIONIX RTV 481.

Setup for energy calibration
For calibration of the detector in this configuration, we use frontal irradiations of the scintillator with radioactive 22 Na, 137 Cs and 133 Ba sources.

Preprocessing
As all DPCs are triggered independently, a clustering algorithm is needed to form events. We use a cluster window of 40 ns to combine triggered DPCs on one tile. These combined signals, we refer to as one hit in the following. The first timestamp recorded in one of the DPCs is used as the timestamp of the hit. If we use several tiles in one measurement, we apply a coincidence window of 10 ns to combine hits from the different tiles. This we call an event.

Fiber identification in 1D setup
With its base area of (1 × 1) mm 2 , one fiber is smaller than the area covered by one readout channel which is approximately (4 × 4) mm 2 . To identify single fibers in which impinging gammas deposited energy, we used the light-sharing through the optical pad. Depending on the position of the fiber, one or two horizontally neighbouring DPCs are triggered, so there are four or eight signals which can be used to calculate a center of gravity (CoG). Plotting all CoG in one histogram yields a so-called floodmap. When separating events by the number of triggered DPCs, one can identify light accumulation points on the floodmaps corresponding to interactions in individual fibers. These peaks are determined in projections of the floodmaps. To tag the hit fibers, the floodmaps are first divided into layers by horizontal lines and afterwards the fibers are separated within the layers by the centreline between two peaks so that each fiber corresponds to a rectangular region on the floodmap. To consider an event valid, we demand that the same hit fiber is identified on the two sensor boards. This is the case for 84.8 % of all recorded events. More details on the fiber identification are presented in [33].

Energy and y-position calibration in 1D setup
The calibration of the energy and the interaction position within a fiber is performed individually for each fiber. For this purpose, we used the nine measurements with a 22 Na source obtained with the fan beam collimator and employed the ELAR model described in detail in [24]. This provides a positiondependent energy calibration based on the measurement of the 511 keV-peak of 22 Na on both ends of the fibers. The description of the calibration of this data set can be found in [33].

Needle identification in 2D setup
To identify the hit needle in the three-layered PET array, again floodmaps of the CoG positions are used. Due to the shift of the three layers with respect to each other, all needles yield different light accumulation points on the floodmap and are thus distinguishable. Here, separate floodmaps for four different regions of interest (ROIs) dependent on the main channel, which is the channel with the highest photon count in this event, are employed: 1. the main DPC containing the main channel, 2. the main DPC and the vertically neighbouring DPC to the main channel triggered, 3. the main DPC and the horizontally neighbouring DPC to the main channel triggered, 4. the main DPC, the vertically neighbouring DPC, the horizontally neighboring DPC and the diagonally neighbouring DPC triggered. If more than one ROI is valid for one event, all are evaluated. For each of the four floodmaps the light accumulation points are identified on a two-dimensional grid and assigned to a needle ID. A predecessor of this algorithm is described in [34]. During the needle identification process, the event is assigned to the needle ID of the closest light accumulation point on the floodmap. If different ROIs yield different needle IDs, for each ROI a quality factor QF = d 1 /(d 1 + d 2 ) is calculated, where d 1 is the distance to the closest light accumulation point and d 2 the distance to the second closest light accumulation point on the respective floodmap. The final needle ID is then taken from the ROI with the smallest quality factor. The three top rows and three bottom rows of needles, i.e., with the highest and lowest y-coordinates, could not be resolved with this approach and were therefore not taken into account in the further steps of the analysis. An in-depth description of the procedure can be found in [35].

Energy calibration in 2D setup
The energy calibration is performed separately for each needle and for each of the four ROIs explained in Section 2.5.4. For each of these cases, we take the energy spectra of the calibration measurements with radioactive sources and fit the 511 keV peak in the 22 Na spectrum, the 662 keV peak in the 137 Cs spectrum and the 356 keV peak in the 133 Ba spectrum with a Gaussian function plus a linear function as background approximation. The three peak positions were used to obtain a first linear calibration. After the needles responses were individually calibrated, the energy spectra for all needles were added up. Then the statistics were sufficient to also take the 1274.5 keV peak in the 22 Na spectrum into account. Its position was used to apply a global saturation correction to all data: where Q represents the pre-calibrated energy and fitted p i are the parameters of the saturation correction. All the intermediate steps are reported in [35].

Implementation of MLEM for image reconstruction
In order to reconstruct the image decoded with CM, we are using the MLEM algorithm [21] while different approaches (like FISTA [15] or OSEM [36]) are also possible. MLEM utilises prior information about the probability of a photon being emitted in a particular position of the source plane to be registered in each detector pixel.
MLEM is a widely used iterative algorithm, examples of its application in PET can be found e.g. in [37,38]. It serves for the reconstruction of Poissonian data: , H is a system matrix and S a normalisation term (or sensitivity map).
Eq. 2 is written in vector form and all multiplications indicated by dots represent the vector (matrix) multiplications, while all other operations are element-wise. An element of a system matrix H ij is a probability that a particle originated from the j-th voxel of the source plane will be detected in a i-th voxel of the detector: where V i is an event that particle was detected in the detector voxel i, O j is an event in which a particle originated from the source plane voxel j.
In order to obtain such a set of probabilities, we used a Monte Carlo simulation. The chosen field of view is divided into pixels and simulations with a single point-like source placed in the centre of each pixel were performed. The detector response from each of these simulations gives a column in a system matrix, since the number of registered particles in each detector pixel is proportional to the probability from Eq. 3 (given that the number of simulated events in each vertex j is the same). The detailed explanation of the system matrix generation procedure is in Sec. 2.8.1.
After normalising each column of a system matrix by the sum of its elements, the elements H ij can be interpreted as a conditional probability to register the particle in the detector voxel i given that the particle originated from the voxel j and was detected somewhere.
Note that both the image I and the reconstructed object f are one-dimensional vectors, even when we want to reconstruct 2D (or even 3D) objects. It allows us to utilize one-and two-dimensional setups with the same reconstruction algorithm.

Efficiency and background corrections
Having a calibrated detector response, before proceeding with image reconstruction, we perform a data correction step. In order to do that, we make use of two auxiliary elements: a background measurement I bg performed without a radioactive source and without a mask, and a detection efficiency map. For the latter, we register the detector response I no−mask when irradiating the detector without a mask, and with a source placed far enough to consider the gamma flux uniform over the surface of the detector. Having those data we can construct an efficiency map which will reflect the relative detection efficiency of individual detector elements (fibers or needles) ϵ: where the index i runs over all detector elements. This step is performed once for each detection setup, the resulting efficiency map and I bg are used to correct each registered data set that serves as input for image reconstruction. In the correction, we subtract the background from each raw data vector I raw and divide it by the efficiency map: The resulting vector I is ready to be used with the MLEM algorithm according to Eq. 2 together with the system matrix prepared beforehand (see Sec. 2.8.1).
The preparation and application of the efficiency correction we demonstrate in Fig. 4 taking the second layer of the three-layered PET array as an example. The background data I bg is presented in histogram Fig. 4(a), where each bin value is a number of counts per second in the corresponding crystal. Fig. 4(b) presents results of measurement with a single point-like source but without a mask (I no−mask in Eq. 4) with the same units as the I bg histogram. By evaluating Eq. 4 for these two histograms, we obtain an efficiency map presented in Fig. 4(c). One can notice that those three histograms have some similarities in their patterns, representing the DPC structure of the sensor tile. With other algorithms, a more homogeneous efficiency distribution can be achieved. In our approach, this is cancelled out when the image is corrected with the efficiency map and so does not compromise our image reconstruction. In order to avoid singularities resulting from a division by zero, we add a small value (10 −6 ) to all elements of the efficiency map.
In Fig. 5 we demonstrate all intermediate steps from Eq. 5, applied to the same second layer of the three-layered PET array: the raw detector response I raw in panel (a) is followed by the background-free histogram I raw − I bg and finally the background-subtracted and efficiency-corrected detector response I is shown in panel (c) which serves as an input for the MLEM algorithm. This data is taken in one of the full-fledged measurements, with a coded mask and a point-like source placed at (−20, 0) mm. Fig. 5 shows how this procedure transforms the seemingly messy raw detector response into the histogram with a recognizable mask pattern (see Fig. 2(b)) projected onto it.  Figure 5: Histograms demonstrating preprocessing steps applied to each experimental data set before image reconstruction: (a) raw data normalized by the total measurement time, (b) the same data after background subtraction, (c) background-free histogram divided by the efficiency map (Fig. 4(c)) -the processed data which are the input of the reconstruction. All three histograms show data with the coded mask and a point-like source placed at (−20, 0) mm for the second layer of the three-layered PET array only, the same procedure was applied for all detector layers.

Simulations
2.8.1. Generation of system matrix MLEM requires a system matrix (SM) which, in this work, is calculated prior to the reconstruction. For both 1D and 2D setups we utilize Monte Carlo simulations in order to obtain corresponding probabilities which are the elements of the system matrix. The FOV was chosen to be of the same size as the mask and it is 70 mm (in each dimension for 2D) and is divided into 100 pixels (in each dimension). By performing simulations of 10 6 γ-particles with a point source placed subsequently in the centre of each pixel of the FOV we obtain the system matrix elements column by column. Such simulated statistics result in the statistical uncertainty of matrix elements being below 1.5 %. All simulated particles have the same energy 4.4 MeV as our investigations revealed that our approach to SM calculation is not sensitive to energy. During the simulation, each element of the system matrix is incremented by the energy deposit obtained from the corresponding detector bin. As a result, we are working with accumulated energy values that are then transformed into probabilities through a normalization process.
A system matrix for a one-dimensional coded-mask setup (described in Sec. 2.3) has a size of 32×100 elements and is shown in Fig. 6. We see that its elements reflect the pattern (or its central part) of the coded mask (see Fig. 2(a)). With shifting the source position (horizontal axis in Fig. 6), the pattern moves vertically in the histogram, which corresponds to the translation of the mask shadow along the detector plane.
For our simulations, we use Geant4 v10.6 with PhysicsList QGSP_-BIC_HP and electromagnetic option 4. The generation of each matrix column (one simulation) takes around 25 sec on Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz and utilizing 6 CPU cores in parallel, which means a full matrix for the 1D setup (100 simulations) can be generated within 10 min. In the case of a 2D system matrix with similar conditions, the number of auxiliary simulations will be 10 000 (100 in each direction). Consequently, the total time will increase proportionally to about 16 h.

Setup for 1D full-scale prototype
In this paper, we are showing experimental results for point-like sources only. Those laboratory experiments allowed us to stimate the performance of the prototype detectors (energy-and spatial resolutions, capabilities, limitations, etc) and conclude about the near-field imaging capabilities of the setups based on them. However, the ultimate goal of the whole project is to come up with an approach applicable in proton therapy. This requires imaging of continuous source distributions, with a particular focus on the distal part of the gamma production depth profile with its falloff occurring close to the Bragg peak position [3]. This scenario is clearly more demanding than the case of point-like sources. In fact, the small-scale prototypes presented in this paper ( Fig. 3(a)) are not suitable for that task; their small surface areas limit the spatial information being used as an input for image reconstruction (i.e. limited views, incomplete sampling). However, this does not necessarily imply that the technology is not well-suited for range monitoring. Encouraged by the results presented in this study, we started the construction of a larger detector within the SiFi-CC project, with its design being very similar to the small-scale prototype used in the 1D setup. It consists of 7 layers with 55 fibers each; the fiber size is (1.94 × 1.94 × 100) mm 3 , and the pitch is 2.01 mm. We tested via Monte Carlo simulations configured similarly as in Sec. 2.8.1 how the detector -called henceforth the full-scale prototypeperforms in a coded mask setup, i.e. combined with a structured collimator. In Geant4, we coupled that detector with a larger section of the 467-th order MURA mask, with the same pixel size as the one presented in Fig. 2(a), but dimensions extended to 51 central bins from the full array (instead of 31) in the horizontal direction and 45 pixels in vertical. The detector-to-mask and mask-to-source distances remained the same as those used for the 1D setup of the small-scale prototype.
In the Monte Carlo simulations, a poly(methyl methacrylate) (PMMA) phantom of the dimensions (60 × 60 × 90) mm 3 and density 1.19 g/cm 3 was irradiated with a proton beam. The phantom was centered with respect to our FOV and the beam impinged from the positive direction of the x-axis, along the longest axis of the phantom. A schematic of the setup geometry is presented in Fig. 7. In the simulations, we recorded energy deposits in individual fibers. The processes of production and detection of scintillation photons are not simulated, but their effect is taken into account by smearing the obtained energy deposits with a resolution function, found via more realistic simulations and benchmarked with laboratory tests with single fibers [22,24]: where a = 0.0322, b = 0.6730 and c = −0.0013. The simulations were conducted for the following beam energies: 85.9, 90.7, 95.1 and 107.9 MeV, corresponding to the Bragg peak depth in the phantom of 5.0, 5.5, 6.0 and 7.5 cm, respectively, where the latter were determined using SRIM [39]. For each beam energy, a set of 1000 simulations, each with 10 7 protons, was performed. It is worth noting, that in proton therapy the typical spot strengths range from a few times 10 7 for proximal spots up to about 2×10 8 for distal ones [40]. Having separate simulations with 10 7 protons, we were able to investigate the effect of statistics in image reconstruction, the resulting resolutions and uncertainties, and conclude about the feasibility of beam range shift detection on the basis of data from a single spot or an iso-energy layer. For each file simulated for a beam with a kinetic energy of T p = 85.9 MeV, we have around 840 000 PGs generated, out of which our detector registers about 15 000 gammas which is about 1.8 %. The overall detection efficiency (including geometrical acceptance) is thus 1.5 × 10 −3 . Those numbers remain similar for other studied beam energies.
A system matrix for this setup was generated assuming one-dimensional reconstruction. A FOV of 130 mm along the x-direction was assumed and divided into 200 pixels. In this case, unlike in the studies with radioactive sources, the gamma source is not monoenergetic. Nevertheless, when generating the system matrix, we shot 10 6 4.4 MeV gamma particles from each FOV bin, since our earlier studies showed that the resulting system matrix is not very strongly energy-dependent. To test that,we in a preliminary work we investigated if the reconstruction of a point source is influenced by the energy used to generate the system matrix. Namely, we generated one-dimensional system matrices with various energies ranging from 0.5 to 5 MeV, in increments of 0.5 MeV. We then reconstructed the same simulation data obtained from a point-like source with a sample energy value of 4.4 MeV and a position (−10, 0) mm. The results showed that the reconstructed source position varied by only 0.03 mm, and the width of the fitted Gaussian curve exhibited a variation of just 3 %. These findings indicate that the reconstruction process is nearly independent of the energy used for the system matrix generation. The observed differences can be attributed to statistical fluctuations.
We evaluated the image reconstruction performance by inspecting the uncertainty on the distal fall-off position determination (DFPD), which is calculated in a similar way as defined in [41] and the procedure is demonstrated in Fig. 8. Namely, we take the reconstructed profile (blue dots in the figure), and interpolate it with a cubic spline, obtaining a smooth function (green line). Subsequently, we subtract from the function its minimum value and then normalize it by the resulting function maximum. Like this, we obtain a normalized reconstructed profile described by a smooth function of the values between 0 and 1. We find all the points where this function equals 0.5 (red cross and orange stars) and take the left-most one (red cross) as our estimation of the distal falloff position. The same is being done with the MC source distribution so that we can compare these two values.

Performance evaluation of MLEM
In order to examine the performance of MLEM, we have prepared sample reconstruction results based on experimental data from the 2D coded-mask setup with the three-layered PET detector. The very same data as shown in Fig. 4(c) (but for all three detector layers) were loaded into a single vector and used as an input for the image reconstruction. Having a system matrix of size 10000 × 3425 and an image vector with size 3425, 100 iterations take 6.51 sec on a Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz to be performed.
Results after 100 iterations are shown in Fig. 9 as a 2D histogram (a) and its projections on the xand yaxes (panels (b) and (c), respectively). In the 2D histogram, we observe a clear image without artefacts that exhibits a single peak. The peak is close to the designed source position (marked as a green cross) with a small shift to the upper left. Numerical evaluation of the reconstruction quality was performed via a Gaussian fit -its results are visible in Fig. 9(b) and (c) as a green solid line. The peak width in the reconstructed image (here 1.7 mm and 1.5 mm for xand y-direction) depends on the number of MLEM iterations and will further decrease when using a higher number of iterations.
Evaluation of the reconstruction performance is straightforward in the case of point-like sources: comparing the reconstructed peak position with the designed one and, using Gaussian fitting, determining the peak width and position. In the case of continuous source distributions, we are using a universal image quality index (UQI) [42] that allows comparing two vectors X and Y, representing in our case the reconstructed image and the expected source distribution. Its application in the near-field coded-aperture imaging was demonstrated in [43]. The index is defined as follows: where cov and var are covariance and variance functions, andX,Ȳ are the means of the vector components. UQI takes values in the range [−1; 1] where 1 corresponds to identical images and -1 to inverted ones. VectorsX andȲ are standardised before calculating UQI, namely we apply: to each vector, where n loops over all vector elements. UQI is beneficial in the case of simulations because one knows the true source distribution, unlike in the case of experimental data. When investigating how the UQI depends on the number of performed iterations, we observe first a steep rise up to about 800 iterations; in that range, subsequent iterations significantly improve the image quality. For larger numbers of iterations, a slow decrease is observed due to the amplification of statistical fluctuations inherent to MLEM. The maximum corresponds to the optimal number of iterations at which the reconstructed image is as close to the true one as possible, given UQI is used as a metric. However, the maximum is rather broad, for the presented example the UQI does not drop below 0.95 of its maximum value for the numbers of iterations from the range 361-1628. In this range, the reconstructed picture does not change significantly, thus it is not critical to run MLEM iterations exactly the number of times corresponding to the maximum UQI, but close to it. This has important implications for image reconstruction using experimental data, where we have no reference image -even if the number of iterations deduced based on the simulations will not be identical with the a priori unknown optimal number of iterations for experimental data, the latter is likely to be close enough, i.e. the reconstructed image will be close to the best possible reconstruction.

Energy and y-position calibration in 1D setup
A detailed description of the calibration results including the intermediate steps is presented in [33]. The average energy resolution of the fibers is σ E /E = 7.7(4)% and the average position resolution along the fibers is 34±3 mm (FWHM) at 511 keV.

Energy calibration in 2D setup
The intermediate results of the calibration procedure of the 2D setup can be found in [35]. The overall energy resolution obtained at 511 keV is σ E /E = 5.35%. For 14 additional needles, the calibration process failed because their spectra could not be fitted satisfactorily. Events with hits in those needles were excluded from the further analysis.

Results from imaging experiments
We performed a series of five measurements with the three-layered PET array, the 2D coded mask and a point-like source placed in different positions. In all cases, the measurement time was 1201.2 ± 0.2 sec and the number of registered hits was about 2 × 10 8 . Combined results of all image reconstructions, based on the collected data, after 100 iterations, are presented in Fig. 10(a) as contour plots. Each contour plot consists of ellipses bounding 38, 68, 86 and 95 % confidence regions for a particular reconstructed peak. In addition, the green point inside the smallest ellipse (for each reconstruction) points to the reconstructed peak position, while the red cross is the designed source position. Similarly to the results from Fig. 9(a)   groups of ellipses is rotated and the rotation is symmetrical with respect to the x-axis. Nevertheless, the offset of the reconstructed peak position is not symmetrical with reference to the origin but always into the same direction. This hints towards a systematic effect rather than an artefact of the reconstruction. The latter conclusion is further supported by Fig. 10(b) where we present the residuals of the reconstructed source positions as a function of the designed source position for xand y-axes (blue circles and red crosses, respectively). Error bars represent standard deviations of the fitted Gaussian functions, peak positions were determined in the same fit. For x = −20 mm and for y = 0 mm we have three measurements so in order to demonstrate all results in one plot, the points have been slightly shifted around the design values. In the figure it is clearly visible that indeed all residuals for the x position have the same sign and very similar values; the same applies to the residuals of the reconstructed y coordinate: the average residual value for x-coordinate is −1.23 ± 0.08 mm and for y-coordinate it is 0.73 ± 0.45 mm (stated uncertainties are standard deviations). We assume that this constant offset originates from a small misalignment in our setup, e.g. that the source holder was not placed perfectly centrally with respect to the detector.
We performed also nine measurements with the one-dimensional setup and the point-like sources. First, data for the 22  very similar, negative values with an average of −1.03 ± 0.14 mm. The observed offset of the reconstructed source position is independent of that position and is constant throughout FOV. The offset is consistent with the 2D measurement within 1 σ, again showing the small misalignment of the source. We present the whole set of reconstructed images in Fig. 11(c). The mean standard deviation of the Gaussian fits for all source positions is 1.14 ± 0.18 mm.

Results of simulations
In this section we discuss the results of simulations with the full-scale prototype for 1D imaging. Fig. 12(a) shows a reconstructed image (blue filled line) of the gamma vertex distribution resulting from the simulation of 10 8 protons of 90.7 MeV interacting with the PMMA phantom. The image was smeared with a Gaussian filter with a kernel of 2 pixels. The shown reconstructed image is a result of 795 MLEM iterations which corresponds to a maximum of UQI. The image is smooth, without significant noise artefacts. There are small peaks in the tail, behind the main one, but each of them is less than half as high as the main peak so their presence does not affect the DFPD. The orange line represents a depth profile constructed from Monte Carlo true information, in which entries are weighted by gamma initial energies, and the profile is subsequently interpolated to create a smooth line. In general, the reconstructed main peak position is very close to the true one and the shape of the gamma depth profile is reflected properly. The DFP, determined for this particular sample reconstructed image, is −7.64 mm and for the MC truth it is −8.20 mm, which means that the offset is below 0.6 mm.
Offsets of the determined DFP for all regarded energies are shown in Fig.12(b) as a function of the initial number of protons impinged on the target. Presented means and resolutions of the DFP were obtained based on 50 bootstrap samples taken out of 1000 samples corresponding to 10 7 protons each. We see that starting from 50 × 10 7 protons, the DFP mean values are very stable and only error bars are being reduced. The same data are presented numerically in Table 1. Starting from the statistics 50 × 10 7 , the DFP resolutions, calculated as a standard deviation of 50 bootstrap samples, are less than 0.6 mm for all beam energies.

Discussion
We performed a set of measurements varying the source position for two setups: one-and two-dimensional. In the two-dimensional setup we observe that after 100 iterations the reconstructed image has an evident single peak without significant noise artifacts (see Fig. 9). The peak position and its σ-value were calculated via a Gaussian fit to both xand yprojections of the image. The obtained reconstructed peak position is slightly shifted with respect to the designed position in both horizontal and vertical directions. Having reconstructed the images for all measurements at different source positions ( Fig. 10(a)), we see that all images have similar offsets in the same direction. The small standard deviation of those offsets (∆x = −1.23 ± 0.08 mm and ∆y = 0.73 ± 0.45 mm) hints towards their systematic origin rather than a method artefact. They are likely to be caused by a misalignment of the setup elements. In fact, our experiment showed that a set of measurements with radioactive sources like the one performed can be used to detect setup misalignments and correct for them. Furthermore, for determining a proton range in a clinical setup, one would rely on detecting relative positions determining different proton ranges, so a constant shift does not hinder a precise range shift determination.
Results from a single-dimensional setup show similar outcomes as the 2D case, having only one dimension instead of taking xand yprojections. Also here, after 100 iterations ( Fig. 11(a)), a very clear peak without large noise artefacts is visible. In this particular case, we have a peak position shifted on average by −1.03 ± 0.14 mm with respect to the designed position which is still within one standard deviation (σ = 1.2 mm) (see Fig. 11(b)). It is consistent with the shift in the x-direction determined from 2D measurements, reinforcing the assumption of a misalignment of the setup elements.
While the three-layered PET array coupled with a structured collimator provides good-quality two-dimensional images of a radioactive source, we show that we also achieve a good reconstruction in one dimension with our 1D setup. A 1D gamma depth profile is already sufficient for the determination of a range shift in a clinical scenario and a 1D setup requires less readout channels (and therefore is more cost-effective) for its operation than a full 2D setup, if the system is scaled up to extend its FOV.
All reconstructions performed for experimental data were stopped after 100 iterations. For a point-like source, the number of iterations (after a certain point) does not change the reconstructed picture much in the sense that the peak position remains the same and only its width is being reduced with each subsequent iteration. In Fig. 13 we show a dependence of the peak width on the number of iterations for a sample 2D reconstruction (solid blue and dashed red lines) and 1D reconstruction (dashed-dotted green line). It is clearly visible that the peak widths are constantly decreasing with the number of iterations. In the 1D reconstruction, the large number of iterations reduces the peak to a few bins, which hinders a Gaussian fit and causes the steps in the figure resulting from fit instabilities rather than the real change of the peak width.
In order to test the method with source distributions relevant for proton therapy monitoring via PGI, we investigated a setup featuring a larger detector: the full-scale prototype and its performance via simulations of its response to PGs originating from a PMMA phantom irradiated with a proton beam. The simulations were performed for four proton energies. We evaluate our results comparing the DFP values obtained from the reconstructed images with the true ones (obtained from Monte Carlo truth distribution). Among other possibilities like fitting a sigmoid function to the distal edge of the profile, we choose to use the 50 % location to define DFP. Due to the smoothness and steepness of our reconstructed profiles, this is an on the one hand very simple and on the other hand very robust method to assign a range value to the profile. The dependence of the precision of the reconstructed DFP on the number of impinged protons is presented graphically in Fig. 12(b) and numerically in Table 1. For 10 8 protons, the error of the distal fall-off estimation is within 0.6 mm for all regarded energies and an average precision of the DFP estimation is 0.72 mm. This value is confronted with precision values reported by other groups working with different setups (both simulation and experimental) in Table 2. In the comparison, we include besides our results also those obtained with a twodimensional coded mask setup [15], multi-parallel slit (MPS) simulation [10] and experimental [11] results as well as clinical results obtained with knife-edge slit (KES) design [8,9]. Although precision-wise our results outperform most of the works under comparison, we admit that our simulation model does not take into account certain effects which could deteriorate the result, e.g. the neutron background. However, the authors of [15] showed that this contribution can be efficiently eliminated by employing a Discrete Cosine Transform. We also do not take into account other effects, e.g. fully realistic resolutions, the time structure of a clinical proton beam leading to high prompt gamma rates, as well as statistics reduction due to the dead time of the detector and finite throughput of the data acquisition system, which are obviously included in the experimental results of [8,9,11]. We suppose that these effects will not deteriorate our results a lot as the energy resolution is validated against experiments and we could show in [22] that our system should be able to deal with clinical rates.  9 10 8 0.72 CM simulation [15] 122.7 10 8 2.1 MPS experiment [11] 95.09 3.8 × 10 8 1.2 MPS simulation [10] 160 10 8 1.30 -1.66 KES clinical [9] 100 -160 ? 0.7 -1.3 KES clinical [8] ? ? 2.0

Conclusions
So far, the use of coded mask systems for proton therapy monitoring via PGI has been considered only theoretically, i.e., via Monte Carlo simulations. In this paper, we show the experimental results of its practical implementation and evaluation of performance using the MLEM algorithm for image reconstruction. In the first step, we use small-scale prototypes of the detector and test the image reconstruction framework with point-like sources. Experimental results confirmed that the near-field coded-mask imaging is feasible with gamma sources, and our implemented image reconstruction framework is able to reconstruct source positions quite well with both 1D and 2D approaches: a clear peak is always visible, and the reconstructed images are free from artefacts. The offsets between the designed and reconstructed source positions are close to each other for each particular setup which indicates its systematic nature.
The second step comprised Monte Carlo simulations with a realistic source distribution, i.e. obtained from a PMMA phantom irradiated with a proton beam. Here, the simulated detector had a larger size of 110.6 × 100 mm 2 and was coupled with a larger mask compared to the small-scale prototypes. This not only lead to a larger FOV, but also increased the setup sensitivity to the details of the imaged source distribution. Our investigation conducted for different beam energies from the range 85.9-107.9 MeV and varying statistics shows promising results. The reconstructed images resembled the Monte Carlo truth PG depth profiles. At the statistics of 1 × 10 8 impinging protons, the mean precision of beam range estimation in the investigated beam energy range was 0.72 mm (1σ), which makes the setup competitive to other PGI approaches with passive collimation, such as KES or MPS investigated by other groups. Due to the promising results of the simulation of the full-scale prototype, we proceed with the construction of a detector with exactly this design.
Systems, Institute for Experimental Molecular Imaging, RWTH Aachen University. They were developed within the European Union's Horizon 2020 research and innovation programme under grant agreement No 667211. When working on the project, Ronja Hetzel was supported from the German Research Foundation (DFG) project number 288267690, and Andreas Bolke from the DFG Grant COMMA, project number 383681334.
The authors are responsible for the content of this publication.