End-to-End QA in Radiation Therapy Quality Management

End-to-End (E2E) testing is a method originating from computer science that is designed to determine whether an application communicates as required with hardware, networks, databases, and other applications. This paper is to advocate that the quality management (QM) of modern radiation therapy (RT) would benefit from more regular use of E2E based quality assurance (QA) in the local clinic. The argument is that modern RT delivery is performed through some process linked by a chain of interdependent stages and actions mediated by complex interchanges during the patient’s treatment. These actions along the chain are often modified due to decisions by clinical staff who are interpreting information acquired along the process. While physics QA can validate that each of these steps are technically achievable (e.g., through machine QA) such conventional QA does not guarantee that the overall process is being carried out as planned even when it has been described by a well-defined protocol and delivered by well-trained staff. The paper briefly reviews the changes in programmatic design as RT has become more complex, the associated changes in RT QM, and some past examples of E2E testing in RT clinics, usually performed during the implementation of some new RT technique or during external audits of the clinic’s practice. The paper then makes the case for increased E2E QA based on the lessons learned from this experience and ends with some suggestions for implementing effective and sustainable E2E testing in a clinic’s QM program.


Introduction
Since the initial IC3Ddose meeting in 1999, contemporary radiation therapy (RT) has moved fully to the conformal radiation therapy techniques foreseen in the 1960s [1].Advances in treatment simulation imaging, computerised treatment planning, and radiation delivery enabled Intensity Modulated Radiation Therapy (IMRT) treatment tailored to each patient.The resulting spatially modulated delivered doses (typically called the 'dose distributions') fit more tightly around the specific target volumes so that tumour dose could be increased appropriately while dose to adjacent normal tissue was decreased.The progress in conformal RT came with a challenge: an increased potential of missing the target because of patient set-up errors, organ motion, or machine malfunction during treatment delivery.This risk motivated the development of on-line image guidance capabilities through the addition of imaging systems on the treatment unit [2][3][4] to enable patient set-up confirmation before dose delivery.Experience with image guidance over many patients' treatments also made the oncology community aware that the tumour region and surroundings could change between planning imaging and the first session at the treatment unit or afterwards during the course of the multiple fractions of radiotherapy [5][6][7][8].Therefore, strategies were developed to account for and correct these changes, for example, by reassessing a patient's treatment plan and delivery at some dose level or fraction interval [9][10][11] to confirm that the initial plan is still appropriate or, alternatively, to adapt the further treatment to be given through replanning.More advanced Image Guided Adaptive Radiation Therapy (IGART) approaches envisage use of daily online imaging (cone beam computed tomography (CBCT) [2] or magnetic resonance imaging (MRI) [12,13] at the treatment unit) to monitor anatomy changes and provide for replanning without going back to the CT simulator [14].Other sophisticated approaches for inter-fraction IGART extend the adaptation beyond corrections for only anatomical changes by incorporating dose delivery monitoring during each treatment (by electronic portal imaging devices (EPIDs) or some other in-vivo dosimetry) to enable dose distribution correction as fractions continue through the treatment course [14][15][16] with next treatment fractions using multileaf collimator (MLC) trajectories modified from the originally planned delivery.
All these IGART approaches involve a patient's treatment being modified during their radiation therapy based on new information gained during the treatment course in a process that involves data transfer between sophisticated advanced equipment, often with human evaluation [15,17,18] to assess and choose between different avenues of action at the multiple stages of the care (see figure 1).Clearly, well-executed IGART brings with it a commensurate challenge for the accurate implementation, and the associated validation, of very complex treatment activities.This challenge has been recognized and addressed by the RT community in recent decades (as outlined below).The objective of this paper is to review the evolution of practices to address these challenges by examining some pertinent literature and reflecting on the issues this literature raises.The author then proposes that the quality management of the systems required for modern RT would benefit from the inclusion of more regular testing to validate that the delivery provided by the cancer centre fulfils the intended treatment objectives, that is, by adopting end-to-end (E2E) testing in the clinic as an added tool [19][20][21][22] to ensure quality radiation delivery (the term Kron et al. use in their excellent 2022 review of radiotherapy quality management [17] to designate the RT that will deliver the intended patient treatment).

Evolution of Radiation Therapy Quality Management
The new technologies and the processes adopted during the uptake of IGART have improved the quality of RT and reduced errors.The modern treatment units are better controlled, have more stable radiation output [23], and provide greater functionality than the units in the 1980s and early 90s, and treatment planning systems and the imaging for treatment planning have improved considerably.Evidence has shown that the quality dose delivery enabled by IGART has provided new treatment options for patients and resulted in higher clinical control rates and reduced toxicity [7,24].But IGART involves complex processes with many steps, some requiring human intervention, which introduces a potential for new error pathways [24][25][26].The levels of complexity, integration, and implementation variability inherent to IGART processes with their image-based planning and computer-controlled delivery are such that device-centered quality assurance (QA) protocols alone can no longer ensure quality radiation delivery [27].Therefore, modern radiation treatment techniques have compelled an associated reassessment of the quality management (QM) necessary for safe beneficial patient treatment: a rethinking of both the QM at a programmatic level of the complete radiation delivery service and the QA of the various components used within the IGART process.One should note that the characterization of radiotherapy as IGART should not be limited to external beam modalities alone; remote-afterloader high-dose-rate brachytherapy (HDR) techniques are increasingly image guided and involve processes with equivalent complexity, therefore they are considered to fall under the IGART umbrella for the purpose of this discussion.

Process Quality Management
While safe and appropriate RT delivery has always imposed strict requirements on a cancer centre's radiation program [17,[27][28][29][30][31], the development of modern IGART has generated further guidelines from professional, national, and international bodies outlining how RT programs should function [32][33][34][35][36].The various documents have many common elements dealing with the full scope of care, from the initial patient evaluation, through treatment and into subsequent follow-up [32][33][34]37,38].Of particular interest for this IC3Ddose proceedings paper, these guidelines set standards for the programmatic environment for radiation treatment preparation (simulation and planning) and for delivery, with recommendations on the health care team staffing, on the roles and responsibilities of team members, on their training and on the maintenance of their qualifications.The guidance documents universally maintain that radiation services must develop and implement clear-cut protocols and procedures describing how patient care is to be delivered, and that they must also set up well-defined QM programs to ensure that the RT process is followed as required and that the radiation specify QA measurements and tests that should be performed prior to and during technical implementation to assure the systems and equipment are able to perform to the accuracy and precision required for safe use.The coloured circles indicate some typical measurement points for the relevant QA performed by the medical physicists (MP), radiation oncologists (RO) and radiation therapists (RT).One could draw a similar figure to illustrate stages where clinical staff must make decisions to proceed with treatment based on observations and evaluations made at earlier points in the process.The figure would look similar to b) but with different players (for example, RTs rather than MPs) at different stages.environment (equipment, hardware and software, and the radiation care team) is performing as needed.The evolving suite of guidance documents underscore that quality RT is not determined only by the equipment used but also by the complete environment in which the treatment process is provided.And RT programs have moved to programmatic analysis and design to improve this environment.
Two of the approaches to improve RT QM at the process level merit discussion at this point.They both specifically address the fact that the RT process is influenced by human factors which can affect treatment quality.The first approach has involved instituting systems to track and analyse incidents (defined as unwanted or unexpected deviations from the RT process that could potentially adversely affect patients, staff, or equipment).Specific local, national, and international incident reporting systems [26,[39][40][41] have been created to report incidents to QM programs, to analyse the impact of these incidents, and then to report the results of investigations back to staff in order to raise awareness and improve process adherence.Incident reporting provides reactive tools to identify incidents that have occurred in the radiation treatment process, and analysis of incident reporting data can help guide the development of improved QM systems to avoid recurrence.The second tactic to improve QM has been to undertake analysis of how human factors affect RT processes [24,42] specifically addressing how human behaviour, abilities, and limitations influence the programmatic systems adopted under the RT program guidelines described in the last paragraph.Field observations and workflow analysis are then used to improve the design of the technologies, steps, and workflows of the work environment so to mitigate the treatment delivery variability introduced by these human factors.Results indicate that processes can be improved to account for human factors; however, some influences that weaken procedural compliance remain present [24,42].This further motivates this article's promotion of more regular adoption of E2E testing to provide recurrent monitoring of select RT processes.

Implication for Quality Assurance testing
The technical advances in the imaging, planning and radiation delivery equipment and the networks connecting their interactions have also prompted an ongoing review of the quality assurance (QA) testing required in a well-developed QM program [27,29,[43][44][45][46][47][48].Select QA tests have come in and out of fashion during the transition from the simpler 2D treatments of the 1980s, through the implementation of IMRT techniques in the 1990s, to the comprehensive modern IGART.The QA evolution is because testing must reflect the particular technical environment in the clinic at a given time, and because professional opinions on requirements will change as this environment changes [23,46,49].For example, the technical advances in the 1990s motivated the development of threedimensional (3D) dosimetry to verify that the dose distributions from the novel IMRT techniques delivered the correct dose to the right location.At that time, linear accelerators were just being modified to include MLCs, computerized treatment planning systems (TPS) were just being improved to included CT based inhomogeneity corrections, TPS algorithms had not yet been fully tested in the small-field conditions characterizing dynamic MLC delivery, and the control systems and interfaces between the treatment planning systems and the treatment units were under constant revision [27,48,50].The uncertainty in the tools available for early IMRT techniques motivated proponents to contend that it was vital to do full 3D dose assessment of the deliveries during a clinic's IMRT commissioning and, perhaps, even for every patient before treatment [46,51] (and encouraged the development of the IC3Ddose community).With time the treatment planning algorithms were better validated, the treatment unit control systems improved, and more efficient alternate patient specific QA was developed using detector arrays [52,53] or Electronic Portal Imaging Devices (EPIDs) [54][55][56][57][58] so that the requirement for regularly performed full 3D dosimetry diminished.Currently such patient specific delivery validation is considered part of a tiered RT QA environment often illustrated as a multi-layered pyramid (see figure 2) [59,60].The various QA tests at each level are to ensure that all individual components required for the RT process work as intended individually and in their interactions.And external agency guidance documents, national technical control guidelines, and internal treatment and QM procedures have been developed to set requirements for appropriate unit The tests progress from unit tests of individual components at level 1 to higher level tests that validate the operation as units work together with data exchange between those units, to eventual quality management of the whole process.The different levels build on the conceptual pyramid proposed by De Wagter [59].The details of the tests at each level might be debated and one could move elements between the levels depending on definitions set (for example the Australian Clinical Dosimetry Service (ACDS) describes the hierarchy using 3 levels [60]); the figure is intended to illustrate that RT is a complicated process with many elements that need to be tested individually and also as they work together.End-to-end testing for some treatment schema acts at the fourth level (ACDS level 3).Some sample tools and reference guidelines proposed by various bodies in the radiation oncology and medical physics community are listed to illustrate the complexity of the QA testing needed to ensure quality radiation delivery [see text for references].
testing [61][62][63][64][65][66][67][68] of the equipment (level 1 testing in figure 2), the validation of the interaction of select adjacent stages [68][69][70][71][72][73][74] in the IGART process (level 2 and 3) and finally to guide the more comprehensive QM of the complete IGART process [17,32,33,37,38,[75][76][77].In this schema, the machine-specific QA unit testing is specified to be implemented at specific stages of the IGART process (illustrated in figure 1b) to ensure that the equipment functions as required within specific tolerances [17] (for example, validating the radiation output of the linear accelerator, the couch motion on treatment unit, the positioning of the EPID, etc.).The further level 2 and 3 type testing is stipulated to ensure the interaction of units, and data transfer between units, function as expected.A common clinical implementation at these levels is the patient-specific QA practiced in cancer centres to validate that the treatment prescribed for a particular patient can be delivered as planned on the treatment unit.Since the clinical scenarios specific to the individual patients are used, such delivery QA is a type of end-to-end testing (see below), albeit of only one select component of the IGART process (encompassing dose calculation in the treatment planning system, through determination of treatment unit's control parameters with transfer to the delivery unit, through to dose delivery).Level 2 and 3 testing is also directed at other components of IGART particularly addressing data transfer and interconnection of different systems in the IGART chain.The functioning together of the CT or MR imaging components with the treatment planning systems, of the planning systems and the radiation delivery units, and of the delivery units and on-board imaging modules add additional QA requirements with appropriate specifications to ensure that the systems function appropriately together and that their interaction does not add failure modes not apparent when the components are tested alone [17,27].The guidelines at these levels may include additional unit tests with slightly different specifications and tighter tolerances to validate that adjacent components function well together [27].The main tactics for QA directed to the highest level in the pyramid fall into two approaches: firstly, risk-based analyses of the radiotherapy processes within a radiation service's particular setting to identify efficient, effective, and safe treatment delivery; and secondly, end-to-end testing of the complete process.Risk based approaches to radiotherapy process QM have evolved in the last 15 years [78] reaching fruition in the report of the American Association of Medical Physicist's Task Group 100 (TG100) [76].The TG100 report represents a departure from previous TG reports in that it does not provide guidelines for specific testing (with set tolerance levels and frequencies) but rather presents a framework for designing QM activities that work through the full RT planning and delivery process.The analysis uses engineering tools such as process mapping, failure modes and effects analysis (FMEA), and fault tree analysis (FTA) to set and prioritize a program's QM activities based on estimates of the probability of identified failures in the process with measures of the impact of these failures on clinical outcome [78].Examples of specific RT process analysis in TG100 (restricted to IMRT) and reports of clinical applications of FMEA RT analysis all conclude that human factors have a considerable impact on the quality of the radiation delivery.The analysis of the complex interaction of equipment, RT processes and people on RT QM has led a number of guidelines to suggest that end-to-end testing should be more regularly practiced to achieve quality radiation delivery [17,77,[79][80][81].

End-to-end testing
End-to-end testing (E2E) originated as a computer science methodology to assess if a software application, built up from linked distinct software routines, performed as required in the field: the application's performance would be tested from beginning to end under real-world working scenarios.In computing, E2E tests are devised to validate that the software routines forming the application work well together, that the application communicates and interacts with hardware, networks, databases, and other applications as required for proper functionality, and that the application delivers the output expected by users.E2E testing for radiation therapy has the same intent, with the testing serving to verify equipment and processes (for example as shown in figure 1) rather than software applications.

E2E testing in radiation therapy
In the RT context, E2E testing is to confirm that some planning/delivery/adaptation process works from start to finish correctly so that the desired radiation dose is delivered accurately to the intended spatial location in the irradiated volume [81].As such, E2E testing operates at the higher levels of the testing pyramid from figure 2. Since the intent is to verify dose delivery, RT E2E testing is always performed using phantoms containing some dose measuring system(s).In effect, the testing uses the phantom as a patient surrogate and the application of the RT process being tested is evaluated by having staff perform the specific process protocol to this surrogate (see figure 3).The ability to verify the delivered dose provides a quantifiable metric for the evaluation.Thus, E2E testing enables a quantitative indication that i) the RT process is performing as required and that ii) the team applying the process is doing so correctly.This builds confidence in the procedure within the clinic.

Application of E2E testing in standard radiation therapy practice
To date E2E QA has been practiced intermittently in RT programs, most commonly in three settings: Setting i) during the creation of new radiotherapy devices and techniques in some development environment (in industry or a clinical research setting); Setting ii) when a clinic is implementing a new technique, especially if the technique is more advanced than the clinic's standard practice to date, or Setting iii) during external auditing under some multi-institutional program [81][82][83][84][85][86][87][88][89][90].This latter application of E2E testing through some external audit is usually performed to help benchmark a clinic's execution of a specific IMRT or IGART delivery process against performance in the wider community or to credential a cancer centre's ability to perform a specific delivery protocol when applying to join a cooperative multi-centre clinical trial using that technique.A universal feature of the external E2E audits is that the service is designed to probe the particular cancer centre's ability to deliver the specific RT technique.For audits performed for clinical trial credentialling, the intent is to verify that the cancer centre can deliver the RT protocol required in some arm of a cooperative multicentre clinical trial.This serves to improve the RT influenced data in the trial by confirming quality Figure 3.An example of end-to-end testing as applied to radiation therapy.a) The workflow for the E2E test that was run to validate the specific IGART process that was being implemented in Kingston during the transition from 3D conformal delivery to full IMRT treatment [19,21].b) The E2E test used an in-house purpose-built head phantom containing a 3D xylenol-orange Fricke gel dosimeter [21] (bottom photograph).The phantom was irradiated using the workflow under the clinic's head-andneck (H&N) IMRT treatment protocol.The resulting performance evaluation showing planned and measured dose distributions in two orthogonal views through the treatment volume.The top distributions are from the Varian-Eclipse generated treatment plan; the middle dose distributions are from the optical-CT measurements of the irradiated gel dosimeter.The lower images show the gamma evaluation maps (3%/3mm) indicating the spatial dose agreement between intent and delivery.Although this was an E2E test on a phantom, target and organ at risk contouring was performed by a radiation oncologist (labels at bottom right) as per the H&N treatment protocol being implemented, and all dose distributions and the resulting gamma comparisons were mapped on the anatomical contours to flag if any delivery failures indicated by the gamma evaluations were clinically relevant (rather than just rely on gamma pass rates which can mask discrepancies [47,74,[91][92][93]). Providing contours also enabled comparison of measured versus planned dose volume histograms for regions of interest in the irradiated volume [19,[94][95][96].The E2E validation during IMRT implementation before the first patient was treated provided considerable confidence to the implementation team that our IMRT protocol was within our program's capabilities.
radiation delivery from single centre(s) and better ensuring consistent RT delivery across the centres participating in the trial [81,84,97].

Lessons learned from past radiation therapy E2E testing
The objective of the E2E testing in the developmental Setting i) above is to validate that the technical components (equipment, hardware and software) required for some subset of the system under development are operating individually and together as intended [27] (thus, complementing level 1 and 2 testing).Such E2E testing for the novel treatment system being put together does not necessarily cover the full RT process, rather it concentrates on a select limited segment of the system to ensure that the segment is not perturbed by unforeseen complications introduced as it is assembled.The testing in this setting is most often performed by specialists, or by select development team members, rather than the individuals in the clinic who will eventually execute the process during treatment delivery.Such work is not widely reported in the open literature; it is mentioned in this review for completeness but is not considered further.
There have only been a few reports of RT E2E testing performed when novel treatment techniques are first introduced in the clinic (Setting ii above).A barrier for widespread reporting of such testing has been that such work is often interpreted as regular clinical practice and not innovative publishable research.The literature that is available usually presents developments of novel dosimeters or phantoms for particular E2E clinical applications or describes particular E2E protocols for specific RT techniques.Some of these Setting ii) applications have been published in past IC3Ddose conferences, often to illustrate the application and possible advantages for 3D dosimeters in clinical E2E QA (well-reviewed in ref. [98]).Some specific developments and applications for E2E QA to evaluate specific clinical treatment protocols are referenced in Table 1 below (which presents a list of vendors that supply various commercial tools and services to aid in-house E2E testing) and in the most recent IC3Ddose proceedings from the virtual meeting in 2021 [99][100][101][102][103].These and other [21,53,96,[104][105][106][107][108][109][110][111][112] clinical reports span a range of investigations into the feasibility of RT E2E testing, nonetheless they do make some common observations.The papers indicate that clinically relevant E2E QA can be performed with a wide range of phantom designs that may contain various point, 2D or 3D dosimetry systems (see discussion below).A common feature in this work is that the phantoms and dosimeters must be well characterized and tailored to the RT delivery technique being tested.Select studies acknowledge that there are extra demands in the data processing for E2E RT QA, starting with a requirement for tools that can manage the TPS planned dose distributions and the  [113,[135][136][137]] measured delivered dose data sets well and that can ensure that they are well registered before evaluating agreement [96,113].Of relevance to this review, the literature investigating clinical E2E testing positively reports that establishing in-house E2E QA is feasible and that the tools (phantoms, dosimeters, analysis systems) for a cancer centre's E2E testing program can be prepared locally with reasonable effort or be purchased from existing service providers.
The most comprehensive literature describing the benefits of RT E2E testing comes from the experience from external audits (Setting iii above).Basic historical reviews [81,83,84] summarize E2E and dosimetry audits in general, giving an overview of E2E auditing by an external agency.Other reports describe the E2E activities of particular bodies such as the Australian Clinical Dosimetry Service (ACDS) [86], the Institute of Physics and Engineering in Medicine in the United Kingdom [90], the International Atomic Energy Agency (IAEA) [88,89,138], the Imaging and Radiation Oncology Core QA Center in Houston ( IROC-H) in North America [83,84,139,140], or in a multicentre collaboration such as in the United Kingdom [130].
These reviews outline well how external audit RT E2E testing proceeds: the auditing body prepares a phantom containing dose measuring tools appropriate to the specific technique under audit.The objectives of the test (e.g., required given doses to the specific targets) are set and communicated to the centre performing the run, as are the criteria that determine if the observed E2E delivery passes or fails expectations.The phantom is delivered to the cancer centre and the RT clinic staff is directed to run the phantom through all steps and manipulations as would be used on a patient undergoing the RT process under investigation (for example with a workflow similar to that shown in figure 3).Once the radiation delivery has been completed, the phantom is returned to the auditing centre and the dosimetry systems are analysed to determine whether the delivery was as intended.
There are some common observations from the experience gained over the large number of external E2E audits cited above.While most of the centres coming into the external audits had wellestablished processes for the IMRT and IGART techniques being audited, and had well-formed in-house physics QA programs, a significant fraction of centres did not pass the initial audit runs.This result is reported consistently across all auditing bodies.One encouraging trend observed by IROC-H, which has the longest reported follow-up for audits on multiple phantom sets designed for different RT delivery tests (e.g., head and neck (H&N), lung, liver, etc.), is that the passing rate for E2E testing on the various phantoms has improved with time.For example, Ibbott [141] reported that ~30% of institutions had failed the 250 IROC-H H&N IMRT E2E audits performed before 2008, by 2013 only ~19% failed (over 1139 tests for ~760 institutions) [140], and by 2016 the failure rate dropped further to ~10% [142].Unfortunately, the pass rates for external E2E audits have not yet approached 100%.Also, a not insignificant number of centres fail the E2R audit again on subsequent runs on the same RT protocol, an indication perhaps that the failure was not the result of random error but rather was related to some problem with the radiation delivery process in use by that centre [143].
An instructive literature has grown which reports on investigations of the patterns of failures observed over the IROC-H experience with external E2E audits [52,93,140,[142][143][144][145][146][147][148].Throughout the IROC-H audits, program failures have consistently been identified with issues in the physics domain, even daily output variations have contributed to unsuccessful E2E phantom irradiation [144].About 1 in 5 failures in IROC-H H&N phantom tests could be attributed to TPS errors [149].The errors included incorrect data entered into the TPS (e.g., output factors, percent depth doses, multi-leaf collimator (MLC) defining parameters, etc.) and inaccurate beam modelling within the TPS (which led to pencil beam algorithm models being disallowed in National Cancer Institute sponsored trials [144]).Some of the failures were well identified with known modelling issues at the time (MLC leaf gap modelling in a particular TPS [150]) and were corrected by the medical physics community; but similar problems then reappeared more recently resulting in unsuccessful E2E tests in centres using another TPS [145].With lung and spine E2E audit phantom irradiations [146] the failures resulting from dose calculation errors were more evident with the highly modulated spine phantom irradiations.These results indicate clearly that E2E testing (in which the whole treatment chain is tested from planning to delivery) can reveal problems with basic physics modelling data and suboptimal TPS calculation algorithms that are missed during the unit specific evaluations in conventional physics QA.This observation has led to the recommendation that E2E testing be a part of TPS commissioning process [77,145,151].
One observation from the IROC-H H&N phantom audits [93] worth note is that audit failure does not correlate well with the results of the patient specific IMRT QA used by the institution performing the phantom irradiation.The discordance may be attributed in part to the fact that the patient specific IMRT QA test might use some surrogate measure of the delivery (with fluence measurement, or dose to a detector array positioned outside of the phantom compared against a TPS prediction).And the in-house IMRT QA might measure the delivery from a single gantry angle rather than going through the full gantry motion in the actual radiation delivery.Furthermore, the pass criteria in the patient specific QA might use too simple a metric (e.g., gamma pass rate rather than gamma maps or other comparator) to score success, and this may can mask failed delivery [47,74,[91][92][93].The poor correlation between patient specific and E2E QA validation of treatment delivery suggests that E2E testing should also be part of any commissioning of internal E2E patient specific QA techniques.This would be a type of Setting ii) E2E testing.Some E2E audit failures have been attributed to incorrect positioning of the phantom during irradiation specifically because the RT clinic's delivery protocols, which specify how patients are to be set up, have their position verified, and corrected if needed, were not followed.In this case the E2E testing is catching noncompliance with the clinic's RT delivery protocol.As noted above, the auditing bodies instruct centres to have staff handle the phantom as if treating a patient.But often external audit testing is performed solely by medical physicists and may not involve the radiation therapists who treat the patient.This can lead to suboptimal performance during the phantom irradiation, especially when the test requires careful phantom set-up; E2E test failures have been attributed directly to the physicists not following set-up procedures as per centre policy [140,147].Such errors have been detected because the IROC-H phantoms contain multiple dosimetry systems, in particular point detectors and film dosimeters for localization of the delivered dose distribution [152][153][154].In many of the E2E runs the auditors have been able to use the point and film detectors to distinguish between systematic failures leading to the given dose (the dose value at points in the irradiated volume [155]) being out of the 7% acceptance criteria, and localization errors in which the correct given dose was delivered but to a shifted position perhaps because of mistaken phantom set-up or perhaps due to incorrect accommodation of motion (in IGART E2E testing of radiation delivery under motion).The advantages of dual detectors for the breakdown analysis of E2E test failures have been reported for head and neck [140,142], lung and spine [143] and liver phantom [147] audits.
The auditing E2E QA literature clearly establishes that E2E QA can identify deficiencies in the implementation of RT techniques that are not detected in the cancer clinic's conventional physics and radiation therapy QA nor apparent in the design of the protocols that set the delivery processes for that technique.In an IMRT uptake audit [88], the IAEA identified that clinics with more IMRT delivery experience did have higher pass rates in their audit.But no similar significant difference was observed in the pass rates for small and large cancer centres in IROC-H H&N IMRT audits [140].Audit success has also been reported to be unaffected by the number of treatment linear accelerators (linacs) in a centre or the number of physicists per linac affected [140].More broadly, the full IROC-H clinical trial credentialling experience shows that even well-established and experienced cancer centres fail phantom irradiation tests.It should be noted that the passing criteria for the IROC-H tests are generous (for a variety of historical and practical reasons, delivered given dose must be within 7% of plan and distance to agreement within 4 mm); if the criteria were tightened the failure rates would increase [142].
In summary, external audit experience shows that E2E testing adds value to a cancer centre's QM program.Deficiencies in treatment delivery can be identified which are otherwise missed in conventional clinic QA testing.Success in a particular RT technique's E2E test adds confidence to the centre that the radiation program can deliver the technique as intended.Running the test provides a learning experience for the team involved and can help fine-tune the policies and procedures defining the technique's delivery process.The results from the test can provide benchmark data for future reference useful to establish continued compliance in the delivery.These benefits to RT QM do not require that the E2E testing be moderated by an external auditing centre, they would apply equally to in-house E2E implementation.

Additional Observations on RT QM
While QM has been a constant concern in RT practice from its very beginning, QM programs have evolved as RT has become more complex, in part by adopting tools and approaches from engineering and industry [78].Incident reporting, FMEA risk analysis, and human factors analysis have all been employed in the last years to advance safe clinical practice.The lessons learned from these developments have become as important a part of RT practice as are fundamental medical physics and dosimetry knowledge and practice.The idea that human factors (i.e., factors related to human behaviour, abilities, limitations, and relationships to the physical and organizational work environment) influence RT processes is now well established.And it is well understood that protocols and procedures for specific RT treatments must be well defined and laid out with appropriate training to minimize these human factors perturbing safe practice.The results of incident reporting programs have established that regular review of clinical performance with reports of these reviews to cancer centre staff can improve protocol compliance and reduce future incidents.FMEA can be used to scrutinize a program so that riskier steps in the RT process come under more scrutiny.Analysis may also enable QM programs to better allocate resources so that a program's efforts are devoted to the most beneficial QA by moving from tests that give little value to tests that better limit failure.
But the literature also indicates that these interventions and analyses cannot eliminate all policy non-compliance.For example, incident reporting [39], FMEA [78], and human factors analyses [18,24,42] accounts have indicated that the patient set-up immediately prior to RT dose delivery often remains subject to operating procedure non-compliance because of human behaviour, even after a process has had steps and workflow improved.The protocol deviations in patient setup have been explained as being the result of staff not being suitably trained or not fully understanding the situation they are encountering (because of limited information or communication, or because they are working under stress).And various corrective recommendations follow, for example suggesting the adoption of using checklists [156], or time-outs, etc., to provide more stable operating conditions.But it could be that the variation of performance is not only the result of poor training or judgement or imperfect process design, but rather is a feature of how individuals manage tasks and decisions in a process and how they make decisions.
For one thing individuals have variable risk tolerance.This is observed in our day-to day lives and regularly in the clinic, for example, during linac output calibration where one physicist might adjust an output that is within the QA tolerance while others will leave it be.This example suggests an illustration of the influence of variable risk tolerance on a QA test, see figure 4. The figure is a modification of a classic representation of how action and tolerance levels [17,157] are used to define the outcome of some QA test and how measurement uncertainty may influence the course of action.The modification is the addition of another row at the bottom to illustrate that decisions made during the QA are also influenced by the risk tolerance of the person evaluating the measurement.This added factor determining action introduces variability to the QA outcome; different risk tolerance would similarly influence the completion of any RT processes since the processes are comprised of multiple decision points that define subsequent action.Investigation of the influence of risk tolerance on decision making in medicine is in early stages [158][159][160][161], and multiple factors such as training, past clinical experience, personality traits, and tolerance for uncertainty, have been identified as affecting an individual's decision making.Some models for decision making [161] propose that a person's reasoning is not exclusively based on analysis of quantifiable facts and measures; rather their information processing also involves simplified categorization of situations into more qualitative and intuitive framework (termed "gist") influenced within an acceptable range defined by some tolerance specified by investigation and action levels used to trigger a specific action after testing [17,157].A QA policy and procedure will use the investigation and action levels to mark when the system might only require examination after the test (for example, a repeat measurement later in the day in daily linac output measurement) or perhaps generate a more critical action (taking treatment unit out of service until recalibrated).QA protocols accommodate measurement uncertainty (∆) by setting these levels appropriately [17,157].Row HF is intended to show that human factors such as individual observer risk tolerance may blur the decision to undertake a specific action after some observation.This may lead to variation of practice even when a process protocol has been well defined and laid out.For example, not all staff might reposition a patient whose set-up images do not align with the planning image within limits of the treatment protocol even though they are working to the same treatment policy.by past experience and past encounters of similar situations.And as individuals become more experienced in their roles, their decision making becomes more gist weighted [161].This is all to say that variations of performance in an RT process cannot be completely controlled by external factors since they may be basic consequences of human behaviour.
In a seminal paper on risk management [162] Rasmussen suggests that top-down process design using traditional approaches may not work in a dynamic environment with changing technology and a high degree of integration and coupling of systems (as encountered in RT programs).Traditional design might attempt to account for the whole process being more than the sum of its uncoupled parts by adding task instructions (i.e., protocols or performance criteria) to better ensure the whole process proceeds as intended.But such an approach misses that the tasks may not just proceed in sequence but may be active simultaneously.And the individuals doing the task (the actors) may start to not follow the task instructions to the letter but rather adopt practical modifications to better achieve the outcome intended.The paper points out that this is the reason "working to rule" is an effective labour action since work often slows down when actors do things exactly as written in policy.Therefore, modeling of human behavior in terms of a stream of acts with task instructions to control processes may not be a reliable standard for judging actual behavior in work.A recent analysis of the variability in practice in medical physics RT treatment plan review (i.e., plan or chart checking) across 15 cancer centres in Ontario, Canada, provides some interesting observations [163] on these points above.The investigators observed that standardized checklist use within a centre did not automatically reduce checking variability.They note individual physicists may not have been following checklists to the letter and suggest this indicates the importance of checklist development guidelines [156] to avoid ineffective checklists.The paper also highlights a regular aspect of planning checking, when to initiate a replan.It notes that there was consensus in all centres that plans that deviated from the centre protocols (e.g., did not meet planning dosevolume histogram (DVH) objectives) would be sent for replanning.However, sub-optimal plans (which met the program's DVH criteria but could still be improved) generated variable action, not only because of workload or a desire to get the patient on treatment as soon as possible, but also because individual medical physicists tolerated differently the level of sub-optimality and the need for replanning.The authors comment that variation in practice is not inherently a bad thing but that in principle uniformity of practices should be desired and that identifying factors that cause variation in practice can lead to improved processes and ultimately higher quality RT.
The nature of RT delivery in a process with human decision making throughout, means that there will be variation of delivery within a centre and between institutions.External audit E2E QA has been established to evaluate and control the variation between institutions.It seems advisable for cancer centre RT programs to use similar E2E methodologies to monitor RT delivery in-house.

In-house E2E Testing in the Radiotherapy Clinic
The discussion above provides the foundation for the main argument of this review: that more regular in-house E2E QA is doable and provides an important component for every comprehensive cancer treatment facility's QM program.But adopting E2E testing into an RT QM program is challenging, especially in the Setting iii) type testing, a setting which the author advocates needs wider application in local clinics.E2E testing is resource (material, and human) intensive and must be implemented prudently so as not to interfere with a clinic's main purpose, patient treatment.Therefore, establishing E2E testing requires forward planning to allocate the time and resources that best advance the clinic's QM needs.Some personal insights on how one might best fit E2E into a clinical QM program are presented in the next sections.

Implementing in-house E2E testing into a clinical QM program
It might be practical, in the spirit of TG100 [76], to initiate an E2E program by first applying a test to a simple commissioning or delivery protocol, and to then expanding the program more widely through the institution's QM.One entry point for clinics, especially those new to E2E testing, is suggested by a universal observation from the external audits in the IROC-H, the IAEA, the ADLS and UK experience: many phantom irradiation tests fail because of issues at the physics level of the IGART process (issues with incorrect beam modelling, MLC parameter entry, TPS algorithms, etc.).This has motivated calls for E2E testing to complete TPS commissioning [77,145,151].The E2E test would involve generating a planned delivery, for example, for some more complex dynamic IMRT technique that challenges the algorithms and radiation delivery and irradiating the phantom under that plan.This is a form the Setting i) E2E QA described earlier.Such testing might be limited to the cancer centre's physics team, as most of the set-up tasks would be simple, although guidance from the radiation therapists might be needed at some stage if the IMRT application involved shifts of the phantom or An alternate initial E2E test, or a next step in further implementation of an E2E program, would be in a Setting ii) application, for example, during the roll-out of a new RT technique within the centre.In this case the team involved with the testing would have to expand beyond just physicists (adding radiation therapists and radiation oncologists) to ensure that design, implementation, and evaluation of the test is appropriate to the RT technique.This would involve everyone on the test team understanding the policies and procedures required to deliver the technique so that the E2E testing is running exactly as the technique's implementation team has laid out.The people performing the test would likely still be a subset of the full radiation clinic staff, and the E2E test might not be as broad as in a full Setting iii) test (perhaps leaving out the planning stages by using a standard special test plan to a standard CT dataset for the phantom).Adopting Setting ii) type tests in the clinic would have the benefit of ensuring that a new technique has been well commissioned before irradiating the first patient, and it would help the staff performing the tests to gain a full understanding of the treatment process with all its steps and interactions.It would also expand the clinic staff involved with the E2E testing program beyond the medical physics department.
The approach to adopting E2E testing for the clinic in the two settings above is guided in a large part by a specific development in the clinic (a new TPS, a different treatment unit, or novel IGART technique).Adding type Setting iii) E2E testing is more complicated since in any given centre E2E testing could likely be applied to multiple RT deliveries.One could consider piloting Setting iii) E2E by adopting the testing for a simple IMRT process (starting simple, as TG-100 [76] suggests).On the other hand, one might direct the initial E2E adoption by using information from the centre's incident reporting program or using risk-based FMEA/FTA methods to determine which of the IGART techniques practiced in the clinic would benefit most from the increased scrutiny the E2E methodology provides.Such analysis will be cancer centre specific and there are no reports in the literature the author is aware of that provide guidance (although the TG-100 practical guides appendices provide a possible starting point [76]).Some guiding principles from computing science would recommend selecting RT deliveries whose failure would cause most harm and to avoid exception testing (testing of seldom encountered scenarios) by focussing on deliveries regularly used in the clinic.
Initial implementation of Setting iii) E2E QA may also benefit from an FMEA to assess specific technical requirements for the E2E test on the specific RT delivery to be useful.Such an analysis might show that a certain E2E test is not yet feasible in some clinic given the tools and resources available in the clinic at that time, while another test could be more readily started.This can be illustrated using two examples: i) E2E testing for treatment of multiple metastatic brain lesions in a Fractionated Stereotactic Radiation Therapy (FSRT) program, and ii) E2E testing of an adaptive RT treatment protocol involving target motion guidance or the correction for tissue deformation over a multi-fraction course of treatment.
The FMEA for the FSRT treatment for multiple brain metastases might establish that the main motivation for the E2E testing is to validate the spatial integrity of the delivery (confirming that the high dose is being delivered only to the multiple small targets that need treatment) [78].Analysis may determine that there is only a small risk that incorrect doses would be given to a target since there was rigorous and complete commissioning of the TPS calculation for the small radiation fields that will be used and because the brain is a homogeneous medium well handled by the TPS.The risk analysis may also determine that the regular physics and radiation therapist QA performed on a particular treatment unit, along with patient specific portal imaging fluence based verification prior to treatment, are sufficient to ensure that the radiation delivery will be as predicted by the TPS.Therefore, the main risk established from the FMEA might be a targeting miss due to errors in image registration and set-up correction prior to irradiation, a step that involves human intervention.Such analysis might conclude that an effective FSRT E2E QA could have relaxed requirements for dose determination but increased requirements for high resolution spatial localization of the delivered dose.These objectives would release some constraints on the tools required for FSRT E2E testing, particularly relaxing calibration concerns in determining the delivered given dose when using film or 3D gel or radiochromic dosimeters for the spatial assessment.Efficient E2E testing could be performed using a phantom containing radiochromic film with transmission scanner readout, or a 3D NIPAM polymer gel dosimeter with CBCT readout [112,164,165], soon after the test irradiation and before all radiationinduced reactions have finished; or with polymer gels read out in an optical CT system where scatter perturbation might affect given dose measurement but not localization [126,166,167]; or using MR measurements based in relaxation time weighted imaging sequences [168][169][170] rather than more lengthy quantitative relaxation rate determining sequences [171].Alternatively, the QA could incorporate a vendor supplied 3D gel dosimeter which is prepared remotely and sent to the clinic through regular shipping and delivery (that is, in an uncontrolled environment) with in-house or remote readout.The FMEA would also inform the design of the phantom required for the E2E QA.Since the risk mitigation motivating the FSRT E2E QA is associated with geographic misalignment of the patient during treatment it might be beneficial to use an opaque anthropomorphic head phantom that can be accommodated in the patient immobilization system.The phantom would have to appear patient-like on CT imaging to mimic the registration task during pre-irradiation setup to fully align the E2E test with the FSRT patient treatment.In the end, the FMEA analysis of the requirements for valid FSRT E2E QA may present options that make implementing the QA more tenable and hence a good starting point for a Setting iii) application.
FMEA of radiation deliveries under motion guidance, or of future adaptive processes with multi-fraction deliveries with dose correction schemas to correct for target deformation, would likely flag additional requirements for the E2E testing since the validation would require improved dosimetry with more exact calibration to verify that the delivered given dose is measured accurately.This is in part because these specific adaptive applications incorporate complex IGART processes with segments of the process that may still be under some development.For example, a particular test may be to validate adaptations incorporating on-line dose recalculation using weaker characterised online CBCT image sets than simulation CT scans [172], or using dose calculations with novel deformation algorithms that have yet to be fully characterized against measurement.The FMEA for IGART techniques involving motion management or tumour tracking adaptation will likely lead to recommendations that the E2E testing be designed specifically to detect any failure of the motion management strategies [173].In all these cases, the dosimeters in the test phantoms will need to be able to measure the delivered given dose and the spatial dose distribution in the irradiated volume precisely and accurately which may require multiple detectors (as used in many of the external audit E2E QA phantoms).
Risk-based analysis could also be used to determine the frequency for any Setting iii) E2E testing within the cancer centre's QM program.The test frequency would depend on the technical aspects of the RT delivery, on the complexity of the interchanges within the delivery process, and perhaps on staff dependent human factors; and it would not necessarily be the same for all RT deliveries in the E2E program.It may be that a particular well-established IMRT treatment need only be tested annually to provide a regular assessment of performance.The E2E testing for the FSTR delivery (described above) may be scheduled as a regular quarterly test on each FSRT treatment unit because it is more easily performed, it confirms that dose delivery is maintained for a technique demanding accurate targeting, and it maintains a regular application of E2E methodology in the clinic QM culture.In another possible situation, analysis of incident reports for a RT technique involving complicated set-up verification with patient shifts during the treatment may show that incidents reports are submitted regularly after new staff rotate onto the treatment unit delivering that technique.The RT program may decide that it would benefit by performing more regular E2E QA for this technique a week or so after staff rotation.Such decisions can be formed by the risk-based approaches now being recommended for QM design.With time E2E QA frequency might also be determined through observed pass/failure data with added scrutiny given to the RT techniques showing higher failure rates.The scheduling will need to balance the benefit of testing with the demands on clinic resources but as a clinic establishes its E2E QA program the testing should become more efficient, and more testing could become manageable.
An E2E program will require additional software and equipment infrastructure to proceed effectively.As indicated by experience, many different phantom configurations can be used for effective E2E testing [153,154,174].Regular shaped physics phantoms [88,[108][109][110]130,[175][176][177] with well-defined geometry and appropriate locations for dosimeters are sufficient for many of the Setting i) and Setting ii) testing required to validate, for example, that the TPS has been commissioned correctly and that the TPS dose calculation algorithms model well the conditions of the RT delivery (e.g., accounting for highly modulated deliveries under dynamic MLC and gantry motion).Similar simple phantoms would work well also for some Setting ii) E2E testing of new delivery techniques being introduced into the clinic.There may be additional requirements to commission some treatments in a Setting ii) application or as a clinic introduces Setting iii) testing.If the delivery under evaluation involves motion management, then the phantom will have to be able to be integrated into some commercial or in-house built motion testing system [129,130,178,179].For some RT deliveries the E2E QA phantoms may need to be more anthropomorphic in external appearance (e.g., if the delivery requires phantom set-up in some patient immobilization system with set-up validation and repositioning) or to simulate internal structure heterogeneity with organ mimicking structures (e.g., if the E2E testing is being applied to RT delivery to specific anatomy and the assessment includes target conformity) [89,102,104,106,111,180,181].The material for the phantom need not be exactly tissue or water equivalent although it should be similar to any organs of interest and should be well characterised by CT, and perhaps MR imaging, before use [174].The phantom must be able to accommodate the required dosimetry systems (3D dosimeters, radiochromic film, detector arrays and point detectors) to enable measurement of delivered doses.Experience with E2E testing in Kingston and published reports from external audit agencies indicate that the use of both point and 2D or 3D detectors improves the dose measurements required to assess delivery [102,127].The ability to establish delivered given dose with well positioned point detectors can relax the dosimetry requirements on the spatial detectors (2D radiochromic film or gel or plastic 3D dosimeters) which typically provide accurate relative dose data, but which may experience preparation history dependent perturbation of dose sensitivity that limit precision when measuring given dose.Finally, the E2E testing system needs specialized software that can import planning image sets, the TPS predicted dose distributions, and measured spatial dose distributions for the analysis [113,118,137,182].For some analysis it can be helpful for the E2E software to be able to import the set-up imaging data from the treatment unit.The software must be able to register these image data with dose data sets easily; this can be facilitated by incorporating fiducial markers into the phantom during preparation [96,102,113].If the delivered dose distribution measurements are being determined in-house, the software must be able to convert data from the dosimeter readout system (e.g., optical CT, X-ray CBCT or MR data) into dose data via some calibration data.And the software must be able to calculate, and display, planned and delivered dose distribution comparison metrics such as gamma maps and conformity indices.These requirements have been reported in multiple IC3Ddose proceedings in this journal over the years.Such E2E capable analysis software is available commercially, including from most vendors listed in table 1.

In-house implementation of Setting iii) E2E testing
Since Setting iii) E2E QA requires that a phantom be processed through all steps and manipulations that would be used on a patient undergoing the RT treatment, the testing should not be limited solely to medical physicists, but rather it should involve radiation therapists and radiation oncologists as they would be during patient treatment (see figure 5) [21,183].The various staff would contribute to the running of the E2E test as it proceeds through the delivery's stages: i) The phantom would be prepared by a member of the medical physics department.This would involve selecting the appropriate phantom from the clinic's inventory and inserting the appropriate dose measurement systems (either prepared in-house or purchased from a vendor) into the phantom.At the end of this stage the phantom should be able to act as a patient to undergo the next stages of the test.
ii) The phantom would be delivered to the planning team and imaged according to departmental treatment simulation policy, and the images would then be contoured by the planning CT/MR therapists and, ideally, a radiation oncologist.The contoured images would then be sent for treatment planning.
iii) Treatment planning would be completed by dosimetrists/planning therapists using the planning protocol for the RT technique under test.Once approved, the treatment plan and phantom images would be sent to the treatment unit as in clinical practice.The TPS dose distributions would also be sent to the physics team so that the planned dose distribution data could be prepared in the E2E software system for subsequent comparison with measured dose data at the end of the irradiation.
iv) The RT delivery would be executed as closely to treatment protocol as possible at the treatment unit by the treating radiation therapists.(For an E2E test of FSRT delivery the phantom setup would include CBCT imaging at the unit with registration, comparison with planning images, and shifts initiated as required by the RT techniques positioning policy as shown in figure 5).Once the phantom is deemed to be in the correct position the phantom would be irradiated.
v) The phantom would be returned to the medical physics team who would remove the test dosimeters and read out the dose data (although the centre may have established E2E testing protocols to image dose directly on the treatment unit using on-board CBCT or MR imaging, in this case the dose images would need to be transferred to the E2E data analysis software).Alternatively, if using commercial services, the irradiated dosimeters might be shipped back to the vendor for measurement with the dose data subsequently sent back to the centre for analysis.The physics team would compare the measured dose data to the planned dose delivery and evaluate if the E2E test has been passed or not.
vi) Once the E2E evaluation of the delivery is complete, the results would be reviewed, initially by the medical physicists to confirm the integrity of the analysis, and then by all members of the E2E test team.Regular test review sessions would be a requisite feature of a well-established E2E program.And the E2E program would send reports summarizing test results periodically to the cancer centre's QM program for its review.
Adopting a framework for Setting iii) E2E testing along the lines outlined above would ensure that the full radiation program staff would be involved in the testing and be aware of the test outcomes.It would lead to confidence within the RT program that it provides quality radiation delivery, thereby benefiting the QM culture within the cancer centre.[102].The E2E test is performed by the whole treatment team (medical physicists, radiation therapists and radiation oncologists) with staff responsible for specific steps in the patient treatment process performing the same tasks during the E2E validation.The QA is designed to replicate the treatment protocol as faithfully as possible given the tools available (e.g., in this figure the phantom cannot replicate patient motion after set-up so that variable cannot be tested).The final assessment should reviewed by the whole team so that process validation is comprehensive, provides opportunity for process review, and offers learning opportunities to staff.

Benefits and Caveats
The benefits of end-to-end QA have been described throughout this review and they need not all be repeated here.One point that can be reiterated is that performing an E2E QA phantom irradiation does require that all the personnel involved in the test review and become familiar with delivery protocol and have the necessary understanding to complete the process as intended.Therefore, E2E QA not only tests the technical components of the RT delivery, but it also provides an assessment of the staff preparation.And the E2E testing can help identify the segments of the process that need greater vigilance during performance [143].It has been suggested that such identification of failure points can help inform risk-based FTA/FMEA review [130].
There are limitations to E2E methodology that should be flagged as implementation is being brought into the local clinic.E2E QA cannot and should not replace all the conventional QA undertaken in the RT program.When an E2E test does fail, the cause is usually not immediately obvious.Troubleshooting and tracking where the failure originated requires that investigators understand how individual segments are functioning, and this is informed in part from the unit testing QA history.Furthermore, adding a new level of resource intensive testing to the RT QM program may be unsustainable.Therefore, cancer centres will have to also undertake FMEA review of their QA programs.There has been debate for some time [23,184,185] that not all conventional QA performed by medical physicists adds value to the centre's QM program.This does need attention, and some of the procedures long held sacrosanct may need to be reassessed (for example, spending weekends on annual water tank measurements of multiple linacs that have shown insignificant beam data variation over the years) to enable adoption of QA that better influences quality control.
Establishing an in-house E2E testing program with wider involvement of clinic staff in the testing will require a change in clinic QM culture.There may be initial challenges to get individuals on board who have had limited experience performing QA.Therefore, there will have to be a There is a caveat associated with moving E2E QA from external audits by an independent body into a local setting.One of the main advantages of external audits is that the testing performance and analysis is directed by a single body that provides a homogeneous testing framework which has been a major benefit of external audit QA to the radiation therapy community.Moving E2E QA in-house removes this benefit.Therefore, such a move will have to be supported by the various bodies in the United Kingdom, at the IAEA, at IROC-H and the ACDS, and within the various professional bodies that help guide practice management to standardize how in-house E2E QA should proceed.Technical quality control documents and medical physics practice guidelines will need to be developed to standardize approaches.

Future directions
The observations from the RT literature and recommendations from medical physics practice guidelines motivate increased E2E testing adoption in RT clinics in-house.This indicates that the community should continue to develop improved tools for E2E testing.There is room to improve dose measurement, for example by reducing the variation of dosimeter response due to post-preparation history, or by developing novel point dose dosimeters that do not perturb measurement when inserted into 3D gel dosimeters.And there are still opportunities to advance phantom systems for the RT environment.For example, developing a fleet of phantoms able to accommodate interchangeable dosimeter inserts between phantoms would improve phantom assembly and disassembly and ease of use.Other work could be devoted to developing inserts to provide the image contrast to enable structure contouring during planning and to enable testing of on-board imaging on the treatment unit.Research to better integrate TPS planning systems and E2E analysis software might enable more efficient planned versus delivered dose comparisons and improve the E2E software environment.This is work well suited to the IC3Ddose community.And it will require consultation and collaboration with commercial vendors to develop phantoms and dosimeter systems that are more exchangeable and transferable between suppliers and within clinics.
The literature reviewed in this article also suggests that there are two additional applications for E2E testing methodologies in the clinic.As has been noted, the success of a phantom irradiation test during an E2E audit has not always been well predicted by the conventional QA or patient specific IMRT QA performed in-house.Perhaps E2E testing should become an integral part of the commissioning of internal QA techniques, in particular patient specific IMRT validation that uses surrogate tests (such as fluence measurements on detector arrays) to validate dose delivery.This would be a type of Setting ii) E2E testing.The second application would be in a role using full Setting iii) approaches.Experience has shown that E2E testing requires the individuals performing the test to fully understand all the components and steps that define the given RT delivery.And performing the E2E QA on a phantom involves going through all the steps of the process that would be used on a patient; in effect it serves as a simulation run of the patient delivery.E2E methodologies therefore provide a learning tool to the clinic staff.It may be beneficial to a cancer centre's QM program to consider how E2E methodologies could be used as simulation training in an educational setting.Simulation training has been shown to add benefit in RT because it provides hands-on application of learned concepts in real clinical scenarios involving team members from different professions working together [186,187].Care would have to be given to ensure testing and education objectives are clearly defined and kept separate.But as E2E testing becomes more tenable, an expanded role for staff education seems well indicated.

Conclusion
The quality management considered appropriate for good radiation therapy delivery has changed considerably over the last three decades as radiotherapy processes have become more complex.Guidelines for quality assurance testing have moved beyond protocols for unit testing of equipment to recommendations for more comprehensive testing of interacting links in each clinic's delivery system, with risk-based analysis of the RT environment to establish points in the process that may require particular attention for testing or improvement.Incident reporting and human factor studies have shown that health care personnel can influence the quality of RT delivery, and protocol review and revision strategies to make processes more robust and less perturbed by persons have become commonplace.But such strategies cannot eliminate all the variables that might cause an RT protocol to deviate from plan.Therefore, testing strategies that evaluate the complete process are required.Endto-end testing provides an appropriate methodology to achieve this goal.E2E testing has been shown to detect deficiencies in the delivery process not detected by the conventional QA performed by medical physicists and radiation therapists in the clinic, thereby improving delivery accuracy.And E2E QA can give the RT team confidence that the entire RT treatment delivery process being evaluated can deliver the intended dose to the intended location.If there are faults detected during testing on phantoms, these can be corrected and re-verified before treating patients.And as the E2E testing is performed it provides opportunities for learning by increasing team members' awareness of the details and requirements at various stages of the protocols they are using.The tools enabling in-house E2E testing have become available in the last years.More regular E2E testing would provide an important tool for any cancer clinic's RT QM so that it can confidently provide its patients quality radiation delivery.

Acknowledgments
The ideas presented in this address have been formed through research funded through Canadian CIHR project MOP-115101, CHRP projects CIHR CPG 151964 and NSERC CHRP 508528-17, and through years of interaction with many colleagues in the IC3Ddose community.I would like to thank those who provided me with examples of IGART protocols from their clinic for my presentation on this topic and colleagues from IROC-H Houston, especially Andrea Molineu, who shared ideas as I was forming my thoughts for this review especially in the discussions in sections 3 and 5.

References
A version of this reference list with titles is available from the author on request.

Figure 1 .
Figure 1.a) A sketch of the process steps for the simplest and most common implementation of image guided radiation therapy: patient set-up correction by on-line image registration to planning images and treatment plan followed by radiation delivery.Even this simple example involves multiple steps with data transfer, status assessment and decision making by various members of the treatment team.b) Multiple treatment protocols, task group reports, and technical quality control documents (see text)

Figure 2 .
Figure 2. A representation of the quality assurance testing environment associated with radiation therapy.The left most pyramid gives a build-up of the QA program required for safe patient treatment.The tests progress from unit tests of individual components at level 1 to higher level tests that validate the operation as units work together with data exchange between those units, to eventual quality management of the whole process.The different levels build on the conceptual pyramid proposed by De Wagter[59].The details of the tests at each level might be debated and one could move elements between the levels depending on definitions set (for example the Australian Clinical Dosimetry Service (ACDS) describes the hierarchy using 3 levels[60]); the figure is intended to illustrate that RT is a complicated process with many elements that need to be tested individually and also as they work together.End-to-end testing for some treatment schema acts at the fourth level (ACDS level 3).Some sample tools and reference guidelines proposed by various bodies in the radiation oncology and medical physics community are listed to illustrate the complexity of the QA testing needed to ensure quality radiation delivery [see text for references].

Figure 4 .
Figure 4.A common illustration (rows P&P and Δ) of a QA test in which a measurement result mustwithin an acceptable range defined by some tolerance specified by investigation and action levels used to trigger a specific action after testing[17,157].A QA policy and procedure will use the investigation and action levels to mark when the system might only require examination after the test (for example, a repeat measurement later in the day in daily linac output measurement) or perhaps generate a more critical action (taking treatment unit out of service until recalibrated).QA protocols accommodate measurement uncertainty (∆) by setting these levels appropriately[17,157].Row HF is intended to show that human factors such as individual observer risk tolerance may blur the decision to undertake a specific action after some observation.This may lead to variation of practice even when a process protocol has been well defined and laid out.For example, not all staff might reposition a patient whose set-up images do not align with the planning image within limits of the treatment protocol even though they are working to the same treatment policy.

Figure 5 .
Figure 5. Clinical implementation of Setting iii) E2E QA validation of an FSRT treatment protocol[102].The E2E test is performed by the whole treatment team (medical physicists, radiation therapists and radiation oncologists) with staff responsible for specific steps in the patient treatment process performing the same tasks during the E2E validation.The QA is designed to replicate the treatment protocol as faithfully as possible given the tools available (e.g., in this figure the phantom cannot replicate patient motion after set-up so that variable cannot be tested).The final assessment should reviewed by the whole team so that process validation is comprehensive, provides opportunity for process review, and offers learning opportunities to staff.

Table 1 .
A list of commercial (and open source) service providers offering tools enabling in-house E2E QA including phantoms, 2D and 3D dosimeters, optical-computed tomography (opt-CT) scanners for volumetric dose readout, software tools for dose calibration and mapping (including modules for registration and comparison of measured and planned dose distribution data).The table includes the MDADL laboratory which provides a commercial IROC-H phantom service for select IMRT and dynamic RT technique E2E clinical testing outside of trial credentialling.
12th International Conference on 3D and Advanced Dosimetry Journal of Physics: Conference Series 2630 (2023) 012007 motion.The E2E testing could be performed on simple phantom sets that are already in the centre's inventory, although one requirement for such an E2E test might be that the phantom contain multiple dosimetry systems that could measure both the delivered given dose and the spatial dose distribution (depending on the IMRT technique).A similar Setting i) E2E testing scenario might occur with the commissioning of some established RT treatment on a treatment unit new to the clinic or on which that technique has not yet been used.Implementing such Setting i) E2E QA would add an important component to the cancer centre's QM program since it would close the path for full physics commissioning of the involved hardware, software, or equipment.And if the centre is new to E2E methodologies it would enable staff to bring on the skill sets required to implement E2E method through a well-controlled scenario.
12th International Conference on 3D and Advanced Dosimetry Journal of Physics: Conference Series 2630 (2023) 012007 considerable effort on education and change management.Ensuring that the outcomes of testing are regularly reviewed in an open QM forum will help this transition.