A machine learning approach for correcting glow curve anomalies in CaSO4:Dy-based TLD dosimeters used in personnel monitoring

The study presents a novel approach to analysing the thermoluminescence (TL) glow curves (GCs) of CaSO4:Dy-based personnel monitoring dosimeters using machine learning (ML). This study demonstrates the qualitative and quantitative impact of different types of anomalies on the TL signal and trains ML algorithms to estimate correction factors (CFs) to account for these anomalies. The results show a good degree of agreement between the predicted and actual CFs, with a coefficient of determination greater than 0.95, a root mean square error less than 0.025, and a mean absolute error less than 0.015. The use of ML algorithms leads to a significant two-fold reduction in the coefficient of variation of TL counts from anomalous GCs. This study proposes a promising approach to address anomalies caused by dosimeter, reader, and handling-related factors. Furthermore, it accounts for non-radiation-induced TL at low dose levels towards improving the dosimetric accuracy in personnel monitoring.


Introduction
Thermoluminescent dosimeters (TLDs), optically stimulated luminescence dosimeters, and radiophotoluminescent glass are widely used for the measurement of doses from ionising radiation. The advantage of these dosimeters lies in their ability to emit a luminescence signal proportional to the dose received, making them ideal for monitoring the radiation exposure of occupational workers. Out of the aforementioned luminescence dosimeters, thermoluminescence (TL)-based dosimeters are one of the most commonly used dosimeters in personnel monitoring of radiation workers. One of the key features of any TL dosimeter is its glow curve (GC), which is used to estimate the dose. The area under the curve or peak height of the GC is interpreted as a measure of the dose. The accuracy of the estimated dose is directly dependent on the correctness of the GC. To ensure the accuracy of GC in routine personnel monitoring, the TLDs are read using TLD readers, which have highly reproducible temperature profiles and stable response. Despite this, several factors can affect the profile of the TL intensity emitted by the dosimeters, leading to anomalies in the GC.
The sources of anomalies may be related to the dosimeter, reader system, handling of the TLD during field use and readout, and so on [1][2][3]. At lower dose levels, deviations in the shape of the GC are commonly attributable to non-radiation-induced TL signals (NRI-TL) originating from black body radiation from the TL element, the heated components of the TLD reader and dark current from the photomultiplier tube [4].
Other factors, such as dirt, dust, oil, corrosion of the TLD card and aberrations such as scratches or stress on the TL element may also affect the shape of GC [2]. The occurrence of NRI-TL and its proportion, along with scattering data, are stochastic in nature and mostly difficult to control in routine personnel monitoring, where the majority of doses are of low level. Hence, it is important to perform an assessment of the GCs before dose computation in personnel monitoring.
A machine learning (ML)-based algorithm has been developed [3] for probabilistic GC analysis of the CaSO 4 :Dy based TLD badge used for countrywide individual monitoring in India. When the GC of a TL element dosimeter is determined to be anomalous with significant distortion in its shape, then its TL counts cannot be used for dose calculation. In such scenarios, the TL counts from the remaining TL elements, having normal GCs, can be used for the estimation of the dose. However, such an estimation for the TLD badge requires certain assumptions about the type and energy of radiation, compromising the accuracy of dose evaluation [5,6]. Further, in the case of TLD reader malfunctions, such as a variation in the heating profile or an increase in the reader background signal, all of the TL elements of the TLD card may be affected and the above approach cannot be used for the estimation of dose. Therefore, it is important to estimate the proper TL count from the anomalous GC so that occupational dose can be calculated. Some distortions in the shape of the GC, such as those caused by the variations in the heating profile or NRI-TL signal, follow a predictable pattern [7]. These patterns can be utilized to determine corrected TL counts from anomalous GCs. With this objective, in the present study, we attempted to develop a method to predict TL counts from anomalous GCs as if their shape was normal.
Recently, researchers have explored the feasibility of using ML algorithms for identifying anomalous GCs, to study the characteristics of TL emission and for the estimation of elapsed time after exposure [3,[8][9][10][11][12][13][14][15]. As mentioned earlier, we demonstrated the effectiveness of ML algorithms in identifying abnormal GCs and classifying them based on the associated abnormalities [3]. In the present study, we investigate the TL intensity pattern to estimate corrected TL counts from anomalous GCs with the utilisation of ML algorithms. Therefore, regression models that are capable of estimating correction factors (CFs) based on the shape of the GCs were developed, and subsequently, their performance was evaluated. The results demonstrate impressive accuracy in the estimation of CFs and thereby in the dose estimated, and that this can be achieved from GCs that are thought to be lost due to anomalies.

CaSO 4 :Dy-based TDL personnel monitoring system
The TLD Badge comprises a TLD card and Cassette [6,16,17]. The TLD card comprises three TL elements (discs) clipped to a nickel plated aluminium TL element holder. The TL-element is a pellet consisting of CaSO 4 :Dy TL phosphor mixed with polytetrafluoroethylene in a 1:3 weight proportion. The pellet has a diameter of 13.3 mm and a thickness of 0.8 mm. The TL phosphor CaSO 4 :Dy exhibits an energy-dependent response [6,18]; hence, the TLD card is loaded into the cassette with filters to compensate for the energy dependence, and the ratios of TL counts from different elements are utilized for the gross estimation of the energy/type of radiation. The first disc, commonly referred to as D1, is provided with a metal filter consisting of a 1 mm thick Cu and a 0.6 mm thick Al filter. The second disc, D2, is sandwiched between polystyrene filters of thickness 1.6 mm (180 mg cm −2 ). The third disc, D3, does not have any filters. The TLD card is sealed into a polythene pouch with a wrapper with information about the user of the TLD badge printed on it. The pouch and wrapper together have a density thickness of ∼13-14 mg cm −2 . A schematic diagram of the TLD cassette and filters is shown in figures 1(a) and (b) respectively. Note that the polythene pouch, wrapper and clip for wearing the TLD badge are not shown in figure 1(a).
The TLD cards are read using a semi automatic TLD badge reader that employs a hot N 2 gas-based heating system with a clamping temperature of 285 • C for 30 s [19]. A flow rate of 5 l min −1 and pressure of 2 kg cm −2 are maintained during the readout to control the amount of heat delivered to the TL element. The intensity of the TL emitted during the heating cycle is recorded every second and stored as TL counts.
Several quality control checks are implemented in the TLD badge-based monitoring programme to ensure the quality of the dosimeters. These include dosimetric testing of CaSO 4 :Dy phosphor batches, physical examination of new discs for coloration, inclusion of foreign particles, voids, and physical and dosimetric individual testing of TLD cards before introduction into service. During service before selecting for reuse, each dosimeter is physically examined for the quality of the discs as well as the nickel plating on the aluminium cards. In addition, the background of freshly annealed dosimeters is checked on a sample basis. It is verified that 3.29 times the standard deviation of annealed background counts from a random sample does not exceed 100 counts (0.1 mSv equivalent). To avoid the possibility of a higher residual signal due to excessive exposure, cards indicating counts 50 000 µSv equivalent in the previous cycle are segregated. In spite of the above measures, during field use, reuse and processing, TLD cards undergo various forms of handling and treatments, which, when combined with the stochastic nature of the NRI-TL, may lead to variability in the shape of the GC and thereby in the dose estimates. These effects are more dominant at lower dose levels.

Characteristics of GC and dataset preparation
A normal GC has a peak position of between 9 and 11 s and a tail-to-peak height ratio of between 0.25 and 0.30. At low doses (less than about 0.5 mSv), the tail-to-peak ratio may exceed 0.3, but generally remains less than 0.5 [5]. When the TL-element is overheated due to higher clamping temperature or a higher flow rate of  N 2 gas or less disc thickness, the GC peak arrives early and the tail-to-peak height ratio decreases. On the other hand, if the heat delivered to the TL element is insufficient, which may be due to the lower clamping temperature, lesser purity of the N 2 -gas, or a lower flow rate of the N 2 -gas, then the GC peak arrives late and the tail-to-peak height ratio also increases, as shown in figure 2. In addition, it is important to note that the presence of an excessive NRI-TL signal has either no or minimal impact on the peak position of the GC, but it leads to an increase in the tail-to-peak height ratio due to its sigmoidal shape. The elevated contribution of NRI-TL may be caused by factors such as residual TL, black body radiation, TLD card corrosion, contamination of the disc with oil, dust, dirt, etc, and possible unknown effects [2]. GCs with such anomalies are referred to as spurious GCs. Figure 2 illustrates the various types of experimental GCs observed during routine monitoring.
Furthermore, GCs with anomalies like early or delayed peaks are generated experimentally by recording GCs at varying flow rates of nitrogen gas. About 1000 experimental GCs comprising ∼200 Normal GCs, ∼150 early peak GCs, ∼350 delayed peak GCs and ∼300 GCs at lower doses were collected. Since the robustness of ML algorithms depends on the volume of the dataset, an additional 2000 GCs were simulated. The dataset of the simulated GCs consists of about 1000 GCs with varying heating profiles and 1000 GCs with a varying NRI-TL component. Therefore, a total of about 3000 GCs were used for training and validation of the ML models.

Simulation of GC of CaSO 4 :Dy for clamped heating
For the simulation of GCs, the heat transferred between the hot N 2 gas and the TL element during the heating process was first simulated. The hot nitrogen gas strikes the surface of the TL element and transfers heat to the TL element through forced convection. During the heating process, the TL element loses some heat to its surroundings due to the finite temperature difference. The heat exchange between hot gas, TL element and the surroundings is given by the following rate equations [20], where dT d /dt is the rate of change of the disc temperature, dH d /dt is the rate of heat transferred from the hot gas to disc (J s −1 ), dH s /dt is the rate of heat transferred from the heated disc to the surroundings (J s −1 ), and C d is the heat capacity of the disc (J • K −1 ), where T g (t) is the temperature of the gas at time t ( • K), P d is the thermal conductance of the heated gas and disc interface (J • K −1 s −1 ), P s is thermal conductance of the disc and surrounding interface ( is the temperature of the TL element ( • K) and T g (t) is the temperature of the gas as a function of time.
Using the above equations (1)-(3), the time-temperature profile of the disc can be given by the following equation: , α represents the heat gained from the hot gas and α ′ represents the heat loss to the surroundings. Further, equation (4) is modified to account for radiative heat loss and gain via black body radiation from the TL element to the surroundings and from the surroundings to the TL element respectively [21], where T sr is the temperature of the surroundings ( • K), λ 1 and λ 2 are constants having dimensions ( • K −3 s −1 ). The rate equation given by equation (5) is solved numerically by the finite difference method to obtain the temperature profile of the TL element. The rate of change of charge carrier density in trap levels is given in equation (6) [22], where n is the number density of trapped carriers (m −3 ), b is the order of kinetics, E is the activation energy (J), f is the frequency factor/pre-exponential factor (s −1 ), k is the Boltzmann constant (J • K −1 ) and c is the TLD reader response factor (counts/TL intensity). A ten-trap model proposed by Souza et al [23] was used to simulate the GC of calcium sulphate doped with dysprosium (CaSO 4 :Dy). As per Souza's model, each trap follows a first-order kinetics and elementary peaks 4, 5 and 6 contribute mainly to the dosimetric peak of the composite GC. The simulation was carried out by solving equation (6) numerically using Euler's method for each pair of E and s. A convoluted/ composite GC is obtained by optimising the relative contribution of each GC peak to fit the experimental GC. Figure 3 shows the simulated GC and experimental GC. The agreement between the two curves was found to be satisfactory with a figure of merit less than 3%. As a result, the same set of simulation parameters was used for the simulations of the effects of changes in heating profile and NRI-TL on GCs. Note that in the dose range relevant to personnel monitoring, the shape of the GC has not been found to change with dose.

Simulation of the effect of NRI-TL on GC
It has been reported in the literature that the major component of the NRI-TL is temperature dependent and arises from the blackbody radiation contribution [4] which is simulated using equation (7), where I b (t) represents the TL counts pertaining to black body radiation and β is a proportionality constant. Therefore, the GCs with an NRI-TL signal were simulated by adding the TL signal and NRI-TL signal in random proportions. The effect of the NRI-TL signal on the GC is demonstrated in figure 4. The GCs in  show no or minimal shift in the peak, but a significant variation in the tail-to-peak height ratio. Note that figure 4 does not depict the impact of scattering data intentionally, as our aim is to exclusively demonstrate the effect of NRI-TL. Scattering data refer to the sharply discontinuous points that typically accompany GCs, particularly noticeable at low dose levels. However, during the process of generating the training dataset for the ML models using simulations, the scattering data is introduced to the GC by randomly selecting values from a uniform distribution ranging from −2 to 2. All negative TL counts obtained after summation were set to zero as a negative TL signal is not physically possible.

Simulation of the effect of variation in the heating profile on GCs
Even though the TLD readers used in routine personnel monitoring generate highly reproducible heating profiles, the probability of variation in heat delivery to the TL element due to variation in pressure of nitrogen gas, flow rate of nitrogen gas, current to heating coils, feedback from thermocouple or ageing of components may not be ruled out. In addition to experimentally collecting GCs that representing such malfunctions, the variations in the shape of GCs were simulated by altering the temperature profile of the TL element. For this, the value of the parameter α from equation (5) was randomly sampled from a normal distribution with a mean of 0.35 s −1 and a standard deviation of 0.05 s −1 . An ideal GC is characterised by a peak position at 10 s and a tail-to-peak height ratio of 0.25, which is simulated by a value of α equal to 0.35 s −1 . The majority of GCs observed during routine monitoring belong to the acceptable variation category, which can be simulated by taking the value of α within the range of 0.30-0.40 s −1 (i.e., within one standard deviation of the mean). However, values of α beyond this range simulate GCs that require attention, although such occurrences are infrequent. Figure 5 depicts the effect of changes in the heating profile on the shape of GCs. As mentioned in section 2.2 the shift in the position of the TL peak and variation in the tail-to-peak height ratio can be clearly observed in the simulated GCs as well.

ML algorithms
In the present work, three competing supervised learning algorithms-artificial neural network (ANN), random forest (RF), and support vector regression (SVR) [24][25][26], are studied for the estimation of correct TL counts from anomalous GCs. These algorithms are commonly used for solving classification and regression problems by training a model to map the input to the output based on known expected outputs. The ML model improves itself iteratively by comparing the estimated output to the desired one and accordingly adjusting the hyper-parameters of the model. The ANN is an ML algorithm that simulates the interconnections between neurons in the human brain. In the present study, the ANN was used to predict the CFs from the normalized TL counts recorded every second. In order to optimise the performance of the ANN, different activation functions, such as 'sigmoid' , 'rectified linear units' , and 'hyperbolic tan' were tried out, along with various combinations of nodes and hidden layers. During the training process, the mean squared error (MSE) was utilised as the error function, while the Adam (adaptive moment estimation) optimiser was employed to minimise the error.
The RF algorithm uses an ensemble-based learning approach: it constructs a large number of decision trees and uses the mean/mode of prediction of the individual trees as output. A large number of uncorrelated decision trees work as a group to perform the regression or classification task. The misinterpretation from one of the trees does not alter the result, as there is no relationship between the individual trees in an RF and the vote of the majority of trees only is considered in deciding the outcome.
The SVR algorithm uses ε, an insensitive loss function, to penalise data greater than ε distance away from the actual values [22]. SVR provides a non-linear mapping function to map response variables as a function of features provided in the training dataset. Given a dataset as (x i , y i ) where x i is the feature vector with dimensions m × 1, corresponding to the ith input datapoint with m number of features and y i is the actual output (response variable) of the ith datapoint, the nonlinear function between input and output is formulated as: where φ is a non-linear mapping function, ω is the weight vector having dimensions m × 1, and b is a bias.
The optimum values of ω and b are obtained by minimising the following optimisation problem: such that where P is a penalty parameter, ϑ i and ϑ ′ i are slack variables representing the upper and lower deviation respectively, and n is the number of data points. During the training of SVR, the values of P and ∈ are tuned to get the optimum accuracy from the SVR model.

Training and tuning of the ML algorithms
In the present study, the CFs are computed by taking the ratio of the TL counts from a GC exhibiting a normal shape to those of the GC with anomalies, hence it is a continuous variable. Its numerical value depends on the nature of the anomaly associated with the GC. Mapping the CFs to the pattern of TL intensity is a regression type problem where the CF is a response variable and TL intensity is a feature vector. Therefore, the input shape of the ML model is a 30 × 1 matrix of normalized TL intensities, and the output is a scalar value of the CF.
In order to fit the ML model for the prediction of CF, the dataset of GCs was divided into two parts, 70% and 30% for training and validation respectively. The tuning of the ANN involves the selection of an appropriate activation function, number of neurons, learning rate, error function, etc, to achieve the optimum accuracy. The tuning of the RF model involves optimisation of the number of features to be sampled to build each decision tree and the number of trees to be grown under a forest. Similarly, the penalty and insensitivity width were optimised to obtain an accurate SVR model. Therefore, for a systematic search of optimum hyperparameters, a mesh grid search approach was utilized. This technique involves the creation of a search mesh grid for the hyperparameters and then uses a systematic approach to search for the optimal combination of parameters that leads to the best performance of the ML model.
In order to analyze the performance of the ML models, we utilized three widely used metrics: RMSE (root mean square error), MAE (mean absolute error), and R 2 (coefficient of determination). RMSE provides a measure of the average difference between the predicted values and the actual values of the target variable. A lower RMSE indicates higher predictive accuracy, as it signifies smaller deviations between the predicted and actual values. RMSE is calculated as the square root of the average of the squared differences between the predicted values (ŷ) and the actual values (y) of the target variable. It is computed using the formula: where n denotes the number of data points. MAE measures the average magnitude of the errors between the predicted and actual values. It provides a more interpretable representation of the error magnitude compared to RMSE. Like RMSE, a lower MAE suggests better predictive accuracy, as it represents smaller absolute errors. MAE is determined by averaging the absolute differences between the predicted values (ŷ) and the actual values (y) of the target variable. The formula for MAE is: R 2 , which quantifies the proportion of the variance in the dependent variable (y) is mapped in a regression model. Ranging from 0 to 1, a higher R 2 value indicates a better fit of the model to the data. The R 2 is computed as follows: whereȳ represents the mean of the actual values. These metrics collectively provide insights into the performance of an ML model. RMSE and MAE indicate the magnitude of the errors made by the model, while R 2 assesses the goodness of fit.
After tuning, the ANN model with four hidden layers, containing 15, 10, 9, and 3 nodes respectively, and the sigmoid activation function, was found to have the best accuracy. The SVM model with a radial basis function as the kernel for data transformation, a gamma value of 0.01, cost parameter (P) of 1000, and ∈ value of 0.05, showed the highest accuracy. Similarly, the RF model with 100 decision trees and 6 randomly sampled features per tree was found to have the highest accuracy in predicting CFs. Further, the performance of these models was inter-compared for the selection of the best ML model.
These ML models were developed using the R programming language [27][28][29]. R packages, such as 'neuralnet' , 'randomForest' , and 'e1071' packages [30][31][32] from the Comprehensive R Archive Network were used for the development of the regression models using ANN, RF and SVM respectively.

Nature of the experimental and simulated variations in GCs
In order to simulate the experimentally observed variation in GC shapes, the values of parameter α were sampled from a normal distribution about the mean of 0.35 s −1 . A value of α less than 0.35 s -1 simulates under heating of the TL element and greater than 0.35 s −1 simulates overheating of the TL element. The nature of variation in experimental GCs and simulated GCs plotted in figure 6 shows a matching pattern, which validates the choice of distribution of α. The spread of simulated GCs is higher because of the selection of a large range of α to consider the extreme variation in the heating profile.

Evaluation of ML models
The optimised ML models were validated by the estimation of CFs for unseen data. The predicted and actual CFs were compared, and the performance of the ML model was evaluated using parameters such as the RMSE, MAE and R 2 values. The performance of all three optimised ML models in predicting the CFs using the validation dataset is shown in table 1. It can be seen that for all ML models, R 2 is greater than 0.95, indicating an excellent correlation between predicted and actual CFs. The values of RMSE and MAE are also significantly small indicating no/negligible bias. Further, the algorithms trained on the mixture of experimental and simulated datasets were tested on the experimental dataset.

Application of CF in the case of heating-related anomalies
To assess the efficacy of ML-based corrections in improving the dosimetric accuracy, the total TL counts of the experimental GCs generated by deliberately varying the heating profile were corrected by applying the CFs estimated by the ML algorithms. Note that the TLD cards used for testing were exposed to 3 mSv (H ref ) of 137 Cs of gamma radiation, read on the TLD reader with a calibration factor of 1 µSv/count. The 3 mSv dose level was selected to minimise the impact of NRI-TL on the shape of GC, as NRI-TL typically contributes between 0.03 and 0.1 mSv. By using the dose level of 3 mSv, a signal-to-noise ratio greater than 30 can be achieved. As a result, the ML model's ability to account for anomalies caused only by heating issues was demonstrated.
The experimental GCs are shown in figure 7, the GCs having a peak TL intensity of 200 counts and peak position at the left of 10 s were recorded at higher heating rates (higher N 2 gas flow rate), whilst the GCs with peak intensity less than 175 counts and peak position to the right of the 10 s were recorded at lower heating rates (lower N 2 gas flow rate). Out of 81 experimental GCs, 24 were recorded at higher gas flow rate, and   remaining 57 were recorded at lower gas flow rate. It is important to note that the ML models were trained using the extensive dataset of experimental as well as simulated GCs, encompassing a wide range of variations in heating profile. The observed metrics values in table 1 indicate the ML algorithm's proficiency in accurately characterising these variations in the shape of GCs. Nevertheless, for the purpose of highlighting the ML model's performance, only two extreme experimental cases were deliberately selected and analysed. Table 2 provides a comparison between the coefficient of variation (CoV) estimated dose (G) with respect to H ref from uncorrected and corrected TL counts. The majority of the experimental GCs were recorded at lower heating rates, resulting in a mean TL count of less than 3000. Consequently, this corresponds to an estimated dose of less than 3 mSv [5]. However, after applying CFs, a significant enhancement in TL counts can be observed. Consequently, the mean of the corrected TL count becomes almost equal to the expected value of 3000 counts, which represents the estimated dose of 3 mSv. Furthermore, figure 8, along with the mean and %CoV values of dose ratio (G/H ref ) presented in table 2, illustrate a notable improvement in the accuracy of G after the application of the CF.
Considering the performance of the RF model and the fact that the RF employs ensemble-based learning and is insensitive to outliers, the RF model is deemed suitable for estimating CFs. It is worth noting that the GCs were initially classified using ML algorithms, and only those with a normal class probability below 0.70  and not belonging to the annealed background (i.e., TL from the freshly annealed unexposed sample) or spikes category [3] were selected for correction. The rationale behind excluding GCs with spikes is that during the pre-processing of GCs, where we normalise the TL intensities to their maximum values, the presence of spikes leads to very small normalised TL intensities for the remaining GC. As a result, the distinctive features of the GC become masked, making it difficult to compute CFs effectively. However, by removing the spike from the GC and replacing it with a moving average, the GC can be considered for correction given that the GC is still anomalous.

Application of CFs at lower dose levels
As the dose level increases, the impact of NRI-TL on both the accuracy of the estimated dose and the shape of the GC diminishes. However, in routine monitoring, most doses fall within the lower range (i.e., 0.1-0.6 mSv). At these dose levels, the presence of scattering data [33,34] (i.e., random noise) and NRI-TL plays a crucial role in determining the accuracy of the estimated doses. In order to simulate such scenarios, the TL counts of the radiation-induced signal were randomly sampled from a uniform distribution between 150 µSv and 600 µSv, while the TL counts of NRI-TL were randomly sampled from a uniform distribution between 35 µSv and 80 µSv, which represents the experimentally observed range of annealed background TL signal. Thus, around 5000 GCs were simulated with the relative contribution of NRI-TL varying from 6% to 50%, and a separate set of ML models were developed. Some of the simulated GCs from the validation dataset are illustrated in figure 9. Note that a separate set of algorithms were developed with the aim of improving the accuracy of CF estimation. Generally, as the dose increases, the impact of scattering data and NRI-TL decreases, resulting in smoother GCs. However, GCs at lower doses display less smoothness. It was anticipated that training a single ML model would compromise accuracy in both higher and lower dose ranges. To address this concern, we made the informed decision to employ two separate ML algorithms. This approach allowed us to tailor their training to the respective dose ranges, thereby enhancing accuracy in both scenarios. The selection of the appropriate ML algorithm for a particular GC will be based on the TL counts obtained from that GC. Table 3 presents the evaluation of ML models based on parameters such as R 2 , RMSE, and MAE for the validation dataset. The values of these parameters suggest that the RF model outperforms the ANN and SVR models.
The accuracy of the ML models was evaluated by comparing the predicted CFs with the actual CFs, as shown in figure 10. The results indicate that the majority of the predicted CFs are within 5% of the actual CFs. In addition, figure 11 displays a boxplot of the normalised TL counts relative to dose at four different dose levels. The median value for each dose level is close to unity, indicating that the ML models did not introduce any bias and resulted in a significant reduction in the spread of TL counts. It is important to note  that the inherent sensitivity variations attributable to the manufacturing process can cause variations in TL counts without a significant change in the shape of GCs, and these GCs were not considered for correction. Therefore, GCs are first screened for the presence of any anomalies and then subjected to CFs.
In many personnel monitoring laboratories across India, the GCs are screened manually, which introduces subjectivity in determining their normal shape. The decision becomes particularly challenging when TL counts are low, which is a common occurrence. Additionally, anomalous GCs are typically discarded, resulting in a loss of valuable dose information. However, our previous and current studies offer a promising approach to address both of these challenges. By implementing objective GC screening methods and correcting anomalous GCs based on detected anomalies, we can effectively avoid the loss of dose information while minimising subjectivity. The proposed approach significantly improves dosimetric accuracy in CaSO 4 :Dy-based TLD personnel monitoring. In addition, to integrate these predictive models into regular personnel monitoring practices, future efforts will focus on developing a graphical user interface that leverages these models. The selection of the ML model for CF estimation depends on the TL counts for a TL element. For TL counts below 700 (representing 0.7 mSv), the model designed for low doses is used, while for TL counts above 700, the model for higher doses is employed. Moreover, we plan to enhance the reliability of these models by incorporating more experimental data.

Conclusion
In routine personnel monitoring, thousands of cards are processed every month, necessitating stringent attention to the maintenance of dosimetric accuracy. The precision of the dosimetric measurements is primarily dependent on the accuracy of the TL signal. As the number of occupational workers subjected to monitoring continues to grow, there is a pressing need for automation to increase the throughput of TLD personnel monitoring laboratories without sacrificing dosimetric accuracy. The use of ML algorithms for the screening of TL GCs and correction process as demonstrated in this study will improve laboratory throughput as well as dosimetric accuracy.

Data availability statement
No new data were created or analysed in this study.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or non-profit sectors.