A Simple Method for Predicting N H Variability in Active Galactic Nuclei

The unified model of active galactic nuclei (AGNs) includes a geometrically thick obscuring medium to explain the differences between type I and type II AGNs as an effect of inclination angle. This medium is often referred to as the torus and is thought to be “clumpy” as the line-of-sight column density, N H, has been observed to vary in time for many sources. We present a method which uses a variation in the hardness ratio to predict whether an AGN has experienced N H variability across different observations. We define two sets of hard and soft bands that are chosen to be sensitive to the energies most affected by changes in N H. We calculate hardness ratios for Chandra and XMM-Newton observations of a sample of 12 sources with multiple observations, and compare the predictions of this method to the N H values obtained from spectral fitting with physically motivated torus models (borus02, MYTorus, and UXCLUMPY). We also provide a calibrated correction factor that allows comparison between Chandra and XMM-Newton observations which is otherwise not possible due to differences in the instrument response functions. The sensitivity of this method can be easily adjusted. As we decrease the sensitivity, we find that the false positive rate becomes small while the true positive rate remains above 0.5. We also test the method on simulated data and show that it remains reliable for observations with as few as 100 counts. Therefore, we conclude that the method proposed in this work is effective in preselecting sources for variability studies.


Introduction
Active Galactic Nuclei (AGN) are powered by accretion of gas onto supermassive black holes (SMBH) and are among the most luminous sources in the Universe, emitting across the entire electromagnetic spectrum.The unified model for AGN includes an obscuring torus surrounding the accretion disk (Antonucci 1993;Urry & Padovani 1995).Depending on the structure and orientation of the torus, the broad line region (BLR) near the accretion disk may be obscured, resulting in a type II AGN (see e.g.Hickox & Alexander 2018 for a recent review).It was originally thought that this obscuring medium is uniform, however, Krolik & Begelman (1988) suggested that this is unlikely.Recent studies of the line-of-sight column density, N H,los (hereafter simply N H ), show variability in AGN over timescales ranging from hours (e.g., Elvis et al. 2004) to years (e.g., Markowitz et al. 2014).These studies, along with IR SED fitting models (e.g., Nenkova et al. 2008), support the idea of a 'clumpy' obscuring medium, perhaps made of individual clouds.
Studying the variability in N H allows us to constrain properties about the obscuring torus structure such as the density, shape, size, and radial distance of the clouds from the SMBH (Risaliti et al. 2005;Maiolino et al. 2010;Markowitz et al. 2014;Pizzetti et al. 2022;Marchesi et al. 2022).For example, variability on timescales of ≤ 1 day is originated at ≤ 10 −3 pc (i.e.within the BLR), while monthly and yearly variability likely originates at parsec scales (i.e. in the torus).On the other hand, Laha et al. (2020) looked at a sample of 20 type II AGN and found that 13/20 showed no significant variability in N H at all, suggesting that the obscuration may be coming from even larger distances associated with the host galaxy.Thus, these studies can provide information about the location of the absorber and the cloud distribution within it.Furthermore, Maiolino et al. (2010) were able to show that the geometry of BLR clouds in NGC 1365 is unlikely to be spherical as is often assumed.However, at present, most properties of these clouds remain poorly understood, in large part due to the paucity of sources with known N H variability available to study.Typically, the way to study N H variability for AGN with multiple observations is to use some variation of an absorbed powerlaw model to fit the X-ray spectrum (e.g.Laha et al. 2020).Perhaps an even better way is to use a physically-motivated torus model (e.g., Murphy & Yaqoob 2009;Baloković et al. 2018;Buchner et al. 2019) as done in Pizzetti et al. (2022) and Marchesi et al. (2022), for example.However, these methods are time consuming when applied to sources with multiple observations, and are thus not practical for a very large sample of blindly-selected sources.For this reason, very few studies have been performed to date.In fact, the most complete sample of cloud occultation events to date observed only 12 individual events (Markowitz et al. 2014), and is still used to calibrate clumpy torus models (Buchner et al. 2019).
X-ray data are becoming much more abundant than in the past and could become even more so with future missions such as AXIS (Mushotzky et al. 2019), Athena (Nandra et al. 2013), and Star-X (Saha et al. 2017;Saha & Zhang 2022).Presently, data is being released from the eROSITA instrument (Predehl et al. 2021) which is expected to detect millions of Xray point sources, each being observed over timescales ranging from months to years (e.g.Salvato et al. 2022;Brunner et al. 2022).Marchesi et al. (2020) showed that 90% of the sources detected by AXIS and Athena would be first-time detections in the X-rays.Therefore, it is imperative to develop methods to sift through this vast amount of data to pick out observations that are likely to show N H variability.Once these sources are found, they can be studied in depth with the standard spectral modeling techniques.
ten interpreted as the X-ray 'color' of a source, since it indicates the amount of high-energy (hard) photon counts relative to the low-energy (soft) counts.Because photoelectric absorption is strongly energy dependent, soft X-rays are more likely to be absorbed than hard X-rays.Consequently, large HR values typically indicate high N H values.However, this is not a simple 1:1 relation due to reprocessing effects not related to line-of-sight obscuration.
Previously, hardness ratios have been used on AGN as an indicator of Compton-thickness (e.g.Iwasawa et al. 2011;Torres-Albà et al. 2018).Variability in HR has also been used to classify AGN (e.g.Peretz & Behar 2018) as well as indicate variability in their spectral shape (e.g.Connolly et al. 2016).However, depending on the choice of 'hard' and 'soft' bands, it can be difficult to disentangle intrinsic variability in coronal emission and line-of-sight obscuration (Caballero-Garcia et al. 2012).By focusing the region of interest on the energies most affected by N H variability, the variability in HR is more likely to be due to obscuration effects.
In this paper, we present a new method for predicting the variability of N H between two observations and provide the results as applied to a small sample of carefully analyzed sources.The layout is as follows: In Section 2 we describe the sample of sources and the modeled N H values used.In Section 3, we describe our method of predicting variation in the modeled N H values using hardness ratios.In Section 4 we discuss various ways to interpret the reliability of our method and present the results.We summarize our findings in Section 5.

Sample
The sample used to test these methods consists of 12 sources with multiple observations across Chandra, XMM-Newton, and NuSTAR.These sources are studied extensively by Torres-Albà et al. 2023, (hereafter TA23) using the AGN torus models borus02 (Baloković et al. 2018), MYTorus (Murphy & Yaqoob 2009), and UXCLUMPY (Buchner et al. 2019), to obtain accurate values of N H .The sources are shown in Table 1 along with the best-fit N H values found with each of the three models for the Chandra and XMM-Newton observations.Three sources had multiple NuSTAR observations and their information is shown in Table 2. Several sources were found to have observations that vary significantly in N H , while others showed no variability.Therefore, this sample has the diversity required to test the predictive power of our hardness ratio method (see Section 3.1).

Data
This analysis uses observations from XMM-Newton, Chandra, and NuSTAR.For the XMM-Newton observations, only the data from the EPIC pn camera (Strüder et al. 2001) is considered due to its higher effective area.All the Chandra observations were obtained using the ACIS-S camera (Garmire et al. 2003) with no grating.Chandra observations range from cycle 1 to cycle 20.However, the degradation in sensitivity with time does not affect our analysis (see Figure 1) because we ignore energies below 2 keV, where the sensitivity is most significantly reduced.
Chandra observations may also be affected by vignetting when the source is observed off-axis1 .In particular, for sources farther than 5' from the center, there may be a significant softening of the spectrum due to stronger vignetting at higher energies.All of the observations used in this work have the sources of interest within 5' of on-axis.Figure 1 also shows simulated data for an off-axis source (2.8' from center) and the relative sensitivity is not significantly reduced until >8 keV where Chandra is already dominated by background counts.We conclude that the effects from effective area degradation or off-axis sources should not impact this method significantly.
We use data from the FPMA detector for the three sources with multiple NuSTAR observations.We note that there is no substantial difference between the counts observed with FPMA and FPMB, so we choose to consider only FPMA to avoid slightly higher background rates in the FPMB detector.

Hardness Ratio
We define the hardness ratio to be where H and S are the net counts (see eq. 2) in the hard and soft bands respectively.We use two different definitions: HR 1 : Soft (2-4 keV), Hard (4-10 keV) HR 2 : Soft (4-6 keV), Hard (6-10 keV) These bands were chosen in an attempt to maximize sensitivity in changes in N H .A second hardness ratio, HR 2 , is needed to break a degeneracy present due to the increased importance of the reflection component in sources with high obscuration (see Figure 2).Above a certain N H , all of the primary soft counts are absorbed, leaving only the reflected counts visible.Since the reflection component does not depend on line-of-sight N H , these highly obscured sources show softer HR 1 as the N H is increased, which decreases the sensitivity of HR 1 in this N H region and ultimately strips it of its predictive power entirely.According to our simulations using the borus02 model, this occurs at N H ∼ 3 × 10 23 cm −2 for AGN with photon index, Γ = 1.8; average torus column density, N H,tor = 10 24 cm −2 ; and covering factor c f = 0.5 (values based on Zhao et al. (2021)).Since HR 2 is shifted to higher energies, it remains sensitive to N H variability at and beyond this limit as seen in Figure 2. It is important to note that these specific quantities should only be taken as indicative since they are meant to represent an 'average' AGN, and most individual sources will differ from these simulated data.However, the trends in Figure 2 should apply for any given source because the average torus properties are not expected to change on the same timescales as line-of-sight N H .
The net counts in each band are obtained by setting the data from TA23 in XSPEC.The total counts, n tot , along with the fraction of the total count rate that the net count rate contributes, f , are recorded.From this information, the net counts (n net ) and background counts (n bkg ) are calculated as follows (2)   Confidence intervals for HR are found by following the methods for Poisson statistics in Gehrels (1986).The approximate upper and lower single-sided limits for a measured number of counts n is given by the Gehrels (1986) equations ( 9) and ( 14) where S = 1.645, β = 0.031, and γ = −2.5 for 95% single-sided confidence level.This corresponds to a 90% confidence level for a double-sided interval n l to n u .These limits are calculated for n tot and n bkg and the error δn is taken to be the average difference 2 between the measured count and the upper and lower bounds The total error on the net counts is then This net count error is propagated through the hardness ratio to get the 90% confidence error on HR The asymmetry is very small or nonexistent in every case.
where H and S are the net counts, n net , in the 'hard' and 'soft' bands respectively.

Cross-instrument Comparison
The ability to compare observations across multiple instruments is important to maximize the opportunities for variability detection.It is clear that this method should work when comparing Chandra observations with different Chandra observations, but it is not as simple when comparing Chandra with XMM-Newton.
In this case, the differences in the instrument response functions make it impossible to meaningfully compare raw hardness ratios between instruments (Park et al. 2006).Therefore, a method must be developed to correct for these differences.
In order to overcome the difficulty in comparing Chandra to XMM-Newton observations, we must account for the differences in the instrument response.In particular, the steep decline of ACIS-S response with respect to EPIC pn response beyond ∼ 4 keV and the complete lack of response in ACIS-S beyond ∼ 7 keV.This is shown in Figure 3.To correct for this difference, we multiply the counts in each band of the Chandra observations by a correction factor C H for the hard band and C S for the soft band.This factor is the ratio of the integrated effective area of the pn detector on XMM-Newton to the ACIS-S with no grating on Chandra.We find these correction factors to be C H = 3.759 , (7.185) C S = 1.958 , (2.368) (7) for HR 1 , (HR 2 ).
These factors are applied to the count n and count error δn after the upper and lower limits have been found.Ultimately, the hardness ratio is and the error is After these corrections, we can more reasonably compare a Chandra observation to a XMM-Newton observation for a given source.

NuSTAR
There are 3 sources in our sample with multiple NuSTAR observations and we applied a modified version of our method to these.The energy bands used to define the hardness ratio for NuSTAR observations are HR nu : Soft (3-8 keV), Hard (8-24 keV).
Since the soft band in this definition covers most of the photons typically absorbed by even highly-obscured AGN (< 10 keV; Koss et al. 2016), there is no need to introduce a second hardness ratio to break degeneracies.

Prediction of N H Variability
It is clear from Figure 2 that HR should depend on N H .In this analysis, we use two different criteria to flag a pair of observations as variable: (1) We take a significant variation in the 90% confidence level of HR 1 or HR 2 between the two observations to indicate a significant variation in the 90% confidence level N H for those same two observations.(2) We calculate the χ 2 of each pair of HR values assuming no variability.That is where µ is the mean HR of the two observations a and b.The source is flagged as variable if χ 2 HR > 2.706 for either HR 1 or HR 2 .This value corresponds to a significance level of α = 0.1.Thereby, we say that the observations are not consistent with each other at the 90% confidence level.
We compare these flagged observations to the 'true' variable observations in TA23.Variability in N H between two observations is defined similarly for Criterion 1. TA23 obtained the 90% confidence intervals for N H and these are considered variable if there is no overlap between the values obtained in the two observations.In order to use Criterion 2, we use the method in Barlow (2003) to account for the asymmetry in the N H errors. Here, we take the χ 2 contribution of an observation to be where N H i is the best fit N H value for the observation, µ is the mean best fit N H value for the two observations, and σ + i and σ − i are the upper and lower limits of the best fit N H .The upper or lower limit is chosen to be in the direction of the mean.For example, if N H 1 < µ then we must have N H 2 > µ and we would calculate As before, we consider two observations to be variable at 90 % confidence if χ 2 N H > 2.706.

XMM-Newton and Chandra Results
In total, we had 76 pairs of observations to test our method on and each observation has an N H value from each of the three models.Since this is a binary classification (variable or not variable), a confusion matrix is one of the best ways to analyze the reliability of the method (Stehman 1997).The confusion matrices are shown in Appendix B. We consider a true positive, TP, to be when our method predicts variability and the N H values show variability.A false positive, FP, is when our method predicts variability, but the N H values are consistent with each other.A true negative, TN, and false negative, FN, are defined similarly.

Accuracy
The simplest measure of the reliability would be the accuracy, which is defined as the total number of correct predictions divided by the total number of predictions.In terms of confusion matrix values The accuracies using each criterion for each of the three models are shown in Table 3 considering only HR 1 , only HR 2 , and both.The accuracy can be a useful first approximation to the reliability of a method, however, it can hide particular behaviors that are important to note before applying this method to a larger sample.Note that for all three models and for both criteria, using HR 1 only, seems to make better predictions than considering both ratios especially when using the N H values derived with the MYTorus model.Looking at the confusion matrices in Appendix B, we can see why that is.Only considering HR 1 is less likely to result in a positive prediction and MYTorus shows more actual negatives (44) than actual positives (32).So, a method that is less sensitive to variability would be expected to do better than a more sensitive method.Considering borus02, which has the same number of actual positives and negatives (38), the accuracy of HR 1 only and both HR 1 and HR 2 , are more in agreement.To quantify this, the "prevalence" is defined as the percentage of actual positives The prevalence is also shown in Table 3.The accuracy of HR 1 only vs HR 1 &HR 2 seems to be dependent on the prevalence, with lower prevalence favoring HR 1 only, as expected.In the real-world application of this method, the prevalence will not be known, so it would be hasty to conclude that we should only consider HR 1 simply because the accuracies are higher for this sample.
Recall is a measure of how good the classifier is at finding true positives and is defined as Ideally, both of these values would be as close to 1 as possible.However, realistically, this is not achievable and one might want to prioritize one metric over the other.For example, if studying N H variability in a large sample of sources is the primary goal, precision might be valued over recall to avoid carefully fitting the X-ray spectra of observations that are not variable.On the other hand, if working from a smaller sample, false positives might not be as inconvenient.In this case, one would want to prioritize recall to make sure most of the variable sources are actually flagged.Furthermore, if the hardness ratios are changing, this means that the spectral shape is changing and could indicate something interesting even if it does not happen to be a changing N H .For example, changes in photon index are typically associated to variability of the AGN Eddington ratio, with higher accretion rates corresponding to a softer X-ray spectrum (Lu & Yu 1999;Shemmer et al. 2008;Risaliti et al. 2009).
A single value that accounts for both precision and recall is the F β -measure (Van Rijsbergen 1979).It is defined as where β > 1 considers recall more important and β < 1 values precision higher.The regular F-measure has β = 1 and weights precision and recall equally.We show the results using β = 2 which values finding truly variable sources over avoiding not variable sources3 .The F 2 measures are shown in Table 4. Here, we see a different interpretation of the results from the standard accuracy shown in Table 3.In this case, considering variability in either HR 1 or HR 2 provides a better score than HR 1 alone.This is not surprising, as this method is more likely to make a variable prediction and with β = 2, we are artificially rewarding the ability to detect variability.

Receiver Operating Characteristic
Similar to precision and recall, one can define the false positive rate, FPR, and true positive rate, TPR.The FPR is the ratio of false positives to the total number of actual negatives and is defined as The TPR is the ratio of true positives to total actual positives and is equivalent to recall (Eq.16).The receiver operating characteristic (ROC) plots the TPR against the FPR and therefore again provides a measure of how sensitive we are to true positives and how resistant we are against false positives (Fawcett 2006).A perfect classifier would be at the point (0,1) while a random classifier would be along the line TPR = FPR.An ROC curve can be obtained by varying the decision threshold.Figure 4 shows the ROC curves for each of the three models considering only variability in HR 1 , as well as variability in both HR 1 and HR 2 .The χ 2 critical value for one degree of freedom is the decision threshold that was varied to obtain the curves4 .The values used are χ 2 =[0, 0.001, 0.004, 0.016, 0.102, 0.455, 1.32, 2.706, 3.841, 5.024, 6.635, 10.828, 15.137, 19.511] which correspond to confidence levels of CL =[<1 %, <1 %, 5 %, 10 %, 25 %, 50 %, 75 %, 90 %, 95 %, 99 %, 99.9 %, 99.99 %, 99.999 %].The blue and red points in Figure 4 show the ROC values at 90 % for each classifier.
Here we see that for all combinations the predictive value is much better than random guessing.Furthermore, for confidence levels higher than 90%, using both ratios leads to better results except for the MYTorus model.This can again be explained with the fact that the MYTorus model has a lower prevalence than borus02 or UXCLUMPY, so the method that is more likely to predict no variability will appear better.However, we reiterate that in general, the prevalence will not be known, so using both ratios is likely to be a more robust classifier.
Notably, as the critical value is increased up to χ 2 > 10.282 corresponding to a confidence level of 99.9 %, the FPR goes to zero while the TPR remains around or above 0.5.This means that by decreasing our sensitivity to variability (up to a point), we can reduce the number of false positives to almost zero, and still be able to detect more than half of the true positives.

NuSTAR Results
The results for the three NuSTAR sources are shown in Figure 5.These plots show the best-fit N H value obtained from the UXCLUMPY model against the single hardness ratio defined in Section 3.3 for NuSTAR As can be seen from the figures, Criterion 1 correctly predicts the variability in NGC 4388 and also the non-variability in 3C 105 and NGC 7319.
The predictions are also correct for all three sources when Criterion 2 is used.For 3C 105, the value for the HR fit is χ 2 HR = 1.56 and the the values for the N H fits are χ 2 N H = 0.51, 0.43, 1.25 for UXCLUMPY, MYTorus, and borus02 respectively.Similarly, for NGC 7319, χ 2 HR = 0.02 and χ 2 N H = 0.92, 0.89, 1.15.For the variable source NGC 4388, χ 2 HR = 52.4 and χ 2 N H = 120, 119, 94.Although the sample size is very small, the NuSTAR hardness ratio seems to be better at predicting N H variability.It would not be surprising if this is the case, considering the energy bands we are able to use with NuSTAR might be better aligned to detect changes in line-of-sight absorption for z ∼ 0 AGN with moderate obscuration.This could be due to the fact that increasing N H only significantly affects the 3-8 keV band which leads to a predictable increase in HR nu .On the other hand, for HR 1 and HR 2 , an increase in N H affects both energy bands differently depending on the amount of absorption and reflection, leading to a less predictable change in the hardness ratios.Of course, we cannot make any definitive statements with only three observations.A larger sample of NuSTAR observations is needed to confirm this.

Summary and Conclusion
In this work, we introduced a method to predict variability in line-of-sight N H for an AGN, without having to perform difficult and time-consuming spectral modelling.This would allow the user to quickly sift through many X-ray observations to flag the sources that are most likely to experience N H variability.These flagged sources can then be studied further by performing a full spectral fitting to obtain accurate N H values.
To do this, we used variability in hardness ratio as a proxy for variability in N H . Two different hardness ratios were defined to account for a possible degeneracy in highly obscured scenarios.Two different critera were used to determine whether observations are 'variable.'Criterion 1 considers two observations variable if the 90 % confidence intervals are inconsistent with each other.Criterion 2 considers the χ 2 fit assuming there is no variability.
We tested our prediction method on a sample of 12 sources with N H values determined through careful spectral modeling, and provided different interpretations of the results.We conclude that our method can be a useful tool for selecting samples of likely N H variable AGN.Criterion 1 seems to be a good overall predictor while Criterion 2 is not as good overall (at a 90 % confidence level), but the sensitivity can easily be adjusted to suit the requirements of a particular project, resulting in a very flexible tool.We reiterate, this method is not to be used as a substitute for measuring the N H via spectral fitting.Rather, it is only an indicator of variability between two observations.In a future paper, we will apply this method to a larger sample of sources with unknown N H values.We will flag the sources with variable HR and study those with careful spectral fitting.Confusion matrices for all three models using Criterion 1 (the overlap method) considering HR 1 &HR 2 .These show that the method is fairly good at classifying sources as variable or not variable.Confusion matrices for all three models using Criterion 2 at a 90 % confidence level considering HR 1 &HR 2 .These show that the method is good at avoiding false negatives meaning that most of the variable sources in a sample will be flagged.However, this comes at the expense of flagging as variable more sources that are not variable (top right).Confusion matrices for all three models using Criterion 2 at a 99.999 % confidence level considering HR 1 &HR 2 .These show that the method is good at avoiding false positives meaning that almost none of the non-variable sources in a sample will be flagged.However, this comes at the expense of not selecting a larger number of variable sources (bottom left).

Fig. 2 .
Fig.2.Calculated hardness ratios for data simulated using the borus02 model for a range of N H values. HR 2 continues to increase beyond N H ∼ 3 × 10 23 cm −2 whereas HR 1 decreases.Neither is sensitive to changes in N H beyond ∼ 10 24 cm −2 given the selected average torus properties.

Fig. 3 .
Fig.3.Shape of the response for ACIS-S and pn cameras on an absolute (red) and normalized (black) scale.Chandra data has a much lower response in the hard band (4-10 keV) than pn, which needs to be corrected.These data were simulated in the same way as Figure1.

Fig. 4 .
Fig. 4. ROC curve for all three models considering both HR 1 &HR 2 and HR 1 alone.The blue and red dots correspond to the values at 90 % confidence level for each combination.The grey dotted line represents a theoretical classifier with no predictive value.

Fig. 5 .
Fig. 5. Direct comparison of the best-fit N H values from UXCLUMPY to the NuSTAR hardness ratio.As can be seen, the hardness ratio is able to predict N H variability in NGC 4388 and non-variability in 3C 105 and NGC 7319.

Fig. A. 1 .
Fig. A.1.Results for 3C 452 and 3C 105.The boxes represent the 90% confidence interval for HR 1 and HR 2 .The bar on the right shows the 90% confidence interval for the modeled N H,los with UXCLUMPY, which we take to be the "true" column density.

Fig
Fig. A.2. Results for NGC 788 and NGC 3281.The boxes represent the 90% confidence interval for HR 1 and HR 2 .The bar on the right shows the 90% confidence interval for the modeled N H,los with UXCLUMPY, which we take to be the "true" column density.

Fig
Fig. A.3. Results for IC 4518A and NGC 612.The boxes represent the 90% confidence interval for HR 1 and HR 2 .The bar on the right shows the 90% confidence interval for the modeled N H,los with UXCLUMPY, which we take to be the "true" column density.

Fig. A. 4 .
Fig. A.4. Results for NGC 7319 and NGC 4388.The boxes represent the 90% confidence interval for HR 1 and HR 2 .The bar on the right shows the 90% confidence interval for the modeled N H,los with UXCLUMPY, which we take to be the "true" column density.

Fig
Fig. A.5. Results for 3C 445 and NGC 835.The boxes represent the 90% confidence interval for HR 1 and HR 2 .The bar on the right shows the 90% confidence interval for the modeled N H,los with UXCLUMPY, which we take to be the "true" column density.

Fig
Fig. A.6. Results for NGC 833 and 4C+29.30.The boxes represent the 90% confidence interval for HR 1 and HR 2 .The bar on the right shows the 90% confidence interval for the modeled N H,los with UXCLUMPY, which we take to be the "true" column density.

Fig
Fig. B.1.Confusion matrices for all three models using Criterion 1 (the overlap method) considering HR 1 &HR 2 .These show that the method is fairly good at classifying sources as variable or not variable.

Fig
Fig. B.2.Confusion matrices for all three models using Criterion 2 at a 90 % confidence level considering HR 1 &HR 2 .These show that the method is good at avoiding false negatives meaning that most of the variable sources in a sample will be flagged.However, this comes at the expense of flagging as variable more sources that are not variable (top right).

Fig
Fig. B.3.Confusion matrices for all three models using Criterion 2 at a 99.999 % confidence level considering HR 1 &HR 2 .These show that the method is good at avoiding false positives meaning that almost none of the non-variable sources in a sample will be flagged.However, this comes at the expense of not selecting a larger number of variable sources (bottom left).

Table 1 .
Sample details for Chandra and XMM-Newton data.The best fit N H values for all three models are in units of 10 24 cm −2 .For details on observations, see TA23.

Table 2 .
Sample details for NuSTAR data.The best fit N H values are in units of 10 24 cm −2 .For details on observations, see TA23.

Table 3 .
Accuracies of Criterion 1 and Criterion 2 in determining variability, using N H values from each of the three models.Predictions are based off of variation in HR 1 alone, HR 2 alone, and either HR 1 or HR 2 .Also shown, is the prevalence of each model.

Table 4 .
Same as Table3but with F 2 -measures instead of accuracies.