Concordance: In-flight Calibration of X-Ray Telescopes without Absolute References

We describe a process for cross-calibrating the effective areas of X-ray telescopes that observe common targets. The targets are not assumed to be “standard candles” in the classic sense, in that we assume that the source fluxes have well-defined, but a priori unknown values. Using a technique developed by Chen et al. that involves a statistical method called shrinkage estimation, we determine effective area correction factors for each instrument that bring estimated fluxes into the best agreement, consistent with prior knowledge of their effective areas. We expand the technique to allow unique priors on systematic uncertainties in effective areas for each X-ray astronomy instrument and to allow correlations between effective areas in different energy bands. We demonstrate the method with several data sets from various X-ray telescopes.

common problem: assessing how much any particular instrument's effective area should be adjusted so that the measurements might agree.Ordinary weighting of measurements based on photon counting statistics would give the observations with large effective areas and exposure times the greatest influence on the result but without consideration of possible systematic errors.The overarching goal of IACHEC is thus to bring the competing adjustments to instrumental effective areas into concordance, while simultaneously including prior knowledge of possible systematic errors.
Here, we further develop the "shrinkage" method pioneered by Chen et al. (2019), hereafter referred to as Paper I, to compute objective corrections to effective areas of several high-energy instruments.See Section 2.1 for the model setup and notation.The method is extended to account for systematic uncertainties specific to each instrument and incorporate systematic correlations across passbands.We present the method here in some generality, without the details of the mathematical machinery presented in Paper I (Section 2.2), along with extensions added (Section 2.3), and apply it to a variety of datasets (Section 3).We present the results in Section 4 along with simulation studies that aim to validate the method, and discuss the next steps in Section 5. We note that while we examine the case of X-ray telescopes in this paper, the method is extendable to most types of telescopes.

Background
We start with an idealized calibration data set where M objects are observed by each of N instruments, obtaining photon counts, c i j , where j = 1, . . ., M indexes the sources and i = 1, . . ., N indexes the instruments.1Each observation is described by a set of observational parameters (e.g., exposure time) that we encapsulate in a matrix T = {T i j }.
Denoting the true effective area of instrument i by A i and the true flux of source j by F j , the expected counts C i j for each object/detector combination is given by where T i j has units of seconds × counts per photon, F j has units of photons per unit area per second, and A i has units of area.
We assume for now that the true exposure factors have negligible error and equal the observed values, i.e., T i j = t i j .We follow the notation of Paper I by using lower case to indicate measured quantities and upper case to indicate the "true" values to be estimated.Note that in fact, the multiplicative constant T i j contains not only the exposure time but also other factors that can be calculated precisely for any given observation.We aim to estimate F j using the calibration data, i.e., the observed counts c i j , and an external (prior) estimated effective area, a i for each instrument.A naive procedure simply substitutes each quantity in Eq. ( 1) with its measured counterpart, to obtain the "estimating equation", c i j = t i j a i f i j (2) and solving Eq. ( 2) for f i j for each instrument-source combination, thus yielding N different estimates of each flux.The resulting estimator f i j = c i j /(a i t i j ) is a ratio estimator, which is known in statistical literature to be both seriously biased and highly variable.More precisely, the variability in the denominator can cause large uncertainty (considering dividing by a value close to zero).Furthermore, the average, 1 N N i=1 f i j , is a biased estimator of F j .We return to the issue associated with ratio estimators in §4.4.1.We can do much better by analyzing all data together using more principled and sophisticated statistical methods, such as the one given in Paper I. We can then achieve our goal to obtain better estimates of the instrumental effective areas, A i , that bring our flux estimates, f i j , closest to the F j in some strict statistical sense.Fig. 1 gives a schematic representation of our goal.
In practice, estimating a flux from an observation is not as simple an operation as merely counting events.A strictly correct approach takes into consideration the response of the instrument, especially when attempting to measure the flux in a specific bandpass.It is often necessary to use a forward-folding method, such as implemented in xspec, that takes a count spectrum as input and effective area and response functions as externally (and accurately) provided.Here, we approximate this process using Eq.(1) because the flux (as defined here) is often a robustly computed quantity for any given observation.In fact, we do not actually compute c i j but a proxy for c i j as described below.
We now carefully examine the flux measurement process to justify using the simplistic model represented by Eq. (1).Source j is assumed to have a photon spectrum n E (Θ j ) in units of photons per unit area per unit energy at energy E. Each Θ j represents a vector of spectral parameters for a source, which we assume to include an overall normalization n j with the same units as n E , so q(E; θ j ) ≡ n E (Θ j )/n j defines the spectral shape as a function of the remaining spectral parameters, θ j .In the case of a

F1 F2
Estimated EA Ideal EA Figure 1.The goal of Concordance illustrated schematically.This schematic supposed 3 sources and 3 instruments, and plots expected count rates, Ci j /Ti j , on the vertical axis and plots true and estimated fluxes on the horizontal axis.The three instruments are represented by different symbols, with solid symbols representing the naive estimates of Fj, i.e., fij = cij/(Tijai), and open symbols representing the same, but with ai replaced by the true effective area, Ai.The estimates are systematically biased because the ai are inaccurate estimates of Ai; compare the solid and dashed lines.(In the plot EA is used as an abbreviation for effective area.)Our aim is to estimate the Ai values that yield best agreement between instruments for each source.
spectrometer measuring an emission line, the bandpass may be so narrow that q is well approximated by a delta function, in which case F j = n j .Otherwise, the flux in a band from E 1 to E 2 is q(E; θ j )dE (3) giving where ∆E = E 2 − E 1 .
For most X-ray telescopes, the instrumental effective area can be separated into two parts.There is the geometric area of the optics, i.e., A g i , with losses due to mechanical obscuration, reflections, and transmissions of the optics (including filters) given by r i (E).The other part is due to the quantum efficiency of the detector, Q(E), which gives probability of a detecting a photon.We characterize the (true) effective area of instrument i as a function of energy in the bandpass of interest by Ãi , where α i (E) now gives the shape of the effective area in the band and is defined such that its integral over the band is unity, i.e., Consequently, is the scale of the effective area and does not depend on E. We take this approach because the model of the effective area through the band is generally better known than its absolute value.
The detector provides counts in K channels that are related to the true photon energy, E, via a response function Φ k (E), where A given observation has an accurately determined exposure time t i j that sets the expected count in channel k: where Eq. ( 10) follows from Eq. ( 4).It is important to point out that Eq. ( 8) is well-known in the astronomy literature but Eq. ( 10) is an innovative way of simplifying the expressions, which is essential for tackling the current problem.Finally, the counts that are associated with the energy band (E 1 , E 2 ) are mostly in the a range of channels (k 1 , k 2 ), which is chosen so that we have a reliable estimate of flux in the band of interest, giving the expected count relevant to a particular flux measurement which defines T i j in terms of t i j , a normalized sum over the response function, and the two shape functions.This is the formal basis for Eq. ( 1).
In actual data analysis, the observed counts, c i jk , not the expected counts, C i jk , are input into the iterative routine that estimates the fluxes, f i j , and, again, the estimated effective area, a i , must be used because the true effective area is unknown.Thus, in analogy to Eq. (2), we replace C i j , A i , and F j in Eq. ( 12) with their observed counterparts to obtain the estimating equation Finally, naive, instrument-specific flux estimates are obtained by solving Eq. ( 13), which is essentially a full derivation of the ratio estimator.
It is important to our analysis that the T i j values be independent of A i and F j .There are two circumstances that may cause a problem in this regard: 1) when there is some nonlinearity of the detector response, such as pileup, where Q i (E) depends on the source flux, and 2) when α i (E) is highly nonlocal and the bandpass of integration in Eq. ( 8) is large, involving portions of the spectrum where q(E; θ j ) or α i (E) are poorly determined.By choosing instrument data in the linear regime, avoiding pileup, and restricting the bandpasses of interest, we mitigate these potential issues.
The goal of our statistical modeling is to determine the best estimates of A i and F j consistent with the data, c i j with uncertainties σ i j .If all the effective areas were precisely correct, i.e. if A i = a i , then we could estimate the F j via a relatively trivial regression model.However, we know that there are systematic errors in our estimated effective areas because observations show that the f i j do not cluster about any given value within their individual uncertainties.We addressed this problem in Paper I by introducing estimates of the systematic errors on the estimated effective areas and applying a statistical method called shrinkage estimation.

Statistical Model and Shrinkage Estimates.
Paper I proposes a linear regression model for the log count rates, denoted y i j ≡ log(c i j /T i j ), such that where the B i ≡ log A i , and the G j ≡ log F j are the quantities to be estimated, and the e i j are noise terms that are assumed normally distributed with mean 0 and (known) variance σ 2 i .The − 1 2σ 2 i term in Eq. ( 15) is a half-variance correction that is included to maintain the multiplicative mean modeling in Eq. ( 12).This correction ensures that E(c i j ) = C i j = T i j A i F j because if log x ∼ N(µ, v), then E(x) = e 0.5v+µ .Paper I adopts a Bayesian hierarchical modeling approach to estimate the unknown quantities B i , G j , and σ 2 i .This involves setting independent Gaussian prior distributions for the B i , with prior mean b i = log a i and prior variance τ 2 i and setting independent flat priors for the G i ; and setting independent conjugate priors for the σ 2 i .Paper I goes on to show how a Hamiltonian Monte Carlo algorithm can be used to obtain a sample from the joint posterior distribution of all unknown quantities in Model (15) and cross-checks this computation with a blocked Gibbs sampler.The resulting Monte Carlo sample can be transformed back to the effective areas and source fluxes on their original scale to obtain their posterior distributions, estimates, and error bars.
A Bayesian perspective allows us to combine multiple sources of information -in this case the information from the calibration data, the y i j , and from the prior estimates of the effective areas, the a i .By replacing the estimated effective areas with prior distributions that reflect their uncertainty, we are able to update the estimated effective areas and their error bars in light of the calibration data.This approach provides improved estimates (e.g., in terms of mean squared error) of the effective areas and the fluxes simultaneously.As a result, we obtain estimates of the sources' true fluxes that combine the instrument-specific estimates in a statistically principled manner.
The updated estimates of the effective areas are weighted averages of their priors and the best values based on the current calibration data.In the context of Model (15), we work on the log scale.The estimates of B i and G j are given by where y i j = y i j + 0.5σ 2 i , and the weights are given by If the prior estimate of the effective area of a particular instrument is very precise relative to its calibration data, i.e., τ 2 i σ 2 i /M, then W i ≈ 0, and the updated estimate of that instrument's effective area is dominated by the prior estimate, resulting in B i ≈ b i .In contrast, if the calibration data are much more precise, then the weights are near unity and the updated estimate of the effective area is dominated by the calibration data, giving The estimates B i in Eq. ( 16) are often called "shrinkage estimates" due to their historical use for "shrinking" several estimates together toward a common prior mean (Efron & Morris 1972, 1973) when, for example, the b i are all the same.Because the prior means, b i , are different for different instruments, the B i are simply a sensible combination of prior knowledge captured by b i and data represented by ȳ i• , weighted by their respective precisions, which are the reciprocals of their variances (assuming the σ i are known).This combination allows our model to weigh the prior estimate of the effective area for a given instrument against deviations between the observed fluxes of the same sources from different instruments and ultimately to obtain the joint estimates of the true fluxes and effective areas that are most consistent with the calibration data and the priors on the effective areas.
Paper I further describes how to handle the case where all sources are not observed by all instruments and presents a robust version of Model (15) that allows for outliers among the measured fluxes (or source counts) by replacing the log Normal error model with a logt error model.

Extensions of the Model
Paper I proposes modeling calibration data using Model (15) and its extensions, derived computational methods for fitting these models, and validated their statistical properties.However, application of Model (15) to real data requires relaxing some of its basic assumptions.Here, we illustrate how this is accomplished via two extensions to the method.

Heterogeneous Uncertainties in Effective Area Priors
IACHEC scientists recognize that the quality of ground-based calibrations varies significantly from instrument to instrument, resulting in perceived differences in the reliabilities of the estimated effective areas.The formalism laid out in Paper I allows for instrument specific prior distributions for for the B i , as explained in Section 2.2, given by Gaussian distributions with instrumentspecific variances τ 2 i .In the numerical examples in Paper I, however, the τ 2 i were set to τ 2 for each i, essentially assuming that all modelled calibrations are equally uncertain in percentage terms.Here we allow for heterogeneous τ 2 i values, as covered by the theory given in Paper I. At IACHEC meetings in 2017, 2018, and 2019(Madsen et al. 2019, 2020), we asked instrument calibration scientists to specify values of τ i for their instruments in each of a specific set of bandpasses.These values are given in Tables 1 and 2; instruments with significant effective area below 1 keV appear in Table 1 and other instruments appear in Table 2.
a The τi values are given as percentages.The ellipses indicate bandpasses where the instrument has an negligible effective area.
Of course, in practice it is difficult even for experts to quantify the τ i precisely.Thus, it is important that we examine the sensitivity of our results to the specified values.Often, however, there is a body of experience and expert knowledge on the reliability of ground-based standards that allows rough estimation of systematic errors.

Prior Correlations among Effective Areas
The second extension allows correlations between the effective areas in different energy bands for each instrument.In Paper I, we treated different energy bands as separate (independent) instruments, while in reality their effective areas can be strongly correlated.Continuing to work on the log relative scale given in Eq. ( 23), we denote the effective areas as a function of the energy band where ξ parameterizes the effective area and includes quantities such as the geometric area, filter thicknesses, and chemical compositions that are initially estimated during ground calibration.Uncertainties in ξ are quantified via the prior distribution, p( ξ); examples generated and used for ACIS analyses can be found in Drake et al. (2006); Kashyap et al. (2008); Lee et al. (2011) and Xu et al. (2014).We suppress the subscript i throughout this section for notational simplicity because we are restricting consideration to an arbitrary instrument.
Following the discussion in Section 2.2, we specify the prior distribution on (the logarithm of) the effective areas of U energy bands, {B(E u , ξ)} for u = 1, . . .,U, as a multivariate Gaussian distribution with expected values, β u , and variances, Λ u , for each energy band u, and with correlations, ρ uv , between all pairs of distinct energy bands, u = v.Among the b u , the Λ u , and the ρ uv , only the correlations, ρ uv , (or, more precisely, the Monte Carlo estimates of the ρ uv ) are used in our data analyses.To distinguish the Monte Carlo estimates of the prior means and variances of the B i obtained here from those we actually use, we introduce new notation for these quantities that differ from those used in Paper I and in Section 2.2.The current methods for computing Λ u are still somewhat experimental, so we rely instead on the τ i values from IACHEC scientists.
For a given instrument and energy band u, the expectation is In practice, calibration scientists set a i to be the prior estimate of A i (on the original scale) based on their best information and experience.Transforming to a logarithmic scale, we might set our prior estimate of B i to log a i , which is the choice of prior mean we suggest in Section 2.2.Eq. ( 18) can be used in the absence of such intuition or if we prefer to use a parameterized model for B. (This is particularly relevant for the correlations, since we have less intuition for them.)In this case, a reasonable strategy is to proceed via Monte Carlo integration of Equation ( 18).More precisely, we obtain a calibration sample, {B (k) (E u ), k = 1, . . ., K}, that quantifies prior uncertainty in B(E u , ξ), for example by obtaining a sample of size K from p( ξ) and computing The Monte Carlo version of Eq. ( 18) is then The prior variance of B(E u , ξ) and its Monte Carlo estimate are respectively.Finally, the prior correlation between B(E u , ξ) and B(E v , ξ) is the covariance normalized by the respective standard deviations,

Practical Implementation
This section discusses practical implementation of our methods, specifically, in terms of normalization of observed counts/fluxes and computation of correlation matrices.

Normalization in Practice
Because X-ray data analysis packages such as xspec return the f i j and their uncertainties, it is convenient to rewrite Eq. ( 15) in terms of where f is a fiducial flux (usually the maximum of the f i j ) used to normalize the data to the range [0, 1].Model ( 24) is functionally equivalent to Model (15).This definition of G j , normalized by f , which depends on data, is only introduced for computational convenience and does not affect the model or its interpretation.Technically, using a data-dependent "parameter", here G j , implies a data-dependent prior distribution, which is generally not legitimate from a Bayesian viewpoint.However, because using a flat prior on G j is the same as using a flat prior on G j = G j − log f , there is no actual effect in our implementation.Thus, Model (24) has the same form as Model ( 15), so we can embed it into the Bayesian hierarchical model described in Paper I to obtain the full posterior distribution, estimates, and error bars for the Bi and G j .Paper I suggests setting b i = 0 (because Bi = 0 implies that A i = a i ).

Deriving Correlations in Practice
We proceed by computing numerous instances of instrument effective areas that are varied in controlled ways dictated by current knowledge of uncertainties in calibration.The basis of the method is to generate a so-called calibration sample of areas that represents the range of uncertainties of the effective area, including all the correlations between different energies.The approach to generating the calibration samples for the different instruments we study here is common to all instruments, with some additional complexity built into the Chandra samples.We describe the method in brief below and refer the reader to Drake et al. (2006) for more complete description.
We devise a "perturbation function" that comprises piecewise cubic segments that stretch between the natural absorption edges of the different materials encountered along the optical path for a given instrument.This function varies about unity by random amounts but is constrained within fixed limits based on specified calibration uncertainties by the cubic function whose parameters are randomly drawn from a truncated Gaussian distribution.The perturbation function is applied as a multiplicative factor to the different subassembly component contributions to the effective area, which are considered on a case-by-case basis.For instance, in the case of Chandra/ACIS, six plausible mirror effective areas are used, uncertainties in the optical blocking filter and contamination and contamination transmittance are modeled by altering the optical depth of each chemical component within their known uncertainties and recomputing ensemble transmittance, and CCD quantum efficiencies are computed for different realizations of depletion depth and SiO 2 layers.Details of this process are given in (Drake et al. 2006, see also Drake et al. 2021, in preparation).
Chandra/HETGS and Chandra/LETGS use a similar approach with additional perturbation functions applied for the transmission grating diffraction efficiences.In the case of the other instruments considered here, the perturbation function approach alone is used.

OBSERVATIONS AND DATA PROCESSING
Three data sets are considered in Paper I and we add a fourth in this paper.Here, we detail how the data sets are handled and the required data processing.

Supernova Remnant 1E0102.2-7219
As in Paper I, the fluxes of the emission line complexes of O and Ne in the X-ray spectra of SNR 1E0102.2−7219are taken from the detailed comparison of 13 instruments by Plucinsky et al. (2017).Briefly, the spectra of each non-dispersive instrument are fit with a model with five free parameters: an overall normalization and four emission line fluxes of O VII, O VIII, Ne IX and Ne X.Because the emission lines of the same element have similar energies, their effective areas are comparable and highly correlated, so we combined the O and Ne line fluxes to create two fluxes for each instrument.In our statistical analysis, these two fluxes are treated as "sources": one for O and another for Ne.The data are normalized to the O or Ne fluxes obtained by the XMM-Newton pn instrument.By requiring that each instrument analysis uses the same model, except primarily for the strengths of the emission lines, the q(E; θ j ) values do not depend on the instrument, nor on the measured line fluxes.
Table 3 shows the τ i values assigned to the effective areas for each instrument considered.The values were taken from Table 1 using the bandpass that covers the lines of interest: 0.54-0.80keV for O and 0.8-1.2keV for Ne.For the correlation matrix, there is only one off-diagonal term, which we set to 0.88 for ACIS instruments and 0.82 for XMM instruments, as derived as in § 2.3.2.

Sources from the 2XMM Catalog
We select a sample of X-ray sources from the 2 nd European Photon Imaging Camera (EPIC) Serendipitous Source (2XMM) Catalog (Watson et al. 2009).EPIC consists of three X-ray cameras with CCD sensors mounted on the ESA spacecraft XMM-Newton (Jansen et al. 2001); the EPIC-pn (Strüder et al. 2001) and two EPIC-MOS (Turner et al. 2001) cameras observe celestial sources quasi-simultaneously within their co-aligned fields of view.For this analysis, v14.0 of the Science Analysis  System (SAS; Gabriel et al. 2004) is used, as well as the calibration files as available in 2016.A description of the data reduction and spectral extraction procedure appears in Read et al. (2014).Soft, medium, and hard bands are defined to be the 0.5-1.5, 1.5-2.5, and 2.5-10 keV bands, respectively.Due to variability,different observations of the same source are treated as separate sources for a total of 41 observations of 35 distinct sources.The normalizing (maximum) fluxes in the soft, medium, and hard bands are 0.138, 0.000701, and 0.00223 photons cm −2 s −1 , the brightest sources in their respective lists.The data are provided in tables A.1-A.3 in the Appendix.There are two primary features of the analysis procedure that make the results suitable for Concordance analysis: 1) the count spectra for the different instruments (pn, MOS1, MOS2) are fit simultaneously to power laws, so that the spectral slopes, θ j = Γ j (where n E [Θ j ] = n j E −Γ j ) do not depend on the instrument combination, and 2) the sources are faint enough that pileup is not an issue.Table 4 gives the τ i values assigned to the effective areas for the pn and MOS instruments and the three bandpasses.The values are taken from Table 1 using the 0.54-0.8and 0.8-1.2keV τ i values for the soft band, the 1.2-2.2keV τ i value for the medium band, and an average of the 2.2-10 keV τ i values for the hard band.Table 5 gives the correlation matrix values, ρ mn , computed for the pn and MOS instruments and the three bandpasses used in the 2XMM catalog while Table 6 provides these values for the bandpasses used in the XCAL analysis.For simplicity, we assume that the ρ mn are the same for each instrument.Another set of EPIC spectra used to validate the model is the so-called "XMM-Newton Cross-Calibration" (XCAL) sample.2This is a sample of radio-loud Active Galactic Nuclei (AGN), primarily blazars, observed routinely by XMM-Newton in the framework of its in-flight calibration program (Guainazzi et al. 2015).As with the 2XMM sample described in §3.2, the sources are variable.In this case, there are more blazars and high signal observations, for a total of 108, 103, and 94 observations of 22 distinct sources in the soft, medium, and hard bands that exceeded a flux limit criterion without highly discrepant fluxes between the three instruments.The normalizing (maximum) fluxes in the soft, medium, and hard bands are set to 0.126, 0.0156, and 0.0154 photons cm −2 s −1 , the brightest sources in their respective lists.Data are provided in tables A.4-A.6) in the Appendix.As with the 2XMM sources, the pn, MOS1, and MOS2 data were fit simultaneously to power law spectra so that Γ j is the same for each instrument.However, compared to the 2XMM sources, the XCAL sources are bright, often exceeding the count rate threshold beyond which the fraction of events affected by pile-up is no longer negligible (Jethwa et al. 2015).Spatial regions on the detector affected by pile-up are removed by excising the core of the telescope Point Spread Function (PSF) up to an observation-dependent radius.This radius is determined on the basis of the ratio between non X-ray diagonal and standard X-ray "patterns" (measure of the event shape in the CCD) in EPIC-MOS (Jethwa et al. 2015); and by visual inspection of the pattern distribution curves in EPIC-pn using the SAS task epatplot in EPIC-pn.This "PSF core excising method", while unavoidable to retain the highest possible fidelity of event spectral calibration, may introduce excess variance in the f i j via systematic uncertainty in the energy-dependent correction for the fraction of events scattered into or out of the annular spectral extraction region by the PSF (the so-called "Encircled Energy Correction" fraction).Values for τ i and ρ mn are the same as for the 2XMM sample.

Active Binary Capella
Capella (α Aur AB; G1III+G8III; 13pc) is a spectroscopic binary that is the brightest line-dominated source accessible to non-Solar X-ray missions.It is remarkably steady for a coronal source, having never exhibited significant flaring.While it does vary over timescales of months, it does not show any evidence of flux variability over timescales of weeks or less.Consequently, it has often been used as a calibration target, in particular with Chandra.It has been observed several times with different detector and grating combinations in close proximity (see Table A.7 in the Appendix).These observations allow us to carry out an assessment of the internal cross-calibration of the Chandra grating spectrometers.
We estimated the total fluxes in each of several strong lines: the highly-ionized lines of Fe XVII (at 15Å and 17Å) whose formation temperatures overlap the peak emission measures of Capella, and the hydrogenic lines of Ne X (12.13Å) and O VIII (18.96Å).For the purposes of this calculation, we treat each of the four emission lines as different sources.We then form 21 epoch groups, comprised of observations that are within 0.1 yr of each other3 , giving a total of 84 sources.Similarly, +1 and −1 grating orders are treated distinctly for each of four grating/detector combinations, ACIS-S/HEG, ACIS-S/MEG, ACIS-S/LEG, and HRC-S/LEG, for a total of eight instruments.The fluxes were normalized to the maximum values for each emission line: 28.02, 60.23, 49.66, and 23.33 × 10 −13 erg s −1 cm −2 for O VIII, Fe XVII 17Å, Fe XVII 15 Å, and Ne X, respectively.We use CIAOv4.11 to extract the dispersed spectra, and compute the effective areas using the contamination corrections as in CALDBv4.8.0.1.
The values of τ i are taken from Table 8 and the correlation matrix is given in Table 9.

RESULTS
Here we present results from new measurements and extensions to the results in Paper I. In each case, we generate 10,000 Monte Carlo replicates from the respective posterior distributions as the basis for our statistical inferences.

1E0102
These data provide a illustration of a case where there are many instruments that obtain data on the same source, shown in Figs. 2 and 3.The effect of allowing heterogeneous τ values is apparent in both cases.Generally, when the prior distribution on an instrument's effective area is more uncertain than average, giving a relatively large value of τ i , then the data for that instrument is given less weight, so the posterior estimate of its effective area is more likely to deviate from the prior estimate by comparison to when all instruments have equally uncertain prior estimates.In addition, the posterior range of the deviation is more likely to be large when τ i is larger.The Suzaku results for the O lines all show this effect.When effective areas are correlated between the O and Ne data sets, the ACIS-I3 point is particularly affected due to the discrepant results obtained when Ne and and O data are considered independently.

XMM Samples
Figures 4 and 5 show the results of the Concordance analysis for the two XMM-Newton data.In this example, there are many sources and few instruments, in contrast with the 1E0102 data set.The 2XMM results show a high degree of consistency between the instruments, consistently favoring 3-5% increases to the effective areas of the MOS detectors across all bands and a corresponding, slight decrease to the pn effective area.With the use of individualized τ values, the Concordance analysis drives the pn effective areas toward the prior, as one might expect due to the significantly smaller τ assigned to the pn compared to the MOS detectors.
The XCAL sample, shows similar trends to that of the 2XMM sample with with a more significant indication that the MOS2 detector's effective area should be increased 2-3% more than that of the MOS1 detector.

Capella Line Fluxes for Chandra Grating Spectrometers
Results from the Concordance analysis as applied to the Capella data are shown in Fig. 6.There are several features of interest.First, the effective area corrections for the LETGS (ALEG and HLEG) are generally negative while those of the HETGS (HEG and MEG) are generally positive.These corrections are consistent with preliminary results on independent data where the instruments are cross-calibrated with alternating observations of Mk 421.Second, the +1 and −1 orders generally agree well for all instruments and wavelengths.Third, when the effective area correlations are included, the posterior effective areas for the longer wavelengths (O VIII and Fe XVII λ17) more strongly deviate from their priors.

Method Validation and Assessment
Paper I includes a series of numerical studies that explore the statistical properties of our method.For example, Figure 2 of Paper I illustrates that our posterior distributions of the effective areas cover the true effective areas in a simulation study.Figure 7 of Paper I then goes on to contrast the estimated 95% intervals for log-fluxes constructed using the standard instrument-specific estimates with the combined estimate based on our posterior distribution, illustrating how our Bayesian process achieves a single consistent estimate for each flux but with smaller errors than the standard estimates.Here we consider additional ways to evaluate how robust our method's results may be, supplementing the simulations performed in Paper I.

Simulation Studies
The method developed and applied in Paper I produces Bayesian posterior distributions for each estimated quantity.The main quantities of interest here are the fractional corrections to instrument effective areas, given by log A i − log a i , and the fractional corrections to the estimated fluxes of sources, given by log F j − log f i j .
We demonstrate how the Concordance method yields accurate and reliable estimates of A i and F j with a simulation study.The simulation involves 40 simulated sources observed by each of five instruments.We set the prior means of the effective areas of the instruments to differ from their actual effective areas by log A i − log a i = [0, 1, −1, 2, −2], for i = 1..5, respectively.For example, for instrument 4, the true effective area is systematically higher than the prior mean by a factor of e 2 , resulting in flux estimates that are systematically too high compared to the true values.The τ i values are all 1.0 in this simulation, indicating large uncertainties in prior estimates of the effective areas, and the measurement uncertainties (i.e., σ i j ) are all set to 0.5 on the log scale, similarly indicating large uncertainties, except for instrument 5, for which σ = 0.1.This simulation setup is designed to test the robustness of the Concordance method to data from an instrument with high signal/noise but a systematically biased effective area.We replicated this entire set up 200 times and processed each replicate with our Concordance method.A representative replicate is shown in Fig. 7.The simulations demonstrate that the Concordance method provides source flux estimates that are substantially better than would be obtained by simply using the prior means as estimates of the effective areas.The replicate simulations indicate that the 95% equal-tailed posterior intervals cover the true values of the effective areas and source fluxes over 99% of the time.Thus, we not only obtain better estimates of the source fluxes, but also estimate the effective area corrections well.Note that the instrument-specific flux estimates can deviate substantially from the true values for any given source, so that when using  a set of such estimates, the weighted average ratio estimators for the flux (i.e., ( j f i j /σ 2 j )/ j 1/σ 2 j ) would be generally biased toward the instrument with the highest signal/noise, regardless of the accuracy of the instrument's effective area.
With an additional pair of simulations, we also quantified the improvement that can be obtained with the Concordance method.In the first case, we simulated M = 3 instruments with A i /a i = [1, 1, 0.9] for i = [1, 2, 3], τ i = 0.05, and N = 20 sources with true fluxes all equal to 1.There were 200 independent simulations and analyses for each setup.The sources were assumed to have good signal/noise as might be expected for calibration observations: σ i = 0.03 (i.e., the c i j were drawn from a Poisson distribution with a mean of about 1100).The second case is the same as the first except there is a higher statistical precision for observations with instrument 3: σ 3 = 0.003.Source fluxes were estimated for each simulation using the Concordance method and also using the above-mentioned weighted average of ratio estimators: ( j f i j /σ 2 j )/ j 1/σ 2 j .The 95% uncertainty bounds on the flux estimates and the coverage fraction where the true flux is included within the uncertainty bounds are shown in Table 7.The Concordance estimator generated uncertainty intervals that were accurate and covered the ground truth, in contrast to the ratio estimator, which, despite the widths of the confidence intervals being nominally smaller, generated biased estimates and did not cover the true fluxes.The situation was worse for the second case, where the Concordance intervals did not change appreciably, but the ratio estimators were biased low by ≈10%, reflecting the higher statistical weight given to an instrument with a biased estimate of its effective area.Indeed, in this latter case, there was only one source (of 20) in only one simulation (of 200) where the ratio estimator confidence interval included the true value.This pair of simulation setups illustrates the robustness of the Concordance method to erroneous effective area priors, showing that it is appropriate to use in calibration work where robustness and accuracy are highly valued.

Posterior Histograms
We have found that the posterior distributions of the effective area corrections are typically well described by Gaussians, as shown in Fig. 8, parts A-C.These three examples were randomly chosen among the dozens of such histograms generated in our analysis of the data from §4.1-4.3.Occasionally, however, there are histograms that are not obviously Gaussian, so we also show three "bad" examples.In one case, Fig. 8, part D, there is a distinct "notch" in a side of the distribution and in two cases (Fig. 8,  The prior for the effective area of instrument 3 is 10% higher than its true value in both setups.See Section 4. parts E and F), there is noticeable skew -tails to large fractional corrections.These three cases were quite rare but give warning that there may be inconsistencies in the underlying data.One known source of error that is not accounted for in our analysis is in the shape of the response function, Φ k (E).For instruments like ACIS and the EPIC detectors, the low energy response is somewhat uncertain and difficult to calibrate.Color coding of results are as in Fig. 2. Features to note are 1) that the effective area corrections for the LETGS (ALEG and HLEG) are generally negative while those of the HETGS (HEG and MEG) are generally positive and 2) the +1 and −1 orders generally agree well.

Sensitivity to Uncertainties in Priors
The specification of priors is typically under scrutiny for Bayesian analysis in practice.Typically researchers conduct sensitivity analysis to study the outcome sensitivity with respect to small perturbations of the priors.In our setting, the sensitivity of the results with respect to τ values are revealed by the comparison between heterogeneous τ values versus the two homogeneous τ value choices.However, for the correlation matrix in the prior distribution, we adopted Monte Carlo estimates, which is subject to random variations.Thus, we undertook a simple example of the test of sensitivity in this paper, but it is feasible for any user of the concordance tools.Namely, for the Capella data, instead of adopting the full correlation matrix as given in Tables 9 and 10, we only keep the correlations between the two Fe bands.Again, we can test out different variations of the correlation matrix with the same procedure and similar analysis.Thorough sensitivity analysis requires extensive testing on a carefully designed set of variations of the prior distributions.See Fig. 9 for the results of applying this variation to the Capella data.By comparing between Fe bands and others, and the correlations of others (Ne and O) within their own.We can see that in Fig. 9, the O and Ne shows nicely aligned results under correlated effective areas versus uncorrelated effective areas.But this is not true for Fig. 6, where O and Ne are still correlated with each other and with other bands, especially for HEG+.Furthermore, while the adjustments for the two Fe chanels are similar across the two figures, the adjustments for Ne are very different across the two figures.The resulting adjustments of effective areas not only deviate more significantly from zero but also have smaller error bars when correlations are taken into account.This makes intuitive sense because the benefit of accounting for correlations among effective areas is to obtain sharper or more informative estimates.

CONCLUSION AND DIRECTIONS FOR DEVELOPMENT
These data sets provided an excellent foundation for the Concordance project, whose goal is to determine quantitative and objective evidence for making effective area adjustments in order to improve agreement between instrument measurements.The process applied here is available for use in studies such as we have undertaken.There are some avenues to explore for expanding this particular implementation of the Concordance analysis.

Correlations between source bandpass fluxes
Several types of calibration sources have simple spectra, which is why they are often used in cross-calibration.Examples are isolated neutron stars with blackbody spectra, blazars with power law spectra, and supernova remnants and clusters of galaxies with thermal spectra.To the extent that these spectra can be characterized by only a few parameters, such as a power law slope, then the flux in one band is closely related to that in an adjacent band.Furthermore, many types of source have smoothly continuous spectra -their spectral fluxes are tightly correlated on small scales.Modeling a many bandpasses of a blazar spectrum with a series of power laws with different slopes would lead to unphysical discontinuities at bandpass boundaries.Thus, it would be advantageous to take advantage of this astrophysical knowledge and include spectral band correlations due to spectral continuity and simplicity.The XMM-Newton-Chandra blazar XCAL sample is an excellent data set to examine next, involving three Chandra configurations and all four XMM-Newton X-ray detectors and covering the energy range from 0.1 to 10 keV using simultaneous observations of active galaxies obtained over 20 years of operation.Preliminary results have been reported at various IACHEC meetings.

Secular variations of instrument responses
While many sources may well vary erratically, instrument behavior can often be subject to gradual degradation.With adequate modeling of many observations, one avenue to explore is how to link instrument effective areas over time within the Concordance framework.

Nonlocal instrument responses
There are definite difficulties that are encountered when the detector energy response Φ k (E) has a non-Gaussian component, a broad asymmetry, or bimodality because systematic errors in the response function can appear in an apparently unrelated bandpass.Response function errors may be responsible for some of the artifacts in our posterior histograms (see §4.4.1).The response functions of solid state detectors have escape peaks that can generate events at a significantly different apparent energy than that of the incoming photon.Modeling the effects of systematic errors in response functions is possible in principle, especially with methods such as used to determine the effective correlation function (see § 2.3.2).One approach for dealing with this issue would be to expand the Concordance mathematical model to include a term to account for variance of the T i j values.

ACKNOWLEDGMENTS
Support for this work was provided in part by the National Aeronautics and Space Administration (NASA) through the Smithsonian Astrophysical Observatory (SAO) contract SV3-73016 to MIT for support of the Chandra X-Ray Center (CXC), which is operated by SAO for and on behalf of NASA under contract NAS8-03060.Support was also provided by NASA under contract NAS8-39073 to SAO.This work was conducted in collaboration with the CHASC International Astrostatistics Center.CHASC is supported by NSF DMS-18-11308, DMS-18-11083, and DMS-18-11661.DvD was supported in part by a Marie-Skodowska-Curie RISE (H2020-MSCA-RISE-2015-691164, H2020-MSCA-RISE-2019-873089) Grants provided by the European Commission.

Figure 2 .
Figure 2. Results of the Concordance analysis for the data from the SNR 1E0102 for the combination of fluxes of the lines of O VII and O VIII.The τ = 0.025 (in black) and 0.05 (in yellow) results are the same as given in Paper I and are shown to elucidate the effects of including heterogeneous τ values (in blue) and adding effective area correlations (in red).The error bars represent the 90% (5%-95%) confidence regions on the posterior estimate of Ai/ai, as defined in §2.2.When effective areas are correlated between the O and Ne data sets, the ACIS-I3 point is particularly affected due to the discrepant results obtained when Ne and and O data are considered independently.

Figure 3 .
Figure3.Same as Fig.2except for the combination of fluxes of the lines of Ne IX and Ne X (see text).Unlike the case for the O lines, the posterior estimates of the effective area tend to be more stable in this band and relatively independent of the uncertainties in the effective area priors.

Figure 4 .
Figure 4. Concordance results for the 2XMM sample.Results are color-coded as in Fig.2.When the τ values are allowed to vary by instrument, the "heterogeneous" case, the posterior for the pn centers on the prior, due to the smaller value of τ than used for the MOS detectors.At the same time, higher effective areas for the MOS detectors are indicated across all bands.

Figure 5 .
Figure 5. Same as Fig. 4 except for the XCAL sample.These results are generally consistent with those from the 2XMM sample but with a somewhat stronger indication that the MOS2 effective area should be increased relative to MOS1 4.1 for details.b Average 95% confidence intervals for source flux estimates; the true fluxes for all sources are set to 1. c Fraction of flux estimates covering true fluxes at the 95% confidence level out of 200 simulations of 20 sources each.

Figure 6 .
Figure 6.Concordance analysis using measurements of four emission lines using the the Chandra grating spectrometers.Emission lines used in the various panels are: a) Ne X, b) Fe XVII λ15, c) Fe XVII λ17, d) O VIII.Color coding of results are as in Fig.2.Features to note are 1) that the effective area corrections for the LETGS (ALEG and HLEG) are generally negative while those of the HETGS (HEG and MEG) are generally positive and 2) the +1 and −1 orders generally agree well.

Figure 7 .Figure 8 .Figure 9 .
Figure 7. Posterior distributions of the logarithms of the effective area corrections (top row) and on the logarithms of the source fluxes (bottom row, for five sources) for one of 200 replicate simulations, each involving five instruments and 40 sources.In the simulation, all sources have a true value of log Fj = 1 but the prior means of the instrument effective areas are off by factors of exp([0, 1, −1, 2, −2]) for i = 1, . . ., 5. The top row shows that the Concordance method generates posterior distributions of the effective area corrections that are well centered on the true values for each instrument (shown as vertical solid lines in various colors).The bottom row shows that the posterior distributions of the source fluxes are well centered on the true values (vertical solid black lines), even while the instrument-specific estimates based on the prior means of the effective areas (vertical dotted lines of colors corresponding to those of the instruments in the top row) can be individually erroneous by large factors.

Table 3 .
Heterogeneous τi Values for 1E0102 Analysis a

Table 4 .
Heterogeneous τ Values for 2XMM and XCAL a Values for τ are percentages for each combination of instrument and line complex, using τ values from Table 1.

Table 5 .
Correlation matrix for 2XMM Analyses

Table 6 .
Correlation matrix for XCAL Analyses

Table 7 .
Results from Two Concordance Simulations Simulation Setup a Flux Estimation 95% Flux Range b r95 c

Table 8 .
Values of τi for Capella Analyses a Line Ne X λ12 Fe XVII λ15 Fe XVII λ17 O VIII λ19

Table 9 .
ACIS Correlation matrix used for Capella Line Ne X λ12 Fe XVII λ15 Fe XVII λ17 O VIII λ19

Table 10 .
HRC Correlation matrix used for Capella Line Ne X λ12 Fe XVII λ15 Fe XVII λ17 O VIII λ19

Table A .
1. 2XMM Concordance Fluxes -Soft Band a

Table A
Fluxes are normalized to 0.126 photons cm −2 s −1 .Table A.5. XCAL Concordance Fluxes -Medium Band a