The Pantheon+ Analysis: SuperCal-Fragilistic Cross Calibration, Retrained SALT2 Light Curve Model, and Calibration Systematic Uncertainty

We present here a re-calibration of the photometric systems used in the Pantheon+ sample of Type Ia supernovae (SNe Ia) including those used for the SH0ES distance-ladder measurement of H$_0$. We utilize the large and uniform sky coverage of the public Pan-STARRS stellar photometry catalog to cross-calibrate against tertiary standards released by individual SN Ia surveys. The most significant updates over the `SuperCal' cross-calibration used for the previous Pantheon and SH0ES analyses are: 1) expansion of the number of photometric systems (now 25) and filters (now 105), 2) solving for all filter offsets in all systems simultaneously in order to produce a calibration uncertainty covariance matrix that can be used in cosmological-model constraints, and 3) accounting for the change in the fundamental flux calibration of the HST CALSPEC standards from previous versions on the order of $1.5\%$ over a $\Delta \lambda$ of 4000~\AA. The re-calibration of samples used for light-curve fitting has historically been decoupled from the retraining of the light-curve model. Here, we are able to retrain the SALT2 model using this new calibration and find the change in the model coupled with the change to the calibration of the light-curves themselves causes a net distance modulus change ($d\mu/dz$) of 0.04 mag over the redshift range $0<z<1$. We introduce a new formalism to determine the systematic impact on cosmological inference by propagating the covariance in fitted calibration offsets through retraining simultaneously with light-curve fitting and find a total calibration uncertainty impact of $\sigma_w=0.013$, which is roughly half the size of the sample statistical uncertainty. Similarly, we find a systematic SN calibration contribution to the SH0ES H$_0$ uncertainty is less than 0.2~km/s/Mpc, suggesting that SN Ia calibration cannot resolve the current level of the `Hubble Tension'.


INTRODUCTION
Type Ia supernovae (SNe Ia) are a critical tool to measure the expansion history of the universe. They are particularly useful for measuring the recent cosmic acceleration and thus the equation-of-state of dark energy w and the current expansion rate H 0 . In cosmological analyses with SNe Ia, calibration of the photometric system is dillon.brout@cfa.harvard.edu typically one of the largest systematics in the error budget Scolnic et al. 2018;Jones et al. 2019). The calibration errors are survey and filter dependent, which typically manifest in redshift-dependent changes in distance because 1) different surveys cover different redshift ranges and 2) SNe redshift into different observer frame wavelengths. In order to maximize statistical leverage and minimize the impact of calibration errors when constraining cosmological parameters, recent analyses have combined SNe from multiple different photometric systems. In this paper, we perform an up-to-date recalibration of photometric systems used in the Pantheon+ sample ) and cosmological analysis (Brout et al in prep), and propagate these changes through light-curve model training/fitting and to cosmological inference.
Photometric calibration is required in two critically important components of SN Ia cosmological analyses. First, the calibration of different SN light-curves in a 'training library' must be accounted for in order to build the spectral time-series model that will be used to fit light-curve parameters of a larger photometric sample. Second, the calibration of light-curves in the full sample must be accounted for to apply the model and fit for light-curve parameters and recover distance estimates. Importantly, the calibration of light-curves used in the training library should be self-consistent with the calibration of the light-curves used in the larger sample, but historically this has not always been the case. Recent cosmological analyses Brout et al. 2019) of SNe Ia used the SALT2 model from Betoule et al. (2014) (hereafter B14) because the SALT2 model has not been available for retraining. Therefore, all current analyses have not benefited from improved calibration since B14. However, because of recent work to update and make available SALT2 retraining code (Taylor et al. 2021;Kenworthy et al. 2021), it is now possible to retrain the light-curve model with the same calibration used in fitting.
The Pantheon+ sample , hereafter S21) compiles data from 25 different photometric systems. To perform the cross calibration, we follow the framework described in Scolnic et al. (2015) (hereafter SuperCal), which used Pan-STARRS (PS1, Chambers et al. 2016) photometry of tertiary standard stars to recalibrate each photometric system, as PS1 covers 3π of the sky and has sufficient overlap with each survey. An update to the SuperCal process was presented in Currie et al. (2020) (hereafter, 'Excalibur'), which followed the same premise, but implemented a number of changes including transferring every system onto its natural system and and fitting for filter transformations. In this work we implement a number of the recommendations from Excalibur, but retain the simplicity of Super-Cal of no additional transformations to natural system, no measurements across the field-of-view of each camera, and we focus on simultaneous fitting photometric magnitude offsets (zeropoints) rather than fitting filter transformations.
This paper is a companion to a suite of papers (e.g. S21, Carr et al. 2021;Peterson et al. 2021;Popovic et al. 2021;Dhawan et al. 2020) leading up to the Pantheon+ cosmological analysis (Brout et al in prep), which contains the measurements of cosmic acceleration, dark energy, and dark matter, and SH0ES distance ladder Hubble constant analysis (Riess et al. 2022). This work focuses solely on the cross-calibration of the samples and associated systematic uncertainty. In Section 2 we describe the data sample and overview the suite of calibration systems used by various SN analyses. In Section 3 we describe the re-calibration process and compare with past results. In Section 4, we discuss the lightcurve model retraining. In Section 5, we propagate these changes towards the impact on cosmological inference and produce a new systematic error budget for calibration. In Sections 6 & 7, we present our discussions and conclusions.

Individual Survey Calibration
Different photometric analyses have calibrated their surveys through a variety of paths. Most recent analyses of SNe Ia have tied their absolute calibration of their photometric systems to the AB system (Oke & Gunn 1983;Fukugita et al. 1996), which has mostly replaced the historical use of absolute calibration to the Vega flux standard (Colina & Bohlin 1994). Many older SN samples have alternatively used tertiary standards from Landolt (1992) or Smith et al. (2002), which themselves can be externally tied to a fundamental calibration like the AB system. In the AB system, broadband magnitudes are defined as m AB = 2.5 × log 10 hν −1 p(ν) f ν dν hν −1 p(ν) 3631J y dν (1) where p(ν) is the transmission function of a given filter, f ν is the flux per unit frequency from an object in J y and 3631 J y corresponds to the flux of a monochromatic 0th magnitude object. For systems calibrated to the Landolt system, linear photometric transformations for each filter are used to bring standard magnitudes of tertiary stars to the natural-system magnitudes following where f is the color coefficient for filter f and C is the color from the standard magnitude (i.e. (B -V )).
For both AB and Landolt/Vega calibration methods, specific passband magnitude offsets are computed following where ∆ AB is the offset for the particular filter to bring the system magnitude to an AB magnitude and where we have dropped the index f for convenience. If a photometric system is defined on AB, then these offsets ∆ AB are found explicitly by comparing to primary standard stars.
All systems calibrated here rely on HST CALSPEC standards (Bohlin 1996) as updated in Bohlin et al. (2020) 1 . The CALSPEC spectra data was taken with STIS and NICMOS observations and have an associated uncertainty of ∼1 mmag/1000Å from 3000Å to 15000Å (Bohlin 2014). The spectra of flux standards are used in two ways in this analysis: 1) To establish the calibration of each system by tying observed photometry to spectrophotometry of HST CALSPEC standards and 2) use as a representative sample of stars to determine the transformation functions between different optical systems and filters thus facilitating cross calibration using observed stellar magnitudes. In this work we do not alter the CALSPEC spectra to assess CALSPEC related systematic uncertainties. For CALSPEC systematic uncertainty and impact on cosmological analysis see Brout et al. (2022).

Synthetic Data
First, while many surveys used in this work have already been calibrated to HST CALSPEC standards, the CALSPEC spectra themselves have been updated and improved significantly over the years (i.e. Bohlin 1996Bohlin , 2007Bohlin , 2014Bohlin & Landolt 2015;Bohlin & Deustua 2019;Bohlin et al. 2020).
Due to the updates in CALSPEC spectra, we homogenize the initial, independent calibrations (Eqs. 1-3) prior to performing any cross calibration, by recalibrating the PS1, SDSS, SNLS, and DES surveys using their provided observations of CALSPEC standards and the latest spectral models of these CALSPECs as given in Bohlin et al. (2020). These offsets that homogenize PS1, SDSS, SNLS, and DES are also included in Appendix Table D.
The magnitude of these changes for typical Bessel BVRI and Sloan griz filters is shown in Fig. 1, which presents the difference in the synthetic-stellar magnitudes of CALSPEC standards between different versions of the CALSPEC spectra. In this work we will compare our results to the previous calibration , so we examine the contributions due to differences arising from updates to the CALSPECs. The most recent and improved stisnic 007/008 and stiswfcnic 002 versions of the CALSPECs results in a 1.5-2% change from g/B to I/z, ∼ 3× larger than the expected systematic uncertainty of the CALSPEC calibration of ∼ 0.5% over 7000 A. These changes in the absolute calibration due to the update of the CALSPEC standards have the largest impact in our analysis when comparing to the previous Betoule et al. (2014) and SuperCal calibrations. Since this is a change in the reference, this affects the inferred zeropoint offsets of all SN samples.
We also utilize synthetic spectral libraries to compute the transformations between photometric systems. For this, we use two synthetic libraries: the HST CAL-SPEC library, as discussed above, and the NGSL 2 li- Table 1. List of the data samples in Pantheon+, whether they are recalibrated in this work, and whether they were previously recalibrated in SuperCal. a SuperCal did attempt to recalibrate these surveys, but the offsets were insignificant and therefore not used in Pantheon analysis.
brary. While the CALSPEC library has been continually updated; there is only a single release of the NGSL library (Koleva & Vazdekis 2012). There are 68(370) total stars with spectra from CALSPEC(NGSL) usable for our measurements, and 74% of them have g − i < 0.2, limiting the color range that we require in order to be in the range of typical field stars. For our cross-calibration measurements, we propagate both CALSPEC and NGSL libraries and weight them equally in fitting.

Description of Observed Data and Photometric Systems
In this analysis, we have compiled photometry from 25 different photometric systems and 105 different filters. There are a number of details that are critical to the reproducibility of analysis with a given photometric system. We summarize these details in Table 2  Following SuperCal, we use the public PS1 survey photometric catalogs (Chambers et al. 2016) to crosscalibrate against each individual survey. The level of relative calibration across the sky reported by PS1 is ∼ 5 mmag (Schlafly et al. 2016). We use the latest, public release DR2 3 all-sky coverage to perform the crosscalibration.

PS1 & Foundation
In this analysis, we utilize 3 different versions of photometry from the Pan-STARRS telescope.
1. PS1 Public: The aforementioned public DR2 catalog. For this sample, we use aperture magnitudes as suggested by Currie et al. (2020), as they are more robust to non-linearity than pointspread-function (PSF) photometry. These magnitudes are used as the intermediary to perform the cross calibration between all surveys.
2. PS1 SNe: The set of stars that were used to calibrate the original PS1 SN Ia images (Rest et al. 2014;Scolnic et al. 2014). The stellar magnitudes are based on the absolute calibration in Tonry et al. (2012a) and are not adjusted based on the later re-calibration for PS1 alone in Su-perCal. These magnitudes are used to determine calibration offsets for the PS1 SNe.
3. Foundation: The catalog used for the Foundation SN Ia sample (Foley et al. 2018) taken with the PS1 telescope. These magnitudes are used to determine calibration offsets for the Foundation SNe.

Carnegie Supernova Project (CSP)
The calibration of each band was rederived in Krisciunas et al. (2017b) and Krisciunas et al. (2020). This work was done after SuperCal was released, which relied on Stritzinger et al. (2010a). It is therefore difficult to directly compare the zeropoint-corrections found in this study compared to those found in SuperCal. We compared the photometry of the same tertiary stars in the standard system presented by Stritzinger et al. (2010a) and Krisciunas et al. (2020) and find offsets of mean (median) g, r, i, B, V offsets between 2010 and the 2020 erratum of the order 1-2% and are shown in Appendix Fig. 11. It is unclear what those offsets were due to, and we remark that we don't see similar offsets for the SN photometry, which are all below 0.01 mag. We therefore include additional systematic uncertainty in the calibration of CSP SNe Ia (Section 4.3).

CfA 1 & 2 and other Heterogenous Low-z SNe Ia
Due to the limited number of stars (< 20 for CfA1; < 50 for CfA2), we are unable to produce an accurate cross-calibration for these surveys. SuperCal attempted to recalibrate these two samples, but the offsets found were not significantly deviated from 0 due to high uncertainties, so Scolnic et al. (2018) did not apply the corrections from SuperCal. We also note that two of the light curves from CfA1 (SN 1994ae andSN 1995al) were recalibrated in Riess et al. (2005Riess et al. ( , 2009) as they have higher importance due to their use as Cepheid-calibrators.
As described in S21, there are other heterogeneous low-z datasets compiled for SH0ES of typically O(1) SN light-curves (e.g. Milne et al. 2010;Krisciunas et al. 2017a;Stritzinger et al. 2010b;Tsvetkov & Elenin 2010), often the stellar photometry is not provided or the number of stars is inadequate for cross-calibration. For these, because we cannot cross-calibrate, we assume a roughly 3× larger calibration uncertainty (20 mmag) than the typical reported uncertainties by other surveys, as discussed in Section 4.

CfA3 and CfA4
The CfA3 and CfA4 samples were taken on the F. L. Whipple Observatory 1.2m telescope's cameras: 4Shooter and Keplercam. For CfA3-4Shooter, the U BV RI passbands were published in Hicken et al. (2009a) and accounted for atmospheric transmission. Hicken et al. (2012a) stated that the first period of CfA4 (CfA4p1) and CfA3-Keplercam shared the same photometric system. The unsubmitted Cramer et al. in prep unofficially released CfA4p1 filters via private communication. For this reason we use CfA4p1 filters for the CfA3-Keplercam stars and SNe. Neither of the CfA4p1 or CfA4p2 filters account for atmospheric transmission and this was not accounted for in SuperCal. However, for this analysis we apply a MODTRAN atmospheric transmission following Stubbs & Tonry (2012), assuming typical airmass of 1.2, water vapor, and aerosols at Mt. Hopkins.

AB Surveys: SDSS, SNLS, & DES
We re-derive the AB pre-offsets (Appendix B) for SDSS and DES. For SNLS, we adopt the same changes to the AB offsets as SDSS because Betoule et al. (2014) did not provide the final observed magnitudes of the CALSPEC standards, and instead presented observations across the focal plane. As Betoule et al. (2014) used the same version of CALSPEC photometry (v003), we employ the same shift for SNLS as SDSS, which we assume to be reliable to ∼ 2 mmag per band. For DES, there are two calibrations, one for the SNe that were analyzed on an older FGCM star catalog for the DESSN-3YR sample and a second calibration for the upcoming DES 5 year FGCM star catalog upon which the unpublished DESSN-5YR photometric sample will be based with the intention that the DES collaboration will utilize the Fragilistic calibration solution determined here in their cosmological analyses. The offsets applied are in Appendix Table D. 2.9. SOUSA Samples from SOUSA have not been included in previous compilation analyses like Pantheon or Betoule et al. (2014). SOUSA has not yet had a publication that has released stars or SNe, however in private communication (with Peter Brown) we received a release of the stellar photometry performed with the same pipeline as that of the SN photometry (Brown et al. 2014). Because SOUSA was calibrated via the VEGA system, we use the most recent release of CALSPEC VEGA for the determination of the initial calibration (Alpha Lyr stis010).

LOSS
There are two different data releases from the LOSS survey: (LOSS1; Ganeshalingam et al. 2010) and (LOSS2; Stahl et al. 2019). Each of the data releases includes SNe observd by the KAIT and Nickel telescopes, and the telescopes go through a series of 'configurations' where the throughput for each filter is different for each configuration. In Ganeshalingam et al. (2010), they use configurations from KAIT1-4 and Nickel1-2; in Stahl et al. (2019), they use configurations from KAIT3-4 and Nickel1-2. Data from these systems have not been included in previous compilation analyses like Pantheon or Betoule et al. (2014). Ganeshalingam et al. (2010) provides all tertiary standards in the Landolt system, and then provides the transformations to convert to the natural system; these are given in Appendix A and applied. For Nickel1 and Nickel2, the transformation appears to be only valid for a small color range as it produces a slope in color-color space different than what is expected. Therefore we restrict the synthetic and data color ranges (±0.1) to the median stellar color of g − i of 0.75 on a typical image. Stahl et al. (2019) uses the public Pan-STARRS catalogs as their tertiary standards, after they calibrate the PS1 photometry of stars to the Landolt system following transformation as given in Tonry et al. (2012a). This transformation was not specific to the filters used in Stahl et al. (2019), but rather generic BV RI filters. We repeat this procedure as transcribed in Stahl et al. (2019) to recreate the actual tertiary catalog they used, and use this to perform our cross-calibration.

Complete Nearby Supernova Sample
The Complete Nearby (Redshift less than 0.02) Supernova Sample (CNIa0.02), a followup survey of ASAS-SN discovered transients, released BV ri filter throughputs in Chen et al. (2020) but did not provide stars. Since they performed the same calibration procedure as Stahl et al. (2019), we utilize the stars from Stahl et al. (2019) in combination with the released CNIa0.02 filter throughput curves.

HST SN Ia samples
As HST observations of CALSPEC standards are themselves used to define the absolute calibration presented in Bohlin et al. (2020), we do not recalibrate the HST SNe Ia in our cross-calibration. From past findings, we adjust the photometric zeropoints of NICMOS F105W photometry fainter by 0.068 mag and F160W fainter by 0.023 mag as the result of 3 net changes: 1) updated NICMOS zero points over original calibration in Riess et al. (2007) used 2) Rubin et al. (2015) check on low count rate zeropoint and 3) update to Rubin et al. (2015) based on revision of WFC3 zeropoints between 2012 and 2020. It includes the NICMOS countrate non-linearity which is built into the recalibration of Rubin et al. (2015). Additionally, following Rubin et al. (2015) we use zeropoint errors of 0.022 and 0.023 mag for F105W and F160W respectively.

Procedure
For the given set of stars that overlap between PS1 and each other survey (S), the expected difference in mag-nitude between a PS1 filter (PS1 b1 ) and a given survey filter (Obs S b ) is expressed as: where we define the observed stellar magnitude differences (left-hand-side of Equation 4) as R Obs b and we define the differences from spectrophotometry of synthetic standards of Equation 4) including propagating our fitted magnitude offsets for each filter (right-handside) as R Synth b . The vector of pre-computed synthetic magnitudes from CALSPEC and NGSL standards are Synth PS1 b1 for PS1 bands and Synth S b for each survey filter and the transformation from synthetic magnitudes is determined from color slope for a specific survey filter C b S . The observed overlapping tertiary standards are used as follows. PS1 Public tertiary colors are computed as Obs PS1 b2 − Obs PS1 b3 . The synthetic transformation combined with the observed tertiary PS1 colors thus facilitates the comparison between observed overlapping tertiary star magnitudes between each survey filter being calibrated Obs S b and the closest PS1 Public filter Obs PS1 b1 .
The offsets for PS1 Public filters (∆ PS1 b1 , ∆ PS1 b2 , and ∆ PS1 b3 ) and the specific survey filter (∆ S b ) are floated with priors (described below) and are fit simultaneously. Floating the PS1 Public offsets facilitates a simultaneous solution and covariance between all assessed filters. There are 105 filters analyzed here, 4 of which are the public PS1 cross-calibration filters in griz. There are therefore 101 equations that are written in the form of Eq. 4, and 105 parameters. Degeneracies between parameters are broken with survey calibration priors explained further below.
In SuperCal, there was the option to fit linear transformation transformations (C S ) separately for the observed and synthetic sequences, thereby attempting to discern possible discrepancies in the mean wavelength of the effective filter. Due to the complexity of the simultaneous fit, we fix the slope to that of the synthetic sequence, for survey-filters that were found to have consistent slopes between data and synthetic. Systematic uncertainties due to this decision are discussed in Section 6. The only band with > 3σ difference in data/synthetic slopes was PS1 g band when comparing to SDSS g (+5σ), SNLS g (+5.5σ), CfA1 B (+3.5σ), CfA3S B (+9σ), CfA4p1 B (+5σ), KAIT B (+4.5σ), SOUSA B (+7σ). This was mediated by shifting the PS1 g filter transmission by +30Å to bring all into better agreement. This shifting of the filter transmission is a correction on top of what was done in Tonry et al. (2012a). Tonry et al. (2012a) performed a polynomial correction on the entire PS1 throughput to match the predicted spectro-photometry of the HST Calspec standards with their observed PS1 magnitudes.
We fit a solution that minimizes the χ 2 differences between observed magnitudes from PS1 filters and each survey-filter following: where the summation is over k-bands which is 101 when not including the 4 public PS1 bands used to perform the comparisons themselves, and N t overlapping tertiaries. The uncertainties are determined from the photometric scatter σ S bk and the designated error floor f S bk . Priors (σ p k ) centered at zero offset are included for each survey-filter. We use the emcee MCMC sampling library described by Foreman-Mackey et al. (2019) to minimize χ 2 (Equation 5) and to obtain an uncertainty covariance matrix.
There are three additional components of Eq. 5 required to compute a robust solution and uncertainties: 1) Survey-filter zeropoint priors (σ p k ) representing the prior knowledge/confidence of the original calibration of each system (after updating their calibrations to the latest CALSPEC as shown in Appendix Table D). Priors are centered at zero. For the rolling SN surveys with modern and reliable calibrations (PS1/Foundation, DES, SDSS, and SNLS) we utilize priors of 0.00 ± 0.01 mag. We do not apply zeropoint priors to any other samples as to ensure that the calibration solution is primarily determined by the more modern all sky surveys. Unlike SuperCal, which fixed the calibration of the public PS1 catalog (σ p k = 0) used to perform the cross calibration, here we place conservative priors of width of 0.02 mag to facilitate covariance between all of the surveys in the fitting process.
2) Relative survey weights (W S ) to account for photometric systems that appear multiple times but are the same telescope having been recalibrated over time (e.g. KAIT 1,2,3,4 receive a W S = 1/4 in Eq. 5) 3) Photometric uncertainty floors (f S bk ) to account for the observed scatter of the brightest tertiary standards that is not explained by photometric uncertainties alone. PS1/Foundation, DES, SDSS, and SNLS are given photometric uncertainty floors (f S bk ) of 0.005 and the remaining low-z surveys are given 0.01 (except for CNIa0.02 for which we did not have tertiary stars we apply a 0.02 mag floor).

Data Preparation
For each survey we match astrometric positions of the tertiary catalogs with those in PS1 to within < 1 arcsec. We avoid potential errors from blending by only choosing isolated stars with no other star (m < 22 mag) within 15 arcsec.
For each of the stars in the spectral libraries we integrate the spectrum with the throughput of each of the passbands of the individual surveys to determine synthetic magnitudes. To facilitate comparison between the observed sequence of stars and the synthetic magnitudes, we correct the observed stellar magnitudes for Milky Way extinction using the known positions of the stars. The extinction values and attenuation curves are queried from IRSA 4 for the maps from Schlafly & Finkbeiner (2011) and are interpolated at the mean wavelength of each filter.
For comparison with robust PS1 magnitudes, we adopt a brightness cut on the PS1 magnitudes eliminating stars with magnitudes brighter than [14.8,14.9.15.1,14.6] in [g,r,i,z] because of the concerns of non-linearity noted in Schlafly et al. (2012). For the PS1 star catalog we also place a cut requiring that the PS1 g is brighter than 19th magnitude in order to avoid Malmquist bias in the tertiary star selection. Lastly, we choose a specific color range (0.25 < g − i < 1.0) of the PS1 catalog stars used in the analysis to maximize statistics, minimize dispersion, and insure linearity of the transformation in the synthetic library in the region of comparison for color transformations (the exceptions for this are the LOSS-Nickel and CNIa0.02 datasets for which we use narrow color ranges, as discussed above in Sections 2.10 & 2.11).

Calibration Solution Results
The best-fit offsets for each of the 105 filters are given in Table 3. Overall, for the subset of filters that were also calibrated in SuperCal, we find relatively good agreement (Fig. 2) and the observed differences between Fragilistic and SuperCal are either traced to changes in the filter throughputs, the reliance of surveys on specific CALSPECs that have been redefined, or the novel inclusion of individual filter uncertainties in the simultaneous fit. The largest difference we find is for CSP V band, which due to numerous changes by the CSP collaboration had different filter transmissions defined for use in this paper and SuperCal. The derived offsets upon jour-  Table 3. Fragilistic best-fit calibration solution. The individual survey-filter offsets and uncertainties are given in mags.
nal acceptance will made available in machine readable format 5 . The covariances between the filter zeropoints are shown in Figure 3. The covariance matrix will be downloadable in machine readable format 7 . The diagonal terms of the covariance, the statistical uncertainties on the calibration offsets, are typically bounded by the error floors given for each survey. Off-diagonal covariance terms between different surveys are largest for the newer surveys with limited calibration stars (i.e. between CNIa0.02 and LOSS or SOUSA). Low-z survey B band constraints are also often amongst those with the largest covariances because they have minimal overlap with PS1 bands used for cross-calibration.

RE-TRAINING THE SALT2 MODEL
Given a new calibration for the samples of light-curves used in SALT2 training, as demonstrated in Taylor et al. (2021), one should retrain the SALT2 model (to avoid model calibration systematics biasing the resulting SALT2 light-curve fit results). Taylor et al. (2021) establishes the up-to-date infrastructure to do so. We note that most previous analyses Brout et al. 2019;Jones et al. 2018) assumed a large additional systematic uncertainty from the fact that they did not retrain the SALT2 surface. Here we describe both the retraining as well as the systematic uncertainty that arises from the cross-calibration solution described by the Fragilistic covariance matrix.

Training of the Nominal SALT2 model with the Fragilistic Calibration Solution.
With the new calibration zeropoints determined here and the updated filter functions, we retrain the SALT2 model following Taylor et al. (2021) which uses the algorithms established in Guy et al. (2010). We denote our newly trained SALT2 surface 'B21'. Previously used surfaces include the model trained for the use in the Joint Light Curve Analysis but also used in the original Pantheon analysis, denoted 'B14', and the model trained in Taylor et al. (2021) on the original SuperCal calibration solution, denoted 'T21'. While we have not    re-calibrated any rest-frame u band filters due to lack of overlap with PS1 wavelengths, in the training of SALT2 u band is used for SDSS and CSP with recalibration zeropoint offsets set to zero. We note the impact of this choice in Section 6 and Appendix B. We present the fractional differences between B21, T21, and B14 in the average spectral energy density for a fiducial (x1=0, c=0) SN Ia (M0) and color law (CL) in Fig. 4. The rest-frame wavelength bounds (3000A, 7000A), within which the model is used for Pantheon+ light-curve fits, are shown by vertical dashed lines. The largest differences occur in the M0 surface component, particularly at the UV end, although not over the region that is used in the light-curve fitting. Over the g − z color range (3000Å -7000Å) we find a slope of ∼ 2.5% between B21 and B14 and a slope of ∼ 1.5% between B21 and T21.

Distances and Cosmology
We evaluate the impact of our Fragilistic calibration and newly retrained SALT2 model by comparing fits of SN light-curve photometry for the three aforementioned SALT2 models (B21, T21, and B14). To determine distances we use the traditional Tripp (1998) where m b is the converted peak-magnitude of SNe Ia from the fitted flux normalization x 0 , M is the fiducial absolute magnitude of a Type Ia SN, x 1 is the stretch parameter, c is the color parameter, and α and β are correlation coefficients that minimize the scatter in the standardized luminosities. We fit a flat wCDM cosmological model to the observed distances.
To better understand the sensitivity of the re-training to the new calibration, we performed two sets of tests. First, we first fixed the M0 surface to that of B14 and only replaced the color law (CL) with the fitted CL for B21. We then performed the opposite and change only the M0 surface to B21 and keep the CL fixed to that of B14. We find that the sensitivity to cosmology, i.e. redshift dependence of inferred distances, is dominated by the M0 surface.
The slope in the M0 component seen in Fig. 4 directly results in redshift-dependent effects on distance (due to different observer frame filters being used), and thus a systematic sensitivity to cosmological parameters. In Figure 5, the percentage differences in M0 are shown for several SALT2 models including several systematic surfaces B14-syst* provided by Betoule et al. (2014). We also design by hand M0 surfaces that exhibit fixed slopes (both positive and negative; 'tilted up' and 'tilted down') relative to B14 over the wavelength range 4500 to 6000 and propagate all of these models to cosmological parameter inference. We find that recovered w values clearly correlate both with the size of the slope and the direction of the slope in wavelength and are confirmed by our 'tilted' surfaces. The largest cause of the change to the M0 surface can be traced to the recent update of the fundamental calibration of CALSPEC standards in Bohlin et al. (2020).
For the second set of tests we perturb each filter in the SALT2 training individually and compute the resulting sensitivity to the fitted cosmological parameter w. The top five most sensitive components for measurements of w are shown in Table 4. All of the most sensitive perturbations are related to filter zeropoints even though   Table 4. Filter zeropoint sensitivity to cosmological constraints (δw/0.01mag) for 0.01mag perturbations in SALT2 training. Only the top 10 most sensitive perturbations are shown. Sensitivity to mean wavelengths of each filter was examined but none are within top 10 most sensitive.
we also performed perturbations to the effective mean wavelengths of each filter in the training. We note that bias correction simulations using the SALT2 model are often used to correct observables by expected biases. The impact of changing bias correction simulations with the new SALT2 model is < 0.1% in w. The reason for this small impact is that in the determination of the bias correction sample, the same SALT2 model is used in the simulation and the fitting. This is unlike the impact on the data itself, which does not benefit from this same cancellation and therefore it is only the fitting of the data that dominates the impact of changing SALT2. We therefore do not include bias corrections in estimating the impact of systematic uncertainties here.

Systematic Uncertainty
In order to quantify the impact of calibration uncertainties on SN distances, we develop a novel technique in which we both retrain SALT2 model and apply zeropoint offsets in the light-curve fitting simultaneously. We utilize the cross-calibration covariance matrix obtained in the Fragilistic MCMC fitting process (Fig. 3). To do so we perform a Cholesky decomposition of the covariance matrix to create 9 mock realizations of calibrations with correlated survey zeropoints as well as with uncorrelated effective mean wavelength shifts of each filter used in training (see Appendix Table 7). We then train 9 SALT2 surfaces on the realizations and the resulting 9 SALT2 M0 components and color laws are shown in Appendix Figure 8. The fractional differences between the the 9 systematic variant M0 components and the nominal component corresponding to the best fit Fragilistic offsets (horizontal line) resemble in scale the differences seen between the original SALT2 calibration (B14) and Fragilistic (Appendix Fig. 8).
Converting the covariance of Fig. 3 which is in filter magnitude space into a distance-based covariance is non-trival from first principles. Instead, to compute a distance covariance that can be used to constrain cosmological models, we apply the same set of mock zeropoint offsets in the light-curve fitting and propagate the differences in both the 9 SALT2 surfaces and zeropoints to differences in cosmological distances. This is shown in Appendix Fig. 9. Finally, following Conley et al. (2010), we propagate the resulting set of 9 Hubble diagrams to a distance × distance covariance matrix.
where the summation is over the systematics (k), ∆µ zi are the residuals in distance for the SNe fitted between each SALT2 model, and σ k is 1/3 such that when the 9 systematic vectors are added in quadrature, they sum to ∼ 1. The resulting matrix is utilized in cosmology fitting as discussed in Section 5.3. In addition, we build covariance matrices for several other calibration-related systematics. This is done following Eq. 7 where we sum over each systematic perturbation to the analysis described in Table 5. As there have been numerous significant updates to CALSPEC over the years as described in Section 2.2, we adopt a 3× larger systematic uncertainty than is described in Bohlin et al. (2020) and than was adopted in the original Pantheon cosmological analysis. Additionally, we  Table 5. The systematic uncertainties associated with calibration for the Pantheon+ analysis (Brout et al. in prep) and their scaled contribution in the building of the covariance matrix (σ k of Eq. 7). The uncertainty on CALSPEC modeling has been tripled (σ k = 3) in comparison to the original Pantheon . Extra SALT2 and CSP Cal. refer to the fact that SALT2 and CSP are both already included in Survey Cal., but in order to be conservative we include extra sources of systematics. Systematic uncertainties on dark energy (σw) are computed when combining with Planck Collaboration et al. (2018).
adopt a systematic of 1/3 the difference in distances derived between B14 and B21 SALT2 surfaces, in order to conservatively account for possible systematic from the SALT2 model training process. Lastly we include an additional conservative systematic uncertainty due to the re-calibration of the CSP tertiary standard stars.

Agreement of Survey Hubble Residuals
To assess the level of improvement with the Fragilistic solution, we compare the survey offsets in the Hubble Diagram. As shown in Fig. 6, we calculate the weighted average of the Hubble residuals for each survey over a redshift range of 0.01 < z < 0.4 relative to the best-fit cosmology (χ 2 /N dof = 17.6/17). We cut at a maximum redshift of 0.4 to eliminate sensitivity to cosmological signals. We also note that we do not expect fully independent distribution of survey offsets because there exist SNe that have beeen observed by multiple surveys. We also report the offsets for the subset of surveys calibrated in SuperCal (χ 2 /N dof = 11.1/11). We find good agreement with the original Pantheon sample and calibration with the exception of CSP which had its tertiary star catalog updated in Krisciunas et al. (2020) and for which we add additional systematic uncertainty (Last row of Table 5). Small differences in Hubble residuals for surveys that have not been recalibrated (i.e. CfA1) are due to changes in the SALT2 training. Surveys in Fig. 6 are ordered from top to bottom by mean survey redshift and no trends as a function of redshift are seen. We note that while the CfA1 survey offset shown in Fig. 6 is for the Hubble flow SN photometry from Riess et al. (1999), the H 0 Cepheid calibrators from CfA1 (SN 1994ae andSN 1995al) were recalibrated in Riess et al. (2005Riess et al. ( , 2009).

Impact on Distances and Cosmology
In Fig. 7, we show the impact on recovered distances for the subset of ∼ 700 SNe that were calibrated for the JLA analysis . We find a difference in w of +0.035 using the B21 SALT2 surface presented in this work, versus the B14 surface used in the JLA/Pantheon analysis (when applied to the same data). We can trace nearly the entirety of this shift to the change in the M0 component, rather than any other change in the components of the SALT2 model. We also find a difference in w of +0.025 from the update to the Fragilistic calibration solution and its affect on the survey zeropoints in light-curve fitting. Distance modulus residuals are shown in Fig. 7 relative to JLA for three cases: 1) calibration zeropoints (ZPT) changed to Fragilistic (no SALT2 retraining), 2) SALT2 retraining to B21 model (no calibration offsets changed), and 3) both Fragilistic zeropoints and SALT2 B21 retraining done simultaneously. Distances are shown as a function of redshift, z, in order to understand the impact on cosmological model constraints. For the SALT2 retraining only, we find a slope relative to B14 over a ∆z = 1 is ∼ 0.04 mag. For the updated zeropoints we find a significant offset between the low-z samples and highz samples, but no significant redshift dependent slope beyond z=0.1.
For these combinations we also report the recovered dark energy equation of state w relative to our replication of the original B14 analysis in Table 6. We note again that these w-differences are only for the subset of ∼700 that were calibrated for JLA. We also provide the same combination of changes to the analysis but relative to the SuperCal calibration (Scolnic & Kessler 2016) and SALT2 model trained on SuperCal offsets (T21). We find that the effect of changing both the SALT2 model and the zeropoints simultanously results in twice as large of a difference in recovered cosmology in comparison to only changing one component at a time (as did the original Pantheon). Relative to the original Pantheon, when Figure 6. Weighted average Hubble residual offsets to the best-fit ΛCDM model for SN observed by each survey. All SNe have been corrected for expected biases in observable parameters (described in Brout et al in prep). The points in red represent the residuals using the SuperCal calibration (Scolnic & Kessler 2016). The blue points represent the offsets after applying Fragilistic calibration. When no SuperCal correction is given, this is because this survey was not included in the SuperCal calibration. The ordering of surveys on the y-axis is in order of increasing sample mean redshift (from bottom to top).
accounting for the updated zeropoints and SALT2 model simultaneously we find ∆w = +0.07 (0.064-0.009) on the subset of SNe that have been calibrated by both JLA and Fragilisitc.

Systematics Impact on Cosmological Inference
We analyze constraints on cosmological parameters before and after including the systematic covariance matrix and we find that the covariance inflates the posterior uncertainty on w (σw sys ) by 0.013 after combining with Planck Collaboration et al. (2018). The breakdown for each component of calibration systematics analyzed in this work can be found in Table 5. We note that this methodology has accounted for correlated SALT2 training and photometry zeropoints simultaneously, as described in Section 4.3. This systematic uncertainty also accounts for novel conservative sources of systematics.

DISCUSSION
In previous analyses such as Scolnic et al. (2018), Jones et al. 2019, Brout et al. (2019, the systematics due to SALT2 retraining were decoupled from the shifts in calibration zeropoints. With our formalism, we have simultaneously retrained SALT2 and fit light curves with our updated calibration. This was not done in previous analyses like Scolnic et al. (2018) or Brout et al. (2019) because the SALT2 training code was neither easily available nor modifiable. As seen in Sec. 5, the net change to w for combined re-fitting with new zeropoints and re-training the SALT2 model is 0.06 relative to the Joint Light Curve Analysis and 0.07 relative to the original Pantheon analysis. While the calibration solution in this work largely agrees with that found in SuperCal, the large difference in w is the result of the original Pantheon analysis not retraining the SALT2 model on their calibration and the difference in Figure 7. The net change in the Tripp distance modulus values as given by Eq. 6 due to re-fitting the light-curves with 1) new B21 calibration but while the keeping JLA SALT2 model (blue) 2) re-training the SALT2 B21 model but keeping JLA zeropoints for the fitting (red) and 3) neither re-training the SALT2 model nor updating calibration zeropoints (black).
trained SALT2 models can be traced largely to the 1.5% update in the CALSPEC modeling. For this reason in the Pantheon+ cosmological analysis we have tripled the associated systematic uncertainty in CALSPEC modeling.
The systematic uncertainty due to the propagation of uncertainties in our joint solution presented in Sec. 5.3 is 0.013 in w. As the reported shift in w relative to previous analyses is significant compared to the typical statistical precision on w and the reported systematic uncertainty, it is important to understand what drives this sensitivity. We can see in Table 6 that the re-calibration and re-training cause roughly equal changes to w.
The net contribution to the systematic uncertainty found here is small in comparison to the older methodology utilized in the original Pantheon which treated SALT2 training and filter zeropoints separately as well as computed systematic uncertainties for filter zeropoints in an uncorrelated manner; individual linear perturbations. Additionally the original Pantheon adopted an additional uncertainty due to the fact that they did not retrain the SALT2 model for their updated calibration (SuperCal).
For this analysis, we followed the systematiccovariance matrix established in Conley et al. (2010). We both retrain the model and refit the light-curves using the updated Fragilistic calibration solution, and propagate calibration uncertainties through to cosmology. As seen in , some systemat-ics (for example SN intrinsic variations) can be 'selfcalibrated' down in size with simply larger datasets, whereas calibration related systematics benefit less from self-calibration and thus effort to reduce them with improved calibration measurements and accurate accounting of their uncertainties is important.
We do find significant sensitivity to the inclusion of the observed frame U band measurements in SALT2 training. As PS1 only covers griz, we are unable to recalibrate the observed frame U band, which is included for surveys CfA3S and CfA3K and CfA4. While the original Pantheon chose not to include observed frame U band in the fitting of light-curves, it was still used in the training of the SALT2 model. This is not completely necessary to determine the model for rest-frame U as the rest-frame U is redshifted to higher observerframe wavelengths at high redshift and the model can be calibrated via optical band measurements of higherz SNe in the training. We test this in further detail in Appendix B.
While writing this paper, a new training code Salt-Shaker as part of SALT3 - Kenworthy et al. (2021) was developed. This is improved compared to the SALT2 model due to improved estimation of uncertainties, better separation of color and light-curve stretch, and a training sample size over 2× larger. Work is ongoing to implement it in standard SN Ia cosmology frameworks, but the sensitivity to the U band discussed above motivate the adoption of surfaces with a longer wavelength range calibrated by higher redshift data. Until this happens we advocate for the removal of observer frame U which corresponds to z > 0.8 which for the Pantheon+ sample is largely SNLS.
In Section 5 we gave the impact on w resulting from the changes to the calibration, but we are also able to determine the impact on H 0 . We find the impact from all sources of calibration on the level of 0.2 km/s/Mpc, subdominant relative to the total statistical uncertainty on H 0 of ∼ 1 km/s/Mpc in the companion SH0ES paper (Riess et al. 2022). This is due to the use of SNe from the same surveys in both the second and third rung of the distance ladder and from the use of many different photometric systems for both the calibrators and overlapping in redshift at low-z. This effect is quantified in . While  only look at the impact of single, gray offsets per survey, they find a similar 0.2 km/s/Mpc limit in the uncertainties when excluding prior information on the survey zeropoints. While our grey offsets are much smaller than that of the worst case scenario of , because we include both color dependent correlations and retrained SALT2 (as Browns-  Table 6. The changes to the recovered value of w due to changes described in this analysis when combining with CMB data. This is only run on the subset of CfA,CSP,SDSS,SNLS that was calibrated in the Join Light Curve Analysis. berger et al. (2021) did not) we find that the sensitivity to cosmological parameters w and Ω M is even stronger than that given in . Currie et al. (2020) studied two effects that we did not: variations across the focal plane for each survey as well as the impact of filter shifts (as did Betoule et al. 2014). Currie et al. (2020) finds while the focal plane variations can improve scatter, the net changes in distance are small and are therefore not included here. The reason we didn't include filter shifts is because of the complexity in 2× the degrees of freedom in our simultaneous fit and introduction of non-linearity in the computation of the likelihood. Despite this, we took care to examine filters that had large differences in their slopes for observed and synthetic and still varied the uncertainty on each filter curve in the SALT2 training systematics as shown in Table 7 albeit without correlations with the zeropoints.
The other way to improve this analysis is with more calibration data. While future surveys (e.g. Vera Rubin LSST, Nancy Grace Roman Space Telescope) can move away from this cross-calibration method by using a single photometric system, this is currently not feasible for measuring H 0 . While we are fortunate in that new surveys with many more stars and better calibration are coming (such as DES, ZTF, LSST and Roman), any improvement for H 0 will be relatively small due to the necessity of using SNe in older surveys that have measured the rare nearby SNe required for Cepheid observations over the last 20 years.

CONCLUSION
In this analysis, we compiled the information to calibrate 25 different photometric systems used for the upcoming Pantheon+ and SH0ES analyses. We measure calibration zeropoints for a total of 105 filters. We find relatively good agreement with SuperCal, though add on an additional 8 photometric systems and account for updated CALSPEC modeling and CSP and CfA filter throughputs.
We derive a full covariance matrix between the filter zeropoints; the first of its kind. We then retrain the SALT2 model based on our fitted zeropoints and use the covariance matrix to create 9 realizations based on correlated perturbed calibration offsets.
We find a change in w compared to the original Pantheon of 0.07 and a 30% reduction in contribution to systematic calibration uncertainty with our new method.    [Right] Differences in inferred distances assuming the same Tripp standardization coefficients α = 0.15 and β = 3.1 between fits with and without rest-frame U band in the SALT2 training. Differences are largest beginning at z = 0.8. A vertical line is placed at z = 0.8 where we place a quality cut in Pantheon+ sample ) in order to be insensitive to this potential systematic.   Natural What is calibration based on?
Here What system are the stars in?
Landolt What is calibration based on? BD17 What is transformation for that calibration?      Table 22. AB recalibration for AB surveys that we calibrated on old CALSPEC models. These offsets are applied prior to performing cross-calibration.