Direct T e-based Metallicities of z = 2–9 Galaxies with JWST/NIRSpec: Empirical Metallicity Calibrations Applicable from Reionization to Cosmic Noon

We report detections of the [O iii]λ4364 auroral emission line for 16 galaxies at z = 2.1–8.7, measured from JWST/NIRSpec observations obtained as part of the Cosmic Evolution Early Release Science (CEERS) survey program. We combine this CEERS sample with 9 objects from the literature at z = 4−9 with auroral-line detections from JWST/NIRSpec and 21 galaxies at z = 1.4−3.7 with auroral-line detections from ground-based spectroscopy. We derive electron temperature (T e) and direct-method oxygen abundances for the combined sample of 46 star-forming galaxies at z = 1.4−8.7. We use these measurements to construct the first high-redshift empirical T e-based metallicity calibrations for the strong-line ratios [O iii]/Hβ, [O ii]/Hβ, R23 = ([O iii]+[O ii])/Hβ, [O iii]/[O ii], and [Ne iii]/[O ii]. These new calibrations are valid over 12+log(O/H) = 7.4−8.3 and can be applied to samples of star-forming galaxies at z = 2−9, leading to an improvement in the accuracy of metallicity determinations at Cosmic Noon and in the Epoch of Reionization. The high-redshift strong-line relations are offset from calibrations based on typical z ∼ 0 galaxies or H ii regions, reflecting the known evolution of ionization conditions between z ∼ 0 and z ∼ 2. Deep spectroscopic programs with JWST/NIRSpec promise to improve statistics at the low and high ends of the metallicity range covered by the current sample, as well as to improve the detection rate of [N ii]λ6585 and thus allow the future assessment of N-based indicators. These new high-redshift calibrations will enable accurate characterizations of metallicity scaling relations at high redshift, improving our understanding of feedback and baryon cycling in the early Universe.


INTRODUCTION
The abundance of heavy elements relative to hydrogen, or metallicity, is a fundamental property of galaxies that traces the combined effects of stellar mass buildup and gas flows that add or remove mass and metals from systems.Theoretical models of galaxy evolution describe how the gas-phase metallicity of the interstellar medium (ISM), traced by the oxygen abundance O/H, is set by the relative strength of the star-formation rate (SFR), mass inflow rate, and mass outflow rate (e.g., Davé et al. 2012;Lilly et al. 2013;Torrey et al. 2019).
email: rlsand@ucdavis.edu* NHFP Hubble Fellow A major goal of modern astronomy is thus to robustly characterize the way metallicity scales with galaxy properties including stellar mass (M * ) and SFR and how such scaling relations change with redshift to understand gas flows and baryonic mass assembly across cosmic history.
Sensitive near-infrared spectrographs on large groundbased telescopes and HST have provided measurements of rest-optical line ratios at z ∼ 1 − 4, enabling metallicity studies to be carried out in the first half of cosmic history.Such studies have generally found that metallicity decreases at fixed M * with increasing redshift (e.g., Erb et al. 2006;Sanders et al. 2015Sanders et al. , 2021;;Papovich et al. 2022;Strom et al. 2022) and that the FMR does not strongly evolve out to z ∼ 3.5 (Cresci et al. 2019;Sanders et al. 2020Sanders et al. , 2021)).While spectroscopic metallicity samples at z ∼ 2 − 3 now comprise hundreds of galaxies such that the statistical precision is high, there is significant systematic uncertainty on derived metallicities due to the unknown form of the relations between rest-optical strong-line ratios and O/H at high redshift.The lack of robust high-redshift metallicity calibrations likewise limits conclusions drawn from the fast-growing archive of rest-optical spectra from JWST for galaxies at z > 4 and reaching deep into the epoch of reionization (e.g., Shapley et al. 2023a;Sanders et al. 2023a;Nakajima et al. 2023;Matthee et al. 2022;Bunker et al. 2023).
Empirical calibrations between rest-optical line ratios and metallicity can be constructed using samples for which O/H has been derived using the robust "direct method" that is based on electron temperature (T e ) determinations.In this approach, T e is calculated from the flux ratio of a faint auroral emission line (e.g., [O iii]λ4364) to a bright line from the same ion ([O iii]λ5008), leveraging the fact that the two transitions arise from different upper energy levels.This temperature can then be used to calculate the emissivity of various transitions to convert dust-corrected flux ratios of O ionic lines to H recombination lines (i.e., [O iii]/Hβ and [O ii]/Hβ) into O/H.Metallicity calibrations can then be constructed by fitting functional forms to the relations between different line ratios and direct-method O/H.This approach has been used to construct many different metallicity calibrations based on local H ii regions and z ∼ 0 star-forming galaxies that form the basis of MZR and FMR studies carried out on large samples(e.g., Pettini & Pagel 2004;Maiolino et al. 2008;Marino et al. 2013;Curti et al. 2017Curti et al. , 2020)).
There is now significant evidence that, at fixed O/H, the ionization conditions of the ISM evolve from "normal" conditions at z ∼ 0 toward a more extreme state at z ∼ 2 associated with a harder ionizing spectrum due to the α-enhanced chemical abundance patterns of young stars and elevated electron densities (e.g., Steidel et al. 2014Steidel et al. , 2016;;Sanders et al. 2016aSanders et al. , 2020;;Strom et al. 2017Strom et al. , 2018;;Shapley et al. 2015Shapley et al. , 2019;;Runco et al. 2021;Topping et al. 2020a,b;Cullen et al. 2021).Since the ionization conditions set the shape of metallicity calibrations, an important consequence of these results is that metallicity calibrations are expected to change between z ∼ 0 and z ∼ 2. Accordingly, calibrations based on typical z ∼ 0 sources should not be applied to highredshift samples.To address this issue, T e -based calibrations were constructed using low-redshift galaxies with extreme line-ratio or SFR properties similar to those of typical high-redshift samples, assuming that matching in such properties selects sources with the same ISM ionization conditions present at high redshift (Bian et al. 2018;Pérez-Montero et al. 2021;Nakajima et al. 2022).However, the validity of these "analog" calibrations must ultimately be tested directly at high redshift.
The most robust resolution to this problem is to construct high-redshift metallicity calibrations using direct metallicity and strong-line measurements of highredshift galaxies themselves.Based on deep groundbased spectroscopy of bright high-redshift line emitters, a sample of ∼ 20 star-forming galaxies at z = 1.4 − 3.7 with auroral line detections and direct-method metallicities has been assembled, representing many nights of 8 − 10 meter telescope time (Villar-Martín et al. 2004;Brammer et al. 2012;Christensen et al. 2012a,b;Stark et al. 2013Stark et al. , 2014;;Bayliss et al. 2014;James et al. 2014;Sanders et al. 2016b;Kojima et al. 2017;Berg et al. 2018;Patrício et al. 2018;Gburek et al. 2019Gburek et al. , 2022;;Sanders et al. 2020Sanders et al. , 2023b)).Sanders et al. (2020) showed that, on average, this sample deviates from calibrations based on normal z ∼ 0 sources, but is wellmatched by the local analog calibrations of Bian et al. (2018).However, both the sample size and the fidelity of individual measurements of this ground-based sample fall short of what is required to construct new calibrations.As discussed in Sanders et al. (2023b), this shortcoming is primarily due to the sensitivity provided by current ground-based near-infrared spectrographs, the highly wavelength-dependent sky background, and the limited accessible near-infrared wavelength ranges due to atmospheric transmission.Accordingly, a significant expansion of the z > 1 auroral-line sample is not feasible with current ground-based facilities.
JWST now provides the spectroscopic capabilities to overcome all of the challenges described above and obtain a large sample of high-redshift galaxies with directmethod metallicities for the first time, paving the way toward the first robust high-redshift metallicity calibrations.Upon commencing science operations, the NIR-Spec instrument onboard JWST immediately demonstrated the ability to detect auroral emission lines of dis-tant galaxies in the Early Release Observations (ERO) targeting the SMACS 0723 cluster field.The ERO spectra revealed detections of [O iii]λ4364 for three galaxies at z > 7.5 for which T e -based metallicities were reported (Curti et al. 2023;Brinchmann 2022;Schaerer et al. 2022;Arellano-Córdova et al. 2022;Taylor et al. 2022;Trump et al. 2022;Tacchella et al. 2022).NIR-Spec observations from the Cosmic Evolution Early Release Science (CEERS) and GLASS Early Release Science (ERS) programs have yielded direct-method metallicities for several additional sources at z = 4−9 (Taylor et al. 2022;Nakajima et al. 2023;Tang et al. 2023;Jones et al. 2023).However, these early studies were limited to a small number of JWST targets and furthermore did not integrate all of the existing ground-based T e data at z ∼ 1 − 4.
In this paper, we report detections of T e -sensitive auroral emission lines for 16 galaxies at z = 2.1 − 8.7 measured from medium-resolution JWST/NIRSpec spectroscopy from the CEERS survey, which we use to derive robust gas-phase oxygen abundances using the direct method.Of these detections, 11 are new while 5 have been previously reported (Nakajima et al. 2023;Tang et al. 2023).We combine this sample with 9 sources at z = 4 − 8.5 from the literature with detections of auroral lines from other JWST ERO and ERS programs and 21 targets at z = 1.4 − 3.6 drawn from the literature with direct-method metallicities from ground-based spectroscopy.We use the resulting sample of 46 galaxies at z = 1 − 9 to derive the first empirical high-redshift metallicity calibrations that enable a robust translation of rest-optical strong-line ratios into gas-phase oxygen abundance.These relations are valid from Cosmic Noon into the Epoch of Reionization, and over the metallicity range 12+log(O/H) = 7.0 − 8.4 This paper is organized as follows.We describe the observations, data reduction, and measurements in Section 2. In Section 3, we calculate physical properties including electron density, electron temperature, and direct-method O/H.We derive empirical metallicity calibrations in Section 4. Finally, in Section 5, we discuss these new calibrations in the context of existing literature calibrations and summarize our conclusions.

Observations and data reduction
This analysis uses publicly-available JWST/NIRSpec Micro-Shutter Array (MSA) spectroscopic data from the CEERS survey (Program ID: 1345Finkelstein et al. 2022a,b, Finkelstein et al.,in prep.;Arrabal Haro et al., in prep.).These data include observations of 6 pointings in the AEGIS field with the G140M/F100LP, G235M/F170LP, and G395M/F290LP grating/filter configurations, providing continuous wavelength coverage (excepting the chip gap) spanning 1 − 5 µm at a spectral resolution of R ∼ 1000.At each pointing, the total on-source integration in each configuration was 3107 sec.
The data were reduced to produce two-dimensional (2D) spectra, one-dimensional (1D) flux-calibrated spectra were extracted, and a slit-loss correction was applied as described in Shapley et al. (2023b), Reddy et al. (2023), andSanders et al. (2023a).Out of 318 total unique targets, spectroscopic redshifts were measured for 252 sources.Eight sources were identified as active galactica nuclei (AGNs) based on the presence of broad emission lines or large [N ii]/Hα ratios (log(N2) > −0.3).The remaining targets are assumed to have emission lines predominantly powered by star formation.Stellar population properties were inferred with the spectral energy distribution (SED) fitting code Fast (Kriek et al. 2009) by fitting the flexible stellar population synthesis models of Conroy et al. (2009) to public multi-wavelength photometry.We assumed a delayedτ star-formation history and either solar stellar metallicity and the Calzetti et al. (2000) attenuation curve or sub-solar metallicity and the SMC attenuation curve (Gordon et al. 2003) based on the redshift and stellar mass of the source, as detailed in Shapley et al. (2023b).For roughly one third of the sample for which reduced JWST NIRCam imaging was available, models were fit to combined HST and JWST/NIRCam photometry.For the remaining two thirds, the 3D-HST survey photometric catalogs comprising HST, Spitzer, and groundbased imaging were used (Momcheva et al. 2016;Skelton et al. 2014).In both cases, the observed photometric measurements were corrected for the contribution from emission lines using the measured line fluxes from the JWST/NIRSpec spectra (Sanders et al. 2023a).

Band-to-band flux calibration
Emission line fluxes were measured from 1D science spectra by fitting Gaussian models on top of the continuum defined by the best-fit stellar population model as described in Sanders et al. (2023a).This analysis requires robust emission-line ratios to calculate the reddening correction, T e , and O/H, and for accurate strongline ratios, some of which are widely separated in wavelength (i.e., O2, R23, O32, N2O2).While the absolute flux calibration has no impact on the results in this paper, achieving an accurate relative flux calibration is of key importance.To ensure accurate line ratios, we first seek to use line flux measurements from within the same grating whenever possible.If a line fell in the region of overlapping wavelength coverage between two gratings, we use the line flux measured in the same grating as the other feature(s) in a line ratio.However, some line ratios for a subset of targets necessitate comparing line fluxes measured in different gratings depending on the redshift of the target and the wavelength separation of the lines in each ratio.We thus took particular care with the relative flux calibration between gratings to minimize offsets between grating configurations using the following method for each target in our sample.
First, if one or more emission lines are measured in both neighboring gratings and are detected at > 5σ significance in both gratings, we use the ratio of the measured overlapping line flux(es) to place spectra in the two gratings on the same relative flux scale.If no lines are detected in the overlap region but Hα falls in the redder grating while Hβ and higher order Balmer lines fall in the bluer grating, then we scale the redder grating such that the Hα flux matches the expected observed (i.e., reddened) flux based on the brightness and ratios of the bluer Balmer lines.Finally, if neither of the above cases are present, we integrate the continuum in the overlapping wavelength region between the two gratings and scale according to the ratio of the integrated fluxes.In all cases, we scale the G140M and G395M spectra to match the G235M spectrum since the G235M configuration covers an overlapping wavelength range with both of the other two settings.We also calculate the uncertainty on the scaling factors.If a line ratio includes features measured in different gratings, then the uncertainty on the scaling factor is propagated into the final uncertainty on the line ratio along with the flux measurement uncertainties.If a line ratio instead compares lines measured in the same grating, then the error on the line ratio is calculated only from the measurement uncertainty on each line flux.
It is notable that for all objects in our sample the ratios O3, Ne3O2, N2, O3N2, and [O iii]λ4364/[O iii]λ5008 are unaffected by band-toband uncertainties.All of these besides O3N2 only compare lines measured in the same grating, while for O3N2 any scaling factors cancel out since the numerator and denominator respectively include lines from a single grating.We further determined the reddening correction (see Sec. 3.1 below) using only the subset of H Balmer recombination lines falling in the same grating as Hβ.As such, multi-grating flux calibration uncertainties have no affect on these line ratios or our derived T e and 12+log(O 2+ /H) values.The line ratios impacted by band-to-band uncertainties include O2, R23, and O32 for only 4/16 targets and N2O2 for all targets with coverage of both [N ii] and [O ii].We thus find that systematics related to the relative flux calibration between grating configurations do not significantly impact our results.

CEERS auroral-line sample selection
We selected objects from the full CEERS/NIRSpec sample with detections of the [O iii]λ4364 emission line using the following process.We first selected all non-AGN sources with a measured [O iii]λ4364 signal-tonoise ratio of S/N≥2.We then visually inspected the 2D and 1D spectra of the resulting 58 targets to select those with robust [O iii]λ4364 by ensuring that the line is identifiable in both 2D and 1D spectra, falls at the expected wavelength according to the redshift measured from brighter lines, falls at the same spatial position in the 2D spectra as brighter lines, is morphologically well-behaved in 2D, and is not narrower than the instrumental resolution (i.e., excluding single-pixel noise spikes).This selection results in a sample of 16 [O iii]4364-emitters spanning z = 2.16 − 8.68 with a median redshift of 4.6 (Table 1).The redshift distribution of this sample is shown by the green histogram in Figure 1.The observed emission-line fluxes of these galaxies are presented in Table 3 in Appendix A. Figure 2 shows the region of the 2D and 1D spectra covering Hγ and [O iii]λ4364 for these 16 galaxies.The [O iii]λ4364 significance spans 2.4σ to 6.1σ with a median S/N of 4.2.
Two of the CEERS [O iii]λ4364-emitters, 11088 (z = 3.302) and 3788 (z = 2.295), also display detections of the auroral [O ii]λλ7322,7332 emission line doublet, shown in Figure 3.The first detections of the [O ii] auroral lines at high redshift were recently reported by Sanders et al. (2023b).To our knowledge, these new detections represent only the second time [O ii]λλ7322,7332 has been reported beyond the lowredshift universe.We will use these [O ii] auroral lines to constrain T e in the low-ionization nebular zone for these two objects.

Literature JWST auroral-line sample
To supplement the CEERS auroral sample, we selected 9 additional targets from the literature with T ebased metallicities and detected auroral emission lines from JWST spectroscopy.Four of these targets have R ∼ 1000 NIRSpec data from the ERO program targeting the SMACS 0723 cluster.We use the published line fluxes from Curti et al. (2023) for ERO IDs 4590, 6355, and 10612, and line measurements from Nakajima et al. (2023) for ERO ID 5144.Five additional sources have published auroral line detections measured from R ∼ 2700 NIRSpec observations from the GLASS ERS program (Treu et al. 2022).We utilize line measurements from Nakajima et al. (2023) for GLASS IDs 100003, 10021, 150029, and 160133, and the line fluxes from Jones et al. (2023) for GLASS ID 150008.The JWST literature auroral-line sample has redshifts spanning z = 4.01 − 8.50 with a median redshift of 7.29, the distribution of which is displayed by the red histogram in Figure 1.The detected auroral line is O iii]λ1666 for GLASS 150008 and [O iii]4364 for the 8 other JWST literature galaxies.These literature auroral lines have S/N= 3.3 − 9.6 with a median significance of 6.0.

Ground-based auroral-line sample
We also include a sample of galaxies at z > 1 with auroral-line detections from ground-based spectroscopy.This sample is predominantly made up of the sample presented in Sanders et al. (2020) that totals 18 targets including several from literature sources (Villar-Martín et al. 2004;Brammer et al. 2012;Christensen et al. 2012a,b;Stark et al. 2013Stark et al. , 2014;;Bayliss et al. 2014;James et al. 2014;Sanders et al. 2016b;Kojima et al. 2017;Berg et al. 2018).We supplement this sample with a [O iii]λ4364-detected galaxy from Gburek et al. (2019) and two galaxies with detected [O ii] auroral emission lines presented in Sanders et al. (2023b).We do not include composite spectra (e.g., Steidel et al. 2016;Gburek et al. 2022), but instead limit to individual sources.The ground-based auroral-line sample thus includes 21 galaxies with a redshift distribution shown by the blue histogram in Figure 1, spanning z = 1.42 − 3.63.This ground-based sample includes 8 galaxies with [O iii]λ4364 detections, 11 with O iii]λ1666 detections, and 2 with [O ii]λλ7322,32 detections.For the ground-based sample, we adopt the reddening-corrected line ratios, T e , and direct-method metallicities calculated by Sanders et al. (2020) and Sanders et al. (2023b) that were derived using a methodology consistent with this work.

Combined high-redshift auroral-line sample
To obtain sufficient statistics to construct new empirical high-redshift metallicity calibrations, we combine the CEERS, JWST literature, and ground-based literature auroral-line samples, resulting in a combined high-redshift auroral-line sample consisting of 46 unique sources with T e -based metallicities.The gray histogram in Figure 1 shows the redshift distribution of the combined sample spanning z = 1.4−8.7,which has a median redshift of z med = 3.63.The currently available data do not suggest strong evolution of ISM ionization conditions or metallicity calibrations over z ∼ 2 − 9, implying that galaxies over this large redshift range may be used in a single calibrating sample (see Sec. 5.2; Sanders et al. 2023a).Of these 46 galaxies, all have detected O3, 39 have detected O2, 43 have detected R23, 39 have de-  Table 1.Properties of the CEERS auroral-line sample.
Dec. tected O32, 31 have detected Ne3O2, and 12 have detected N2, O3N2, and N2O2.This sample is more than twice the size of the largest high-redshift auroral-line compilation assembled to-date (Sanders et al. 2023b).

DERIVED PHYSICAL PROPERTIES
In this section, we describe how physical properties including dust reddening, emission-line ratios, electron density and temperature, and oxygen abundances were calculated for the CEERS and literature JWST/NIRSpec auroral-line samples.

Dust reddening, SFR, and emission-line ratios
A robust correction for dust reddening is required for accurate T e and O/H inferences.We derived the nebular reddening, E(B − V ) gas , using the observed ratios of H Balmer recombination lines including Hα, Hβ, Hγ, and Hδ assuming the Milky Way (MW) extinction law of Cardelli et al. (1989).The nebular attenuation curve derived directly for z ∼ 2 star-forming galaxies is consistent with the MW curve (Reddy et al. 2020).Furthermore, analysis of Balmer and Paschen line ratios from JWST at z = 1 − 3 do not indicate significant deviation from the MW law (Reddy et al. 2023).To reduce uncertainty due to the relative flux calibration between gratings (Sec.2.2), we only used Balmer line fluxes measured in the same grating as Hβ to derive the nebular reddening.The set of lines employed for each of the CEERS auroral-line emitters is reported in Table 1.For the JWST literature sources, we do not have complete information about whether the reported Balmer lines were measured in different spectroscopic configurations and simply used the subset of Hα, Hβ, Hγ, and Hδ that are detected at > 3σ.
E(B − V ) gas was calculated via a χ 2 minimization routine that simultaneously fits to the set of available ratios out of Hα/Hβ, Hγ/Hβ, and Hδ/Hβ, taking into account the uncertainty on each observed ratio.The intrinsic ratios were calculated with pyneb (Luridiana et al. 2015) assuming T e =15,000 K, a typical value for our sample.The derived E(B −V ) gas values, reported in Table 1, were then used to dust-correct the observed line fluxes based on their rest-frame wavelengths assuming the Cardelli et al. (1989) extinction law.The sample is generally not significantly dusty, with E(B − V ) gas < 0.3 for the vast majority of targets, such that our results are not strongly dependent on the reddening correction.
SFRs were calculated using dust-corrected Hα luminosity when Hα was covered, otherwise dust-corrected Hβ luminosity was employed.We adopt the conversion factor from Balmer line luminosity to SFR based on Z * = 0.001 BPASS binary stellar population synthesis models (Eldridge et al. 2017) appropriate for moderate and low metallicity high-redshift systems (Reddy et al. 2022;Shapley et al. 2023b).SFR as a function of M * is shown in Figure 4 for the CEERS, JWST literature, and ground-based auroral-line samples, colorcoded by redshift.The vast majority of the auroral-line detected high-redshift galaxies lie above the mean starforming main sequence at their respective epoch (Speagle et al. 2014).This bias toward high specific SFR (sSFR=SFR/M * ) is a result of selecting sources based on detections of weak emission lines.
Emission-line ratios were calculated using the reddening-corrected line fluxes.Due to the close proximity of the involved lines, the ratios O3, Ne3O2, N2, and O3N2 have virtually no dependence on the reddening correction.
In contrast, the final R23, O2, O32, N2O2, [O iii]λ4364/[O iii]λ5008, and [O ii]λλ7322,7332/[O ii]λ3728 ratios depend on the inferred E(B − V ) gas .We define a detection of a strongline ratio as the case where all lines in that ratio are detected at ≥ 3σ significance.In the case that one or more lines in a ratio do not meet this criterion, we calculate 3σ limits when possible.The vast majority of the combined high-redshift sample (≥39/46) is detected in the O3, O2, R23, and O32 ratios, while Ne3O2 is detected for 31/46 sources.As such, the detected sample for line ratios based on α-element metals (i.e., O and Ne) is reasonably representative of the full combined auroral-line sample.
[N ii]λ6585 is covered in the spectra of 11/25 JWST targets.With JWST/NIRSpec, Hα and [N ii] can only be accessed out to z ≈ 6.7.Nine of the JWST CEERS  and literature sources are at z > 6.7 such that [N ii] was not covered, while for 5 other objects [N ii] fell in the chip gap (CEERS 1651), fell off the detector due to the position on the MSA mask, or was not reported in the literature reference.Of the JWST targets with coverage, [N ii] is detected for 6 galaxies.This low detection rate is likely due to the low [N ii]/Hα ratios that appear to be typical of metal-poor high-redshift galaxies.(Sanders et al. 2020(Sanders et al. , 2023b)).Consequently, only 12/46 sources in the combined sample have detections of the N2, O3N2, and N2O2 ratios.

Electron density
The electron density, n e , is derived from the ratio of the components of the for which n e ranges from the low-density limit to 2900 cm −3 with a median value of 280 cm −3 (Sanders et al. 2020(Sanders et al. , 2023b)).If the 6 JWST sources with n e constraints are included, the median density for 23/46 objects in the combined sample is 278 cm −3 .This value is in good agreement with the typical electron density found for large samples of z ∼ 2 − 3 star-forming galaxies of 250 − 300 cm −3 (Sanders et al. 2016a;Strom et al. 2017).Accordingly, we assume n e =300 cm −3 for the T e and abundance ratio calculations described below.The exact assumed value has negligible impact on the metallicity results since n e variation changes derived T e values by 1% when n e <3000 cm −3 (Sanders et al. 2020).

Electron temperature
For all but one JWST target, the electron temperature of the high-ionization O 2+ zone of the nebula, T e ([O iii]), was calculated from the [O iii]λ4364/[O iii]λ5008 ratio.We used pyneb with the O 2+ collision strengths from Storey et al. (2014).For GLASS 150008, [O iii]λ4364 fell in the chip gap, but O iii]λ1666 was significantly detected (Jones et al. 2023).
We use the O iii]λ1666/[O iii]λ5008 ratio to calculate T e for this galaxy.We use the Aggarwal & Keenan (1999) collision strengths for the O iii]λ1666 calculation since it requires a 6 level atom while Storey et al. (2014) only include 5 levels, following the approach used in Sanders et al. (2020).The derived T e and O/H values change by much less than 1σ if we instead use Aggarwal & Keenan (1999) for the entire sample.In the CEERS sample, T e ([O iii]) ranges from 11,000 K to 28,000 K, as reported in Table 1.Four CEERS targets have very high T e in excess of 20,000 K (though with large uncertainties of ∼ 3, 500 K), as hot as extremely metal-poor (< 0.1 Z ) local galaxies (e.g., Izotov et al. 2012Izotov et al. , 2018) ) but similar to what has been inferred from early JWST spectra at z ∼ 6 − 8 (e.g., Curti et al. 2023;Schaerer et al. 2022;Arellano-Córdova et al. 2022).
The electron temperature in the low-ionization O + zone, T e ([O ii]), is required to compute O + /H.Two objects in the CEERS sample (11088 and 3788) have detections of [O ii]λλ7322,7332 (Fig. 3), for which we derive T e ([O ii]) with pyneb using the O + collision strengths of Kisielius et al. (2009).These targets represent the first high-redshift sources with direct constraints on T e in both the low-and high-ionization zones.We find

Ionic and total oxygen abundances
Ionic and total oxygen abundances were calculated using pyneb with the collision strengths of Storey et al. (2014)  indicating that the high-redshift galaxies in this sample are relatively metal-poor.

Uncertainties on derived properties
Uncertainties on E(B −V ) gas , emission-line ratios, n e , T e , and abundance ratios were calculated by perturbing the observed line fluxes according to the measured uncertainties and recalculating all of the properties based on the new realization of line strengths.This process was repeated 500 times to sample the distribution of each property, and the 1σ error was inferred from the 68th-percentile bounds on each quantity.Uncertainties of line ratios comparing lines measured in different NIR-Spec gratings additionally include the uncertainty on the relative flux calibration between gratings (Sec.2.2).

EMPIRICAL HIGH-REDSHIFT METALLICITY CALIBRATIONS
We now use the direct-method metallicities for the combined high-redshift sample of 46 galaxies at z = 1.4 − 8.7 to construct the first empirical T e -based metallicity calibrations derived directly from high-redshift sources.Figure 5 shows the strong-line ratios O3, O2, R23, O32, Ne3O2, N2, O3N2, and N2O2 as a function of direct-method O/H.O3 and R23 are known to be double-valued in z = 0 samples and local analogs, with a turnover point at roughly 12+log(O/H) ∼ 8.0 (e.g., Curti et al. 2020;Bian et al. 2018).The highredshift data are consistent with such a shape, in particular showing a dropoff toward lower O3 and R23 with decreasing metallicity at 12+log(O/H) < 7.6.We do not observe any signs of flattening in the O2, O3, and Ne3O2 vs. O/H diagrams.A Spearman correlation test on the detected sources in each of these panels indicates the presence of significant correlations, with a correlation coefficient of ρ s = 0.625 and a p-value of 2.1 × 10 −5 for O2, ρ s = −0.497and p-value=1.3× 10 −3 for O32, and ρ s = −0.578and p-value=6.6× 10 −4 for Ne3O2.As discussed above, [N ii] measurements were not available for most of the sample and only 12 sources have [N ii] detections, such that the statistics are poor for N2, O3N2, and N2O2.With the limited data available, no clear trends in the N-based line ratios are apparent as a function of O/H.
To further discern how these strong-line ratios depend on metallicity, we calculated medians in bins of O/H including only the galaxies with line ratio detections in each panel.We aim to have 12 − 15 galaxies per bin to obtain robust sample-averages.Accordingly, we use 3 bins for O3, O2, R23, and O32; 2 bins for Ne3O2, and 1 bin for the N-based line ratios.The binned medians make the turnover of O3 and R23 clear, while suggesting monotonic metallicity dependence for O32 and Ne3O2.
We fit strong-line ratio as a function of metallicity, adopting polynomial functional forms of different orders, represented as where x = 12 + log(O/H) − 8.0 = Z/5Z , R is the strong-line ratio, and the coefficients c i are determined from fitting.For each line ratio, fits are carried out on the subsample that is detected in that ratio.We adopt a second-order polynomial for O3 and R23, and a first-order (i.e., linear) relation for O2, O32, and Ne3O2, motivated by the trends in the binned medians as well as the shape of calibrations based on these line ratios in the local universe (e.g., Curti et al. 2020;Bian et al. 2018).The low number of detections and lack of a clear trend precludes fitting calibrations for N2, O3N2, and N2O2.
The best-fit calibrations are derived as follows.For each line ratio, we fit the individually-detected sources using an orthogonal distance regression (ODR).While ODR fitting can include inverse-variance weighting in both variables, we do not weight according to the uncertainties while fitting.Strong-line calibrations are known to have large intrinsic scatter due to the variation of physical properties such as ionization parameter at fixed metallicity (e.g., Pilyugin & Grebel 2016).This scatter is typically 0.1−0.3dex in line ratio at fixed O/H in local calibration samples, larger than the measurement uncertainty for most of the high-redshift sample.As such, any targets with very small error bars should not be more heavily weighted as the intrinsic physical scatter is still large, otherwise the outcome will be biased.Preventing this bias is especially important since the uncertainties on R and O/H vary widely across our sample.However, the uncertainties should still affect the error on the final best-fit coefficients.To achieve this goal while preventing an unrealistic over-weighting of objects with very high S/N, we perturb the data points according to their uncertainties 500 times and fit each of the realizations.Among the 500 realizations of best-fit relations, we compute the median R as a function of O/H of the functional fits and then infer the final best-fit coefficients by fitting the functional form to the resulting curve.
The best-fit calibrations for O3, O2, R23, O32, and Ne3O2 are shown by the black lines in Figure 5, and the best-fit coefficients are reported in Table 2.The best-fit relations derived in this way show good agreement with the binned medians.In contrast, fitting with inverse-variance weighting failed to match the median trends due to the effect described above.The gray shaded regions show the 1σ confidence interval on the best-fit calibrations, derived from the 500 realizations.These calibrations are valid over the range 12+log(O/H) = 7.0 − 8.4.The typical uncertainty of these calibrations due to measurement uncertainties, parameterized as the uncertainty in R at fixed O/H, is ≈ 0.05 dex.
We calculate the intrinsic scatter in line ratio at fixed O/H by computing a χ 2 statistic that includes the measurement uncertainty in both parameters as well as an intrinsic scatter term: where F R is the best-fit calibration for ratio R, R obs is the observed line ratio, σ R,obs is the measurement un-   certainty on R, ḞR is the derivative of F R evaluated at O/H of the source, σ O/H,obs is the measurement uncertainty on O/H, σ R,int is the intrinsic scatter term, and the sum is evaluated over all objects with a detection for R. We then vary the intrinsic scatter term to find the value of σ R,int for which the reduced χ 2 is equal to unity.The inferred intrinsic scatters are reported in Table 2, ranging from 0.06 to 0.29 dex.These values are generally similar to what has been found for z ∼ 0 galaxies and H ii regions for which O3 and R23 also show smaller intrinsic scatter than O32 and Ne3O2 at fixed O/H (e.g., Maiolino et al. 2008;Curti et al. 2020), though this is at least partly due to the fact that O3 and R23 span a smaller dynamic range than O32 and Ne3O2 in our sample.For the linear fits of O2, O32, and Ne3O2, we can convert the intrinsic scatter in R to an intrinsic scatter in O/H at fixed strong-line ratio of σ O/H,int = 0.20, 0.25, and 0.24 dex.

Comparison to literature calibrations
We compare the new empirical high-redshift calibrations derived in Sec. 4 to existing calibrations in the literature based on representative z ∼ 0 sources and extreme local galaxies that are analogs of high-redshift systems, as well as semi-empirical calibrations derived by applying photoionization modeling to high-redshift spectroscopic samples.Figure 6 compares our high-redshift calibrations (black lines) and binned medians (orange diamonds) to the literature strong-line calibrations.

Normal z ∼ 0 calibrations
We first compare to "normal" local-universe calibrations based on z = 0 H ii regions and/or z ∼ 0 starforming galaxies, shown as solid colored lines in Fig. 6 (Maiolino et al. 2008;Curti et al. 2017Curti et al. , 2020;;Sanders et al. 2021;Nakajima et al. 2022).Here, the term normal corresponds to calibrations for which the calibrating sample has ISM ionization conditions representative of what is typical at z ∼ 0, defined either by the average properties of nearby H ii regions or by galaxies falling on the z ∼ 0 star-forming main sequence.The high-redshift calibrations are distinct from normal local calibrations in several ways.The z ∼ 0 calibrations fail to reach the high O3 and R23 values that are common among the high-redshift sample, and generally have lower O3 and R23 at fixed O/H relative to the high-redshift calibrations.For the ionization-sensitive ratios, the highredshift calibrations have higher O32 and Ne3O2 than the local calibrations at fixed O/H.At fixed O/H, O2 may be slightly lower at high-redshift than at z ∼ 0 although the distinction is less clear than for O3, R23, O32, and Ne3O2.These offsets demonstrate that H ii regions in high-redshift galaxies are more highly ionized than their local counterparts at fixed metallicity.This result is consistent with many studies that have concluded that the evolution of line ratio excitation sequences between z ∼ 0 and z ∼ 2 − 3 arises because high-redshift galaxies have harder ionizing spectra at fixed O/H (e.g., Steidel et al. 2016;Strom et al. 2018;Shapley et al. 2019;Sanders et al. 2020;Topping et al. 2020a,b;Jeong et al. 2020;Cullen et al. 2021;Runco et al. 2021).The difference between the z ∼ 0 and new high-redshift calibrations shows how a different set of ionization conditions results in a change in the form of the metallicity calibrations.We conclude that normal local-universe calibrations should not be applied to high-redshift samples and will yield biased metallicity inferences if they are used at z 2. For O2, O32, and Ne3O2, local calibrations tend to underestimate the metallicity of high-redshift samples by ∼ 0.1 − 0.4 dex, consistent with what was found in Sanders et al. (2021).For O3 and R23 the direction of the bias will depend on whether an object is on the lower or upper branch.

Local analog calibrations
The use of calibrations based on extreme local galaxies that have properties analogous to high-redshift galaxies has recently become commonplace among high-redshift metallicity studies employing strong-line methods (e.g., Sanders et al. 2020Sanders et al. , 2021;;Wang et al. 2022;Matthee et al. 2022;Li et al. 2022;Nakajima et al. 2023).The dashed lines in Figure 6 show different empirical local

log(N2O2)
Figure 6.Comparison of the best-fit high-redshift calibrations (black lines) and high-redshift binned medians (orange diamonds) to a selection of strong-line calibrations from the literature including those calibrated to "normal" z ∼ 0 star-forming galaxies and H ii regions (Curti et al. 2020;Maiolino et al. 2008;Nakajima et al. 2022;Sanders et al. 2021); extreme local galaxies that are analogs of high-redshift galaxies (Bian et al. 2018;Pérez-Montero et al. 2021;Nakajima et al. 2022;Jones et al. 2015); and calibrations based on the application of photoionization model fitting to strong-line samples at z ∼ 1 − 3 (Papovich et al. 2022;Strom et al. 2018).For the line ratios involving [N ii] in the bottom row, we show the full set of high-redshift points since robust calibrations could not be fit with existing data for N2, O3N2, and N2O2.
analog calibrations (Jones et al. 2015;Bian et al. 2018;Pérez-Montero et al. 2021;Nakajima et al. 2022).Jones et al. (2015) fit their calibration to [O iii]λ4364-detected z ∼ 0 galaxies from SDSS that have high sSFRs due to this selection.We note also that the low metallicity (12+log(O/H) 8.1) data used by Curti et al. (2020) also employed individual [O iii]λ4364-detected SDSS galaxies that mostly fall near the z ∼ 2 starforming main sequence (Sanders et al. 2021), explaining why the Curti et al. (2020) calibrations tend to match the local analogs at low metallicity.
We find that the local analog calibrations perform much better than the normal z ∼ 0 calibrations in matching the high-redshift sample, consistent with what was found using ground-based T e -metallicities for ∼ 20 galaxies at z ∼ 2 by Sanders et al. (2020).One challenge in using local analogs is that their metallicity range is typically limited, thus limiting the usable range of the calibration without relying on extrapolation.For example, the Bian et al. (2018) sample spans 12+log(O/H) = 7.8 − 8.4, failing to reach low enough metallicities to be relevant for many of the z > 6 or low-mass z ∼ 2 galaxies that now have JWST spectra.The Nakajima et al. (2022) 2022) constructed semi-empirical calibrations using z ∼ 1 − 3 star-forming galaxies by first employing photoionization model grids to deriving metallicities of the samples and then fitting strong-line ratio vs. metallicity relations to the resulting distributions.While these model-based calibrations reach high enough R23 values to match the peak of the high-redshift sample used in this work, the turnover point is shifted 0.2 − 0.3 dex higher in metallicity (top right panel of Fig. 6).Likewise, the Strom et al. (2018) N2 and O3N2 calibrations are also shifted toward higher O/H at fixed line ratio relative to the high-redshift median.These model-based calibrations do not reproduce metallicities on the empirical T e scale.

Nitrogen-based indicators at high redshift
The utility of line ratios including [N ii] to derive metallicities at high redshift has been a subject of debate since it was pointed out that N/O may evolve with redshift at fixed O/H or be less tightly coupled to O/H at high redshifts (e.g., Masters et al. 2014Masters et al. , 2016;;Steidel et al. 2014Steidel et al. , 2016;;Sanders et al. 2016a;Strom et al. 2017Strom et al. , 2018Strom et al. , 2022;;Hayden-Pawson et al. 2022;Sanders et al. 2023b).Interestingly, strong rest-UV N iii] and N iv] lines recently reported in the spectrum of GN-z11 at z = 10.6 potentially imply a super-solar N/O ratio despite having ∼ 10% solar O/H (Bunker et al. 2023;Cameron et al. 2023).Despite early JWST observations more than doubling the existing high-redshift auroralline sample, [N ii] is only detected for 12/46 objects with [N ii] upper-limits dominating at 12+log(O/H) < 7.9.The inherent weakness of [N ii] in metal-poor highredshift systems by itself suggests that N-based calibrations are far less useful than those based on O and Ne line ratios at z > 2. Comparing the median of the 12 [N ii]-detected sources to the literature calibrations, we find that the existing calibrations have lower N2 at fixed O/H on average than the high-redshift sample.The median O3N2, on the other hand, matches existing T e -based calibrations well, with the apparent evolution toward higher O3 and N2 at fixed O/H canceling out.Focusing on N2O2, we find that the [N ii]-detected high-redshift galaxies have higher N2O2 at fixed O/H than both z ∼ 0 normal and analog calibrations.Since N2O2 correlates tightly with N/O (e.g., Pérez-Montero & Contini 2009;Strom et al. 2017), this offset suggests that the N/O vs. O/H relation found at z ∼ 0 does not hold in the same form at high redshift.A significantly expanded sample of high-redshift galaxies with both T e measurements and [N ii] detections is required to robustly assess the N-based indicators.Until such a sample is available, metallicity indicators based on [N ii] should be used with great caution at high redshifts.
5.2.Applicability of the calibrations over z = 2 − 9 A question that must be addressed is whether a single set of calibrations can truly yield accurate results over the wide redshift range of our sample, spanning z = 2 − 9. We investigated residuals around the best-fit calibrations as a function of redshift, and did not find any significant trends.Figure 7 shows the residuals in R23 relative to our best-fit calibration for the combined aurora-line sample, displaying an essentially flat distribution.This result suggests that these calibrations are equally accurate at z ∼ 2 and z ∼ 8. Points in Fig. 7 are color-coded by O/H.We further find that, within this sample, metallicity is fairly evenly distributed as a function of redshift such that the fit in a particular metallicity range is not dominated by galaxies from the lower or higher end of the redshift range.
A single set of metallicity calibrations could hold over z = 2 − 9 if the typical ionization conditions in H ii regions do not significantly evolve over this redshift interval.Using a sample of 164 star-forming galaxies at z = 2 − 9 from CEERS, in Sanders et al. (2023a) we recently showed that galaxies at z = 2.7 − 6.5 fall on the same excitation sequence as z = 2.0 − 2.7 galaxies in the [O iii]/Hβ vs. [N ii]/Hα and [S ii]/Hα "BPT" diagrams and the O32 vs. R23 diagram.This result suggests that, within the constraints offered by the admittedly limited current JWST spectroscopic samples, ISM ionization conditions do not significantly change between z ∼ 2 and z ∼ 6.In contrast, clear evolution in these excitation sequences is present between z ∼ 0 and z ∼ 2, demonstrating distinct ionization conditions that manifest as distinct line ratio vs. metallicity sequences as shown in this work.More JWST spectroscopy is ultimately needed to confidently answer whether a single calibration set applies at z ∼ 2 and in the epoch of reionization, requiring tighter constraints on line ratio excitation sequences across this redshift range and more T e measurements where the sample could be subdivided into multiple redshift bins.

Appropriate uses of the calibrations
To ensure robust metallicity inferences, it is important that strong-line calibrations like the ones constructed in this work are used appropriately.First, strong-line calibrations should ideally only be used on samples that fall within the same line ratio and metallicity range as the calibration sample, otherwise an uncertain extrapolation is required.For our new high-redshift calibrations, the valid range is 12+log(O/H) = 7.0 − 8.4.Second, it is clear in Fig. 5 and in Table 2 that the intrinsic scatter of these relations is large.This is generally true of all strong-line calibrations, whether local or high redshift, because the correlations between nebular metallicity and properties including ionization parameter, density, ionizing spectral shape, and N/O have significant scatter (e.g., Pérez-Montero & Contini 2009;Sanders et al. 2016a;Pérez-Montero 2014).Consequently, metallicity derived via the strong-line method for a single object necessarily carries a large uncertainty.However, determining the average metallicity across a sample, potentially in multiple bins, can achieve a high degree of accuracy, where the uncertainty due to intrinsic scatter reduces by √ N for a sample of N objects.Metallicities derived for single galaxies using our new calibrations should thus include uncertainty due to the intrinsic scatter.Ideally, studies of the MZR and FMR in the high-redshift universe should utilize sufficiently large samples to statistically reduce the effects of this intrinsic scatter to obtain accurate mean relations.Finally, we caution against the use of these calibrations (or normal z ∼ 0 calibrations) with samples at intermediate redshifts (i.e., z = 0.5 − 1.5), which appear to have ionization conditions distinct from those at z ∼ 0 but less extreme than those at z ∼ 2 (e.g., Shapley et al. 2019;Hirtenstein et al. 2021).

Areas of improvement for high-redshift calibrations
While the new calibrations presented here represent a major step forward for high-redshift metallicity determinations, there remain clear directions to improve these calibrations with additional observations.The metallicity ranges encompassing the lower and upper O3 and R23 branches at 12+log(O/H) 7.5 and 12+log(O/H) 8.3 are not well populated compared to the O3 and R23 peak where the majority of the sample lies.The O3 and R23 peak is the region in which [O iii]λ4364 will be brightest relative to Balmer lines like Hβ and Hα, such that this is the easiest metallicity range in which to detect [O iii]λ4364.Improving statistics for [O iii]λ4364 at both low and high metallicities requires deeper spectroscopy than the CEERS/NIRSpec medium grating observations.Since this program had on-source integration times of ∼ 1 hour per grating, significant improvement in the limiting line flux will be achieved in JWST/NIRSpec programs featuring several-hour integrations.There is additional promise for obtaining more T e constraints at high metallicity (12+log(O/H) 8.4 = 0.5 Z ) using the low-ionization [O ii]λλ7322,7332 auroral lines that should be detectable at higher metallicities and lower T e than [O iii]λ4364.
The paucity of [N ii] detections is another clear weakness of the current high-redshift auroral-line sample, preventing the robust assessment and formulation of calibrations for N-based line ratios that are among the most common metallicity indicators used for local-universe samples.Deeper spectroscopy is again the solution, where achieving reasonable completeness in [N ii] for a sample similar to the one used in this study must have sufficient sensitivity to detect lines 30 times weaker than Hα (see also Sanders et al. 2023a;Shapley et al. 2023a).
A final and significant outstanding problem is whether typical main-sequence galaxies at these redshifts follow the same calibration relations as the objects with detected auroral lines in the current sample.Selecting samples based on the detection of very faint auroral emission lines necessarily introduces a bias, typically toward high-SFR and high-[O iii] equivalent width sources.Sanders et al. (2020) demonstrated that this is indeed the case for the ground-based z ∼ 2 − 3 auroralline sample.We find that the JWST auroral-line emitters at z ∼ 2−6 likewise fall above the mean star-forming main sequence at these redshifts (Fig. 4; Speagle et al. 2014).At z > 6, more JWST spectroscopy and imaging is required to robustly characterize the typical starforming population before we can robustly asses how representative (or not) the T e sample is.JWST shows great promise to address all of these shortcomings in the next few years.

SUMMARY AND CONCLUSIONS
We report detections of the temperature-sensitive [O iii]λ4364 auroral emission line for 16 galaxies at z = 2.1 − 8.7 from the CEERS survey, measured from medium-resolution JWST/NIRSpec observations.The [O ii]λλ7322,7332 auroral emission line was also detected for two of these sources, the first high-redshift galaxies with constraints on T e in both the low-and high-ionization zones.We combine the CEERS sample with 9 galaxies at z = 4 − 9 from the literature with auroral-line detections from JWST/NIRSpec and 21 objects with auroral-line detections from ground-based spectroscopy at z = 1 − 4. The combined high-redshift auroral-line sample comprises 46 star-forming galaxies at z = 1.4 − 8.7, more than doubling the sample size from ground-based observations over the past decade.We calculate T e and direct-method oxygen abundances for this sample, and construct the first T e -based empirical strong-line metallicity calibrations based purely on high-redshift galaxies (Fig. 5).These calibrations, presented in Table 2, are valid over the metallicity range 12+log(O/H) = 7.0 − 8.4.
Our new calibrations, derived directly from observations of high-redshift sources, represent a significant step forward in our ability to derive accurate metallicities in the early universe.Studies of the MZR and FMR at high redshifts no longer need to rely on local-universe metallicity calibrations or the indirect approach in which extreme local galaxies are assumed to be analogs of highredshift systems.These measurements also provide important empirical tests for any theoretical photoionization model-based methods of deriving metallicities at high redshift.
We compared these calibrations to strong-line calibrations from the literature, finding that the high-redshift calibrations have higher O3, R23, O32, and Ne3O2 line ratios at fixed O/H relative to normal z ∼ 0 calibrations.This redshift evolution of strong-line calibration functions is driven by evolving ISM ionization conditions between z ∼ 0 and z ≥ 2 identified in studies of starforming galaxies at z ∼ 2 − 3 (e.g., Steidel et al. 2016;Shapley et al. 2019;Topping et al. 2020a).Local analog calibrations display much better agreement with the high-redshift data for these line ratios and in particular reach the high O3 and R23 values that normal z ∼ 0 calibrations fail to reproduce, but still do not display consistent agreement with the high-redshift calibrations across all line ratios and regions of parameter space.
The current high-redshift T e sample features only 12 detections of the [N ii]λ6585 emission line used in some of the most commonly-used metallicity indicators at z ∼ 0, including the N2 and O3N2 ratios.Consequently, calibrations of N-based indicators cannot yet be robustly assessed at high redshift.The current sample is also lacking good statistics at both very low (12+log(O/H) 7.5) and high (12+log(O/H) 8.4) metallicities.Deep spectroscopy with several hours of integration per pointing with JWST/NIRSpec promises to improve both of these shortcomings by detecting fainter [O iii]λ4364 lines at low metallicity and lowionization [O ii]λλ7322,7332 lines at high metallicity, while also increasing the detection rate of [N ii] which has proven to be very faint in z > 4 systems.
The new high-redshift metallicity calibrations presented in this work will yield an immediate improvement to strong-line metallicities in existing and future z > 2 spectroscopic samples.They are applicable over a red-

Figure 1 .
Figure 1.Redshift distribution of the combined highredshift auroral-line sample (gray) and the constituent samples from CEERS JWST/NIRSpec observations (green), additional JWST/NIRSpec auroral-line emitters in the literature (red), and sources with auroral-line detections from ground-based spectroscopy (blue).

Figure 2 .
Figure 2. 1D and 2D spectra displaying the detected [O iii]λ4364 emission lines and Hγ for the 16 galaxies in the CEERS auroral-line sample.The black line displays the 1D science spectrum, while the gray shaded region shows the 1D error spectrum.The blue, green, and red solid lines display the best-fit continuum, and Hγ and [O iii]λ4364 line profiles, respecitvely.The dotted vertical lines show the rest-frame wavelengths of these transitions.

Figure 3 .
Figure 3. 1D and 2D spectra showing detections of [O ii]λλ7322,7332 for two CEERS targets.The blue line shows the best-fit continuum model, while the red line shows the combined fit to [O ii]λ7322 and [O ii]λ7332.

Figure 4 .
Figure 4. SFR vs. M * for the CEERS, JWST literature, and ground-based auroral-line samples, color-coded by redshift.The dashed lines show the mean star-forming main sequence parameterization of Speagle et al. (2014) evaluated at z = 2, z = 4, and z = 6 on the same color scale.All SFRs are derived from dust-corrected Hα or Hβ luminosity.Literature data and the Speagle et al. (2014) relation have been converted to a Chabrier (2003) IMF, and their SFRs have been lowered by 0.34 dex to account for the difference between the solar-metallicity conversion factors used in those works and the low-metallicity BPASS binary conversion factor employed here(Shapley et al. 2023b).
[S ii]λλ6718,6733 doublet when both lines are detected.The spectral resolution of R ∼ 1000 offered by the medium-resolution NIRSpec gratings is insufficient to resolve the components of the [O ii]λλ3727,3730 doublet such that n e cannot be reliably constrained using [O ii] in the CEERS or ERO SMACS 0723 data, though observations taken with the R ∼ 2700 high-resolution NIRSpec gratings used in GLASS are sufficient.If the spectral resolution is too low to Nyquist sample the separation between the [O ii] doublet members, the inferred doublet ratio is biased toward unity (Sanders et al. 2016a).The pyneb Python package was used to calculate n e ([S ii]) using the S + collision strengths of Tayal & Zatsarinny (2010) assuming T e =15,000 K, though this calculation is very weakly dependent on T e .Both components of the [S ii] doublet were detected with S/N≥3 for 4 objects in the CEERS auroral-line sample, with the inferred densities ranging from the lowdensity limit to n e ≈ 1000 cm −3 with large uncertainties (Table1).Isobe et al. (2023) report n e ([O ii]) for GLASS 150029 and 160133, finding values of 158 cm −3 and 234 cm −3 , respectively, yielding a total of 6/25 JWST sources in our samples with n e estimates.The same authors also report [O ii] densities for several of the CEERS auroral-line targets, but these n e constraints are unreliable because the separate components of the [O ii] doublet are not resolved at R ∼ 1000.In the ground-based auroral-line sample, 17/21 sources have n e measurements based on [O ii] from R ≥ 3000 spectra or [S ii], T e ([O ii]) of 9710 +880 −850 K for 11088 and 12, 140 +2940 −2790 K for 3788.These values are generally similar to their high-ionization temperatures of T e ([O iii]) ≈ 12, 000 K. For the vast majority of the JWST sample, a lowionization auroral line (e.g., [O ii]λλ7322,7332) is not detected such that T e ([O ii]) cannot be directly calculated.Following the common practice in local-universe abundance studies, we adopt a parameterized relation of T e ([O ii]) as a function of T e ([O iii]).We use the relation of Campbell et al. (1986): T e (O ii) = 0.7 × T e (O iii) + 3, 000 K (9) We use this relation to infer T e ([O ii]) for all objects in the sample with only direct T e ([O iii]) measurements.Of the two objects with measured T e ([O ii]) and T e ([O iii]), 11088 is offset 1.3σ from this line, while 3788 is consistent at the < 1σ level, suggesting that this relation is reasonable.However, there is notable uncertainty about the form of the T e ([O ii])−T e ([O iii]) relation even at z = 0, and the relation appears to have a large intrinsic scatter (Rogers et al. 2021).Our results do not significantly change if we instead assume T e ([O ii])=T e ([O iii]).In the ground-based literature sources, T e ([O iii]) is based on [O iii]λ4364 for 8 objects and O iii]λ1666 for 11 objects.The remaining two ground-based objects have T e ([O ii]) measurements from [O ii]λλ7322,7332.The same T e ([O ii])−T e ([O iii]) relation was adopted in the ground-based literature analyses.

Figure 5 .
Figure 5. Strong-line ratios vs. direct-method oxygen abundance.The CEERS, JWST literature, and ground-based Te samples are shown in green, red, and blue, respectively.Orange diamonds represent median values in bins of O/H for the combined high-redshift sample.The solid black lines in the top two rows display the best-fit calibrations (Table2 and equation 11).The gray shaded region shows the 1σ confidence interval on the best-fit relations.
median z = 2 − 9 calibrations (this work) normal z = 0 calibrations: analog calibrations only cover very low metallicities at 12+log(O/H) < 8.0.In contrast, our z = 2 − 9 auroral-line sample spans 12+log(O/H) = 7.0 − 8.4, offering useful calibrations over a much wider total metallicity range.The local analog calibrations match our new calibrations relatively well in O32 and Ne3O2 though there is some deviation at 12+log(O/H) 7.8, where the Nakajima et al. (2022) relation is flatter while the Bian et al. (2018) relation is steeper.For O3 and R23, the local analog calibrations tend to fall off more steeply in both the lower and upper branch than the high-redshift calibrations.Our new calibrations, derived directly from a large T e -based sample at z = 2 − 9, offer a more robust route to accurate strong-line metallicities in the early universe than relying on the indirect analog connection.5.1.3.Model-based high-redshift calibrations Strom et al. (2018) and Papovich et al. (

Figure 7 .
Figure 7. Residuals in R23 relative to the best-fit R23 calibration (Table 2) vs. spectroscopic redshift for the combined high-redshift sample, color-coded by O/H.The residuals have no systematic dependence on redshift.
for O 2+ and Kisielius et al. (2009) for O + .We assume that all O is in either the O 2+ or O + states inside H ii regions, such that (Jones et al. 2023)ferred from [O iii]λ5008/Hβ using T e ([O iii]), while O + /H + is derived from the dust-corrected [O ii]λ3728/Hβ ratio assuming the directly-constrained T e ([O ii]) when available or T e ([O ii]) calculated with equation 9 otherwise.Two JWST targets lack [O ii]λ3728 coverage: CEERS 1651 and GLASS 150008.For CEERS 1651, [O ii] falls in the chip gap in the G235M observations, while the position of GLASS 150008 on the NIRSpec MSA mask was such that [O ii] fell off the detector(Jones et al. 2023).For the metallicity calculations of these two targets, we infer the dust-corrected [O ii]λ3728 flux from the dust-corrected [O iii]λ5008 flux assuming O32=5, a typical value for the JWST auroral-line sources, while O32 values uniformly distributed between 2 and 10 were adopted in the uncertainty calculations.The directmethod oxygen abundances are reported in Table1for the CEERS auroral-line sample.Metallicities of the CEERS objects range from 12+log(O/H) = 7.1 − 8.3 with a median value of 12+log(O/H) = 7.8 (0.13 Z ),

Table 2 .
Best-fit coefficients for the high-redshift strong-line metallicity calibrations (equation 11 and Fig. 5).Estimate of the formal uncertainty in R at fixed O/H of the best-fit calibration.d Intrinsic scatter in R at fixed O/H about the best-fit calibration, after accounting for measurement uncertainties.