Evolution of the Mass–Metallicity Relation from Redshift z ≈ 8 to the Local Universe

A tight positive correlation between the stellar mass and the gas-phase metallicity of galaxies has been observed at low redshifts. The redshift evolution of this correlation can strongly constrain theories of galaxy evolution. The advent of JWST allows probing the mass–metallicity relation at redshifts far beyond what was previously accessible. Here we report the discovery of two emission line galaxies at redshifts 8.15 and 8.16 in JWST NIRCam imaging and NIRSpec spectroscopy of targets gravitationally lensed by the cluster RX J2129.4+0005. We measure their metallicities and stellar masses along with nine additional galaxies at 7.2 < z spec < 9.5 to report the first quantitative statistical inference of the mass–metallicity relation at z ≈ 8. We measure ∼0.9 dex evolution in the normalization of the mass–metallicity relation from z ≈ 8 to the local universe; at a fixed stellar mass, galaxies are 8 times less metal enriched at z ≈ 8 compared to the present day. Our inferred normalization is in agreement with the predictions of FIRE simulations. Our inferred slope of the mass–metallicity relation is similar to or slightly shallower than that predicted by FIRE or observed at lower redshifts. We compare the z ≈ 8 galaxies to extremely low-metallicity analog candidates in the local universe, finding that they are generally distinct from extreme emission line galaxies or “green peas,” but are similar in strong emission line ratios and metallicities to “blueberry galaxies.” Despite this similarity, at a fixed stellar mass, the z ≈ 8 galaxies have systematically lower metallicities compared to blueberry galaxies.


INTRODUCTION
The gas-phase metallicity of a galaxy measures its current state of chemical enrichment, holding a record of its star-formation history (SFH), gas infall, feedback, and merger history.These mechanisms are not identical for galaxies of different stellar mass at a given redshift, as evidenced by the positive empirical correlation between the gas-phase metallicity and stellar mass: the massmetallicity relation (van den Bergh 1968;Peimbert & Spinrad 1970;Lequeux et al. 1979).This correlation has been extensively studied in the local Universe with numerous works deriving a tight mass-metallicity relation that spans five decades of stellar mass from 10 7 M ⊙ to 10 12 M ⊙ and only starts to saturate in metallicity at M ⋆ > 10 10 M ⊙ (Tremonti et al. 2004;Lee et al. 2006;van Zee & Haynes 2006;Kewley & Ellison 2008;Mannucci et al. 2010;Berg et al. 2012a;Pérez-Montero et al. 2013;Pilyugin et al. 2013;Andrews & Martini 2013;Haurberg et al. 2013Haurberg et al. , 2015;;Lian et al. 2015;Ly et al. 2016;Blanc et al. 2019;Maiolino & Mannucci 2019;Curti et al. 2020;Sanders et al. 2021).
Despite these challenges, several groups have attempted to measure gas-phase metallicities at redshifts beyond 3.5 through alternative methods.Faisst et al. (2016) used a calibration of the rest-ultraviolet (UV) absorption lines to estimate the metallicity in three mass bins at z ≈ 5 from stacked spectra of a sample of 3.5 < z < 6.0 galaxies, detected in the cosmic evolution survey (COSMOS; Scoville et al. 2007) and spectroscopically confirmed with the Deep Imaging Multi-object Spectrograph (DEIMOS; Faber et al. 2003).Jones et al. (2020) presented a calibration of the ALMA-accessible (the Atacama Large Millimeter/submillimeter Array) far-infrared [O III]λ88µm emission-line intensity as a direct-method (i.e., calibrated against the "direct T e method") metallicity estimator, and used it to measure the metallicities of a sample of six galaxies (five with mass measurements) at z ≈ 8.However, these studies could not significantly constrain the mass-metallicity relation at high redshifts because of the small sample size and large statistical and/or systematic uncertainties (see Maiolino & Mannucci 2019, for a discussion on systematic uncertainties).
Tuned to reproduce the mass-metallicity relation at z < 3.5, theoretical models and simulations of galaxy evolution have predicted the shape and normalization of the mass-metallicity relation at higher redshifts.Ma et al. (2016) inferred the mass-metallicity relation and its evolution up to z = 6 from the FIRE simulations and demonstrated reasonable agreement with the observed relation and its evolution up to z = 3 for a broad range in stellar mass.Ma et al. (2016) concluded that the redshift evolution of the mass-metallicity relation coincides with the redshift evolution of the stellar mass fraction (see, e.g., Behroozi et al. 2013Behroozi et al. , 2019Behroozi et al. , 2020)), potentially pointing toward a universal relation between the stellar mass, gas mass, and metallicities.Although pending empirical confirmation, their results can be extrapolated to redshifts beyond the current observational limits.Similar conclusions have been made based on EAGLE (Schaye et al. 2015;Lagos et al. 2016;De Rossi et al. 2017), Illustris TNG (Torrey et al. 2019), and FirstLight (Langan et al. 2020) simulations and reproduced by semi-analytic models (see, e.g., Hirschmann et al. 2016;Ucci et al. 2023).
Furthermore, theoretical models and simulations have identified the stellar and active galactic nucleus (AGN) feedback-driven outflows, the metal content of the outflows in comparison to the interstellar medium (ISM), the shape and evolution of the stellar initial mass function (IMF), and the dependency of stellar yields on redshift and galaxy stellar mass as the primary drivers of shape and normalization of the mass-metallicity relation (see, e.g., Lian et al. 2018, and references therein).Probing the mass-metallicity relation at z ≈ 8 and beyond is of critical importance in characterizing the mechanisms deriving the shape and redshift evolution of the mass-metallicity relation, because this is the epoch when galaxies are expected to have much simpler star-formation histories (SFHs), feedback histories, and merger histories which allow for a more robust comparison with galaxy evolution theoretical models and simulations.
The NIRSpec instrument (Jakobsen et al. 2022) onboard JWST has already demonstrated tremendous capability in spectroscopically confirming the high-redshift NIRCam-selected candidates with relative ease (see, e.g., Carnall et al. 2023;Williams et al. 2022;Roberts-Borsani et al. 2022;Morishita et al. 2022).For the first time, NIRSpec enables high signal-to-noise ratio (S/N) detections of the rest-frame optical metallicity diagnostic emission lines with high spectral resolution; this has resulted in "direct T e " or "strong line" metallicity measurements of a growing sample of galaxies at z ≈ 8 and beyond (see, e.g., Curti et al. 2022;Schaerer et al. 2022;Williams et al. 2022).
In this work, we present the discovery of two galaxies detected in the field of the foreground lensing cluster RX J2129.4+0005, in imaging and spectroscopy acquired as part of a Director's Discretionary program (DD-2767; PI P. Kelly) to observe a strongly lensed background supernova.They have spectroscopic redshifts of z = 8.16 (RX2129-ID11002) and 8.15 (RX2129-ID11022), based on emission lines detected with NIRSpec prism observations.We obtain gas-phase metallicity measurements for these galaxies using restframe optical emission-line metallicity indicators.We combine these new measurements with literature JWST and ALMA metallicity measurements of galaxies at z ≈ 8 to construct a sample of eleven galaxies with "direct T e ", "strong line", or far-infrared emission line metallicity measurements at this redshift.We measure the stellar masses of the entire sample of eleven galaxies, and for the first time significantly constrain both the slope and the normalization of the mass-metallicity relation at z ≈ 8 as well as the evolution of its normalization from z ≈ 8 to the present day.
Young, low-metallicity galaxies in the nearby universe have been proposed as analogs of high-redshift galaxies.In particular, the so-called "extreme emission-line galaxies" (EELGs), "green peas," and "blueberry galaxies" are interesting candidates for having properties similar to those that are being revealed at high redshift.EELGs (Amorín et al. 2015) were identified in zCOSMOS as z < 1 galaxies with strong emission lines; higher redshift EELGs (z ≈ 3) were proposed to be analogs of very high redshift galaxies (Amorín et al. 2017).Green peas (Cardamone et al. 2009;Yang et al. 2017b) are compact SDSS galaxies with strong [O III] in the range 0.14 < z < 0.36; their properties are very similar to those of EELGs.Blueberry galaxies are similar to green peas, but are selected to be at low redshifts (z < 0.05) and hence probe fainter luminousities and lower stellar masses (Yang et al. 2017a).The first three JWST NIRSpec-identified galaxies at z ≈ 8 have been dis-cussed in this context and have been likened individually to green peas or blueberry galaxies (Schaerer et al. 2022;Taylor et al. 2022;Rhoads et al. 2022;Katz et al. 2023).For the six JWST detected galaxies at z ≈ 8, we find that their emission-line properties are very similar to those of blueberry galaxies as a population.However, we find that the z ≈ 8 galaxies stand out from the blueberry galaxies in the mass-metallicity diagram.At a given metallicity, z ≈ 8 galaxies have higher stellar masses than blueberry galaxies or green peas.
2. SAMPLE of z ≈ 8 GALAXIES With the addition of strong line metallicity 1 measurements for the RX2129-ID11002 and RX2129-ID11022 galaxies presented in this work (see Section 3.2 for metallicity measurements), we can construct a sample of eleven z ≈ 8 galaxies with available multiband photometry and metallicities measured through either the direct T e method or other empirical methods calibrated against this method (i.e., the strong line method and the Jones et al. (2020) calibration of the [O III]λ88µm emission-line intensity; see Section 3.2 for more details on both methods).This sample includes the RX2129-ID11027 galaxy presented in Williams et al. (2022), the three galaxies detected in the field towards the SMACS J0723.3−7327galaxy cluster (Carnall et al. 2023), and the pre-JWST sample compiled by Jones et al. (2020).In this Section we provide an overview of this sample.The spectroscopic redshifts, magnification factors, and original literature references for the entire sample are available in Table 4.

RX2129 galaxies
The imaging of the RX J2129.4+0005galaxy cluster (RX2129 for short) was obtained as part of the DD-2767 program (PI: Kelly) with the JWST NIRCam instrument in the F115W, F150W, F200W, F277W, F356W, and F444W filters.We present a color-composite image in Figure 1.The details of NIRCam observations and data reduction are presented in Williams et al. (2022).Spectra for a sample of high redshift galaxy candidates, identified using the EAZY (Brammer et al. 2008)  We do not expect these galaxies to be multiply imaged; the lensing magnifications are presented in Table 2.
the same DD program.The NIRSpec spectra were obtained in multi-object spectroscopy mode with the prism disperser, which provides wavelength coverage from 0.6 µm to 5.3 µm.The spectral resolution ranges from R ≈ 50 at the blue end to R ≈ 400 at the red end.Based on this spectra, three candidates were confirmed at z spec > 8: RX2129-ID11002, RX2129-ID11022, and RX2129-ID11027.
The photometry of RX2129-ID11002 and RX2129-ID11022 are presented in Table 1; their color-composite images are presented in the smaller panels of Figure 1.We measure the lensing magnification of these galaxies based on the model presented in Williams et al. (2022) (see also Caminha et al. 2019;Jauzac et al. 2021).This model is constructed using glafic (Oguri 2010(Oguri , 2021)), and the measured magnifications are further confirmed with the Zitrin-parametric code (Zitrin et al. 2015).The

Å]
Figure 5. Same as Figure 4, but for the RX2129-ID11022 galaxy at zspec = 8.15.The 1D spectrum is shown in Figure 3.We also present the NIRSpec spectra of RX2129-ID11002 and RX2129-ID11022 and establish their spectroscopic redshifts.The NIRSpec data of RX2129-ID11002 and RX2129-ID11022 were reduced following the method described in Williams et al. (2022); we measure z spec = 8.16 ± 0.01 and z spec = 8.15 ± 0.01, respectively.The reduced spectra of these galaxies are presented in Figures 2 and 3. We present their strong line analysis and metallicity measurements in Section 3.
The NIRCam photometry and NIRSpec spectra of the RX2129-ID11027 galaxy were reduced and presented in Williams et al. (2022), measuring z spec = 9.51 ± 0.01.Williams et al. (2022) also reported the strong line analysis and metallicity measurement of this galaxy.The ionization properties of all three RX2129 galaxies, including their UV magnitudes, UV slopes, escape fractions of ionizing radiation, and ionizing photon production efficiencies are reported in Lin et al. (2023).

Pre-JWST sample
Among the z ≈ 8 galaxies that have been spectroscopically confirmed prior to the launch of JWST, metallicities for six galaxies have been measured by Jones et al. (2020) using the ALMA-measured intensity of the [O III]λ88µm emission line (see Section 3.2 for more details).Multi-band photometry for five of these galaxies are available in the literature (see Table 4 for references).In the case of the BDF-3299 galaxy, the 6th galaxy in Jones et al. (2020) sample, there is a significant spatial offset between the far-infrared emission line and the restframe UV continuum.Therefore the measured metallicity does not correspond to the same region probed by photometry.Moreover, this galaxy has been detected in only one photometry band (Vanzella et al. 2011) which is not sufficient for an accurate stellar mass measurement; therefore we do not include it in our sample.

Line intensity measurement
We measure the intensity of emission lines in the NIR-Spec 1D spectrum of RX2129-ID11002 and RX2129-ID11022 using the Penalized PiXel-Fitting package (pPXF; Cappellari & Emsellem 2004;Cappellari 2017Cappellari , 2022)).pPXF adopts a maximum penalized likelihood method (Merritt 1997) to subtract the stellar continuum by modelling it with a stellar population and measures the line fluxes by fitting them with Gaussian profiles.We use the same pPXF setup as described in Williams et al. (2022), with the MILES stellar library (Sánchez-Blázquez et al. 2006;Falcón-Barroso et al. 2011).The pPXF Gaussian fits to the emission lines of RX2129-ID11002 and RX2129-ID11022 are shown in Figures 4  and 5, respectively.The measured emission line fluxes are reported in Table 2.
The [O III]λλ4959, 5007 Å doublet is resolved in our NIRSpec spectra.We fit each of the Å flux ratios of 3.07 ± 0.19 and 1.79 ± 0.48 for RX2129-ID11002 and RX2129-ID11022, respectively.The measured ratio for RX2129-ID11002 is in very good agreement with the 2.98 value set by atomic physics (Storey & Zeippen 2000).While this is not the case for RX2129-ID11022, it can be explained by the much lower S/N of the observed spectrum of this galaxy.We investigate this further by re-running pPXF with the ratio of [O III]λλ4959, 5007 Å doublet fixed to the 2.98 value set by atomic physics.The measured Table 2. Strong emission line flux measurements from the NIRSpec 1D spectrum of the RX2129-ID11002 and RX2129-ID11022 galaxies (see Figures 2 and 3).The flux and 1σ uncertainties are reported in units of 10 −19 erg s −1 cm −2 .These measurements are not corrected for lensing magnification.In the bottom row we report the gas-phase metallicities measured using the strong line method from Izotov et al. (2019) (see Section 3.2).λλ3727, 3729 3.51 ± 1 a Based on the model constructed using glafic (Oguri 2010(Oguri , 2021)), but also confirmed using the Zitrin-parametric code (Zitrin et al. 2015); see Williams et al. (2022) for a detail description of our lens model.b Includes both the statistical and systematic 1σ uncertainty c 1σ upper limit flux of [O III]λλ4959, 5007 Å doublet as well as the strong line metallicity measurement for RX2129-ID11002 remain intact.This setup results in measuring 2±17% less [O III]λλ4959, 5007 Å doublet flux for RX2129-ID11022, but only decreases the 1σ metallicity upper limit (see Section 3.2) for this galaxy by 0.21 dex.By re-fitting the mass-metallicity relation, we confirm that this does not affect its best-fit normalization and slope (see Section 5).

Emission line
In this Section we compare the emission line properties of our sample of z ≈ 8 galaxies with those of extremely low-metallicity analog candidates in the local Universe.Izotov et al. (2019) suggested the use of two emission line diagnostic diagrams to select extremely low-metallicity galaxies 2 : [O III]λ5007 Å/Hβ vs [O II]λλ3727, 3729 Å/Hβ and O32 vs (R23−0.08O32).This was motivated by their calibration of the "strong line" metallicity measurement method, where metallicity is calculated as a function of O32 and R23 (see Equation 1 and Section 3.2).Here, we compare the locations of z ≈ 8 NIR-Spec emission line galaxies on these diagnostic diagrams with those of the proposed low-metallicity local Universe analogs: EELGs, green peas, and blueberry galaxies.
Figure 6 shows the [O III]λ5007 Å/Hβ flux ratio plotted against the [O II]λλ3727, 3729 Å/Hβ flux ratio for the galaxies in our z ≈ 8 sample (large colored data points) for which these line intensity measurements are available; this only includes the six NIRSpec emission linedetected galaxies in RX2129 and SMACS0723.For the RX2129-ID11027 galaxy we use the line ratios as calculated in Williams et al. (2022).For the SMACS0723 galaxies we use the line ratios from Curti et al. (2022).We do not adopt any extinction correction for the z ≈ 8 galaxies, consistent with the negligible extinction reported for the RX2129-ID11027 and SMACS0723 galaxies (see Williams et al. 2022;Curti et al. 2022, respectively) as well as our photometry analysis of RX2129-ID11002 and RX2129-ID11022 in Section 4. Figure 7 shows the O32 plotted against (R23−0.08O32)for the galaxies in our z ≈ 8 sample (large colored data points).
The high [O III]λ5007 Å/[O II]λλ3727, 3729 Å ratio of these z ≈ 8 galaxies is typical of EELGs.In Figures 6  and 7, for comparison we also include the EELGs from the z ≲ 1 sample compiled by Amorín et al. (2015) from the zCOSMOS spectroscopic follow up survey (Lilly et al. 2007) of the COSMOS field (Scoville et al. 2007).These authors report extinction-uncorrected emission line flux measurements as well as the reddening constant c(Hβ) derived from either the Hα/Hβ or Hγ/Hβ ratios where available or spectral energy distribution fitting otherwise.We correct for extinction assuming a Cardelli et al. (1989) extinction law with R V = 3.1.
Young low-metallicity galaxies in the local Universe such as the green peas and blueberry galaxies (see Cardamone et al. 2009;Yang et al. 2017b,a) are proposed as spectroscopic analogs of the high redshift galaxies.Both samples are selected as extreme emission line galaxies with systematically low metallicities at a fixed stellar mass.In Figure 6 we also include the sample of green peas compiled in Yang et al. (2017b) as well as the blueberry galaxies from Yang et al. (2017a).
Figures 6 and 7 also show the 1σ and 2σ confidence intervals (determined using the seaborn package; Waskom 2021) for the sample of z ≈ 8 galaxies (dashed red contours) as well as the zCOSMOS EELGs (shaded green region), both calculated by weighting each entry in the sample by its 1σ line ratio uncertainties.We note that the confidence interval of the z ≈ 8 sample  The small dark-green data points show the measurements for the extreme emission line galaxies (EELGs) at zspec ≲ 1 from the zCOSMOS survey.The dashed red contours indicate the weighted 1σ and 2σ confidence intervals for the z ≈ 8 sample; since the weighted confidence intervals for this sample are dominated by the tight constraints on the SMACS0723-ID6355 galaxy we also show the unweighted 1σ and 2σ confidence intervals as the shaded red region.The shaded green region indicates the 1σ and 2σ confidence interval for the zspec ≲ 1 EELGs sample.Moreover, we show the blueberry (small purple data points) and green pea (small light-green data points) galaxies from Yang et al. (2017a,b), confirming their remarkable strong emission line similarities (see also Figure 7) to the emission line-detected galaxies at z ≈ 8; this is especially the case for the blueberry galaxies, almost all of which lie within the 2σ credible interval of the z ≈ 8 galaxies.
is dominated by the SMACS0723-ID6355 galaxy which has much smaller line ratio uncertainties compared to the rest of this sample; therefore we also show the unweighted 1σ and 2σ confidence intervals for the z ≈ 8 sample (shaded red region).The lack of overlap between the confidence interval regions of z ≈ 8 galaxies and EELGs, even at the 2σ level, strongly suggests that these galaxies are drawn from intrinsically different populations.
Blueberry galaxies, and to a lesser degree green peas, show significant similarities to the z ≈ 8 galaxies, with almost all the blueberry galaxies and ∼ half the green peas occupying the region within the 2σ confidence interval of z ≈ 8 galaxies.This suggests that these galaxies have similar emission line features to those of the z ≈ 8 galaxies, as well as similarly low metallicities.In Section 6.2 we will further investigate this similarity in the context of the mass-metallicity relation, where the addition of the stellar mass parameter may potentially distinguish the blueberries (and green peas) from z ≈ 8 emission line galaxies.
(1) 10 0 10 1 R23 − 0.08O32 10 −1 10 0 10 1 10 2 O32 weighted 1σ and 2σ regions for zCOSMOS EELGs unweighted 1σ and 2σ regions for z ≈ 8 galaxies weighted 1σ and 2σ regions for z ≈ 8 galaxies zCOSMOS EELGs at 0 < z < 1 Yang+2017 Blueberries at z ∼ 0 Yang+2017 Green Peas at 0.10 < z < 0.36 Å) for the 6 galaxies at z ≈ 8 with available NIRSpec strong emission line measurements (large colored data points).The shaded red region (dashed red line) show the unweighted (weighted) 1σ and 2σ confidence intervals of the z ≈ 8 sample.For comparison we show the zCOSMOS extreme emission line galaxies (EELGs) at zspec ≲ 1 (small dark-green data points) as well as their weighted 1σ and 2σ confidence intervals.We also show the blueberry (small purple data points) and green pea (small light-green data points) galaxies from Yang et al. (2017a,b).Both here and in Figure 6, blueberry galaxies (and green peas to a lesser degree) occupy a region similar to z ≈ 8 emission line-detected galaxies, which indicates that they have remarkably similar strong emission line features and metallicities.However, as shown in Figure 11, the addition of the stellar mass parameter can distinguish these local analog candidates from z ≈ 8 emission line galaxies in the context of the mass-metallicity relation.
This choice is motivated by the lack of significant detection of the [O III]λ4363 Å emission line in both galaxies (which is required for the direct T e method) and is shown to measure accurate oxygen abundances for lowmetallicity EELGs (for a full discussion see Izotov et al. 2019).The measured metallicities of RX2129-ID11002 and RX2129-ID11022 are reported in Table 2.In this Table , we also report the total metallicity measurement uncertainty which includes both the statistical and systematic uncertainties; the former is the propagation of line flux measurement uncertainties and the latter is the 0.05 systematic uncertainty of the Izotov et al. (2019) "strong line" metallicity calibration.
For the remaining galaxies in our z ≈ 8 sample we use the metallicity measurements as reported in the literature.The metallicity of the RX2129-ID11027 galaxy was measured in Williams et al. (2022) using the "strong line" method.The metallicities of the SMACS0723 galaxies were measured in Curti et al. (2022) and Schaerer et al. (2022) using the direct T e method, with significant discrepancies in the reported values.The main source of discrepancy seems to be the method used to reduce the NIRSpec data.We adopt the values reported by Curti et al. (2022) since the NIRSpec data used in this work had undergone extra reduction beyond the level 2 data products available on MAST (The Barbara A. Mikulski Archive for Space Telescopes) through the NIRSpec GTO pipeline (NIR-Spec/GTO collaboration, in preparation).Nevertheless, in Section 5 we investigate the effect of adopting the values reported in Schaerer et al. (2022).
We note that the metallicities of these six NIR-Spec emission line galaxies are not measured using the same method.This is forced by the lack of significant [O III]λ4363 Å detections for the three RX2129 galaxies, preventing the application of the direct T e method.Alternatively, the strong line method can be used for the entire sample of NIRSpec emission line galaxies to achieve homogeneous metallicity measurements.This is presented in Appendix B, where the effect of adopting "strong line" instead of "direct T e " metallicities for the SMACS0723 galaxies on the best-fit mass-metallicity relation is investigated in detail.In Appendix B we show that although the measured metallicities for two of the SMACS0723 galaxies differ slightly if the strong line method is used, the normalization and the slope of the best-fit mass-metallicity relation remain intact (see also Section 5).Hence, throughout the rest of this work we adopt the direct T e metallicity measurements for the SMACS0723 galaxies since this method is believed to yield more accurate metallicity measurements.
The metallicities of the pre-JWST sample were measured in Jones et al. (2020).These authors used a combination of the nebular [O III]λ88µm emission line and photometrically-measured star-formation rate (SFR) as a direct-method metallicity estimator (i.e.calibrated against the direct T e method).They report that their calibration yields 12 + log O/H values that are systematically offset by +0.2 from the direct T e method; we correct for this offset before adopting their measured metallicities.The offset-corrected values are reported in Table 4.
As demonstrated in Jones et al. (2020), [O III]λ88µmmeasured metallicities are in general not as accurate as metallicities measured through the rest-optical emission lines, namely the strong line and the direct T e methods.In particular, [O III]λ88µm-measured metallicities have a 0.2 dex 1σ scatter (after correction for the offset mentioned above) around the direct T e values.This is not a major source of concern for constraining the best-fit mass-metallicity relation since in this work both the statistical and the (large) systematic uncertainties of the [O III]λ88µm-measured metallicities are taken into account.This holds, if there is no significant systematic offset (beyond the +0.2 dex value reported in Jones et al. 2020, which has been corrected above) between the [O III]λ88µm-measured metallicities and those measured through rest-optical emission lines.
We note that both the strong line and the [O III]λ88µm metallicity diagnostics are calibrated against the direct T e method at relatively low redshifts and in the local Universe, respectively.Future high S/N NIRSpec observations of high redshift emission line galaxies capable of detecting the [O III]λ4363 Å line as well as the [O III]λλ4959, 5007 Å and [O II]λλ3727, 3729 Å doublets can be used to assess the strong line calibration against the direct T e method at these redshifts.This was investigated in Appendix B for the three SMACS0723 galaxies for which both metallicity diagnostics can be used; however, robust conclusions require larger samples.Similarly, NIRSpec follow-up observations of high redshift [O III]λ88µm emitters (e.g., see GO 1840) as well as ALMA follow-up observations of NIRSpec emission line galaxies will enable evaluating the calibration of [O III]λ88µm-measured metallicities against the restoptical emission line methods at these redshifts, and in particular investigating if there is any systematic offset beyond the +0.2 dex value measured at the local Universe.

PHOTOMETRY ANALYSIS
In this Section we use prospector (Johnson et al. 2021) to fit the available photometry of the galaxies in our z ≈ 8 sample (see Section 2) and infer their stellar masses.prospector explores the posterior probability distributions of stellar population parameters to find the values that best fit the observed photometry.The spectra for each set of drawn stellar population parameters is derived from FSPS (Flexible Stellar Population Synthesis; Conroy et al. 2009;Conroy & Gunn 2010), accessed through the Foreman-Mackey et al. ( 2014) python bindings.Here we use the built-in dynesty sampler (Speagle 2020; Koposov et al. 2022), a python-based sampler adopting the dynamic nested sampling method developed by Higson et al. (2019).
The prospector setup used in this work closely resembles that used in Johnson et al. (2021) for fitting the measured photometry of GN-z11, the highest redshift (z spec = 10.6)spectroscopically-confirmed galaxy to date (Oesch et al. 2016) 4 .We fix the redshift to the spectroscopically measured value.The star-formation history (SFH) is modelled nonparametrically with 5 temporal bins.The last bin spans 0-10 Myr (lookback time).The remaining bins are evenly spaced in log(lookback time) up to the maximum allowed age of the galaxy as determined by its spectroscopic redshift and the earliest possible onset of star-formation, which we assume to be at z = 35.Our stellar population free parameters include the total formed stellar mass (M ⋆, formed ); the stellar metallicity (Z ⋆ ); the nebular metallicity (Z neb ); the nebular ionization parameter (U neb , indicating the ratio of ionizing photons to the total hydrogen density); and the parameters controlling the dust attenuation and the intergalactic medium (IGM) attenuation.
We adopt the dust attenuation curve from Kriek & Conroy (2013) and include a diffuse dust component for the entire galaxy as well as a birth-cloud component for young stars.These two components are modelled with three free parameters: the diffuse dust optical depth at 5500 Å (τ V ); the ratio of the birth-cloud optical depth to the diffuse dust optical depth (r dust ); and the dust index (Γ dust ) controlling the power-law slope of the attenuation curve.The intergalactic medium (IGM) attenuation is included as a free parameter because the rest-frame photometry at wavelengths below 1216 Å is affected by IGM attenuation; prospector adopts the redshift-dependent IGM attenuation model suggested by Madau (1995).The only free parameter is the IGM fac-tor (f IGM ) which determines the normalization of Madau (1995) model.
The top panels in Figures 8 and 9 show the observed photometry and the best-fit (maximum a posteriori solution) spectra, respectively for RX2129-ID11002 and RX2129-ID11022.The 6 small panels at the bottom right of each Figure show the posterior probability distribution functions (PDFs) for a selection of free parameters.The dotted lines show the assumed priors.The bottom left panel shows the SFH.Similar Figures for the remaining 9 galaxies in our sample are available in Appendix C. The stellar mass posterior PDFs in these Figures show the total formed stellar mass without subtracting the accumulated mass of dead stars; before further analysis, we correct this by applying the correction factors calculated by prospector.In Table 4 we report the best-fit (log probability-weighted 50th percentile of the posterior PDF) surviving stellar mass and its 1σ uncertainty (16th and 84th percentiles); however, throughout this work the full posterior PDFs are used whenever stellar mass measurements are needed.
Our best-fit stellar mass measurements for the pre-JWST sample agree within 1σ with the lensing-corrected values used in Jones et al. (2020) which are adopted from Roberts-Borsani et al. (2020).These authors fit the photometry and the ALMA measurements of the [O III]λ88µm emission intensity and dust continuum with two-component SED models.The first component is a young starburst with strong nebular emission lines that contribute most of the flux in broadband photometry and determines the [O III]λ88µm emission.The second component is a more mature stellar population that does not necessarily dominate the photometry but dominates the dust continuum detected in ALMA Band 7 and constitutes the majority of stellar mass.The authors show that unlike the models which fit the SFH with a single parameterized young component, these two-component models can simultaneously reproduce the dust continuum constraints and the broad-band photometry especially for MACS1149-JD1 and A2744-YD4.Based on a log-likelihood comparison between their two-component and single-component fits, the authors conclude that the two-component models provide superior fits to the data.This is further validated by our measurements which strongly rule out the values inferred by their single-component SED fits.We do not measure a significant systematic offset between our mass measurements and those of Roberts-Borsani et al. (2020) (< 0.1 dex if the B14-65666 galaxy, which has 1 dex errorbars in Roberts-Borsani et al. 2020, is excluded).
We note that there is a significant ∼ 1 dex discrepancy between the stellar mass measurements of SMACS0723 galaxies reported in the literature (see, e.g., Curti et al. 2022;Schaerer et al. 2022;Tacchella et al. 2022;Carnall et al. 2023).The NIRCam photometry used in Curti et al. (2022) and Schaerer et al. (2022) were calibrated using the earlier versions of the calibration reference files (before the jwst 0989.map),where flux calibration offsets as high as 0.2 mag exist between different filters (see, e.g., Boyer et al. 2022).This offset can potentially bias the inferred stellar mass, as is implied by the systematically higher values measured in both studies compared to Carnall et al. (2023) despite the relatively similar adopted SFH models.The photometry used in Carnall et al. (2023) has undergone extensive flux calibration as detailed in Appendix C of Donnan et al. (2023), and is believed to be better calibrated compared to the calibrations achieved using the early NIRCam calibration reference files.
The photometry used in our analysis is the same as that used in Carnall et al. (2023).Nevertheless, the stellar mass measurements reported in Carnall et al. (2023) are systematically lower than our measurements by > 0.7 dex.These authors fit the photometry with a single-component parameterized (delayed exponential) SFH.As shown in Roberts-Borsani et al. ( 2020) (see their Table 3), such SFH models do not account for the more mature stellar population which constitutes the overwhelming majority of stellar mass.Roberts-Borsani et al. ( 2020) suggest that depending on the SFH such single-component models can underestimate the stellar mass by as much as 1.5 dex, well in line with the large offsets between our measurements and those of Carnall et al. (2023).
Our stellar mass measurements for SMACS0723-ID4590 and SMACS0723-ID10612 are consistent with the values inferred in Tacchella et al. (2022), where both the NIRCam photometry and NIRSpec spectrum are simultaneously fitted to infer the stellar mass.This agreement is expected since Tacchella et al. ( 2022) used a prospector setup similar to our setup; in particular, these authors adopted a nonparametric SFH model.Compared to our measurements, Tacchella et al. (2022) measured a slightly higher (< 0.5 dex) stellar mass for SMACS0723-ID6355.

MASS-METALLICITY RELATION
In this Section we combine the metallicity measurements from Section 3.2 and the stellar mass measurements from Section 4 (plotted as the colored data points in Figure 10) to infer the mass-metallicity relation at z ≈ 8.We use the method described in Appendix A.2 to fit the distribution of masses and metallicities with a linear relation of the form (adopted from Ma et al. 2016 where we search for the best-fit normalization Z g,10 (gasphase metallicity at 10 10 M ⊙ stellar mass) and slope γ g .The method adopted here (see Appendix A.2 for details) is best suited if it cannot be safely assumed that there are no outlier data point with severely underestimated uncertainties.As discussed in Section 3.2, this is likely the case for the measured metallicities.We find the bestfit values of γ g = 0.24 +0.13 −0.14 and Z g,10 = −1.08 +0.19 −0.16 .If the slope is fixed to the value found at lower redshifts (γ g = 0.30; Sanders et al. 2021), we find Z g,10 = −0.98 +0.09 −0.15 .These best-fit values are summarized in Table 3.
Following the discussion in Section 3.2, we investigate if using the Schaerer et al. ( 2022) metallicity measurements for the SMACS0723 galaxies, instead of the values adopted in this work from Curti et al. (2022), can significantly affect our results (see Taylor et al. 2022, for a discussion on the different metallicity determinations of the three SMACS0723 galaxies).Although this results in inferring a slightly shallower bestfit slope (γ g = 0.21 +0.15  −0.10 ), we do not report any meaningful change in its 1σ credible region.Similarly, the best-fit normalization does not change meaningfully.Moreover, we assess if defining the normalization at a stellar mass other than M ⋆ = 10 10 M ⊙ , which is the standard at lower redshifts, can affect our results.Equation 2 explicitly assumes that the best-fit line passes from Z g,10 at M ⋆ = 10 10 M ⊙ , which is ∼ 1 dex higher than the average stellar mass of our z ≈ 8 sample: M ⋆ = 10 8.8 M ⊙ .We modify this relation to instead infer the normalization at M ⋆ = 10 8.8 M ⊙ (by replacing 10 with 8.8 in Equation 2).The results remain intact; we find a best-fit Z g,8.8 = −1.33 +0.10 −0.09 (which corresponds to Z g,10 = −1.06+0.10 −0.09 ) and γ g = 0.21 +0.15 −0.13 , both in very good agreement with the results found adopting the standard normalization given by Equation 2. In the discussion of Section 6 and the rest of this work we adopt the best-fit mass-metallicity relation that was inferred above using the metallicities reported in Curti et al. (2022), normalized at the standard stellar mass of M ⋆ = 10 10 M ⊙ (see Table 3).
We note that the rest-frame optical emission line ratios of z ≈ 8 NIRSpec emission line-detected galaxies might be subject to contribution from AGN activity, which can potentially bias their measured metallicities.This speculation is particularly rooted in the increasing number of quasars detected at redshifts beyond z = 6.5 (see, e.g., Mortlock et al. 2011;Mazzucchelli et al. 2017;Bañados et al. 2018;Wang et al. 2018Wang et al. , 2019Wang et al. , 2021;;Matsuoka et al. 2019a,b;Reed et al. 2019;Yang et al. 2019, 2020, for the confirmed quasars, and Furtak et al. 2022, for a z ≈ 7.7 candidate).For instance, Brinchmann (2022) argues that SMACS0723-ID6355 hosts a narrow-line AGN.Interestingly, this is the source with the most notable deviation from our best-fit mass-metallicity relation, as can be seen in Figure 10.This is further emphasized by our method for finding the best-fit mass-metallicity relation, identifying the mass/metallicity measurements of this source as outliers (66 +21 −11 % probability; see Appendix A.2 for more details).As detailed in Appendix A.2, our method for fitting the mass-metallicity relation objectively prunes the outliers.Therefore, a few metallicity measurement outliers with severely underestimated uncertainties caused by AGNs are not expected to bias the inferred massmetallicity relation.Significant biases can only be expected if a major fraction of our sources are affected by AGN activity.This is unlikely to be the case, especially for the M ⋆ < 10 9 M ⊙ galaxies (which constitute the majority of the discovered z ≈ 8 NIRSpec emission line galaxies), a large fraction of which are not expected to host AGNs (e.g., see Habouzit et al. 2019) or host AGNs that are massive/active enough to significantly affect the rest-optical/UV spectrum (Volonteri et al. 2023).
There is a clear separation in stellar mass between the NIRSpec emission line-detected galaxies (log M ⋆,mean /M ⊙ = 8.3 +0.1 −0.5 ; round data points in Figure 10) and the pre-JWST galaxies (log M ⋆,mean /M ⊙ = 9.8 +0.2 −0.4 ; squared data points in Figure 10).The metallicities at the lower mass end are measured using the rest-optical emission lines and at the higher mass end using the far-infrared [O III]λ88µm emission line.This dichotomy might affect the inferred normalization and slope of the mass-metallicity relation.We check if the inferred normalization is affected by separately fitting mass-metallicity relations to each mass range, with slopes fixed to the value found at lower redshifts (γ g = 0.30).We find Z g,10 = −0.93+0.11  −0.16 for the six NIR-Spec emission line-detected galaxies at the lower mass end, and Z g,10 = −1.09+0.18  −0.23 for the pre-JWST galaxies at the higher mass end, well within 1σ uncertainties of one another.At this point, we cannot robustly evaluate the degree to which the inferred slope might be affected.This will become more clear with NIRSpec coverage of the higher mass end or with ALMA follow-up observations of NIRSpec emission line galaxies, providing more accurate calibrations of the [O III]λ88µm metallicity di- Figure 10.Mass-metallicity relation at z ≈ 8. Colored data points show the distribution of the measured masses and metallicities for the sources in our sample of z ≈ 8 galaxies.The solid red line and the shaded pink region respectively show the best-fit z ≈ 8 mass-metallicity relation and its 1σ uncertainty, if the mass-metallicity relation is fitted with a free slope.The solid orange line and the shaded orange region respectively show the best-fit z ≈ 8 mass-metallicity relation and its 1σ uncertainty, as inferred by fixing the slope to the empirical value at lower redshifts, γg = 0.3.The solid black line show the predicted mass-metallicity relation at z = 8 based on FIRE simulations (Ma et al. 2016, see), showing remarkable agreement with our findings.For comparison, we also show the best-fit mass-metallicity relation at lower redshifts.The green, blue, and dark purple lines respectively show the best-fit mass-metallicity relation at z ≈ 3.3, 2.3, and 0 inferred by Sanders et al. (2021) based on the data from MOSDEF survey.The light purple line shows the best-fit line at z ≈ 0 from Curti et al. (2020).There is a significant ∼ 0.9 dex evolution in the normalization of the mass-metallicity relation from z ≈ 8 to the local Universe; on average galaxies are 8 times more metal enriched at z ≈ 0 compared to z ≈ 8.The evolution persists by 0.5 and 0.4 dex with respect to the average galaxies at z ≈ 2.3 and z ≈ 3.3, respectively.The dashed section of each solid line indicates extrapolation beyond the investigated stellar mass range of the corresponding study.
agnostic against the rest-optical diagnostics at z ≈ 8 (see the discussion in Section 3.2).

Evolution of the mass-metallicity relation
The best-fit z ≈ 8 mass-metallicity relation as well as its 1σ uncertainty are shown in Figure 10 as the solid red line and the shaded pink region, respectively.First, we compare the normalization of the best-fit massmetallicity relation at z ≈ 8 with empirical constraints at lower redshifts, out to z ≈ 3.3, based on the results of Sanders et al. (2021).Our inferred normalization Z g,10 = −1.08 +0.19  −0.16 indicates a substantial ∼ 0.9 dex redshift evolution, i.e., decrease in metallicity at a fixed stellar mass, with respect to the z ≈ 0 measurements5 , where Z g,10 = −0.18(solid purple line in Figure 10 Yang+2017 Green Peas at 0.10 < z < 0.36 Figure 11.Mass-metallicity relation at z ≈ 8 compared with the local analog candidates, the blueberry and green pea galaxies.The large colored data points show the measured mass and metallicity of the galaxies in our z ≈ 8 sample; the solid orange (solid red) line shows the best-fit mass-metallicity relation with a fixed (free) slope and the shaded orange (shaded pink) region show its 1σ uncertainty region.Blueberry galaxies and green peas are shown with the small purple and light green data points, respectively.Only the green peas which lie within the 2σ confidence interval of the z ≈ 8 galaxies in the [O III]λ5007 Å/Hβ vs [O II]λλ3727, 3729 Å/Hβ and O32 vs R23−0.08×O32metallicity diagnostic plots (Figures 6 and 7, respectively) are shown.We plot all the blueberry galaxies, because Figures 6 and 7 indicate that they have similar metallicities to the z ≈ 8 emission line galaxies.This Figure shows that although blueberry galaxies and green peas have similar metallicities to the z ≈ 8 galaxies, this degeneracy is broken down by considering the mass-metallicity relation; at a fixed stellar mass, green peas and blue berries are at systematically higher metallicity compare to z ≈ 8 galaxies.light pink line shows the best-fit z ≈ 0 mass-metallicity relation from Curti et al. 2020, which extends to lower stellar masses).The inferred normalization at z ≈ 8 also indicates significant redshift evolution with respect to z ≈ 2.3 or z ≈ 3.3 (solid blue and solid green lines, respectively), where Z g,10 = −0.49and −0.59, respectively.
The slope γ g = 0.24 +0.13 −0.14 of the inferred massmetallicity relation at z ≈ 8 is slightly shallower than the slope measured at z ≈ 0, 2.3, or 3.3 (0.28, 0.30, 0.29, respectively; see Sanders et al. 2021), but they are consistent within 1σ uncertainties.To investigate this further, we adopted a fixed slope of γ g = 0.3 (consistent with the lower redshift observations) in Equation 2and repeated the analysis of Section 5 to find the best-fit normalization.This does not affect our inferred normalization Z g,10 = −0.98 +0.09 −0.15 , still showing substantial evolution with respect to lower redshifts; this fit and its 1σ uncertainty are shown as the solid orange line and the shaded orange region in Figure 10.Ma et al. (2016) inferred the mass-metallicity relation out to z = 6 from the simulated galaxies in the FIRE project, showing good agreement with the available empirical measurements out to z ≈ 3.5.Their inferred redshift evolution of Z g,10 can be extrapolated beyond z = 6.They predict Z g,10 = −0.98 at z = 6 and Z g,10 = −1.02 at extrapolated z = 8, both in 1σ agreement with our measured normalization at z ≈ 8. Never-theless, our measurement mildly favors the extrapolated normalization at z = 8.Similar to the case of empirical measurements at lower redshifts, our measurement at z ≈ 8 favors a shallower slope of the mass-metallicity relation compared to the Ma et al. (2016) predictions (γ g = 0.35), but the measurements are in agreement within 1σ uncertainties.The predicted z = 8 massmetallicity relation of Ma et al. (2016) is shown as the solid black line in Figure 10.

Comparison with the low-redshift analogs
Figures 6 and 7 show that z ≈ 8 emission line galaxies possess strong emission line features that are generally distinct from extreme emission lines galaxies (EELGs) at z < 1, but are similar to blueberry galaxies and, to some degree, green peas.Based on these plots we expect the z ≈ 8 galaxies to have similar metallicities to blueberry galaxies and a subset of green peas.This might suggest blueberry galaxies (and green peas to a lesser degree) as local analogs to the z ≈ 8 emission line galaxies.However, this analogy should be further considered in the context of mass-metallicity diagram.
Figure 11 shows the distribution of the measured stellar masses and metallicities of our sample of z ≈ 8 galaxies (large colored data points), as well as the 1σ uncertainty of their distribution around the best-fit massmetallicity relation (shaded pink and shaded orange regions, respectively for a free and fixed slope, see Section 6.1).The small purple data points show the distribution of blueberry galaxies, all of which show similar strong emission line features to those of z ≈ 8 emission line galaxies based on Figures 6 and 7. We also show a subsample of green peas (small green data points), consisting of the green peas that lie within the 2σ credible interval of the z ≈ 8 galaxies in Figures 6 and 7.Both the blueberry galaxies and green peas have metallicities similar to the z ≈ 8 galaxies, as expected.However, at a fixed stellar mass, the z ≈ 8 galaxies populate a region with lower metallicity compared to green peas and blueberry galaxies, and hence stand out in the massmetallicity diagram.
In Figure 11, we also show the Berg et al. (2012b) z ≈ 0 mass-metallicity relation for dwarf galaxies in the Spitzer Local Volume Legacy (LVL) survey (Dale et al. 2009).The Berg et al. (2012b) mass-metallicity relation is also consistent with the distribution of low metallicity galaxies from the Dark Energy Survey (Lin et al. 2022).We converted the stellar masses reported in Berg et al. (2012b) from Salpeter IMF to Chabrier IMF.This is not a representative sample of galaxies at z ≈ 0 but rather biased toward lower metallicities (e.g., see the discussion in Curti et al. 2020).Nevertheless, it is interesting to note that the Berg et al. (2012b) mass-metallicity relation coincides with the location of blueberries in this diagram.Despite the lower normalisation of this relation compared to the representative mass-metallicity relation at z ≈ 0 (the Curti et al. (2020) relation is shown as the solid pink line in Figure 11), it is systematically at higher normalization than that of the z ≈ 8 galaxies.
There are remarkable strong emission line and metallicity similarities between the z ≈ 8 emission line galaxies and extremely low metallicity local analog candidates, especially blueberry galaxies.However, the persistent systematically lower metallicities of z ≈ 8 galaxies at a fixed stellar mass with respect to the local analog candidates suggests potential differences in their star formation, feedback, and enrichment histories.In Langeroodi & Hjorth (2023), we further investigate the metallicity deficiency of high redshift galaxies in the context of the fundamental metallicity relation, showing that the ultraviolet compactness can be used as a tracer of lowest metallicity galaxies.

CONCLUSION
We present the JWST NIRCam photometry and NIR-Spec spectra of two galaxies at z ≈ 8.15 detected in the field towards the lensing cluster RX J2129.4+0005.We used these galaxies as well as 9 other galaxies at 7 < z < 9 from the literature to compile a sample of 11 galaxies at z ≈ 8 with stellar mass and gas-phase metallicity measurements.Using this sample we establish the mass-metallicity relation at z ≈ 8.
The normalization of the mass-metallicity relation has evolved significantly from z ≈ 8 to the local Universe; metallicity at a fixed stellar mass has increased significantly from z ≈ 8 to z ≈ 0. We compared our results with the mass-metallicity relation inferred by Sanders et al. (2021) at z ≈ 0. The normalization of our best-fit mass-metallicity relation at z ≈ 8 is ∼ 0.9 dex lower than the normalization at z ≈ 0; galaxies are on average 8 times less metal enriched at z ≈ 8 compared to the local Universe.Furthermore, the evolution persists by ∼ 0.5 dex and ∼ 0.4 dex respectively, compared to the z ≈ 2.3, 3.3 empirical results of Sanders et al. (2021).The galaxies observed at z ≈ 8 are on average half as enriched as the galaxies at z ≈ 3.3, the highest redshift up to which the mass-metallicity relation has been probed prior to JWST.This implies a remarkably rapid enrichment epoch in the early Universe, when in less than 3.5% of the lifetime of an average galaxy (< 450 Myr at z ≈ 8, assuming the galaxy starts forming at z = 20) almost 10% of its enrichment has already happened.
In general, our results agree well with the evolution of the mass-metallicity relation as predicted by the FIRE simulations (Ma et al. 2016).Our measured normalization of the mass-metallicity relation at z ≈ 8 agrees within a few 0.01 dex with Ma et al. (2016) predictions, well below the statistical uncertainty of our measurement.
We tested the particular case where we did not fix the slope of the best-fit mass-metallicity relation to the slope suggested based on simulations or lower redshift observations.In this case, our inferred slope (γ g = 0.24) is slightly shallower than the measured slope at lower redshift (γ g = 0.3) or the slope predicted by simulation (γ g = 0.35).However, we cannot rule out these slopes, since they are within the 1σ uncertainty of our measurement.Compiling larger samples of z ≈ 8 galaxies will address this further.
We compared the z ≈ 8 galaxies with potential analogs in the low redshift Universe, based on the [O III]λ5007 Å/Hβ vs [O II]λλ3727, 3729 Å/Hβ and the O32 vs (R23-0.08O32)diagnostic plots.We find that galaxies detected at z ≈ 8 have emission line features that are distinct from extreme emission line galaxies at z ≈ 0 − 1, and have systematically lower metallicities.However, there seems to be remarkable similarities in the emission line features of the blueberry galaxies (and to some degree green peas) and the z ≈ 8 emission line galaxies.We investigated this further in the context of the mass-metallicity diagram: at a fixed stellar mass, the z ≈ 8 galaxies have systematically lower metallicities compared to blueberry galaxies and therefore stand out in the mass-metallicity diagram.
We thank the anonymous referee whose comments helped improve the robustness of our analysis and conclusions.We thank Gabe Brammer for his identification of the high-redshift galaxy targets we present during the mask design process, and his contributions to their analysis.We also  The JWST NIRCam and NIRSpec data presented/used in this paper were obtained from the Mikulski Archive for Space Telescopes (MAST) at the Space Telescope Science Institute.The specific observations analyzed can be accessed via DOI.(Conroy et al. 2009;Conroy & Gunn 2010), glafic (Oguri 2010(Oguri , 2021)), prospector (Johnson et al. 2021), pPXF (Cappellari & Emsellem 2004;Cappellari 2017Cappellari , 2022)), seaborn (Waskom 2021), Zitrin-parametric code (Zitrin et al. 2015) ln p( q We adopt a flat prior in the range [0, 1] on P b , and flat priors in ranges [6, 9] and [0, 6] on Y b and V b motivated by the range of 12 + log(O/H) i,truth .We marginalize over the nuisance parameters, {q i } N i=1 , P b , Y b , V b , to report the best-fit γ g and Z g,10 .The strength of this method is that, apart from the parameters of interest, it also constraints the posterior PDF of a given data point being an outlier (q i ).

B. STRONG LINE METHOD METALLICITY MEASUREMENTS FOR THE SMACS0723 GALAXIES
As mentioned in Section 3.2, the methods used to measure the metallicities of our sample of NIRSpec emission linedetected galaxies are not homogeneous: the direct T e method was used to measure the metallicities of the SMACS0723 galaxies, while the strong line method was used for the RX2129 galaxies.Here we measure the metallicities of the SMACS0723 galaxies using the strong line method in order to build a sample with more homogeneously measured metallicities: Six NIRSpec emission line galaxies with strong line method metallicity measurements and five pre-JWST galaxies with ALMA [O III]λ88µm emission line metallicity measurements.We use this sample to repeat the analysis of Section 5 and find the best-fit mass-metallicity relation for the multiple setups considered in this work: adopting Equation 2 with a fixed slope (γ g = 0.3); adopting Equation 2 with a free slope; and adopting Equation 2 with a free slope with the normalization defined at 10 8.8 M ⊙ (the average stellar mass of our sample) instead of 10 10 M ⊙ (see Section 5 for more details on the motivation for this setup).
Using the strong line method, we measure 12+log(O/H) = 7.19±0.14,7.80±0.010,and 7.64±0.04for SMACS0723-ID4590, SMACS0723-ID6355, and SMACS0723-ID10612, respectively.We compare the strong line measurements with the direct T e measurements in Figure 12.SMACS0723-ID6355 is the galaxy where the strong line method and the direct T e method are most in disagreement.Interestingly this is the galaxy that is most offset from our best-fit massmetallicity relation (see Figure 10), most confidently identified as an outlier metallicity measurement with severely Figure 12. "Strong line" metallicity measurements plotted against the "direct Te" metallicity measurements (from Curti et al. 2022) for the SMACS0723 galaxies.Although metallicities for individual galaxies differ slightly depending on the adopted metallicity measurement method, we confirm that the overall effect on the best-fit mass-metallicity relation is negligible.underestimated uncertainties by the algorithm we used to fit the mass-metallicity relation (see Appendix A.2), and also is suggested to likely host an AGN by Brinchmann (2022).
Using these new metallicity measurements and following the method used in Section 5 (see also Appendix A.2), we measure a best-fit Z g,10 (normalization) = −1.03+0.10 −0.10 for the mass-metallicity relation if the slope is fixed to γ g = 0.3; this is in very good agreement with the normalization measured for a similar setup in Section 5 where the direct T e metallicities for the SMACS0723 galaxies are adopted.Similarly, we measure Z g,10 = −1.05+0.13  −0.15 and γ g = 0.24 +0.11 −0.11 if the slope is free, and Z g,8.8 = −1.31+0.09 −0.08 and γ g = 0.23 +0.13 −0.15 if the slope is free and the normalization is measured at 10 8.8 M ⊙ , both in very good agreement with the values reported in Section 5 where the direct T e metallicities for the SMACS0723 galaxies are adopted.Hence, although the metallicities for individual SMACS0723 galaxies differ slightly if the strong line method is used instead of the direct T e method, the overall effect on the best-fit mass-metallicity relation seems negligible.
C. BEST-FIT STELLAR POPULATIONS TO OUR SAMPLE OF z ≈ 8 GALAXIES In this Appendix we show the best-fit photometry and spectrum, the posterior PDFs of the stellar populations, and the SFHs for the remaining 9 sources in our sample of z ≈ 8 galaxies; the same plots for the RX2129-ID11002 and RX2129-ID11022 galaxies were shown in Section 4 (Figures 8 and 9).

Figure 2 .
Figure 2. NIRSpec 2D (top panel) and 1D (bottom panel) spectra of the RX2129-ID11002 galaxy at zspec = 8.16.The red dashed lines show the identified emission lines, and the thin grey lines show the 1σ uncertainties.The fits to the emission lines and continuum are shown in Figure 4, and the emission line flux measurements are presented in Table2.The spectrum is not corrected for lensing magnification.

Figure 3 .
Figure 3. NIRSpec 2D (top panel) and 1D spectra (bottom panel) of the RX2129-ID11022 galaxy at zspec = 8.15.The red dashed lines show the identified emission lines, and the thin grey lines show the 1σ uncertainties.The fits to the emission lines and continuum are shown in Figure 5, and the emission line flux measurements are presented in Table2.The spectrum is not corrected for lensing magnification.The blue part of the spectrum is missing for this galaxy because the NIRSpec MSA configuration caused the spectrum to only partially fall on the detector.

Figure 4 .
Figure 4. Emission lines fits (using pPXF) to the NIRSpec 1D spectrum of the RX2129-ID11002 (see Figure 2) galaxy at zspec = 8.16.The dashed orange line shows the stellar continuum fit and the dashed blue line shows the Gaussian fits to the emission lines; the solid purple line shows the combined continuum plus emission lines fit.The line widths are fixed to the spectral resolution of NIRSpec prism at the observed wavelength.The measured emission line fluxes are reported in Table2.

Figure 6 .
Figure 6.[O III]λ5007 Å/Hβ line flux ratio plotted against [O II]λλ3727, 3729 Å/Hβ line flux ratio for 6 galaxies at zspec ≈ 8 (large colored data points), as inferred from the JWST NIRSpec observations of the RX2129 and SMACS0723 lensing clusters.The small dark-green data points show the measurements for the extreme emission line galaxies (EELGs) at zspec ≲ 1 from the zCOSMOS survey.The dashed red contours indicate the weighted 1σ and 2σ confidence intervals for the z ≈ 8 sample; since the weighted confidence intervals for this sample are dominated by the tight constraints on the SMACS0723-ID6355 galaxy we also show the unweighted 1σ and 2σ confidence intervals as the shaded red region.The shaded green region indicates the 1σ and 2σ confidence interval for the zspec ≲ 1 EELGs sample.Moreover, we show the blueberry (small purple data points) and green pea (small light-green data points) galaxies fromYang et al. (2017a,b), confirming their remarkable strong emission line similarities (see also Figure7) to the emission line-detected galaxies at z ≈ 8; this is especially the case for the blueberry galaxies, almost all of which lie within the 2σ credible interval of the z ≈ 8 galaxies.

Figure 7 .
Figure 7. Strong line metallicity indicator comparison.This Figure shows O32 ([O III]λ5007 Å/[O II]λλ3727, 3729 Å) plotted against R23−0.08×O32(R23×Hβ = [O II]λλ3727, 3729 Å+ [O III]λ4959 Å+ [OIII]λ5007 Å) for the 6 galaxies at z ≈ 8 with available NIRSpec strong emission line measurements (large colored data points).The shaded red region (dashed red line) show the unweighted (weighted) 1σ and 2σ confidence intervals of the z ≈ 8 sample.For comparison we show the zCOSMOS extreme emission line galaxies (EELGs) at zspec ≲ 1 (small dark-green data points) as well as their weighted 1σ and 2σ confidence intervals.We also show the blueberry (small purple data points) and green pea (small light-green data points) galaxies fromYang et al. (2017a,b).Both here and in Figure6, blueberry galaxies (and green peas to a lesser degree) occupy a region similar to z ≈ 8 emission line-detected galaxies, which indicates that they have remarkably similar strong emission line features and metallicities.However, as shown in Figure11, the addition of the stellar mass parameter can distinguish these local analog candidates from z ≈ 8 emission line galaxies in the context of the mass-metallicity relation.

Figure 8 .
Figure8.SED-fitting results for the RX2129-ID11002 galaxy.Top panel shows the observed (orange circles) and best-fit photometry (green squares) as well as the best-fit spectra (green line).The six smaller panels on the bottom right show the probability distribution functions (PDFs) of the stellar population synthesis parameters.The dotted lines show the assumed priors.The stellar mass here refers to the total stellar mass formed before correcting for the dead stars.Bottom left panel shows the star formation history (SFH) modelled nonparametrically with 5 temporal bins.All the parameters are corrected for lensing magnification.

Figure 9 .
Figure 9. Same as Figure 8, but showing the SED-fitting results for the RX2129-ID11022 galaxy.

Table 2 .
The spectrum is not corrected for lensing magnification.The blue part of the spectrum is missing for this galaxy because the NIRSpec MSA configuration caused the spectrum to only partially fall on the detector.

Table 4 .
Jones et al. 2020mass (lensing corrected) and metallicity for our z ≈ 8 sample.Only the statistical uncertainty; an extra ±0.2 dex systematic uncertainty should be considered (seeJones et al. 2020).
appreciate Program Coordinator Patricia Royle, and Program Scientists Armin Rest, Diane Karakala, and Patrick Ogle for their efforts with short turnaround that made the follow-up observations a success.D.L. and J.H. were supported by a VILLUM FONDEN Investigator grant (project number 16599).P.L.K. is supported by NSF grant AST-1908823 and anticipated funding from JWST DD-2767.A.Z. acknowledges support by Grant No. 2020750 from the United States-Israel Binational Science Foundation (BSF) and Grant No. 2109066 from the United States National Science Foundation (NSF), and by the Ministry of Science & Technology, Israel.J.M.D. acknowledges the support of projects PGC2018-101814-B-100 and MDM-2017-0765.A.V.F. is grateful for financial assistance from the Christopher R. Redlich Fund and numerous individual donors.The UCSC team is supported in part by NSF grant AST-1815935, the Gordon & Betty Moore Foundation, and by a fellowship from the David and Lucile Packard Foundation to R.J.F.M.O.acknowl-