The Heavy Metal Survey: Star Formation Constraints and Dynamical Masses of 21 Massive Quiescent Galaxies at z = 1.3–2.3

In this paper, we present the Heavy Metal Survey, which obtained ultradeep medium-resolution spectra of 21 massive quiescent galaxies at 1.3 < z < 2.3 with Keck/LRIS and MOSFIRE. With integration times of up to 16 hr per band per galaxy, we observe numerous Balmer and metal absorption lines in atmospheric windows. We successfully derive spectroscopic redshifts for all 21 galaxies, and for 19 we also measure stellar velocity dispersions (σ v ), ages, and elemental abundances, as detailed in an accompanying paper. Except for one emission-line active galactic nucleus, all galaxies are confirmed as quiescent through their faint or absent Hα emission and evolved stellar spectra. For most galaxies exhibiting faint Hα, elevated [N ii]/Hα suggests a non-star-forming origin. We calculate dynamical masses (M dyn) by combining σ v with structural parameters obtained from the Hubble Space Telescope COSMOS(-DASH) survey and compare them with stellar masses (M *) derived using spectrophotometric modeling, considering various assumptions. For a fixed initial mass function (IMF), we observe a strong correlation between M dyn/M * and σ v . This correlation may suggest that a varying IMF, with high-σ v galaxies being more bottom heavy, was already in place at z ∼ 2. When implementing the σ v -dependent IMF found in the cores of nearby early-type galaxies and correcting for biases in our stellar mass and size measurements, we find a low scatter in M dyn/M * of 0.14 dex. However, these assumptions result in unphysical stellar masses, which exceed the dynamical masses by 34%. This tension suggests that distant quiescent galaxies do not simply grow inside-out into today’s massive early-type galaxies and the evolution is more complicated.


INTRODUCTION
The majority of stars in today's universe live in earlytype galaxies with quiescent stellar populations (e.g., Muzzin et al. 2013a).These galaxies are massive, large, exhibit little-to-no rotation, and are thought to have formed the majority of their stars at high redshifts (e.g., Thomas et al. 2005;McDermid et al. 2015).Nonethe-less, despite the wealth of information from low-redshift studies, the formation histories of massive early-type galaxies are still poorly understood.
To quantify the growth of massive galaxies and understand the physical processes driving this evolution, it is imperative to directly observe them during their early stages.Such studies find that massive galaxies with quiescent stellar populations already exist when the Universe was only a fraction of its current age.These distant quiescent galaxies were first identified almost two decades ago (e.g., Franx et al. 2003;Cimatti et al. 2004;Glazebrook et al. 2004) and have been found to dominate the massive end of the galaxy distribution out to z ∼ 2.5 (e.g., Muzzin et al. 2013a;Tomczak et al. 2014;McLeod et al. 2021).Galaxy formation models originally failed to predict this quiescent galaxy population and -almost two decades later -are still struggling to explain their presence.
Our poor understanding of this galaxy population is primarily due to the difficulty of obtaining high-quality spectra.Quiescent galaxies typically do not have bright emission lines and thus, we rely on faint stellar absorption features to measure redshifts and learn about their stellar, chemical, and kinematic properties.Obtaining such spectra is even more challenging at z ≳ 1, as the bulk of the stellar spectrum is shifted to near-IR wavelengths.
In recent years, spectroscopic studies have pushed to even higher redshifts (z > 3; e.g., Glazebrook et al. 2017;Schreiber et al. 2018;Tanaka et al. 2019;Forrest et al. 2020;Esdaile et al. 2021;Carnall et al. 2023;Antwi-Danso et al. 2023).At the same time, deeper observations have enabled the first measurements of chemical abundances and resolved stellar kinematics at z > 2. These initial studies show intriguing results and demonstrate the power of using such measurement to gain insights into the formation mechanisms of distant quiescent galaxies.First, they have old ages and extreme chemical abundance patterns (Kriek et al. 2016;Jafariyazani et al. 2020), indicating that they formed their stars in early vigorous bursts, followed by an efficient quenching process.Second, they appear to be rotationally supported (Newman et al. 2015(Newman et al. , 2018a;;Toft et al. 2017).
However, these studies are based on few very massive and/or lensed galaxies, and many questions remain.We do not know how these galaxies became so massive at such early epochs, when and how fast they formed and assembled their mass, whether they are supported by rotation or random motions, which physical processes are responsible for halting their star formation, and how they evolved into the massive early-type galaxies in the today's universe.Addressing these questions requires statistical samples of distant quiescent galaxies with ultra-deep spectra covering several Balmer and metal absorption lines, which enable stellar, chemical, and kinematic measurements.
In order to obtain such a spectroscopic galaxy sample, we have conducted the Heavy Metal Survey with MOSFIRE (McLean et al. 2012) and LRIS (Oke et al. 1995) on the Keck I telescope.The Heavy Metal survey observes 21 "bright" quiescent galaxies selected to be in two redshift intervals, 1.30 < z < 1.50 and 1.92 < z < 2.28, as well as many more star-forming and fainter quiescent galaxies at similar redshifts.With integration times of up to 16 hr per filter per mask, we observe numerous Balmer and metal absorption lines.While this distant quiescent galaxy sample is not as large as the sample by Belli et al. (2017aBelli et al. ( , 2019, ∼30 galaxies at z > 1.35), it is unique for its wavelength coverage and the only survey so far that obtains ultradeep spectra at rest-frame ∼ 4800 − 5400 Å for a sample of distant quiescent galaxies.This wavelength range targets the strongest α-element absorption line (i.e., Mgb) in the rest-frame optical, as well as several prominent Fe lines.
In this paper we present our survey design and observational strategy, data reduction and overview (Section 2), methods to derive spectral properties (Section 3) and characteristic of the galaxy sample (Section 4), and discuss the implications of our finding for galaxy evolution studies (Section 5).The primary science applications of this data set, the chemical abundance measurements, will be presented in an accompanying paper (Beverage et al. 2023b).The spectra and chemical abundances for the primary galaxies in first Heavy Metal mask were also presented in Kriek et al. (2019).Several other science applications including molecular gas properties and active galactic nuclei (AGN) outflows will be presented in future papers (K.Suess et al. in prep; Y. Ma et al in prep).Each row represents a different spectral feature, as indicated on the left.The color of the feature reflects the primary origin of the chemical element, as indicated in the bottom-right box.The gray bars indicate whether a feature is visible in a certain filter, with the different shades of gray corresponding to the different filters (as indicated in the top right) and the gradations for each filter indicating the throughput.The Heavy Metal low (1.30 < z < 1.50) and high (1.92 < z < 2.28) redshift intervals are indicated by the dashed and dotted vertical lines, respectively.
Throughout this work we assume a ΛCDM cosmology with Ω m = 0.3, Ω Λ = 0.7, and H 0 = 70 kms −1 Mpc −1 .All magnitudes are given in the AB-magnitude system (Oke & Gunn 1983).The wavelengths of all emission and absorption lines are given in vacuum.

Survey Design
The Heavy Metal survey aims to study the formation histories of massive quiescent galaxies using stellar, chemical, and kinematic measurements.Achieving this goal requires (i) a statistically significant sample of ∼ 20 distant quiescent galaxies with (ii) ultradeep rest-frame optical spectroscopy covering several hydrogen, iron and α-element absorption features, and (iii) ancillary datasets including ultradeep multiwavelength photometry and high-resolution imaging.
To that end, we executed the Heavy Metal survey in the overlapping area of the UltraVISTA (Mc-Cracken et al. 2012), COSMOS (Scoville et al. 2007), and COSMOS-DASH (Momcheva et al. 2017;Mowla et al. 2018) surveys, using the LRIS and MOSFIRE spectrometers on the Keck I telescope.The UltraV-ISTA survey provides deep multiwavelength photometry, while the F814W and F160W imaging from COS-MOS and COSMOS-DASH reveals the rest-frame optical structures of distant galaxies.For our selection we used the COSMOS UltraVISTA v4.1 catalog by Muzzin et al. (2013b).Quiescent galaxies were identified by their rest-frame U − V and V − J colors (e.g., Wuyts et al. 2007;Williams et al. 2009).In this work, we use the U V J criteria by Muzzin et al. (2013a).
We select the targets to be at 1.30 < z < 1.50 or 1.92 < z < 2.28.These specific redshift intervals are chosen such that we observe MgI at 5178 Å and several FeI and Balmer absorption lines in atmospheric windows, as illustrated in Figure 1.Furthermore, by using two redshift intervals, combined with deep spectroscopic surveys at lower redshifts such as LEGA-C at 0.5 < z < 1.0 (van der Wel et al. 2016Wel et al. , 2021)), we can study evolutionary trends.For the 1.30 < z < 1.50 galaxies, we use LRIS-RED and MOSFIRE J-band to observe the 4000 Å break region and the region around MgI at 5178 Å, respectively.For the 1.92 < z < 2.28 galaxies, we target these same regions with MOSFIRE in the J and H bands.We also obtained shallower spectra in the H and K bands for the low-and high-redshift masks, respectively, to obtain additional constraints on several emission lines (i.e., Hα, [NII]).
Both LRIS and MOSFIRE are among the most efficient spectrographs at their respective wavelengths.Nonetheless, even with unprecedented integration times, only the brightest galaxies are within reach.The 1.30 < z < 1.50 galaxies are selected to be brighter than J = 21.6 and the galaxies at 1.92 < z < 2.28 are selected to be brighter than H = 21.8.These magnitudes limits, combined with the long integration times (see next section), ensure sufficient signal-to-noise ratios (S/N) to facilitate the anticipated science.There are ∼ 100 and ∼ 50 quiescent galaxy candidates that meet our criteria in the low-and high-redshift intervals, respectively.
The large survey area enabled us to identify four pointings for which we observe at least bright five distant quiescent galaxies, simultaneously.Two pointings target galaxies at 1.30 < z < 1.50 and the other two target galaxies at 1.92 < z < 2.28.In total, we have 21 primary targets.No other pointing allowed the observation of at least 5 primary targets within a single MOSFIRE field of view.The four pointings are shown in Figure 2, in comparison to the photometric coverage in HST/WFC3-IR F160W.In Table 1 we list the mask parameters of all LRIS and MOSFIRE masks.This sample size is an improvement of an order of magnitude compared to the 2 galaxies at z ∼ 2 for which spectra of comparable depth and wavelength coverage were previously available (Kriek et al. 2006;Jafariyazani et al. 2020).a In 2021A one of the red detectors was not working.To observe all galaxies, we used two masks, which differed by 180 degrees, and only one of the detectors.The masks were observed for 3.5 and 3.75 hr.In 2021B a new detector had been installed, and we reobserved the original mask for 2.75 hr.Hence, the integration times varied per galaxy, depending on the location in the mask.
The remaining slits were placed on fainter quiescent and star-forming galaxies.We prioritized galaxies at similar redshift.For the Heavy Metal 3 and 4 masks, we also added quiescent galaxies at z ∼ 1.4, though for these galaxies we lack LRIS observations, which target the most prominent absorption lines for these redshifts.

Observing Strategy
The Heavy Metal survey was executed over eight semesters, ranging from 2016B to 2021B.In total, 26 nights were allocated, though half of the nights were lost due to bad weather or technical problems.The primary goal of the Heavy Metal survey is to measure faint absorption lines, in particular around 5000 Å.This regions is targeted by MOSFIRE J-band and H-band for the z ∼ 1.4 and z ∼ 2.1 pointings, respectively.We require integration times of ∼12 and ∼16 hr, respectively, for z ∼ 1.4 (J-band) and z ∼ 2.1 (H-band), and used our best imaging conditions for these observations.Second priority is the Balmer/4000 Å break region, which has more prominent features and thus requires slightly shorter integration times.This region was observed for ∼ 4 and ∼ 12 hr, respectively, with LRIS and MOS-FIRE J-band for the z ∼ 1.4 and z ∼ 2.1 pointings.Finally, for all four pointings we took shorter integrations (∼1-2 hr) of the wavelength regions around Hα, to assess whether the galaxies have any nebular line emission.This wavelength region is observed with MOSFIRE Hband and K-band for z ∼ 1.4 and z ∼ 2.1, respectively.These observations were planned to be taken under our least-favorable seeing conditions.In Table 1, we sum-marize the observing settings and integration times for all masks and filters.
The MOSFIRE slits were configured with a width of 0. ′′ 7, and have a minimum length of 7 ′′ .The LRIS slits were 1 ′′ wide, with a minimum length of 10 ′′ .For all masks we used a minimum of five stars for the alignment.With MOSFIRE, the galaxies were observed using an ABA ′ B ′ dither pattern, and with the longer LRIS slits we used an ABC dither pattern.Both dither patterns are preferred over an ABBA dither pattern as they result in better background subtraction and higher S/N (see Appendix A in Kriek et al. 2016).
In all masks we observed at least one star in a slit.These "slit star" observations have three advantages.First, they enable us to monitor the seeing and possible drifts while observing.Second, the profiles and positions of the slit stars aided the data reduction, such that we could accurately register and weigh the individual science frames.Third, the slit star was used in the flux calibration, as explained in Kriek et al. (2016) and in the next section.

Data Reduction
The MOSFIRE data are reduced using a custom software package that was originally developed for the MOS-DEF survey (Kriek et al. 2015).This package is all automated, working with a single parameter file input, indicating the mask and target name, directories to raw frames and mask files, filter to be reduced, and path and name of photometric catalog to be used for the flux calibration.The first step is to read in all headers and identify the science and calibration frames.Next, a master dome flat frame is made, which is used to correct all science frames for pixel-to-pixel sensitivity variations and to trace the edges of all spectra.Next, we do an initial background subtraction of all science frames by subtracting the average of the previous and following frame.For the first and last science exposure, we only use one adjacent science frame as sky frame.The next step is to derive the wavelength solution using bright isolated sky lines.For this step we use the edge solutions from the master flat frame.This procedure is all automatic, as the position of the slit gives us a rough position of where to expect the sky lines.For the K band we also use the arc lamp frames, allowing for an offset (i.e., flexure) between the sky lines and the arc lines.The final ingredient for the rectification is the position of the galaxies in the spectra.The exact position is a combination of the assigned dither position and the observed drift (about 1 pixel hr −1 , see Kriek et al. 2015).We use the wavelength and edge solution to derive this position in all science frames for the slit star.Thus, for each frame we collapse the slit star spectrum along the wavelength direction and measure the position, FWHM of the seeing, and throughput.This position, combined with the wavelength and edge solutions, now gives us a transformation from the raw to the reduced frame for each science exposure.
Using the transformations derived in the previous step, we now perform an additional background subtraction on the (unrectified) science frames.We do this step before resampling, so we can better model the remaining sky.We run L. A. Cosmic (van Dokkum 2001) on the cleaned frames and combine the cosmic-ray map with the available MOSFIRE bad pixel map.The "cleaned" frames are now resampled to the final frame in a single transformation.We apply this same transformation to the sky and mask frames for each science exposure.Finally, we combine all science frames for each galaxy and filter, while weighing the frames using the throughput and seeing, and excluding all masked pixels.We also make a final weight map for each object and filter, as well as two noise frames, one based on the frame-toframe variations and one on the sky and read-out noise.For more details on these steps, see Kriek et al. (2015).
All spectra are calibrated for the relative response using telluric standards.Instead of observing new telluric standards for each science exposure, we make use of the library collected by the MOSDEF survey.For each mask and filter we construct a response spectrum from multiple telluric standards observed at similar airmass, combined with the stellar spectrum of a star of the same spectral type.The telluric spectra are reduced using a similar procedure as the science spectra.See Kriek et al. (2015) for more information on the construction of the response spectra and the motivation for using this procedure.
Lastly, we generate one-dimensional (1D) science and error spectra for both primary and filler targets through an optimal weighing technique, as outlined by Horne (1986), followed by absolute flux calibration.Our employed MOSDEF software initially conducts absolute flux calibration for each galaxy by applying a scaling factor that is derived by comparing the 1D spectrum of a slit star to its integrated photometry.This step effectively performs a slit-loss correction for point sources.However, for all primary galaxies we detect the stellar continuum, and thus we directly scale the spectra to their respective broadband photometry (see Sect. 3.1).
For the LRIS reduction we follow a similar procedure as for the MOSFIRE spectra.The only major difference is the calibration, as we do not have a library of telluric standards.To correct for atmospheric transmission features, we use the slit star spectrum combined with a theoretical sky spectrum.Furthermore, we calibrate each 1D science spectrum individually using the photometric data in the overlapping wavelength regime.See Kriek et al. (2019) for more information.

Data Overview
In Figures 3 and 4 we present an overview of the Ultra-VISTA photometric spectral energy distributions (SEDs Muzzin et al. 2013b, left column), 1D spectra (middle two columns), and the F160W image from COSMOS-DASH (Mowla et al. 2018;Momcheva et al. 2017) for all primary targets.The position of the LRIS (yellow) and MOSFIRE (blue) slits are shown in the images, as well.
Figure 3 shows the galaxies in Heavy Metal 1 and 2, targeting z ∼ 1.4.For these two masks, we show the LRIS and MOSFIRE J-band spectra, all shifted (in wavelength only) to rest frame.We observe multiple Balmer absorption lines (yellow dotted lines) and the two CaII lines (green) around 4000 Å for all 11 galax-ies.Most galaxies also show clear MgI and several FeI lines (red) in their MOSFIRE spectra.None of the targets show Balmer emission lines in their LRIS and MOSFIRE-J spectra.Nonetheless, three galaxies have either [O ii] or [O iii] in emission.We will further discuss these emission lines in Section 4.
Figure 4 shows the primary quiescent galaxies at z ∼ 2.1 targeted by the Heavy Metal 3 and 4 masks.Instead of LRIS and MOSFIRE J-band, we now show MOSFIRE J-band and H-band in the middle two columns.Two of the primary targets (59375 and 60736) scatter out the intended redshift regime (1.92 < z < 2.28), and thus their spectra do not cover all targeted absorption lines.Nonetheless, we detect additional absorption lines, such as Na for these galaxies.
Considering the remaining eight targets, six of them show several Balmer absorption and metal lines in their spectra.Two galaxies, 55878 and 59449, do not show any clear absorption lines, but their emission lines do reveal their redshifts.Galaxy 55878 has strong asymmetric emission lines, most likely originating from an AGN, and no absorption lines are detected.This galaxy will be discussed in detail in Ma et al. (in preparation).Galaxy 59449 shows two [O iii] emission lines in its spectrum, but no absorption lines are detected.We would have expected to detect some continuum features, and thus the line and continuum emission may not originate from the same galaxy.However, we could not identify a redshift solution just from the continuum emission.
In Figure 5, we present spectra in the Hα region for all primary targets.To illustrate whether Hα is detected, we zoom in on a small wavelength region and show the continuum-subtracted 1D spectra (see Sect. 3.2).This spectral range does not encompass critical absorption features; these observations were taken to assess whether the galaxies have any Hα emission.Hence, the spectra acquired in this band are shallower compared to the deeper spectra shown in Figures 3 and 4 (refer to Table 1 for details).It is worth noting that the spectrum of 59375 is significantly deeper, as Hα falls within the H band, where ultradeep observations were conducted, rather than the K band.Galaxy 60736's spectrum is not included, as there is no coverage of Hα.
Except for 55878 (an AGN), none of the galaxies exhibits strong Hα emission in their 2D or 1D spectra.Nonetheless, several galaxies show very faint Hα and [N ii] emission lines, in particular after the continuum removal, as this step corrects for the underlying Balmer absorption feature.In Section 3.2 we describe our methodology to measure all Hα lines in order to derive constraints on the star formation rates (SFRs).   .The 2D and 1D spectra in wavelength regions around the Hα spectral feature for all primary Heavy Metal galaxies.For each 1D spectrum the continuum has been removed.The 1D spectra are shown in black, binned to 3 (unmasked) pixels, and in gray we show the corresponding error spectrum.The yellow fit presents the best-fit emission-line model to Hα and the two [N ii] lines, with the 68% uncertainty shown by the shaded yellow region.The vertical red dotted lines indicate the location of the three emission lines.Galaxy 55878 is the only one with strong Hα in emission and will be discussed in Y. Ma et al (in preparation).We have 8 additional galaxies with marginal (> 3 σ) Hα detections (see Table 2) and 11 galaxies for which we derive upper limits on the Hα flux.One galaxy (59375) has a significantly deeper spectrum; due to a incorrect photometric redshift, the line was observed in a different filter (H) than expected (K).

METHODOLOGY
In this section, we outline the methods we employed to determine the spectral, photometric, and structural properties of the Heavy Metal galaxies.We begin by deriving the spectroscopic redshifts, emission-line fluxes, stellar population characteristics, and rest-frame U V J colors for both the primary and filler galaxies (Sect.3.1).In Section 3.2, we detail our approach to measuring the Hα emission-line fluxes and subsequently calculating the SFRs for our primary, quiescent targets.Lastly, in Section 3.3, we present the methodology used to derive the galaxy structures and estimate dynamical masses for the primary Heavy Metal galaxies.

Redshifts and stellar population properties
For all primary quiescent galaxies, we derive a spectroscopic redshift and stellar population properties by simultaneously fitting the spectra and the UltraVISTA photometry with the Flexible Stellar Population Synthesis models (FSPS; Conroy et al. 2009;Conroy & Gunn 2010).We assume an exponentially delayed star formation history, the average Kriek & Conroy (2013) dust attenuation law, and the Chabrier (2003) initial mass func- tion (IMF).We use a custom version of the fast fitting code (Kriek et al. 2009), in which the automatic scaling of the spectra to the photometry has been improved1 .
To facilitate comparison with the full galaxy distribution from which the galaxies are selected, we assume solar metallicity.fast does not fit for the absorption-line broadening, and thus we fit binned spectra.
For galaxy HM4-55878 the emission lines are very strong and affect the broadband spectral shape.Thus, for this galaxy we first correct the photometry for the emission-line fluxes (see Sect. 5).Furthermore, we mask the wavelength regions affected by emission lines while fitting.The strong lines also affect the absolute calibration of our spectra, and our default method does not work (see Sect. 2.3).Instead, for this galaxy we use the filter curves and integrated broadband magnitudes, corrected for the partial overlap between the spectra and filter curve.
The resulting best-fit redshifts, stellar masses, SFRs, and magnitudes of dust attenuation (A V ) are listed in Table 2.The typical uncertainties on the stellar mass, SFR, and A V are 0.1 dex, 0.2 dex, and 0.1 mag, respectively.These uncertainties include the flux uncertainties as well as variations in the various assumptions (except for the IMF) and the stellar population synthesis model (fsps; Bruzual & Charlot 2003;Maraston 2005).The best-fit models are shown in Figures 3 and 4. For displaying purposes, we show the original models convolved to the velocity dispersion of the spectra, as derived by Beverage et al. (2023b).
While most stellar continuum fits look reasonable, there are a few exceptions.First, for HM1-213931 the fit is quite poor, probably because it is a blended spectrum of multiple galaxies, which have a velocity offset.Though we cannot deblend the spectra of the sources, we find a different spectrum when assuming different weighing profiles for the extraction (see Beverage et al. 2023b, for more information on the implications).For HM4-59449 we do not see any clear absorption lines, and the redshift is based on the faint [O iii] emission lines.
For the filler galaxies, we derive spectroscopic redshifts by fitting the emission lines.The majority of the filler targets show multiple emission lines, resulting in robust spectroscopic redshifts.For the z ∼ 1.4 masks, we observed different filler galaxies for the LRIS and MOS-FIRE masks.This strategy results in a larger number of filler galaxies, but a lower success rate of confirming the spectroscopic redshift.For the z ∼ 2.1 masks, we use the same filler galaxies in the different settings and thus had more wavelength coverage to detect possible spectral features.
Finally, for all galaxies in the observed Heavy Metal masks we determine rest-frame U , V , and J colors using the EAzY (Brammer et al. 2008) code.When available, we assume the spectroscopic redshift, otherwise we adopt the photometric redshifts provided by (Muzzin et al. 2013b).Each rest-frame magnitude is determined individually, using a fit to just the surrounding photometric datapoints.Hence, the colors are not based on the best-fit stellar population model to the full spectrum.

Hα SFR measurements
For the primary quiescent targets, we measure the Hα emission-line flux.We use the best-fit stellar population model convolved to the best-fit velocity dispersion as the continuum model.This approach ensures that we incorporate the underlying Balmer absorption.To derive the fluxes and correct for emission-line blending, we fit Hα and the two [N ii] at 6548 Å and 6584 Å, simultaneously.For our model spectrum, we use three Gaussians with the same velocity dispersion and a fixed ratio between the two [N ii] lines of a factor of 3. The redshift is fixed to the best-fit absorption-line redshift.If none of the lines are clearly visible, the velocity dispersion of the emission lines cannot exceed the stellar velocity dispersion by more than 1 σ.The minimum allowed velocity dispersion is set by the spectral resolution.
We derive the uncertainties on the emission-line flux measurements using Monte Carlo simulations.We make 500 realizations of the spectrum around Hα, by perturbing the fluxes following the error spectrum.For each realization we also allow for variations in the subtracted continuum, assuming an uncertainty on the Hα absorption line strength of 5%.For each realization we fit all three lines using the same method as for the actual spectrum.We derive the 16% and 84% confidence intervals 4.0 +0.9 6.1 +0.9 a All limits are 3 σ on the emission-line fluxes from the resulting distribution.In Figure 5 we show these fits and confidence intervals for all 20 galaxies with coverage of Hα.For galaxies that do not have a 3σ detection for Hα, we derive the 3σ upper limit.All values are given in Table 3.
For galaxy 213947 we have to do additional masking to derive the Hα flux, as the 2D spectrum partially overlaps with the (negative) spectrum of a close galaxy.This nearby galaxy only has emission lines and no continuum emission, and thus only a small wavelength range is affected.Thus, for this galaxy, we mask the wavelengths that are contaminated by the emission lines of the nearby galaxy.
We convert the integrated Hα flux to the integrated luminosity using the spectroscopic redshift.In order to correct this line for dust attenuation, we ideally would use the Balmer decrement (Hα/Hβ).However, with the exception of 55878, Hβ is too faint to yield a useful Balmer decrement measurement, and thus we use the stellar attenuation for the dust correction, instead.We do note, however, that nearly all galaxies have a bestfit A V = 0.For 55878, we do use the Balmer decrement (Hα/Hβ=4.43)for the dust correction.Finally,  d Bad Galfit fit, no uncertainties available.We adopted uncertainties of 25%, ±1 and ±0.1 for Re, n and and q, respectively.
we adopt the conversion by Kennicutt (1998) for solar metallicity and a Kroupa ( 2001) IMF (which is comparable to the Chabrier 2003, IMF) to derive SFRs for all galaxies with Hα coverage (see Table 3).The majority of the galaxies do not have detected Hα emission and thus we derive a 3 σ upper limit on the SFR.
Two galaxies stand out in Figure 5. First, galaxy 55878 has very strong emission lines, and we will further discuss this galaxy in Sect.4.2).Second, HM4-59375 stands out, as despite the low Hα flux and resulting SFR, the galaxy has significantly detected emission lines.For this galaxy the spectrosopic redshift of z spec = 1.552 is significantly lower than the photometric redshift used in the selection.Hence, Hα does not fall in the K band, as do the other galaxies in its targeted redshift regime, but in the H band for which the observations are significantly deeper.For galaxy 60736, the spectroscopic redshift falls outside the selection window, and thus we have no coverage of the Hα wavelength re-gions (see Fig. 1).Hence, this galaxy is missing from Figure 5.

Structural Measurements and Dynamical Masses
The Heavy Metal pointings overlap with the COSMOS/ACS-F814W (Scoville et al. 2007) and the COSMOS-DASH/WFC3-IR-F160W imaging (Momcheva et al. 2017;Mowla et al. 2018;Cutler et al. 2022), enabling structural measurements.We derive galaxy sizes from the F814W images by fitting singlecomponent Sérsic models with Galfit (Peng et al. 2002), following the technique described in Beverage et al. (2021).For the COSMOS-DASH imaging, we adopt the structural measurements by Cutler et al. (2022).
For the z ∼ 1.4 galaxies, we use both the F814W and F160W structural parameters in our analyses, listed in Table 4.We derive the structural parameters (R e,major , n, q) at rest-frame 5000 Å using interpolation.For HM1-213947, the F160W image results in a bad Galfit fit, and thus for this galaxy we only use F814W.We use Equation 1 by van der Wel et al. (2014) to correct the size to rest-frame 5000 Å.For the z ∼ 2.1 galaxies we only use the F160W measurements, as these galaxies are not or barely detected in F814W.We also correct these size measurements, standardizing them to the rest-frame 5000 Å wavelength (see Table 4), following van der Wel et al. (2014).These corrections are generally subtle, fluctuating within the range of -0.01 to 0.009.For three galaxies, no F160W size measurements were available, as either the fit failed or there was no coverage.For three additional galaxies, the fit was qualified as "bad".We nonetheless use these structural measurements in the subsequent analysis for the two galaxies (HM3-103236 and HM4-56163) for which no ACS measurements are available.Nonetheless, we flag these galaxies in the subsequent figures.For these galaxies no uncertainties are available in the catalogs by Cutler et al. (2022), and thus we adopt uncertainties of 25%, ±1.0 and ±0.1 for R e , n, and q, respectively.
We also use the Galfit parameters to refine our stellar mass measurements and ensure their consistency with other structural measurements.For this refinement process, we derive a mass correction factor by comparing the integrated magnitude from Galfit with the magnitude from the corresponding filter band in the photometric catalog.For the F814W and F160W filters, which are absent from the UltraVISTA catalog by (Muzzin et al. 2013b), we compute their magnitudes by integrating the best-fit fast model using the respective filter curves.For the z ∼ 2.1 galaxies, we combine the correction factors from F160W and F814W following their proximity to rest-frame 5000 Å.The Galfit magnitudes are typically fainter than the catalog magnitudes by 0.094, resulting in an average mass correction of -0.038 dex.However, for some galaxies, in particular blended systems such as HM1-213931, the mass corrections can be as large as -0.28 dex.The corrected masses (M * ,c ) for the galaxies with structural measurements are listed in Table 4.
The Heavy Metal spectra yield velocity dispersion measurements (σ v ) for all but two galaxies, as described in our accompanying paper (Beverage et al. 2023b).These measurements are derived using the absorptionline fitter (alf) code (Conroy & van Dokkum 2012;Choi et al. 2016;Conroy et al. 2018).Beverage et al. (2023a) shows that the alf velocity dispersions are in perfect agreement with those found by ppxf (Cappellari & Emsellem 2004) for a large sample of z ∼ 0.7 quiescent sample of galaxies.We increase the measured velocity dispersion measurements (σ v ) by 4% to obtain the ve-locity dispersion within 1 r e (σ v,e ) (see van de Sande et al. 2013).
The velocity dispersions and structural measurements together enable an estimate of the dynamical mass.We still have a poor understanding of the internal stellar dynamics within these galaxies.A few resolved investigations of three lensed distant quiescent galaxies have hinted at the presence of rotational support to varying degrees (Newman et al. 2015(Newman et al. , 2018a;;Toft et al. 2017).Nonetheless, due to our limited knowledge and to facilitate comparison with similar works, here we define dynamical mass as with β(n) = 8.87 − 0.831n + 0.0241n 2 , the virial constant for a spherical isotropic model described by profile R For R e we take the circularized radius (R e = R e,major √ q) at rest-frame wavelength of 5000 Å.
The resulting dynamical masses are listed in Table 4.

RESULTS
While quiescent galaxies have been studied extensively throughout cosmic time, the majority of these investigations have relied on photometric data.The absence of spectroscopic information may lead to biases in our photometric redshifts, stellar masses, and stellar population properties.Consequently our studies of the buildup and growth of galaxies over cosmic time may be biased, as well.The Heavy Metal survey provides redshifts for a significant sample of distant quiescent galaxies, resulting in more accurate stellar population properties.Additionally, the presence of absorption lines facilitates kinematic and chemical composition studies, while emission lines offer an alternative avenue for examining their star formation characteristics.In Section 4.1 we examine our galaxy sample and compare it with the parent galaxy sample from which the spectroscopic sample was drawn.In Section 4.2 we present the star formation properties of the primary Heavy Metal galaxies and assess whether they indeed have quiescent stellar populations.Moving to Section 4.3, we discuss their structural properties, and finally, in Section 4.4 we compare the stellar masses to the dynamical masses.

Galaxy sample and success rate
Our primary galaxy sample is selected to have quiescent stellar populations, be relatively bright, and fall in two redshift intervals, 1.30 < z < 1.50 and 1.92 < z < 2.28.In Figure 6 we show the photometric versus spectrospic redshifts of the primary (circles) and .Photometric vs. spectroscopic redshift for all spectroscopically confirmed primary (circles) and filler (squares) targets in the Heavy Metal survey.We measured a spectroscopic redshift of all primary targets.For the filler galaxies the success rate was much lower with 65%.The normalized medium absolute deviations (σ nmad , Brammer et al. 2008) between the photometric and spectroscopic redshifts are 0.017 and 0.014 for the filler and primary galaxies, respectively.The shaded areas present the targeted redshift intervals used in the selection.Two of the primary targets scattered out the targeted redshift interval.filler (squares) galaxies, as well as the distribution of the spectroscopic redshifts.Most primary galaxies fall in or very close to the selection windows, and their photometric and spectrscopic redshifts agree well with a normalized median absolute deviation in ∆z/(1 + z spec ) of σ nmad = 0.014.The only exception is HM4-59375, which has a significantly lower redshift than predicted by the photometry.This figure also shows the filler galaxies.These galaxies are drawn from a larger redshift distribution, though galaxies at similar redshifts were prioritized.The scatter for the filler galaxies is slightly larger with a σ nmad = 0.017, which may be explained by their fainter magnitudes.
The histogram in Figure 6 shows that the spectroscopic redshifts of the primary and filler targets are clustered, and several potential overdensities may exist, specifically at z ∼ 1.40 (HM1), z ∼ 1.42 (HM2), z ∼ 2.16 (HM4), and z ∼ 2.23 (HM3).This finding is not surprising, as we specifically selected pointings for which we can observe multiple quiescent galaxies in one field of view.A further investigation into these overdensities is beyond the scope of this paper.Nevertheless, when interpreting our results, it is important to keep in mind that the environments in which our galaxies reside may not be typical for distant quiescent galaxies.
In Figure 7 we show all primary and filler targets in magnitude versus redshift and rest-frame U − V versus V − J space.The left panels show the galaxies in Heavy Metal 1 and 2, while the right panels show galaxies in Heavy Metal 3 and 4. The boxes in the top panels enclosed by the dotted lines indicate the primary target selection in terms of magnitude and redshift.In contrast to Figure 6, here we show both the confirmed filler galaxies (large squares) and the filler galaxies for which we did not measure a spectroscopic redshift (small squares).The top-left box in the bottom panels enclosed by the solid lines indicates our quiescent galaxy selection (red symbols; Muzzin et al. 2013a).Galaxies outside the box are generally identified as star-forming galaxies (blue symbols).
While we measure spectroscopic redshifts of all primary targets, for the filler targets the success rate is lower with 71% (42/59) and 53% (17/32) for the z ∼ 1.4 masks and z ∼ 2.1 masks, respectively.There are several reasons for the lower success rate of the fillers.First, for Heavy Metal 1 and 2, most fillers are only targeted by either MOSFIRE or LRIS.Second, many fillers are faint quiescent targets, for which we do not detect clear absorption lines.The few faint quiescent fillers that are confirmed all have emission lines in their spectra.However, in the Heavy Metal 3 and 4 masks, there are several quiescent filler targets at z ∼ 1.4 that are as bright as the faintest primary targets.Unfortunately, we do not capture the 4000 Å region crucial for spectroscopic redshift measurements for these galaxies.Finally, for several star-forming fillers, the emission lines may be outside the atmospheric windows.For example, we find no confirmed star-forming galaxies below z = 2 in the Heavy Metal 3 and 4 masks.
Based on the photometric redshifts, all primary targets were initially selected to be quiescent.However, when rederiving the rest-frame colors using the spectroscopic redshifts, two of the primary targets (HM3-107590 and HM4-55878) shift just outside the quiescent box.Given their location, though, we expect these galaxies to be post-starburst or young quiescent galaxies (e.g., Whitaker et al. 2012;Belli et al. 2019;Suess et   .Apparent magnitude vs. redshift (top panels) and rest-frame U − V vs. V − J colors (bottom panels) for all galaxies observed in the LRIS and MOSFIRE masks.The Heavy Metal 1 and 2 masks (left panels) primarily targeted bright (J < 21.6) quiescent galaxies at 1.3 < z < 1.5.The Heavy Metal 3 and 4 masks (right panels) primarily targeted bright (H < 21.8) quiescent galaxies at 1.92 < z < 2.28.Quiescent galaxies (red symbols) were selected by their red U − V and blue V − J colors, as in indicated by the selection box.However, several primary galaxies scattered out of the boxes when including the spectral information.The fillers are star-forming (blue symbols) and fainter quiescent galaxies at similar or higher/lower redshifts.The filler galaxies for which a spectroscopic redshift was measured are indicated by the larger symbols.We also show the parent UltraVISTA galaxy distribution from which the samples were drawn.The U V J panels only include the UltraVISTA galaxies in the targeted redshift intervals.
2021; Park et al. 2023), and thus still have quiescent populations.We will further assess their star-formation properties in the next section.
Finally, we compare our primary galaxies to the parent galaxy distributions at 1.30 < z < 1.50 and 1.92 < z < 2.28 from which the targets are drawn.At z ∼ 1.4 the primary targets do sample nearly the full distribution along the quiescent sequence, though there is a bias toward bluer colors.The quiescent galaxies at z ∼ 2.1 span a larger range along the quiescent sequence, but on average are also biased toward the bluer and younger systems.This bias is expected, as our bright magnitude limit favors galaxies with lower mass-to-light ratios (M/L), which are generally bluer and younger.Obtaining a more representative sample would require significantly longer integration times and larger surveys, and thus necessitates more efficient telescopes and spectrographs, such as NIRSpec on JWST.

Star formation constraints
All primary Heavy Metal galaxies are selected to have quiescent stellar populations based on their rest-frame .SFR derived from the Hα emission lines vs. the best-fit SED SFR and stellar mass for all primary galaxies with coverage of Hα.Orange and red symbols represent galaxies at z ∼ 1.4 (Heavy Metal 1 and 2) and z ∼ 2.1 (Heavy Metal 3 and 4), respectively.For galaxies without detected Hα we show a 3σ upper limit.For all but two galaxies with detected Hα, the [N ii]/Hα > 0.45 (white plusses), implying that the Hα flux is not dominated by star formation.Thus, for these galaxies, the Hα SFR is overestimated and more comparable to an upper limit.Consequently, the Hα SFRs (and limits) are larger than the SED SFRs, as illustrated in the left panel.In the right panel, we compare the Hα SFRs with the star-forming main sequence from Leja et al. (2022) at z ∼ 1.4 (red shaded area) and z ∼ 2.1 (orange shaded area).Except for HM4-55878, which has bright emission lines originating from a luminous AGN, all other galaxies are significantly below the star-forming main sequence.When using SED SFRs, they would shift to even lower values.
U V J colors.In this section, we assess whether these galaxies indeed have low SFRs using both the stellar continuum emission and their emission-line properties.
The SFRs derived from fitting the stellar spectra and photometry with SPS models are listed in Table 2. Except for HM4-55878, all primary galaxies have best-fit SFRs of < 1 M ⊙ yr −1 .HM4-55878 has a significantly higher SFR than expected based on its U V J colors and the initial photometric analysis.This disparity is attributed to the influence of strong emission lines on the broadband SED.In our fitting procedure, we first adjusted the broadband photometry to account for the impact of these lines, as outlined in Section 3.1.
When examining the Hα SFRs, we find a similar result.With the exception of HM4-55878, the primary galaxies exhibit either very faint or undetectable Hα emission.Among the nine galaxies where Hα is detected at > 3σ, seven display [N ii]/Hα ratios exceeding 0.45, implying that star formation is likely not the primary ionization source (e.g., Baldwin et al. 1981;Kauffmann et al. 2003;Kewley et al. 2006).Our study supports previous findings that high [N ii]/Hα ratios are common in distant quiescent galaxies (e.g., Kriek et al. 2007;Newman et al. 2018b;Belli et al. 2017b).Although such line ratios are commonly associated with photoioniza-tion by AGNs (e.g., Kauffmann et al. 2003;Kewley et al. 2006), in quiescent galaxies, they are thought to originate from the photoionization by hot evolved stars, including post-asymptotic giant branch stars (e.g., Yan & Blanton 2012;Belfiore et al. 2016).For the majority of the galaxies, we do not have a meaningful measurement of [O iii]/Hβ, and thus we cannot further assess the origin of the line emission in our sample.Only HM-55878 has a significant detection for all lines, with its line ratios suggesting an AGN (Y.Ma et al. in preparation).HM1-213931 and HM4-56163 have [N ii]/Hα < 0.45, and thus star formation is likely the dominant ionization source.These galaxies have low SFRs of 4-5 M ⊙ yr −1 , but the uncertainties are significant.For HM1-213931 we also do not see a clear emission line in the 2 D spectrum (see Fig. 5).
In Figure 8 (left panel) we compare the two SFR measurements.For consistency, both measurements assume solar metallicity and a similar IMF (Kroupa vs. Chabrier).For galaxies that have no detected Hα emission, we show the 3σ upper limit (triangles).For galaxies with detected Hα, we mark the ones for which star formation is not the primary ionization mechanism by a plus.For these galaxies the SFRs are overestimated, and the values should been regarded as upper limits.Addi- Quiescent galaxy (V-J > 0.7) Post-starburst (V-J < 0.7) Figure 9.The left panel shows the effective radius (major axis) at rest-frame 5000 Å vs. stellar mass for the primary Heavy Metal galaxies.Red and orange symbols depict galaxies at z ∼ 1.4 and z ∼ 2.1, respectively.Stars and circles represent post-starburst (V − J < 0.7) and older quiescent galaxies, respectively.The white crosses indicate the galaxies with bad Galfit fits.Heavy Metal galaxies at z ∼ 1.4 exhibit a bias toward smaller sizes, likely due to our selection criteria favoring younger galaxies.This bias indeed diminishes when excluding post-starburst galaxies.The right panels show how both M stellar and Re,major relate to the velocity dispersion (σv,e).While the post-starburst galaxies are smaller, their velocity dispersions are comparable to those of older quiescent galaxies of the same stellar mass.
tional attenuation toward H ii regions, however, could potentially have resulted in an underestimation of the Hα SFRs, as indicated by the arrow (right panel).
Figure 8 shows that all SFR upper limits from Hα are not inconsistent with the SED SFRs.The galaxies for which Hα does not originate from star formation are all located above the 1-to-1 line as well.Only for HM1-213931 and HM4-56163, for which Hα is thought to originate from star formation, the two SFRs are inconsistent.HM1-213931 seems to be a merger of several galaxies (see Fig. 3), and thus it may not be surprising to find a low SFR of 4 ± 1 M ⊙ yr −1 .In particular, different physical regions for the stellar and nebular components may explain the discrepant values.The low [N II]/Hα ratio implies low metallicity, which makes it more likely that the star formation is either fueled by low-metallicity infalling gas or associated with a nearby smaller galaxy.Belli et al. (2017b) find similarly low levels of (metal-poor) star formation activity in distant quiescent galaxies, which they attribute to rejuvenation events due minor mergers or inflowing gas.Higher spatial resolution spectra will be needed to examine the different components and further assess this galaxy.
In the right panel of Figure 8 we show the Hα SFRs versus stellar mass, in comparison the star-forming main sequence (ridge) from Leja et al. (2022) at similar redshifts (and for a similar IMF).Except for AGN HM4-55878, all primary quiescent targets are significantly below the star-forming main sequence at its redshift.Furthermore, the majority of these data points are upper limits, either because Hα is undetected or because Hα is not originating from star formation.Hence, except for HM4-55878, all galaxies have indeed strongly suppressed star formation.

Galaxy structures
Quiescent galaxies follow a size-mass relationship, where galaxies with greater mass or luminosity exhibit larger effective radii (e.g., Kormendy 1977;Shen et al. 2003).This relationship evolves over cosmic time, with galaxies at greater distances appearing more compact (e.g., Trujillo et al. 2006;van Dokkum et al. 2008;van der Wel et al. 2014;Mowla et al. 2018;Suess et al. 2019a,b).In Figure 9, we compare the half-light radii at rest-frame 5000 Å of the Heavy Metal galaxies with the average size-mass relation at z = 1.25, z = 1.75, and z = 2.25 as reported by Mowla et al. (2018) (using the same IMF) for a large representative sample of massive quiescent galaxies.To ensure consistency with prior research, we consider the major axis (noncircularized) as .Left: dynamical vs. stellar mass for all primary Heavy Metal galaxies for which sizes and stellar velocity dispersion could be measured, assuming a Chabrier ( 2003) IMF.The white crosses indicate the galaxies with bad Galfit fits.We also show the galaxies by Belli et al. (2017a) with comparable redshifts (z > 1.35).Right: the difference between the dynamical and stellar mass vs. the velocity dispersion (σv,e), age of the galaxy, axis ratio b/a, and Sérsic index n.The ages, adopted from Beverage et al. (2023b), are derived using alf and present the luminosity-weighted age.In all panels, the galaxies are color coded by their effective radii at rest-frame 5000 Å (major axis).For the majority of the galaxies, the dynamical mass exceeds the stellar mass, with a median dark matter fraction of 28%.Three galaxies have stellar masses that exceed their dynamical masses.Interestingly, three galaxies with low M dyn /M stellar are the smallest and youngest galaxies in the sample.
the half-light radius.In contrast to Figure 8, here we use the stellar masses that are corrected using the Galfit magnitudes, to make them consistent with the size measurements (see Sect. 3.3).Mowla et al. (2018) also applied this correction.When comparing the galaxies at z ∼ 1.4 to the relations at z = 1.25 and z = 1.75 (Fig. 9), we find that, on average, the Heavy Metal galaxies are smaller.This trend can likely be attributed to our selection criteria favoring quiescent galaxies with lower M/L, indicative of younger ages.Several studies have indeed highlighted that younger quiescent galaxies have smaller half-light radii than their older counterparts of equivalent mass (e.g., Whitaker et al. 2012;Belli et al. 2015;Yano et al. 2016;Almaini et al. 2017;Maltby et al. 2018;Wu et al. 2020;Suess et al. 2020Suess et al. , 2021;;Setton et al. 2022).When excluding the youngest galaxies (stars), as identified by their blue V − J (< 0.7) colors (e.g., Belli et al. 2019;Beverage et al. 2021), we find a good agreement between the relations by Mowla and the Heavy Metal galaxies at z ∼ 1.4.The sizes of the z ∼ 2.1 Heavy Metal galaxies are more challenging to compare, as there are only five galaxies with robust size measurements, of which two have blue V − J colors.Our primary quiescent galaxies also have similar sizes to the spectroscopic galaxy sample by Belli et al. (2017a), when including galaxies at similar redshifts (1.35 < z < 2.45).
In Figure 9 we also show the stellar masses and sizes in relation to their velocity dispersions.These panels show that the youngest galaxies, despite their small sizes, have similar velocity dispersions as older galaxies of the similar mass.The Heavy Metal galaxies follow a distribution roughly similar to that of the sample by Belli et al. (2017a) in both diagrams.

Comparison of dynamical and stellar masses
The combination of deep Keck spectra with highresolution HST imaging enables dynamical mass measurements for the majority of the primary Heavy Metal galaxies, as listed in Table 4.In addition to the stellar content, the dynamical mass also includes the dark matter and gas components.Thus, in theory, the dynamical masses should give us insights into these dark components.In practice, however, this is extremely challenging, as both the stellar and dynamical mass measurements rely on many assumptions (see Sect. 3.3).Nonetheless, the boundary condition that the stellar mass should not exceed the dynamical mass provides an  5).Symbols are similar as in Figure 10.The galaxies are color coded according to their velocity dispersion following the color bar in Figure 12.The vertical light-gray stripes indicate the full range and the median is indicated by the dark-gray horizontal bar Right: the distribution in M dyn /M * ,c for the individual galaxies for all ten assumption sets.The galaxies are ordered by increasing velocity dispersion from left to right.Each vertical stripe corresponds to a separate galaxy and each color to a different assumption set.Treu et al. (2010) for the cores of nearby early-type galaxies, subsolar metallicity, and halfmass radii.For this combination of assumptions, the scatter in M dyn /M * ,c is smallest.However, the stellar mass exceeds the dynamical mass by 34%.This tension may imply that distant quiescent galaxies do not simply grow inside-out into present-day massive early-type galaxies.
independent check on our stellar mass measurements, and may give us insights into assumptions that went into our mass estimates.
In Figure 10 we show the dynamical vs. stellar mass for the primary Heavy Metal galaxies.For the majority of the galaxies, the dynamical mass exceeds the stellar mass, with a median dark matter fraction of 28%.For two galaxies the dynamical masses are below their stellar masses, with one galaxy (HM3-107590) being off by > 3σ.These galaxies are among the smallest (based on the light-weighted size) and youngest, as shown in the top-right panel of Figure 10.Interestingly, Runco et al. (2022) found a similar result for a post-starburst galaxy at redshift z = 1.89, with the stellar mass also being significantly larger than the dynamical mass.Cappellari (2023) also found a trend with age for z ∼ 0.7 galaxies in the LEGA-C survey, with the younger galaxies having lower dynamical-to-stellar mass ratios.
In order to assess how the masses compare when adopting different assumptions, and to understand why some galaxies have M * ,c > M dyn , we discuss the different assumptions below.First, when deriving the dynamical mass, we circularize the effective radius and use a Sérsic-dependent virial coefficient β(n).If we had not circularized the radius, M dyn /M * ,c would increase by 0.023 dex and the scatter would increase from 0.232 to 0.255 (assumption set 2, Fig. 11 and Table 5).Instead of  Treu et al. (2010) circularizing, we also explore the axis ratio correction by van der Wel et al. (2022), which is implemented using an additional virial coefficient K(q).This combination increases the median M dyn /M * ,c by 0.12 dex as well as the scatter (assumption set 3).Assuming a virial constant of 5 would also increase the scatter in M dyn /M * ,c , but the median M dyn /M * ,c would decrease by 0.15 dex (assumption set 4).
Second, in our dynamical mass measurement, we assume that the galaxies are pressure supported.However, if they are (partially) rotationally supported, our dynamical mass measurement would be off.For example, for HM3-107590 the low dynamical mass could be explained by a face-on view or by a strong misalignment of the slit and the major axis of the galaxy.For such cases, part of the velocity field would not be included in our dispersion measurement and we would underestimate the mass.We check for this possibility by examining M dyn /M * ,c as a function of the Sérsic index and axis ratio (b/a) in the right panels of Figure 10.HM3-107590 is nearly round and the velocity dispersion is indeed lower compared to galaxies of similar mass (Fig. 9).The Sérsic index appears at odds with the galaxy being a disk, though this measurement is quite uncertain as this galaxy is just barely resolved.
In this context, it is interesting to note that Belli et al. (2017a) find higher M dyn /M * ,c for galaxies with low Sérsic indices (n < 2.5) and low axis ratios, and interpret this finding as evidence for a significant contribution of rotational motion.We do not see any indications that galaxies with the highest M dyn /M * ,c preferentially have low n and low b/a.However, in contrast to Belli et al. (2017a), we circularize our effective radii when deriving the dynamical mass, which lowers M dyn /M * ,c for galaxies with low axis ratios, and thus we partially account for inclination effects.Improving upon our simplified approach requires a forward modeling method, preferentially combined with spatially resolved spectroscopy, allowing for dynamical models with different levels of rotational support and correcting for inclination and aperture effects (e.g., Price et al. 2016Price et al. , 2020;;van Houdt et al. 2021;de Graaff et al. 2023).
Third, dynamical masses depend on size measurements, which may have been biased due to stellar population gradients.Distant quiescent galaxies have redder centers, with the gradient being stronger in galaxies that are more massive, older, and at lower redshifts (e.g., Mosleh et al. 2017;Suess et al. 2019aSuess et al. ,b, 2020Suess et al. , 2021;;Miller et al. 2023).By applying the average size corrections by Suess et al. (2019aSuess et al. ( , 2021, ∼0., ∼0.2 dex and ∼0.1 dex for the older quiescent galaxies at z ∼ 1.4 and z ∼ 2.1, respectively, and no corrections for post-starburst galaxies), we find that median M dyn /M * ,c decreases by 0.16 dex (see Fig. 11, assumption set 5).The color gradient correction also reduces the scatter M dyn /M * ,c .The M dyn of the two youngest galaxies remain unaffected, as post-starburst galaxies tend to display uniform color gradients (e.g., Setton et al. 2020;Suess et al. 2020Suess et al. , 2021)).HM1-217249, the sole post-starburst galaxy with robust size measurements in both F814W and F160W, supports this trend.Thus, stellar population gradients do not explain the low M dyn /M * ,c of the few poststarburst galaxies.Instead, they further lower the median inferred dark matter fraction of our full distant quiescent galaxy sample.Finally, size underestimation could occur due to the presence of an AGN, although the full SEDs and spectra provide limited room for a power-law continuum contribution.
The stellar masses could also be biased.First, as we assume a simple delayed exponential star-formation history, we likely miss older and low M/L stellar populations in our stellar mass.This "outshining" prob-lem has been discussed in many works (e.g., Papovich et al. 2001;Wuyts et al. 2007;Leja et al. 2019;Giménez-Arteaga et al. 2023).However, for distant massive quiescent galaxies this effect is small, and thus the fast and Prospector masses (Johnson et al. 2021), assuming nonparameterized star formation histories, are very similar (Leja et al. 2019).Second, we assume solar metallicity while the galaxies, on average, have subsolar iron abundances (Beverage et al. 2023b).Assuming a half-solar metallicity (Z=0.0096) would increase the median stellar masses by 13% (see Fig. 11) and decreases the scatter in M dyn /M * ,c by 0.035.Third, we assume a Chabrier (2003) IMF, which, similar to a Kroupa IMF, is relatively bottom light.Assuming a Salpeter (1955) IMF would increase the stellar masses by 0.2 dex (Fig. 11), such that they exceed M dyn for the majority of the galaxies.Thus, the IMF assumption causes the largest (systematic) uncertainty in our stellar mass estimates (see also Wang et al. 2023).Combining both stellar mass effects and the color gradient correction (assumption set 8 in Fig. 11), would lead to stellar masses vastly exceeding the dynamical masses for nearly all galaxies.Hence, given our dynamical masses, we infer that a Chabrier IMF is more likely than a Salpeter IMF for distant quiescent galaxies.We will further explore the IMF in the next section.

Implications for Photometric Studies
While the number of distant star-forming galaxies with spectroscopic redshifts has increased tremendously in the past decade (e.g., Steidel et al. 2014;Wisnioski et al. 2015;Kriek et al. 2015), the number of distant quiescent galaxies with spectroscopic redshifts or other spectroscopic information is still very small.Detecting absorption lines requires significantly longer integration times than observing nebular emission lines.Thus, the majority of studies of quiescent galaxies over cosmic time, including the buildup of the stellar mass function, still rely on photometric data.
Our study presents a reassuring picture.The initial photometric redshifts of our primary quiescent galaxies agree well with their spectroscopic redshifts (see Sect. 4.1) and nearly all galaxies have quiescent stellar populations with their SFRs significantly below the star-forming main sequence (see Sect. 4.2).We do find, however, that photometric redshifts become less accurate beyond z = 2. Furthermore, we show that for one galaxy, the contribution from strong AGN emission lines mimics the shape of a quiescent galaxy.Schreiber et al. (2018) shows that the success rate of the U V J selection criteria further declines to about 80% when going to 3 < z < 4. Forrest et al. (2020) presents a less optimistic picture with a spectroscopic confirmation rate of about 50% for quiescent galaxy candidates beyond z = 3.Nonetheless, out to z ∼ 2, we do not expect that mass functions of quiescent galaxies will be strongly biased by incorrect photometric redshifts or quiescent galaxy classifications.5.2.Implications for the evolution of massive quiescent galaxies One popular explanation for the size evolution of quiescent galaxies over cosmic time is growth by minor mergers.This scenario is supported by the finding that the central mass densities of quiescent galaxies remain roughly constant, while their (blue) outskirts are building up over time (e.g., Bezanson et al. 2009;van Dokkum et al. 2010;Barro et al. 2017;Suess et al. 2021).Furthermore, distant quiescent galaxies have many small companions (Newman et al. 2012;Suess et al. 2023).Thus, in this scenario, distant quiescent galaxies are the cores of massive galaxies today.These same cores are also found to have a bottom-heavy IMF (e.g., Treu et al. 2010;Conroy & van Dokkum 2012), with the highest velocity dispersion galaxies having a larger excess of lowmass stars.Thus, for this inside-growth scenario, the IMF in the high-dispersion galaxies should already be bottom heavy at these early times.
In Figure 10 we indeed find that M dyn /M * ,c correlates with σ v,e , which could imply that the IMF may be more bottom heavy in higher-dispersion galaxies.This trend was already visible in the M dyn − M * diagrams in several distant quiescent galaxy studies (e.g., van de Sande et al. 2013;Belli et al. 2017a;Forrest et al. 2022), and discussed in detail in Mendel et al. (2020).Mendel et al. (2020) argue that this trend is due to a varying IMF and that the IMF-σ v relation was already in place at these early times.
To further assess this theory, we show M dyn /M * ,c assuming a σ v -dependent IMF for our distant quiescent galaxies in Figure 11 (assumption set 9).We use the relation by Treu et al. (2010, Equation 4), in which the IMF is more bottom heavy than the Salpeter IMF for galaxies with σ v > 250 km s −1 .This IMF results in a median M dyn /M * ,c of 1.033.The scatter in M dyn /M * ,c is strongly reduced, which is expected as we are (partially) removing the trend with σ v,e .Interestingly, the scatter is smallest when also assuming subsolar metallicity (Z=0.0096) and correcting for color gradients (Fig. 11, assumption set 10).However, this assumption set results in stellar masses that exceed the dynamical masses for all but one galaxy, with a median M dyn /M * ,c of 0.75 (Fig. 12).Thus, our dynamical masses may suggest that compact distant quiescent galaxies do not "passively" evolve into the cores of massive elliptical galaxies today and that the evolution is more complicated (e.g., Wellons et al. 2015).Major mergers (with galaxies with a different IMF) and/or late-time central star formation could have affected the average IMF in today's cores.
We come to a similar conclusion based on our elemental abundance measurements presented in our accompanying paper (Beverage et al. 2023b).The iron abundances in distant quiescent galaxies are much lower than found in the cores of nearby massive early-type galaxies (Gu et al. 2022).Neither minor mergers nor progenitor bias can explain this evolution, and thus late-time star formation and/or major mergers are needed to explain the increase in iron abundance.We do note, though, that minor mergers are still needed to explain the structural and size evolution of massive quiescent galaxies over cosmic time.
Interestingly, van Dokkum et al. ( 2023) came to the opposite conclusion, based on a perfect lensing system (see also Mercier et al. 2023).They find that a bottomheavy IMF must already be in place for a distant quiescent galaxy at z ∼ 1.9, because the stellar mass, assuming the Chabrier (2003) IMF, would lead to an unrealistically large dark matter fraction within the Einstein radius.Obtaining spectroscopic redshifts for both galaxies as well as a dynamical mass measurement for the quiescent galaxy lens would be needed to directly compare our results.
To further unravel this puzzle, we need progress on several fronts.First, we need to measure stellar population gradients and half-mass radii for our spectroscopic samples.This should preferentially be done from spectroscopic data, as age, metallicity, and dust gradients result in different M/L gradients (e.g., van de Sande et al. 2015).We would also have to redetermine the stellar masses, taking into account these stellar population gradients.Second, we need to resolve the kinematics of distant quiescent galaxies, such that we can model their stellar dynamics.Third, we need a direct spectroscopic measurement of the IMF in distant quiescent galaxies (using gravity-sensitive absorption features; e.g., van Dokkum & Conroy 2010), to obtain more accurate stellar masses and understand whether the bottom-heavy IMF was already in place at these early times.Finally, we need larger samples of galaxy spectra.JWST will enable advances in all these areas and has already collected spectra of a handful of distant quiescent galaxies (Nanayakkara et al. 2022;Carnall et al. 2023;Marchesini et al. 2023;D'Eugenio et al. 2023;Belli et al. 2023).

SUMMARY
In this paper, we present an overview of the Heavy Metal survey, an ultradeep rest-frame optical spectroscopic survey of 21 distant quiescent galaxy candidates at 1.4 ≲ z ≲ 2.2.The Heavy Metal survey was executed with MOSFIRE and LRIS on the Keck I telescope and overlaps with the UltraVISTA and COSMOS-DASH surveys.Our primary targets were selected across two redshift intervals, 1.30 < z < 1.50 and 1.92 < z < 2.28, allowing the observation of multiple Balmer and metal (Ca, Mg, Fe) absorption lines in atmospheric windows.The extensive sky coverage enabled galaxy pointings for which we observe 5-6 "bright" quiescent candidates in one pointing, with two pointings per redshift interval.The remaining slits were placed on fainter quiescent and star-forming galaxies at similar redshifts.The z ∼ 1.4 and z ∼ 2.1 targets were observed for a total of ∼ 18 and ∼ 32 hr, respectively.The Heavy Metal survey is unique for its wavelength coverage and presents the first statistical sample of z ≳ 1.4 quiescent galaxies with ultradeep spectra covering rest-frame ∼3700-5400 Å.
We measure spectroscopic redshifts for all primary targets, and nearly all show clear Balmer and metal absorption lines in their spectra.20 out of the 21 quiescent candidates indeed have quiescent stellar populations; the SFRs determined from Hα and spectrophotometric fitting are both significantly below the starforming main sequence.For 11 out of the 20 quiescent galaxies, we detect no Hα and derive upper limits on the SFR from Hα.For nine targets, we do detect faint Hα emission, but seven of them have emission-line ratios that indicate that star formation is not the primary ionization source; instead, they may be powered by hot evolved stars or low-luminosity AGNs.Hence, for these galaxies the Hα SFRs are more comparable to upper limits, as well.For the remaining two galaxies with detected Hα, the SFRs are very low, and for one of them [N ii]/Hα suggests that the star formation is likely associated with a nearby smaller galaxy.Finally, one of the quiescent candidates appeared to be an AGN, with strong (asymmetric) emission lines mimicking the SED shape of a quiescent galaxy.This galaxy will be discussed in detail in Y. Ma et al. (2024, in preparation).
The primary goal of the Heavy Metal survey is to measure chemical compositions and ages from the stellar absorption-line spectra.These measurements are discussed in our accompanying paper (Beverage et al. 2023b).The stellar population fitting, presented in that paper, also yields accurate stellar velocity dispersion measurements for 19 out of the 21 primary galaxies.These measurements, combined with the structural parameters derived from HST F814W and F160W imag-ing, enable us to derive dynamical masses for the majority of the primary Heavy Metal galaxies.
In this paper, we compare our dynamical masses with the stellar masses from spectrophotometric modeling, considering various assumptions for both masses.Interestingly, for a fixed IMF, M dyn /M * shows a positive correlation with σ v .This correlation may suggest that a varying IMF, which is more bottom heavy for highσ v galaxies, was already in place at these early times (see also Mendel et al. 2020).When implementing the σ v -dependent IMF found in the cores of nearby massive early-type galaxies, and also correcting for biases in our stellar mass and size measurements, we find a low scatter in M dyn /M * of only 0.14 dex and a median M dyn /M * of 0.75.Thus, for these assumptions, the stellar mass measurements exceed the dynamical masses for nearly all quiescent galaxies.This result may imply that distant quiescent galaxies do not simply grow inside-out into massive early-type galaxies in today's Universe and late-time evolution (major mergers and/or late-time star formation) may be needed.In Beverage et al. (2023b) we come to a similar conclusion based on the difference in iron abundance between our distant quiescent galaxies and the cores of nearby massive early-type galaxies.
In order to fully characterize the distant quiescent galaxy population and solve this possible tension with the studies of the cores in nearby massive galaxies, we need to make progress on several fronts.First, we need a statistical sample of distant quiescent galaxies with resolved stellar kinematics, ages, elemental abundances, and robust stellar mass profiles.Moreover, we need to directly measure the IMF in distant quiescent galaxies using gravity-sensitive absorption lines.JWST will be able to make progress on all these fronts and thus will be transformative for our understanding of the formation histories of distant quiescent galaxies and their evolutionary link to the massive early-type galaxies in the present-day Universe.
We acknowledge support from NSF AAG grants AST-1908748 and 1909942.C.C. acknowledges support from NSF grant AST-131547.The authors wish to recognize and acknowledge the very significant cultural role and reverence that the summit of Maunakea has always had within the indigenous Hawaiian community.We are most fortunate to have the opportunity to conduct observations from this mountain.

Figure 1 .
Figure 1.LRIS and MOSFIRE visibility of various restframe optical absorption features as a function of redshift.Each row represents a different spectral feature, as indicated on the left.The color of the feature reflects the primary origin of the chemical element, as indicated in the bottom-right box.The gray bars indicate whether a feature is visible in a certain filter, with the different shades of gray corresponding to the different filters (as indicated in the top right) and the gradations for each filter indicating the throughput.The Heavy Metal low (1.30 < z < 1.50) and high (1.92 < z < 2.28) redshift intervals are indicated by the dashed and dotted vertical lines, respectively.

F160WFigure 2 .
Figure 2. Footprints of the Heavy Metal observations in the larger COSMOS field.The middle panel shows the weight map of all publicly available HST/F160W imaging, constructed by the COSMOS-DASH collaboration(Momcheva et al. 2017).The dark-blue contiguous area represents the CANDELS survey(Koekemoer et al. 2011;Grogin et al. 2011).The three larger and lighter stripes represent the shallower COSMOS-DASH survey, which overlaps with the deep UltraVISTA stripes.The smaller dark-gray rectangles represents the MOSFIRE field of view for all four Heavy Metal pointings.For Heavy Metal 1 and 2, both targeting lower redshifts (z ∼ 1.4) we also show the LRIS field of view by the larger, light-gray rectangles.For each pointing we show the zoom-in panels to the left or right of the primary panel.In the zoom panels, we show the COSMOS-DASH/F160W images and indicate the primary (red circles) and filler (blue circles) targets.

Figure 3 .
Figure 3. Overview of UltraVISTA photometric SEDs (left), spectra (middle), and HST-F160W images (right) of distant quiescent galaxies at z ∼ 1.4.The LRIS (4-6 hr) and MOSFIRE J band spectra (12 hr) are shown in the middle-left and middle-right column, respectively.The spectra are binned by 15 and 10 pixels, respectively, such that each bin corresponds to ∼5 Å in rest frame.Flux densities (f λ ) are in 10 −18 erg s −1 cm −2 Å−1 .The extent of the LRIS and MOSFIRE-J panels are indicated by the light-and middle-gray rectangles in the left panels (arrows indicate that full range exceeds panel).The best-fit FSPS models to the combined photometry and spectra are shown in gray (left panel) and red (middle panel).Prominent spectral lines are indicated by the dotted vertical lines.The orientation of the MOSFIRE (blue) and LRIS (orange) slits are indicated in the right panel.

Figure 4 .
Figure 4. Overview of UltraVISTA photometric SEDs (left), spectra (middle), and HST-F160W images (right) of distant quiescent galaxies at z ∼ 2.1.The MOSFIRE J-band (12-14 hr) and H-band spectra (16-17 hr) are shown in the middle-left and middle-right column, respectively.The spectra are binned by 13 and 10 pixels, respectively, such that each bin corresponds to ∼5 Å in rest-frame.Flux densities (f λ ) are in 10 −18 erg s −1 cm −2 Å−1 and the extent of the MOSFIRE J-band and H-band panels are indicated by the light-and middle-gray rectangles in the left panels.The best-fit fsps models to the combined photometry and spectra are shown in gray (left panels) and red (middle panels).Prominent spectral lines are indicated by the dotted vertical lines.
Figure5.The 2D and 1D spectra in wavelength regions around the Hα spectral feature for all primary Heavy Metal galaxies.For each 1D spectrum the continuum has been removed.The 1D spectra are shown in black, binned to 3 (unmasked) pixels, and in gray we show the corresponding error spectrum.The yellow fit presents the best-fit emission-line model to Hα and the two [N ii] lines, with the 68% uncertainty shown by the shaded yellow region.The vertical red dotted lines indicate the location of the three emission lines.Galaxy 55878 is the only one with strong Hα in emission and will be discussed in Y. Ma et al (in preparation).We have 8 additional galaxies with marginal (> 3 σ) Hα detections (see Table2) and 11 galaxies for which we derive upper limits on the Hα flux.One galaxy (59375) has a significantly deeper spectrum; due to a incorrect photometric redshift, the line was observed in a different filter (H) than expected (K).

a
Adopted fromCutler et al. (2022).b Stellar masses corrected for the total magnitude difference between the photometric catalog and Galfit.The typical uncertainties are 0.1 dex, excluding variations in the IMF.c Adopted fromBeverage et al. (2023b).
Figure6.Photometric vs. spectroscopic redshift for all spectroscopically confirmed primary (circles) and filler (squares) targets in the Heavy Metal survey.We measured a spectroscopic redshift of all primary targets.For the filler galaxies the success rate was much lower with 65%.The normalized medium absolute deviations (σ nmad ,Brammer et al. 2008) between the photometric and spectroscopic redshifts are 0.017 and 0.014 for the filler and primary galaxies, respectively.The shaded areas present the targeted redshift intervals used in the selection.Two of the primary targets scattered out the targeted redshift interval.
Figure7.Apparent magnitude vs. redshift (top panels) and rest-frame U − V vs. V − J colors (bottom panels) for all galaxies observed in the LRIS and MOSFIRE masks.The Heavy Metal 1 and 2 masks (left panels) primarily targeted bright (J < 21.6) quiescent galaxies at 1.3 < z < 1.5.The Heavy Metal 3 and 4 masks (right panels) primarily targeted bright (H < 21.8) quiescent galaxies at 1.92 < z < 2.28.Quiescent galaxies (red symbols) were selected by their red U − V and blue V − J colors, as in indicated by the selection box.However, several primary galaxies scattered out of the boxes when including the spectral information.The fillers are star-forming (blue symbols) and fainter quiescent galaxies at similar or higher/lower redshifts.The filler galaxies for which a spectroscopic redshift was measured are indicated by the larger symbols.We also show the parent UltraVISTA galaxy distribution from which the samples were drawn.The U V J panels only include the UltraVISTA galaxies in the targeted redshift intervals.
Figure8.SFR derived from the Hα emission lines vs. the best-fit SED SFR and stellar mass for all primary galaxies with coverage of Hα.Orange and red symbols represent galaxies at z ∼ 1.4 (Heavy Metal 1 and 2) and z ∼ 2.1 (Heavy Metal 3 and 4), respectively.For galaxies without detected Hα we show a 3σ upper limit.For all but two galaxies with detected Hα, the [N ii]/Hα > 0.45 (white plusses), implying that the Hα flux is not dominated by star formation.Thus, for these galaxies, the Hα SFR is overestimated and more comparable to an upper limit.Consequently, the Hα SFRs (and limits) are larger than the SED SFRs, as illustrated in the left panel.In the right panel, we compare the Hα SFRs with the star-forming main sequence fromLeja et al. (2022) at z ∼ 1.4 (red shaded area) and z ∼ 2.1 (orange shaded area).Except for HM4-55878, which has bright emission lines originating from a luminous AGN, all other galaxies are significantly below the star-forming main sequence.When using SED SFRs, they would shift to even lower values.
Figure10.Left: dynamical vs. stellar mass for all primary Heavy Metal galaxies for which sizes and stellar velocity dispersion could be measured, assuming aChabrier (2003) IMF.The white crosses indicate the galaxies with bad Galfit fits.We also show the galaxies byBelli et al. (2017a)  with comparable redshifts (z > 1.35).Right: the difference between the dynamical and stellar mass vs. the velocity dispersion (σv,e), age of the galaxy, axis ratio b/a, and Sérsic index n.The ages, adopted fromBeverage et al. (2023b), are derived using alf and present the luminosity-weighted age.In all panels, the galaxies are color coded by their effective radii at rest-frame 5000 Å (major axis).For the majority of the galaxies, the dynamical mass exceeds the stellar mass, with a median dark matter fraction of 28%.Three galaxies have stellar masses that exceed their dynamical masses.Interestingly, three galaxies with low M dyn /M stellar are the smallest and youngest galaxies in the sample.

Figure 11 .
Figure11.Left: the distribution in M dyn /M * ,c for different assumptions when calculating the dynamical and stellar mass (see Table5).Symbols are similar as in Figure10.The galaxies are color coded according to their velocity dispersion following the color bar in Figure12.The vertical light-gray stripes indicate the full range and the median is indicated by the dark-gray horizontal bar Right: the distribution in M dyn /M * ,c for the individual galaxies for all ten assumption sets.The galaxies are ordered by increasing velocity dispersion from left to right.Each vertical stripe corresponds to a separate galaxy and each color to a different assumption set.

Figure 12 .
Figure12.Dynamical vs. stellar mass when adopting the σv-dependent IMF byTreu et al. (2010) for the cores of nearby early-type galaxies, subsolar metallicity, and halfmass radii.For this combination of assumptions, the scatter in M dyn /M * ,c is smallest.However, the stellar mass exceeds the dynamical mass by 34%.This tension may imply that distant quiescent galaxies do not simply grow inside-out into present-day massive early-type galaxies.

Table 1 .
Overview Observations and Data

Table 2 .
Overview of primary quiescent galaxy sample a Typical uncertainty on stellar mass is 0.1 dex.b Typical uncertainty on SFR is 0.2 dex.c Typical uncertainty on A V is 0.1 mag.

Table 3 .
Emission line properties a

Table 4 .
Structural and kinematic properties al.

Table 5 .
Dynamical-to-stellar mass ratios for varying assumptions