Detecting and Characterizing Young Quasars. I. Systemic Redshifts and Proximity Zone Measurements

In a multiwavelength survey of 13 quasars at 5.8 ≲ z ≲ 6.5, which were preselected to be potentially young, we find five objects with extremely small proximity zone sizes that may imply UV-luminous quasar lifetimes of ≲100,000 yr. Proximity zones are regions of enhanced transmitted flux in the vicinity of quasars that are sensitive to the quasars’ lifetimes because the intergalactic gas has a finite response time to their radiation. We combine submillimeter observations from the Atacama Large Millimetre Array and the NOrthern Extended Millimeter Array, as well as deep optical and near-infrared spectra from the medium-resolution spectrograph on the Very Large Telescope and on the Keck telescopes, in order to identify and characterize these new young quasars, which provide valuable clues about the accretion behavior of supermassive black holes in the early universe and pose challenges on current black hole formation models to explain the rapid formation of billion-solar-mass black holes. We measure the quasars’ systemic redshifts, black hole masses, Eddington ratios, emission-line luminosities, and star formation rates of their host galaxies. Combined with previous results, we estimate the fraction of young objects within the high-redshift quasar population at large to be 5% ≲ f young ≲ 10%. One of the young objects, PSO J158–14, shows a very bright dust continuum flux (F cont = 3.46 ± 0.02 mJy), indicating a highly starbursting host galaxy with a star formation rate of approximately 1420 M ⊙ yr−1.


Introduction
High-redshift quasars host central supermassive black holes (SMBHs) with masses exceeding M BH ∼10 9 -10 10 M e as early as 1 Gyr after the big bang (e.g., Mortlock et al. 2011;Venemans et al. 2013;Wu et al. 2015;Mazzucchelli et al. 2017;Bañados et al. 2018;Onoue et al. 2019). How these SMBHs form and grow in such short amounts of cosmic time remains an unanswered question. Assuming Eddington-limited accretion rates and a constant supply of fueling material, SMBHs grow exponentially during the quasar's lifetime t Q , i.e., ⎛ The initial mass M seed denotes the mass of the black hole before the onset of quasar activity. The lifetime or the age of a quasar t Q is defined such that the onset of quasar activity happened at a time −t Q in the past. The e-folding time, or "Salpeter" time, t S (Salpeter 1964), describes the characteristic timescale on which the black hole growth is believed to occur, i.e., where ò denotes the radiative efficiency of the accretion, which is assumed to be about 10% in thin accretion disk models (Shakura & Sunyaev 1973), and L bol describes the bolometric luminosity of the quasar with a theoretical upper limit of the Eddington luminosity L edd . It requires at least 16 e-foldings, i.e., 7×10 8 yr, in order to grow a billion-solar-mass black hole from an initial stellar remnant black hole seed with M seed ∼100 M e , even if they accrete continuously at the Eddington limit (e.g., Volonteri 2010Volonteri , 2012. However, it is currently unknown whether quasars obey this exponential light curve, or if other physics related to the triggering of quasar activity and the supply of fuel complicate this simple picture, giving rise to much more complex light curves (e.g., Di Matteo et al. 2005;Hopkins et al. 2005;Springel et al. 2005;Novak et al. 2011;Davies et al. 2020). These timescales required for the growth of SMBHs are comparable to the age of the universe at z6. Nevertheless, at these high redshifts, more than 200 quasars have been discovered in the last decade (e.g., Venemans et al. 2015;Bañados et al. 2016;Mazzucchelli et al. 2017;Reed et al. 2019;Wang et al. 2019;Yang et al. 2019b), many of which host billion-solar-mass black holes. Thus, massive initial seeds in excess of stellar remnants, i.e., M seed 1000 M e (e.g., Lodato & Natarajan 2006;Visbal et al. 2014;Habouzit et al. 2016;Schauer et al. 2017), or radiatively inefficient accretion rates with ò0.01-0.001 have been invoked (e.g., Volonteri et al. 2015;Trakhtenbrot et al. 2017;Davies et al. 2019), which would reduce the quasar lifetime required to grow the SMBHs.
Measurements of quasar lifetimes have proven to be challenging. At low redshifts, i.e., z∼2-4, the quasar lifetime can be constrained by comparing the number density of quasars to their host dark matter halo abundance inferred from clustering studies (Haiman & Hui 2001;Martini & Weinberg 2001;Martini 2004;White et al. 2008). However, to date, this method has yielded only weak constraints on t Q ∼10 6 -10 9 yr owing to uncertainties in how quasars populate dark matter halos (Shen et al. 2009;White et al. 2012;Conroy & White 2013;Cen & Safarzadeh 2015). Following the "Soltan" argument (Soltan 1982), which states that the luminosity function of quasars as a function of redshift reflects the gas accretion history of local remnant black holes, Yu & Tremaine (2002) estimated the mean lifetime of luminous quasars from local early-type galaxies to be t Q ∼10 7 -10 8 yr. Further constraints on quasar activity on timescales between ∼10 5 and 10 7 yr have been set by measuring an ionization "echo" of the quasar, which denotes the time lag between changes in the quasar's ionization rate and the corresponding changes in the opacity of the surrounding intergalactic medium (IGM; Adelberger 2004;Hennawi et al. 2006;Schmidt et al. 2017;Bosman et al. 2020). A recent compilation of studies on the timescales governing the growth of SMBHs can be found in Inayoshi et al. (2019).
We recently showed how the extent of the proximity zones around high-redshift quasars provides a new and independent constraint on the lifetime of quasars (Eilers et al. 2017a(Eilers et al. , 2018bDavies et al. 2019;Khrykin et al. 2019). These regions of enhanced transmitted flux within the Lyα forest in the immediate vicinity of the quasars have been ionized by the quasar's intense radiation itself (e.g., Bajtlik et al. 1988;Haiman & Cen 2001;Wyithe et al. 2005;Bolton & Haehnelt 2007a;Lidz et al. 2007;Bolton et al. 2011;Keating et al. 2015) and are sensitive to the lifetime of the quasars because intergalactic gas has a finite response time to the quasars' radiation (e.g., Khrykin et al. 2016;Eilers et al. 2017a;Davies et al. 2020). The equilibration timescale t eq describes the time when the IGM has reached ionization equilibrium with the ionizing photons emitted by the quasar, i.e., » Gt eq H 1 I , where Γ H I denotes the total photoionization rate from the quasar as well as the ultraviolet background. However, the quasar's radiation dominates the radiation field within the proximity zone, which has been observationally defined as the location at which the smoothed, continuum-normalized transmitted flux drops below the 10% level (Fan et al. 2006). A photoionization rate of Γ H I ≈10 −12 s −1 from the quasar's radiation at the "edge" of the proximity zone at z≈6 leads to an equilibration timescale of t eq ≈3×10 4 yr (Davies et al. 2020).
Applying this new method to a data set of 31 quasar spectra at 5.8z6.5 (Eilers et al. 2018a), we discovered an unexpected population of quasars with significantly smaller proximity zones than expected, which are likely to be very young, i.e., t Q 10 4 -10 5 yr (Eilers et al. 2017a(Eilers et al. , 2018b. These three young quasars provide valuable clues to the accretion behavior of SMBHs and pose significant challenges on current black hole formation models that require much longer lifetimes to explain the growth of SMBHs.
This study aims to determine the fraction of such young objects within the quasar population at large and to establish a statistically uniform and significant sample of young quasars, which will then enable us to search for any spectral or environmental signatures that might distinguish these young objects from the whole quasar population. To this end, we conduct preliminary measurements of the proximity zones R p of 122 quasars at 5.6z6.5. However, these preliminary measurements have large uncertainties due to their imprecise redshift estimate, which constitute the largest source of uncertainty for proximity zone measurements.
From this sample we select the best young quasar "candidates" whose preliminary measurements of their proximity zones are very small and thus they potentially indicate very short quasar lifetimes. For these young candidates we conduct a multiwavelength survey which we present in this paper, in order to obtain measurements of the quasars' systemic redshifts and precisely measure the extents of their proximity zones. In a subsequent paper (A.-C. Eilers et al. 2020, in preparation, hereafter PaperII) we will use these measurements to derive constraints on the quasars' lifetimes and study the dependence of spectral properties with quasar age.
Throughout this paper, we assume a flat ΛCDM cosmology of h=0.685, Ω m =0.3, and Ω Λ =0.7, which is consistent within the 1σerror bars with Planck Collaboration et al. (2018).

Quasar Sample
We target quasars that are likely to have short lifetimes, as indicated by a very small proximity zone. To this end we analyzed the spectra of 122 quasars at 5.6z6.5, which were taken with a variety of different telescopes and instruments, and thus cover different wavelength ranges, have different spectral resolutions, and different signal-to-noise ratios (S/Ns; see Willott et al. 2009;Bañados et al. 2016;Reed et al. 2017;Wang et al. 2017;Farina et al. 2019, for details). Based on these spectra we conducted preliminary estimates of the quasars' proximity zone sizes (see Sections 4.3 and 4.4 for details on the procedure for measuring proximity zones). However, these measurements have large uncertainties up to Δv∼1000 km s −1 , i.e., ΔR p ∼1.5 proper Mpc (pMpc), due to the highly uncertain redshift estimates, which are based on template fitting of broad rest-frame UV emission lines that can be displaced from the systemic redshift due to strong internal motions or winds.
In order to compare the proximity zone sizes of quasars with different luminosities, we normalized these preliminary proximity zone estimates to the same absolute magnitude of M 1450 =−27 (Fan et al. 2006;Bolton & Haehnelt 2007b;Eilers et al. 2017a). This eliminates the dependence of the zone sizes on the quasars' luminosities, because brighter quasars emit more ionizing radiation and are thus expected to have a larger proximity zone for a given quasar age. The exact procedure is described in detail in Section 4.4. These "corrected" proximity zones R p,corr of our sample span a range of 0.6pMpcR p,corr 12.1 pMpc, with a mean of á ñ » R 5.4 p,corr pMpc. We excluded any objects that showed clear evidence for a premature truncation of the proximity zones due to associated absorption systems, such as proximate damped Lyα absorption systems (pDLAs), which we identify by searching for ionic metal absorption lines in the quasar continuum that are associated with the absorber. Such quasars with associated absorption systems spuriously resemble quasars with small proximity zones. Additionally, we exclude quasars with clear broad absorption line (BALs) features that might contaminate the proximity zones.
We then choose the 10% of quasars from this sample with the smallest proximity zone measurements for follow-up multiwavelength observations. These 12 objects all exhibit proximity zones with R p,corr <2 pMpc. One additional object, CFHQS J2229+1457, has been added to this sample for follow-up observations. This quasar was previously identified to have a small proximity zone, but the available spectrum obtained with the Low Resolution Imaging Spectrometer on Keck did not cover the near-infrared wavelengths, nor did it have sufficient resolution to securely exclude any premature truncation of its proximity zone (Eilers et al. 2017a).

Multiwavelength Data Set
We intend to measure the systemic redshifts by means of submillimeter observations of emission lines arising from the cold gas reservoir of the quasar host galaxies with the Atacama Large Millimetre Array (ALMA), Section 3.1, and the NOrthern Extended Millimeter Array (NOEMA) at the Institute de Radioastronomie Millimétrique (IRAM), Section 3.2. These emission lines provide a tenfold improvement on the quasars' systemic redshifts, because they do not suffer from possible displacements due to strong internal motions or winds in the quasars' broad-line regions (BLR). Furthermore, we obtained deep optical (VIS) and near-infrared (NIR) spectra with the medium-resolution spectrographs X-Shooter on the Very Large Telescope (VLT), Section 3.3, as well as optical spectra for quasars located in the Northern hemisphere with the Deep Imaging Multi-Object Spectrograph (DEIMOS) on the Keck telescopes, Section 3.4, to search for associated absorption systems that might have contaminated and prematurely truncated the quasars' proximity zones. Table 1 shows a summary of the observations presented in this paper.

ALMA Observations
The ALMA data consist of short (∼15-20 minutes on source) observations centered on the optical/NIR coordinates of the quasars. The tuning frequency of the spectral windows was chosen such that two neighboring windows encompass the expected observed frequency of the [C II] emission line at 158 μm (ν rest =1900.548 GHz) based on preliminary redshift estimates. The other two spectral windows cover the dust continuum emission. Observations were carried out in May 2018 with the array in a compact configuration C43−2, resulting in images with ∼1″ spatial resolution. Thus, the size of the [C II]-emitting region is comparable to the expected sizes of the quasar host galaxies and hence the sources are likely unresolved (Walter et al. 2009).
The data were processed with the default calibration procedure making use of the CASA pipeline (McMullin et al. 2007), version 5.1.2. The data cubes were then imaged with the CASA command tclean using Briggs cleaning and a robust parameter of 2 (natural weighting), in order to maximize the S/N of our observations. The mean rms noise is 0.35 mJy beam −1 per 30 MHz bin.
The map of the continuum emission is calculated by averaging the two line-free spectral windows. We obtain the emission-line map by subtracting the continuum emission by applying the CASA command uvcontsub and afterwards collapsing the data cube within a narrow frequency range (Δν=0.4 GHz for PSO J158-14 due to its broad [C II] line and Δν=0.25 GHz for all other objects) around the peak frequency of the emission line. All images of the collapsed and Note.The columns show the name of the quasar, its coordinates R.A. and decl. in the J2000 epoch, as well as the instrument with which the data were taken, the exposure time on the source, and the program ID of the observation.
continuum-subtracted line maps are shown in the top six panels in Figure 1. The dust continuum maps are shown in Figure A1.

NOEMA Observations
Three quasars from our sample located in the northern hemisphere have been observed with the 10 antennas of NOEMA in compact configuration (10C) in 2018 December and 2019 January. Taking advantage of the PolyFix correlator, we could simultaneously collect data in the upper and lower sidebands with a total bandwidth of 15.4 GHz. The data were calibrated and reduced making use of the GILDAS routine clic (Gildas Team 2013). The visibilities were imaged using the software mapping as part of the GILDAS suite. We adopt natural weighting, which results in a synthesized beam size of 0 6×0 9 at the higher tuning frequency ν obs =270.49 GHz for PSO J265 +41 and 1 6×2 2 at a lower tuning frequency of ν obs =101.14 GHz for SDSS J1143+3808, for which we observed the CO(6−5) line at 3 mm (ν rest =691.473 GHz) and CO(5−4) (ν rest =576.267 GHz) emission lines instead of [C II]. The imaged cubes are rebinned on a spectral axis with 50 km s −1 wide channels. The rms noise for SDSS J1143 +3808 is 0.21 mJy beam −1 per 50 km s −1 bin, whereas the other two observations for PSO J261+19 and PSO J265+41 have a larger rms noise of 1.18 mJy beam −1 and 1.01 mJy beam −1 per 50 km s −1 bin, respectively. The continuum flux is estimated using the line-free channels and subtracted from the cubes. The bottom three panels of Figure 1 show the continuum-subtracted line maps obtained with NOEMA.

VLT/X-Shooter Observations
Most quasar spectra observed with X-Shooter on the VLT were obtained on 2018 August 18th and 19th, in visitor mode (program ID: 101.B-02720). The data of four quasar spectra, i.e., PSO J056-16, PSO J158-14, CFHQS J2100-1715, and PSO J359-06, were acquired between January 2016 and May 2017 and are taken from the ESO archive 10 (Chehade et al. 2018; E. P. Farina et al. 2020, in preparation). We obtained multiple exposures of 1200 s each, with the 0 6 × 11″ slit in the NIR and the 0 9 × 11″ slit in the VIS. The VIS observations are binned 2 × 2 in spectral and spatial directions. Using a 0 9 slit width, we obtain a spectral resolution of R ≈ 8900 in the visible wavelength regime and R ≈ 8100 in the NIR arm for a 0 6 slit. 11 The wavelength range covers 5500 Å  λ obs  22000 Å. We dithered the different exposures along the slit to allow for image differencing in the data reduction step (see Section 3.4). One object, PSO J158-14, has been observed with the wider 0 9 slit in the NIR (R ≈ 5600) and the K-band blocking filter, which results in a reduced wavelength coverage of Å Å l   5500 20250 obs .

Keck/DEIMOS Observations
The two quasar spectra taken with the DEIMOS instrument at the Nasmyth focus on the Keck II telescope were observed in May and September 2017. For each object we acquired three exposures of 1200 s each. In the case of SDSS J1143+3808, we used a custom-made slitmask with a 1″ slit and the 830G grating, resulting in a pixel scale of Δλ≈0.47 Å and a spectral resolution of R≈2500. For PSO J265+41 we used the same grating, but the LongMirr slitmask with a narrower (0 7) slit, resulting in a slightly higher resolution. The grating was tilted to a central wavelength of 8400 Å, resulting in a wavelength coverage of 6530 Åλ obs 10350 Å.
All optical and NIR spectroscopic data were reduced by applying standard data reduction techniques with the newly developed open source Python spectroscopic data reduction package PypeIt (Prochaska et al. 2020a(Prochaska et al. , 2020b. The reduction procedure includes sky subtraction, which was performed on the 2D images by including both image differencing between dithered exposures (whenever these were available) and a B-spline fitting procedure. In order to then extract the 1D spectra, the optimal spectrum extraction technique is applied (Horne 1986). The individual 1D spectra are flux calibrated using the standard stars LTT 3218 (for spectra observed with VLT/X-Shooter) and G191B2B or Feige 110 (for spectra taken with Keck/DEIMOS). Finally, the fluxed 1D spectra are stacked and a telluric model is fitted to the stacked spectra using telluric model grids produced from the Line-By-Line Radiative Transfer Model (LBLRTM; 12 Clough et al. 2005;Gullikson et al. 2014), resulting in the final spectra. Figures 2 and 3 show the final reduced optical and NIR spectra for all quasars in our sample. We apply a running 20 pixel filter when showing the spectra and noise vectors with the average flux computed using inverse-variance weights.

Analysis
In this section we analyze our multiwavelength data set to measure the quasars' systemic redshifts (Sections 4.1 and 4.2) and estimate their optical continuum emission (Section 4.3), in order to measure the extents of their proximity zones (Section 4.4). We determine further properties of the quasars, such as the star formation rates (SFRs) of the host galaxies (Section 4.5), as well as the black hole masses and the Eddington ratios of the accretion.

Systemic Redshifts of Quasar Host Galaxies
The most precise estimates of the systemic redshifts of quasars are based on the narrow atomic or molecular emission lines arising from the gas reservoir within the quasars' host galaxy. We estimate the systemic redshifts primarily by means of the [C II] emission line, which is the dominant coolant of the ISM. For one quasar in our sample, SDSS J1143+3803, this line was outside of the observable frequency range of NOEMA, and thus we observed the CO(6−5) and CO(5−4) emission lines, which usually are some of the brightest molecular CO transitions in quasars (e.g., Carilli & Walter 2013;Yang et al. 2019a). Because these lines arise from the host galaxy itself, they provide a much more precise redshift estimate compared to rest-frame UV emission lines that arise from the BLR around quasars and may suffer from strong internal motions or winds, potentially displacing the emission-line centers from the systemic redshift (e.g., Richards et al. 2002;Venemans et al. 2016;Mazzucchelli et al. 2017).
We extract the 1D spectra from the continuum-subtracted data cubes at the position of the brightest emission from the source. We then fit the [C II] or CO emission lines assuming a Gaussian line shape. We apply the Markov Chain Monte Carlo (MCMC) affine-invariant ensemble sampler emcee (Foreman-Mackey et al. 2013) with flat priors for the amplitude Aä[0, 20] mJy, the width σä[0, 1] GHz of the emission line, as well as the peak frequency ν obs . We adopt the median of the resulting posterior probability distribution as the best parameter estimate. We take the peak of the Gaussian fit as the best estimate for the systemic redshift of the quasar with an uncertainty arising from the 68th percentile of the posterior probability distribution. Figure 2. Spectra of the quasars in our sample. All spectra (black) are observed with VLT/X-Shooter, besides the spectra of the two quasars PSO J265+41 and SDSS J1143+3808, which have been observed with Keck/DEIMOS. Regions of large telluric absorption have been masked with gray shaded regions. The dashed red lines indicate the location of various emission lines, i.e., Lyα at 1215.7 Å in the rest frame, the C IV doublet at 1548 and 1550 Å, as well as the Mg II emission line at 2798.7 Å. All spectra and noise vectors (gray) have been inverse-variance smoothed with a 20 pixel filter. The red frames indicate quasars that exhibit very small proximity zones, i.e., R p,corr 2 pMpc.
The extracted spectra and the corresponding best fit to the submillimeter emission lines are shown in the lower panels of each object in Figure 1. Table 2 shows the estimated parameters derived from the submillimeter emission lines of the quasars. Note that all derived flux values only include statistical uncertainties, but ignore the ∼10% systematic uncertainty that comes from calibrating interferometric data. 13 Note that in one case (PSO J261+19) no clear detection of an emission line associated with the quasar host galaxy could be found, and we extracted the spectrum at the nominal position of the target based on optical/NIR data. Because we also do not detect any continuum emission from this source (see Figure A1 in Appendix A), this nondetection is likely explained by a very faint emission that is below the detection limit of our 2.5 hr exposure with NOEMA, i.e., F cont <0.05 mJy. If the nondetection of the [C II] emission line could be explained by a wrong redshift estimate that would have shifted the emission line outside of the observable frequency range, the offset between its systemic redshift and the redshift estimate based on its Mg II emission line (see Section 4.2) would have to be Δv>4730 km s −1 . Such large velocity shifts have not been reported in the literature, and thus this possibility seems unlikely.

Mg II Redshifts, Black Hole Masses, and Eddington Ratios
For the subset of quasars for which we did not obtain a submillimeter redshift measurement (VDES J0323-4701, VDESJ0330-4025, and PSO J261+19), we estimate their redshift by means of the Mg II emission line at λ2798.7 Å observable in the NIR spectra. The Mg II emission arises within the BLR of the quasars and may suffer from velocity shifts with respect to the systemic redshift. However, for the majority of quasars, we only expect modest velocity shifts (Richards et al. 2002;Venemans et al. 2016;Mazzucchelli et al. 2017), and thus calculate a redshift estimate z Mg II from the peak of the line. To this end, we model the quasar emission within the wavelength region around the Mg II emission line, i.e., 2100 Åλ rest 3089 Å, as a superposition of a power-law continuum with slope α arising from the quasar's accretion disk, a scaled template spectrum of the iron lines Fe II and Fe III, f λ, iron , within the BLR, as well as a single Gaussian to model the Mg II emission line, i.e., where a 0 , a 1 , and a 2 denote the amplitudes of the individual components. We apply the iron template spectrum from Vestergaard & Wilkes (2001), which has been derived from a narrow emission-line quasar, and convolve it with a Gaussian kernel with FWHM≈FWHM Mg II to mimic the quasars' broad emission lines. We estimate the free parameters of the fit by means of the MCMC sampler emcee, assuming again flat priors and adopting the median of the posterior probability distribution as our best estimate. From the peak of the Mg II emission line, we can then derive the redshift estimate z Mg II . All fits to the Mg II emission lines are shown in Figure B1 in Appendix B. 14 The bolometric luminosity is estimated based on the quasars' absolute magnitudes M 1450 and the bolometric correction by Runnoe et al. (2012 , Table 3), which has a scatter of approximately 0.3 dex (see their Figure 5). In order to estimate the mass of the central SMBHs, we derive the monochromatic luminosity L λ , 3000 Å from the bolometric luminosity via L bol =5.15×3000 Å L 3000 Å (Richards et al. 2006), and infer the FWHM of the Mg II line from the single-epoch NIR spectra. Assuming that the dynamics in the quasar's BLR are dominated by the gravitational pull of the black hole, the virial theorem can be applied, and thus we estimate the mass of the black hole by means of the scaling relation   Knowing the black hole masses, we can derive the Eddington luminosity L Edd of the quasars, as well as the Eddington ratio of their accretion, i.e., λ Edd =L bol /L Edd . All measurements of the NIR properties are shown in Table 3.

Quasar Continuum Estimates
Measurements of proximity zone sizes require a prediction for the underlying quasar continua. We estimate the quasar continua by means of a principal component analysis (PCA) that decomposes a set of training spectra into an orthogonal basis (Suzuki et al. 2005;Pâris et al. 2011;Davies et al. 2018). Following Davies et al. (2018), we construct a PCA decomposition based on 12,764 training spectra from the SDSS BOSS sample in the logarithmic flux space, such that each quasar spectrum f λ can be approximated via is the mean logarithmic flux and A i are the PCA components, weighted by the coefficients a i . The logarithmic space has been chosen, because variations of the power-law quasar continuum are more naturally described by a multiplicative component rather than additive components (e.g., Lee et al. 2012).
Because quasar spectra at high redshifts suffer from significant absorption bluewards of the Lyα emission line due to residual neutral hydrogen in the IGM, we follow the approach of previous works (Suzuki et al. 2005;Pâris et al. 2011), and estimate the PCA coefficients only on the red side of the spectra. To this end, we construct a set of 10 "red" PCA components R i between 1220 Åλ rest 2850 Å, as well as a set of 6 "blue" PCA components B j between 1175 Åλ rest <1220 Å. We then determine the best estimate for the coefficients for the set of red PCA components r i by fitting them to the red side of the quasar spectra, which we first normalize to unity at λ rest =(1290±2.5) Å. All spectral regions that show contamination by BAL features are masked when estimating the quasar continua. Note that whenever we have no NIR data available, the wavelength range is truncated to 1220 Åλ rest 1470 Å.
The best set of estimated red coefficients r i are then projected onto a set of blue coefficients b j for the blue PCA components by means of a projection matrix P ij determined from the training spectra, i.e., The quasar spectra as well as their best estimated continuum model for both the red and blue wavelength sides are shown in Figures C1 and C2 in Appendix C. The predicted continua match the data overall well. However, while estimates of the IGM neutral gas fraction for which this continuum fitting machinery was originally developed (Davies et al. 2018) critically depend on precise continuum estimates, the proximity zone measurements are more robust with respect to uncertainties in the continuum fit (Eilers et al. 2017a(Eilers et al. , 2017b). The continuum model is predicted to be biased by less than 1% with continuum uncertainties of less than 10% in the wavelength range of interest (see Davies et al. 2018, Figure 9), which only influences the proximity zone size measurement very mildly by áD ñ » R 0.02 p pMpc on average. More details on the continuum uncertainties and their influence on proximity zone measurements can be found in Appendix C.1.

Proximity Zone Sizes
In order to estimate the sizes of the quasars' proximity zones, we adopt the standard definition applied in previous studies (Fan et al. 2006;Willott et al. 2007Willott et al. , 2010Carilli et al. 2010;Venemans et al. 2015;Eilers et al. 2017a;Mazzucchelli et al. 2017). Namely, we normalize the quasar spectra by their estimated continuum emission and smooth the continuumnormalized flux with a 20 Å wide (in the observed wavelength frame) boxcar function, which corresponds to a smoothing scale of approximately 1pMpc or 700 km s −1 at z∼6. The location at which the smoothed continuum-normalized flux drops below the 10% level marks the extent of the proximity zone R p . All continuum-normalized quasar spectra and their proximity zones are shown in Figure 4.
For the estimate of the proximity zone sizes, we take the best available redshift estimate (see Sections 4.1 and 4.2). Note that the redshift estimates based on the Mg II emission line have a systematic blueshift and a systematic uncertainty compared to the systemic redshift estimate based on submillimeter emission  (6), which is the dominant source of uncertainty on the proximity zone measurements. We also account conservatively for a systematic uncertainty of Δv=100 km s −1 on the systemic redshifts based on submillimeter estimates, which corresponds to Δz≈0.0024 at z≈6. The size of the proximity zone also depends on the luminosity of quasars (e.g., Fan et al. 2006;Bolton & Haehnelt 2007b;Davies et al. 2020), as more luminous quasars emit more ionizing radiation at any given quasar age, and thus we normalize the proximity zone measurements to the same fiducial absolute luminosity of M 1450 =−27. Theoretically, the dependency between the proximity zones and the rate of ionizing photons emitted by the quasar  g N is  µ g R N p 1 2 (Bolton & Haehnelt 2007b;Davies et al. 2020). However, due to additional heating from the reionization of He II within the proximity zone, this relation might differ in practice. Using a radiative transfer simulation, we found that a relation between the observed proximity zone sizes and quasar luminosity of ( ) best eliminates the dependency on the quasars' luminosity and yields "luminosity-corrected" proximity zone measurements R p,corr . Note that this scaling relation depends only marginally on the ionization state of the ambient IGM surrounding the quasars (Eilers et al. 2017a). All proximity zone measurements, as well as the luminosity-corrected estimates, are presented in Table 4.

Search for Associated Absorption Systems
We carefully search for any associated absorption systems in the quasar spectra that might prematurely truncate the quasars' proximity zones. This would be the case if a self-shielding Lyman limit system (LLS) is located within 1000 km s −1 of the quasar, around the edge of its proximity zone (e.g., D'Odorico et al. 2018; Bañados et al. 2019). Thus, we search for strong low-ionization metal absorption lines redwards of the Lyα emission line, which we would expect to find if a selfshielding absorption system is present.
To this end, we place a hypothetical absorption system at the end of each quasar's proximity zone and stack the spectrum at the location, where low-ionization metal absorption lines, i.e., Si IIλ1260, Si IIλ1304, O Iλ1302, and C IIλ1334, would fall. We compare the stacked spectrum to a composite spectrum of 20 LLSs by Fumagalli et al. (2011) in Figures 5 and 6.
The spectrum of PSO J056-16 ( Figure 5) shows a proximate damped Lyα absorption (pDLA) system in front of the quasar along our line of sight at z abs ≈5.9369, i.e., with a velocity offset of Δv≈1297 km s −1 . This system clearly shows lowionization absorption lines; it has a high column density (a fit by eye indicates N H I 10 20 cm −2 ) and is thus optically thick, causing a premature truncation of the quasar's proximity zone. We searched the dust continuum map of this quasar shown in Figure A1 in Appendix A for the presence of a second continuum source that could be associated with the pDLA, but we did not detect any other sources in the vicinity of this quasar, presumably due to their low continuum luminosity.
We do not find evidence for proximate self-shielding absorption systems truncating the proximity zones in the remaining quasar spectra. Figure 6 shows a hypothetical absorption system in the spectrum of CFHQS J2100-1715. The stacked spectrum at the location where the low-ionization metal absorption lines would fall does not reveal any evidence for the presence of such an absorption system. The same figures for all remaining quasars with small proximity zones are shown in Appendix D.
We exclude the quasar PSO J056-16 from any further analysis of its proximity zone due to its pDLA. Furthermore, two quasars in our sample, i.e., PSO J239-07 and PSO J265 +41, will be excluded from any further analyses of their proximity zones, because they exhibit BAL features in their optical/NIR spectra (see Figure 3), which might contaminate or prematurely truncate their proximity zones.

Star Formation Rates
An estimate of the SFR within the quasars' host galaxies can be obtained by means of the total line fluxes F line of the submillimeter emission lines, as well as via the dust continuum flux F cont . To calculate these fluxes, we sum the flux of all pixels within a radius of 2″ around the source in the emissionline map and the continuum map, respectively. We then convert the integrated line fluxes F line into line luminosities by where D L represents the luminosity distance and ν obs is the mean observed frequency of the line (see, e.g., Carilli & Walter 2013 This relation has been derived based on z>0.5 galaxies and has an estimated scatter of 0.4 dex. An alternative method to estimate the SFR within the quasars' host galaxies is based on the dust continuum. To this end, we estimate the dust continuum emission as a modified blackbody (e.g., Dunne et al. 2000;Beelen et al. 2006), i.e., with a dust temperature of T dust =47±3 K, and the opacity law κ ν (β)=0.77(ν/352 GHz) β cm 2 g −1 with the (dust) emissivity index β=1.6±0.1. We obtain an estimate of the dust mass by means of the dust continuum flux F cont , i.e., where B(ν, T dust ) is the Planck function and ν the rest-frame frequency (Venemans et al. 2012). The IR luminosity L IR is estimated by integrating Equation (10) over the solid angle and between 3 and 1100 μm in the rest frame (e.g., Kennicutt & Evans 2012). We obtain a dust-based SFR estimate following Kennicutt & Evans (2012) via the scaling relation All derived quantities are shown in Table 5. Note that the statistical uncertainties on the SFR estimates are small, but in practice the errors are dominated by systematic uncertainties arising from the scatter of 0.4-0.5 dex in the scaling relations (Equations (9) and (12)), as well as assumptions about the dust temperature and the emissivity index, which can influence the derived SFRs by a factor of 2-3. For more details on the systematic uncertainties on these measurements, we refer the reader to Section 4.1 in Venemans et al. (2018).

Notes on Individual Objects
In Figure 7 we show all proximity zone measurements that are not prematurely truncated or potentially contaminated by BAL features, as a function of the quasars' absolute magnitude M 1450 . All quasars in our data sample show smaller proximity zone sizes than the expected average given their magnitude, which results from our selection criteria aiming to target young quasars. Five quasars that show no associated absorption systems or broad absorption lines exhibit extremely small proximity zones with R p,corr <2 pMpc. These five quasars plus two from our previous study (Eilers et al. 2017a), which are marked with boxes in Figure 7, indicate very short quasar lifetimes, i.e., t Q 10 5 yr, which will be analyzed in more detail in PaperII. Figure 8 shows the bolometric luminosity as a function of the quasars' black hole masses. Our new measurements are compared to a low-redshift quasar sample of 75,000 objects from the SDSS Data Release 7 (DR7; Shen et al. 2011;Wang et al. 2015), as well as to several other z5.8 quasars (Jiang et al. 2007;Willott et al. 2010;De Rosa et al. 2011Wu et al. 2015;Mazzucchelli et al. 2017;Bañados et al. 2018). The quasars with very small proximity zones marked with red boxes do not populate a special region in parameter space.

PSO J004+17
This object has a very small proximity zone of R p = 1.16±0.15 pMpc (R p,corr =1.71±0.22 pMpc). Although we find a high-ion absorption system within the quasar's proximity zone, it is not optically thick and thus could not have prematurely truncated the proximity zone (see Figure D1). The redshift of the absorption system coincides with the quasar's systemic redshift and might thus be due to the circumgalactic medium of the quasar's host galaxy. The dust continuum map shown in Figure A1 Figure D2). It also has the largest black hole mass within our sample, i.e., M BH = (4.96±0.51)×10 9 M e .
PSO J158-14 This quasar, discovered both by Chehade et al. (2018) and E. Bañados et al. (2020, in preparation), shows a small proximity zone of R p =1.91±0.14 pMpc (R p,corr =1.63±0.12 pMpc). Interestingly, the quasar has a very strong dust continuum emission, i.e., F cont =3.46±0.02 mJy. The derived star formation rate of approximately 1420 M e yr −1 suggests the presence of a coeval starburst with the SMBH growth. Furthermore, we estimate a large bolometric luminosity of log L bol /erg s −1 =47.31 and a high Eddington ratio of λ Edd =1.01±0.08. The Mg II emission line is highly blueshifted with respect to the systemic redshift of the quasar, i.e., Δv=−673±49 km s −1 , suggesting strong internal motions within the BLR.
CFHQS J2100-1715 This quasar exhibits the smallest proximity zone R p = 0.37±0.14 pMpc (R p,corr =0.66±0.25 pMpc) detected to date with no signs of contamination from associated absorption systems (see Figure 6). Decarli et al. (2017) reported the detection of a companion galaxy at a projected separation of ∼60 kpc, suggesting that the two objects might be at an early stage of interaction. Its spectrum shows a very red spectral slope (Willott et al. 2009).

CFHQS J2229+1457
This quasar's small proximity zone has been confirmed in this study, i.e., R p =0.47±0.14 pMpc (R p,corr =1.12±0.33 pMpc). Its X-Shooter spectrum does not reveal any associated absorption systems that could truncate or contaminate its proximity zone, although the S/N of the data redwards of the Lyα emission line is still very low (see Figure D4). Note that the black hole mass measurement for this object should be taken with caution, because the Mg II emission line falls on top of a telluric feature. Our measurement differs from a previous measurement by Willott et al. (2010) who made use of a lowerresolution spectrum (R≈520) observed with NIRI/Gemini by more than one order of magnitude. Note.The columns show the name of the quasar, the estimated IR luminosity, and dust mass, as well as the inferred SFR of the quasars' host galaxies by means of the  Our new measurements are shown as colored data points, whereas the gray square data points shows those from previous work (Eilers et al. 2017a). The gray dashed line and shaded region represent the median and 68th percentile of the distribution of 400 simulated proximity zones from radiative transfer simulations at z=6 and = t log 7.5 yr Q (see Davies et al. 2016;Eilers et al. 2017a, for details), respectively. The seven quasars with extremely small proximity zones are indicated by the black boxes. The black error bar in the bottom right indicates the systematic uncertainty on the R p measurements with submillimeter redshifts, assuming a systematic uncertainty on the submillimeter redshift estimates of Δv=100 km s −1 . This object is the second highest-redshift quasar in our data sample. Its Mg II emission line is highly blueshifted, i.e., Δv=−1021±143 km s −1 , with respect to the [C II] systemic redshift.

PSO J011+09
PSO J056-16 As shown in Figure 5, this quasar's proximity zone has been prematurely truncated due to a pDLA along our line of sight. Thus, we exclude this object from any further analysis of its proximity zone. The estimated large Eddington ratio of λ Edd =1.26±0.08 indicates that this quasar has a high accretion rate.
VDES J0323-4701 It has been speculated that this quasar as well as VDES J0330-4025 might lie in an overdense region of the universe, because they are located within 10°on the sky (Reed et al. 2017). The new, very similar redshift estimates for both objects based on their Mg II emission lines of z Mg II ≈6.241 with a very small velocity difference of Δv≈50 km s −1 supports this hypothesis. The estimated Eddington ratio is very high, i.e., λ Edd =1.76±1.24. However, the NIR spectrum of this object and in particular the Mg II emission line have a very low S/N and thus these estimates have large uncertainties and should be taken with caution.

SDSS J1143+3803
Based on the new redshift estimate from the CO(6−5) emission line (z CO(6−5) ≈5.8366), which is significantly higher than the preliminary estimate from the Lyα emission line (z Lyα ≈5.805), this quasar has the largest proximity zone (R p =3.93±0.63 pMpc) in our sample. Unfortunately, the end of its zone falls right in between the two DEIMOS detectors, which are separated by a Δλ=11 Å wide gap. Thus, we adopted the middle of the detector gap as the best estimate of the proximity zone size and the width of the gap as the uncertainty on R p .
PSO J239-07 Although this quasar exhibits a very small proximity zone, i.e., R p =1.29±0.14 pMpc, the BAL features detected in its optical/NIR spectrum might have contaminated its zone. Thus, we exclude this object from any further analysis. The dust continuum map shown in Figure A1 shows two other continuum sources in the vicinity of this quasar.
PSO J261+19 This quasar is the highest-redshift object in our sample. We could not detect any line nor continuum emission associated with its host galaxy (see Figures 1 and A1), indicating that its emission is very faint, i.e., F cont <0.05 mJy.
PSO J265+41 This object has been discovered by E. Bañados et al. (2020, in preparation). It is a BAL quasar, and has thus been eliminated from our analysis about proximity zones, as its absorption features might contaminate the proximity zone. This quasar shows a very bright [C II] line (F line =9.20±0.50 Jy km s −1 ), as well as a bright dust continuum emission (F cont =3.61± 0.07), suggesting a highly star-forming (SFR1470 M e yr −1 ) quasar host galaxy.

Fraction of Young Quasars
In our previous study we discovered three young quasars in a parent sample of 31. This implies a fraction of young quasars within the quasar population at large of f young ≈3/31≈10% (Eilers et al. 2017a). For a majority of objects in this sample we had precise redshift estimates and good spectroscopic data, and hence the young population within this data set is likely to be complete.
Assuming that the five objects mentioned in Section 4.6.1 all have short quasar lifetimes, we now know a total of seven young objects. Please be reminded that one of the objects analyzed here was already previously identified to have a short quasar lifetime (Eilers et al. 2017a). Given a total quasar sample of 153 objects, i.e., the combined 122 quasars from the parent sample of this study (Section 2) and the 31 objects from Eilers et al. (2017a), we obtain an estimated fraction of young quasars f young ≈7/153≈5%. However, this estimate likely represents a conservative lower limit, because there might still be more quasars with short lifetimes within the remaining sample for which we did not conduct follow-up observations and thus no precise redshift estimates exist to date. Additionally, quasars exhibiting BAL features, which can make up to ∼40% of the quasar population (e.g., Allen et al. 2011), could also be young, but we will not be able to estimate their lifetime by means of their proximity zone sizes.
Thus, we conclude that the fraction of young quasars within the whole quasar population in the early universe is 5%f young 10%.

Summary and Conclusions
We perform a multiwavelength analysis to systematically detect and characterize high-redshift quasars that are likely to be very young, as indicated by their small proximity zones. We analyze 13 quasars at 5.8z6.5 and determine precise redshift estimates by means of their [C II] or CO(6−5) emission lines arising from the cold gas of the host galaxy observed with ALMA and NOEMA, or based on their Mg II emission line from the NIR spectra observed with VLT/X-Shooter, if no submillimeter data are available. These new redshift estimates allow us to precisely measure the size of the proximity zones of the quasars, which we will use in a subsequent follow-up paper to determine their lifetimes (PaperII). Additionally, the deep optical and NIR spectra we obtained from VLT/ X-Shooter and Keck/DEIMOS for the quasar sample allow us to exclude a contamination of the proximity zone due to associated absorption systems or broad absorption lines, and enable measurements of the black hole masses as well as the Eddington ratio of their accretion.
The main results of this study are: 1. We find five quasars (PSO J004+17, VDES J0330-4025, PSO J158-14, CFHQS J2100-1715, and CFHQS J2229 +1457) that exhibit particularly small proximity zones, i.e., R p,corr 2 pMpc, and thus likely indicate very short quasar lifetimes, i.e., t Q 10 5 yr. One of these five objects, CFHQS J2229+1457, has previously been identified as a very young quasar (Eilers et al. 2017a). 2. The quasar CFHQS J2100-1715 exhibits the smallest proximity zone detected to date, i.e., R p,corr =0.66± 0.25 pMpc. Additionally, the detection of a companion galaxy (Decarli et al. 2017) might indicate that this system is at an early stage of interaction. 3. Our previous work revealed three young quasars in a sample of 31 objects (Eilers et al. 2017a). For this study, we analyzed 122 quasar spectra and chose the most promising 13 candidates to follow up. We discover five young quasars in this data set, one of which was previously known. This allows us to constrain the fraction of young quasars within the high-redshift quasar population at large to be 5%f young 10%. 4. We determine the spectral properties of the quasars in our sample, such as black hole masses, the velocity shifts of the emission lines, and the Eddington ratios of the accretion of the SMBHs. For three objects in the sample we measure large Eddington ratios, i.e., λ Edd 1, which indicate high mass accretion rates at the Eddington limit. The estimated black hole masses, derived from fitting the single-epoch Mg II region, vary between M BH ≈3× 10 8 -5×10 9 M e . 5. We measure dust continuum fluxes and find two particularly bright quasars, i.e., PSO J158-14 (which is likely to have a very short lifetime) and PSO J261+41, with F cont =3.46±0.02 mJy and F cont =3.61± 0.07 mJy, respectively. The inferred high star formation rates of 1400 M e yr −1 suggest the presence of a starburst within their host galaxies coeval to the SMBH growth. These quasars represent ideal targets for highresolution imaging with ALMA to study star formation and the ISM in the very early universe and to search for any signs of mergers, outflows, and feedback.
Our analysis presents a first step toward a systematic study of the lifetimes of high-redshift quasars based on their proximity zone sizes. In future work we will determine the lifetime estimates for this quasar sample, as well as a statistical estimate of the lifetime of the quasar population at large by means of the estimated fraction of young quasars. This will enable us to analyze the evolution of quasar and host galaxy properties with the quasar lifetime and to further study the accretion behavior of SMBHs in the early universe. Figure A1. Dust continuum maps. The six quasars in the top and middle rows show ALMA observations, whereas the bottom row shows our NOEMA data. Each panel is 25″×25″ in size. Solid and dashed lines show the ±2σ, 4σ, 6σ, 8σ, and 10σ isophotes. Mg II Emission-line Fits In order to measure the black hole masses of the quasar, we fit the width of the Mg II emission line as described in Section 4.2. The best fits to the Mg II emission lines are shown in Figure B1. Figure B1. Best fits to the Mg II emission lines. We show all quasars in our sample that have NIR spectral coverage, and their Mg II line does not fall into a region of high telluric absorption. The spectra around Mg II at λ rest =2798 Å and the corresponding noise vector have been inverse-variance smoothed with a 20 pixel filter and are shown in black and gray, respectively. The fit to the quasar emission (red curve), as well as its individual components, is shown by the colored curves, i.e., a power-law continuum (blue dashed), the smoothed iron template (green), and a Gaussian for the Mg II emission line. Regions with large telluric absorption have been masked by the gray regions. The faint red lines show draws from the posterior distribution. The red frames show quasars that exhibit very small proximity zones and short lifetimes. uncertainties in the relationship between red-side and blue-side features, as well as the inability of the PCA model to exactly reproduce a given spectrum. Following Davies et al. (2018), we estimate that the mean of the error (i.e., the bias) is ò C ≈1%, whereas the uncertainty of the prediction can be up to s  C ≈10%.
In order to test the influence of these uncertainties in the quasar continuum estimate on the measurements of the proximity zones, we draw samples of the continuum with Gaussian uncertainties added according to s  C , as shown in Figure C3 for an example spectrum. We then calculate the proximity zones for each draw as described in Section 4.4 and estimate the error on R p due to uncertainties in the continuum estimate, i.e., ΔR p ≈0.01-0.03 pMpc, for all quasars for which NIR spectral coverage was available to construct the PCA model. For quasars without NIR coverage, we had to use a truncated PCA model, which results in slightly larger uncertainties and thus the error on R p increases to ΔR p ≈0.05-0.25 pMpc. These uncertainties are still small compared to the R p measurements.

Appendix D Absorption Systems
In Figures D1-D4 we show the spectra of all quasars with very small proximity zones and a hypothetical absorption system. We do not see any evidence for a premature truncation of an associated absorption system. Figure C3. Example for analyzing the effects of the quasar continuum uncertainties on the proximity zone measurements. Top: spectrum of PSO 158-14 (black) with 100 draws from the quasar continuum estimate including Gaussian noise (blue). Bottom: continuum-normalized quasar spectrum and the smoothed flux from the different draws of the continuum estimate, showing that the influence of continuum uncertainties is very small on the location of R p , where the smoothed flux drops below the 10% level (yellow).
However, the spectrum of PSO J004+17 shown in Figure D1 shows an associated absorption system located within the quasar's proximity zone directly at the systemic redshift of the quasar, i.e., z abs ≈5.8165, possibly due to the circumgalactic medium of the host galaxy itself. However, this system only shows high-ionization absorption lines, such as the doublets N V at the rest-frame wavelengths λ=1238 Å and λ=1242 Å, and Si IV at λ=1393 Å and λ=1402 Å. We do not find any evidence for low-ionization lines, indicating that the absorption system is unlikely to be self-shielding. Additionally, the spectrum shows clear flux transmission bluewards of the absorption system, which indicates that the proximity zone extends beyond this absorber. Thus, we exclude a premature truncation of the quasar's proximity zone due to this absorption system.
The spectrum of PSO J158-14 shows a potential absorption line close to the location of Si IIλ 1260 of the hypothetical absorption system. However, we do not find evidence for any other low-ionization lines that could be associated with an absorption system, which should be present if there was indeed such a system, and thus conclude that the line likely belongs to a foreground absorber at lower redshift.
We note that the S/N of the spectrum of CFHQS J2229 +1457 redwards of the Lyα emission line is still very low due to the quasar's faint continuum emission. We do not see any evidence for an absorption system that might truncate the proximity zone (see Figure D4); however, the low data quality does not allow us to securely rule out this option. Figure D1. Same as Figure 6 but for PSO J004+17. For this object we also show the locations of high-ionization absorption, as these show evidence for an absorption system at the systemic redshift of the quasar (gray dashed lines). Wavelength regions (shown in light gray) around very prominent skylines, as well as around lowredshift absorption systems, have been masked to avoid biases in the stack.

23
The Astrophysical Journal, 900:37 (25pp), 2020 September 1 Eilers et al.  Note. The columns show the name of the quasar, its best systemic redshift estimate with its systematic uncertainty, the emission line it is derived from, its absolute magnitude M 1450 , as well as the size of the proximity zone and its magnitude corrected value. The last column indicates whether the quasar has broad absorption lines (BALs) or associated absorption systems such as a proximate damped Lyα system (pDLA), which might have contaminated the proximity zones.